RFC 1123 - Load balancing question
Kevin Darcy
kcd at daimlerchrysler.com
Tue Oct 3 00:56:19 UTC 2006
Jon Doe wrote:
> Can someone please help me in understanding the information contained in the
> following section of RFC 1123?
> I'm currently involved in a situation where the other company I'm dealing is
> quoting RFC1123 with claims that my mailer is not complying
> with this passage and therefore, it's my fault that mail is being delayed in
> reaching them.
>
> RFC 1123
>
> 5.3.4 Reliable Mail Transmission
>
> (1) Multiple MX Records -- If there are
> multiple destinations with the same preference and there
> is no clear reason to favor one (e.g., by address
> preference), then the sender-SMTP SHOULD pick one at
> random to spread the load across multiple mail exchanges
> for a specific organization;
>
> The short story is that, we send over 15,000 messages to one company, and
> during peak times we see delays of those messages to be over 4-hours (I dont
> see delays to any other domain). They have 6 MX records and during these
> peak times, my SMTP logs show attempts to connect, but there's always
> delays. Even when I telnet to port 25, it takes up to 20 seconds to even get
> the banner screen. The other guy says that they show that they most of my
> mail sender's traffic attempting just 2 of the 6 MX records they have, and
> therefore not "spreading the load".
>
> So my question is, how should this passage in the RFC be interpreted? What
> does the RFC mean by "spread the load"? My SMTP sender simply finds the MX
> server, connects and keeps sending to that one server until the TTL expires.
>
Why are you using the words "server", singular, and "TTL", singular? If
what the guy is saying is true, there are 6 MX records with TTLs
(plural), collectively pointing to 6 servers (plural). You should be
trying those 6 servers in preference-value order, as per the algorithm
outlined in the RFC.
Note that, unless you are running a broken *DNS* implementation, you'll
never have differing TTLs for the RRs of a given RRset (see RFC 2181),
so you'll never have a "partial" RRset being returned by your DNS
subsystem. You'll always see all 6 records (or possibly none at all if
your DNS subsystem is having trouble resolving the query, but that would
be an exception case and cause for message-delivery deferral-and-retry).
None of the records of the MX set will expire before any of the others.
If the MX records in question (why didn't you tell us what MX name
you're talking about, by the way?) have exactly 2 records with the
lowest (i.e. best) preference value, and those 2 records point to the 2
mail servers that you're using most of the time, then the behavior
observed is perfectly normal and expected. The only reason for going to
the higher-preference-valued (i.e. less preferred) MXes would be if you
failed to connect to *either* of those 2 "preferred" targets. This may
be cause for you to review your connection-timeout settings, but it
doesn't point to any non-compliance problem of your software or
configuration with respect to the DNS provisions of any RFC. On the
other hand, if your MTA is preferring those 2 targets without proper
regard for preference values, then the guy is perfectly correct: your
MTA software and/or its configuration, would appear to be broken.
You could argue, of course, if more than just those 2 records are at the
most "preferred" value, e.g. if all 6 of the records are at the *same*
preference value, that, as provided in the RFC, you have a "clear reason
to favor" those 2 addresses over the others at the same preference
value. But how could that argument prevail, when favoring those 2
addresses is, by your own admission/observation, causing 4-hour delays?
If you really do have such a "preference" configured, then, again, I
would say your configuration is broken.
> Also, what does the RFC mean by "random"?
random Pronunciation Key - Show Spelled Pronunciation[ran-duhm]
Pronunciation Key - Show IPA Pronunciation
–adjective
1. proceeding, made, or occurring without definite aim, reason, or
pattern: the random selection of numbers.
Do you need more definitions?
> When I do an nslokup on my DNS
> server, I get a list of MX records, and it's this list that my mail sender
> uses in determining where to send the mail.
It may be the same list, but the order of the MX records _ipso_facto_ is
meaningless. Mail-sending software is required to parse the preference
values of those records, work through the list in
*preference-value*order*, and only if there are multiple records with
the *same* preference value, must you randomize, unless, as mentioned,
there is a "clear reason to favor" some records over others. The RFCs
speak quite plainly here.
> By the way, I found a similar
> passage in RFC 2821 section 5.
>
That would be the controlling document, actually. The part that you
elided from the end of your quote from RFC 1123 explained that the
preceding text was a refinement of the algorithm from RFC 974; since RFC
974 was obsoleted by RFC 2821, this "refinement" has been obsoleted as
well. Not that it really matters, since the text is pretty much
identical anyway...
- Kevin
More information about the bind-users
mailing list