Unusual TSIG problem

Wed Dec 8 21:42:21 UTC 2010

I just ran into an odd issue with a TSIG signed zone transfer.

On occasion I was logging a clocks are unsynchronized message doing a
transfer from a customer server at a site about 30 ms away. I dropped a
note to the manager there asking that he look at the his system for a
time issue. He checked and found no problems.

Today I looked at the problem more closely. I realized that the problem
was NOT a clock sync issue. They were probably within a millisecond of
one another. I found the following in the log:
Dec  8 06:26:18 ns1 named[67170]: zone XXXXXX.gov/IN: notify from 123.234.1.1#33372: refresh in progress, refresh check queued
Dec  8 06:31:18 ns1 named[67170]: transfer of 'XXXXXX.gov/IN' from 123.234.1.1#53: failed while receiving responses: clocks are unsynchronized
Dec  8 06:31:18 ns1 named[67170]: transfer of 'XXXXXX.gov/IN' from 123.234.1.1#53: Transfer completed: 1 messages, 397 records, 59674 bytes, 898.462 secs (66 bytes/sec)

The transfer, probably due to a hardware problem was taking over 5
minutes to transfer the zone and RFC2845 suggests tha the difference
between clocks should be limited to 300 seconds (5 minutes). This really
means that, should the transfer take over 5 minutes, you get the
unsynced clocks error. (4.5.2. TIME check and error handling)

Clearly, something is broken when a zone transfer takes over 5
minutes. (This one SHOULD take about 2-3 seconds.) But the message
certainly pointed in the wrong direction. Is there more appropriate
language that might indicate that it could also be an effective time-out
because the transfer took too long? Maybe "failed while receiving
responses: clocks are unsynchronized or maximum transfer time exceeded"?
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman at es.net			Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751