BIND 9.7 Serial Number Decrease Problem

Fri Jun 10 16:01:08 UTC 2011

On 07/06/11 13:51, I wrote:
> I now have this situation on one Solaris 10 slave; the problem
> probably also exists on the other Sol 10 slave and the two
> Ubuntu hardy slaves:
>
>The _tcp zone on the master MS DNS Server:
>
>      1238 600 86400 3600
>
>The _tcp zone on the BIND 9.7.3-P1 Solaris 10 server disk:
>
>      1239       ; serial
>      900        ; refresh (15 minutes)
>      600        ; retry (10 minutes)
>      86400      ; expire (1 day)
>      3600       ; minimum (1 hour)
>
>The _udp zone on the master MS DNS Server:
>
>      842 900 600 86400 3600
>
>The _udp zone on the BIND 9.7.3-P1 Solaris 10 server disk:
>      843        ; serial
>      900        ; refresh (15 minutes)
>      600        ; retry (10 minutes)
>      86400      ; expire (1 day)
>      3600       ; minimum (1 hour)
>
>Note that the zone serial number for both zones on the master is
>one LESS than the serial number on the slave.  The last messages
>in /var/adm/messages are
>
>      _tcp:
>      Jun  4 07:46:57 serial number (1238) received from master ... <
>ours (1239)
>      Jun  4 07:47:35 zone ... expired
>      Jun  4 07:47:35 zone ... transfer started
>      Jun  4 07:47:35 zone ... transferred serial 1238
>      Jun  4 07:47:35 zone ... Transfer completed: ...
>
>      _udp:
>      Jun  4 07:39:22 serial number (842) received from master ... <
>ours (843)
>      Jun  4 07:42:22 zone ... expired
>      Jun  4 07:42:22 zone ... transfer started
>      Jun  4 07:42:22 zone ... transferred serial 842
>      Jun  4 07:42:22 zone ... Transfer completed
>
>There was a zone serial number mismatch, each zone expired three days
>ago, and new zones were transferred from the master.  But the zone
>files on disk still have the higher serial numbers.  There are no .jnl
>files on the disk.  A "dig" on the server for the zone serial numbers
>shows the lower numbers, so BIND has those correct serial numbers.  I
>assume that if I stopped BIND (rndc stop) and restarted it, then I
>would again see the serial number mismatches.  I can try this during
>the day, as this server is not heavily used.  Is there any debugging I
>need to run?  Thanks.

I ran a test this morning on one of the Solaris 10 slave servers.
A query to the server showed serial numbers:

      _tcp   1238
      _udp    842

Both of these match the zone on the MS Windows DNS Server.
I checked the zone files on the slave server:

      _tcp   1239
      _udp    843

Both of these are increased by one from what BIND returns in
response to a query.

The two zones have NO .jnl files.

I did

      ./rndc stop
      <<Wait for the "exiting" message.>>
      /etc/init.d/named.anl start;tail -f /var/adm/messages

Once BIND started, the serial numbers were INCREASED, as I
expected they would be, given the lack of .jnl files.

And a few minutes later BIND complained about the serial
number on the master being less than that on the slave
for both zones.  I consider this a bug in BIND 9.
What further diagnostics do I need to get?

I have another Solaris 10 slave on which, I assume, I can
duplicate this.  And from past experience, in one day, after
the zone has expired and been refreshed, I will be in the same
state on this slave.
-
----------------------------------------------------------------------
Barry S. Finkel
Computing and Information Systems Division
Argonne National Laboratory          Phone:    +1 (630) 252-7277
9700 South Cass Avenue               Facsimile:+1 (630) 252-4601
Building 240, Room 5.B.8             Internet: BSFinkel at anl.gov
Argonne, IL   60439-4828             IBMMAIL:  I1004994