BIND 10 #2244: remove ddns component, but boss still keeps trying to start it

BIND 10 Development do-not-reply at isc.org
Mon Oct 8 12:26:15 UTC 2012


#2244: remove ddns component, but boss still keeps trying to start it
-------------------------------------+-------------------------------------
                   Reporter:  jreed  |                 Owner:  jinmei
                       Type:         |                Status:  reviewing
  defect                             |             Milestone:
                   Priority:         |  Sprint-20121009
  medium                             |            Resolution:
                  Component:  Boss   |             Sensitive:  0
  of BIND                            |           Sub-Project:  Core
                   Keywords:         |  Estimated Difficulty:  6
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei


Comment:

 Hello

 Did you try running the system tests? It seems to be failing reliably for
 me, like this:

 {{{
 cd tests/system; \
 sh /home/vorner/work/bind10/tests/system/runall.sh
 S:bindctl:Mon Oct  8 14:13:03 CEST 2012
 T:bindctl:1:A
 A:System test bindctl
 Using SQLite3 database file
 /home/vorner/work/bind10/tests/system/bindctl/nsx1/zone.sqlite3
 Zone name is .
 Loading file "/home/vorner/work/bind10/tests/system/bindctl//nsx1/root.db"
 4 RR(s) loaded in 0.35 second(s) (100.00% of
 /home/vorner/work/bind10/tests/system/bindctl//nsx1/root.db)
 Done.
 I:starting server nsx1
 I:Checking b10-auth is disabled by default (0)
 I:Starting b10-auth and checking that it works (1)
 I:Checking BIND 10 statistics after a pause (2)
 I:Stopping b10-auth and checking that (3)
 I:Restarting b10-auth and checking that (4)
 I:failed
 I:Rechecking BIND 10 statistics after a pause (5)
 I:failed
 I:Changing the data source from sqlite3 to in-memory (6)
 I:failed
 I:Rechecking BIND 10 statistics after changing the datasource (7)
 I:failed
 I:Starting more b10-auths and checking that (8)
 I:failed
 I:Rechecking BIND 10 statistics consistency after a pause (9)
 I:failed
 I:Stopping extra b10-auths and checking that (10)
 I:failed
 I:exit status: 1
 R:FAIL
 E:bindctl:Mon Oct  8 14:14:52 CEST 2012
 S:glue:Mon Oct  8 14:14:52 CEST 2012
 T:glue:1:A
 A:System test glue
 Using SQLite3 database file ./nsx1/zone.sqlite3
 Zone name is .
 Loading file "./nsx1/root.db"
 16 RR(s) loaded in 0.32 second(s) (100.00% of ./nsx1/root.db)
 Done.
 Using SQLite3 database file ./nsx1/zone.sqlite3
 Zone name is root-servers.nil.
 Loading file "./nsx1/root-servers.nil.db"
 5 RR(s) loaded in 0.19 second(s) (100.00% of ./nsx1/root-servers.nil.db)
 Done.
 Using SQLite3 database file ./nsx1/zone.sqlite3
 Zone name is com.
 Loading file "./nsx1/com.db"
 8 RR(s) loaded in 0.18 second(s) (100.00% of ./nsx1/com.db)
 Done.
 Using SQLite3 database file ./nsx1/zone.sqlite3
 Zone name is net.
 Loading file "./nsx1/net.db"
 7 RR(s) loaded in 0.17 second(s) (100.00% of ./nsx1/net.db)
 Done.
 I:starting server nsx1
 I:testing that a TLD referral gets a full glue set from the root zone (0)
 I:testing that we don't find out-of-zone glue (1)
 I:exit status: 0
 R:PASS
 E:glue:Mon Oct  8 14:14:57 CEST 2012
 S:ixfr/in-2:Mon Oct  8 14:14:57 CEST 2012
 T:ixfr/in-2:1:A
 A:System test ixfr/in-2
 Using SQLite3 database file
 /home/vorner/work/bind10/tests/system/ixfr/zone.sqlite3
 Zone name is example.
 Loading file "/home/vorner/work/bind10/tests/system/ixfr/db.example.n6"
 10 RR(s) loaded in 0.31 second(s) (100.00% of
 /home/vorner/work/bind10/tests/system/ixfr/db.example.n6)
 Done.
 /home/vorner/work/bind9/bin/tests/system/testsock.pl: bind(10.53.0.1,
 53210): Address already in use
 I:Couldn't bind to socket (yet)
 /home/vorner/work/bind9/bin/tests/system/testsock.pl: bind(10.53.0.1,
 53210): Address already in use
 I:Couldn't bind to socket (yet)
 /home/vorner/work/bind9/bin/tests/system/testsock.pl: bind(10.53.0.1,
 53210): Address already in use
 I:Couldn't bind to socket (yet)
 /home/vorner/work/bind9/bin/tests/system/testsock.pl: bind(10.53.0.1,
 53210): Address already in use
 I:Couldn't bind to socket (yet)
 /home/vorner/work/bind9/bin/tests/system/testsock.pl: bind(10.53.0.1,
 53210): Address already in use
 /home/vorner/work/bind10/tests/system/start.pl: could not bind to server
 addresses, still running?
 I:server sockets not available
 R:FAIL
 Can't open perl script
 "/home/vorner/work/bind10/tests/system/ixfr/stop.pl": No such file or
 directory
 }}}

 I think the testsock.pl ones are caused by previous failed tests, since
 they leave stuff running, I had to manually kill bind10. They seem they
 could be related, since the first one failing is about restarting. Many
 lettuce tests are also failing, with traceback like:

 {{{
  Given I have bind10 running with configuration
 xfrin/retransfer_master.conf with cmdctl port 47804 as master #
 features/terrain/bind10_control.py:107
     Traceback (most recent call last):
       File "/home/vorner/.local/lib64/python2.7/site-
 packages/lettuce/core.py", line 117, in __call__
         ret = self.function(self.step, *args, **kw)
       File
 "/home/vorner/work/bind10/tests/lettuce/features/terrain/bind10_control.py",
 line 120, in have_bind10_running
         step.given(start_step)
       File "/home/vorner/.local/lib64/python2.7/site-
 packages/lettuce/core.py", line 326, in given
         return self.behave_as(string)
       File "/home/vorner/.local/lib64/python2.7/site-
 packages/lettuce/core.py", line 366, in behave_as
         assert not steps_failed, steps_failed[0].why.exception
     AssertionError: Got: 2012-10-08 14:23:30.925 FATAL [b10-boss.boss]
 BIND10_STARTUP_ERROR error during startup: b10-msgq already running, or
 socket file not cleaned , cannot start
 }}}


 And there are the minor things. The log description is missing „m“ at the
 end of the line:
 {{{
 The boss module simply skipped restarting that module, and the whole syste
 went back to the expected state (except that the crash itself is likely
 }}}

 The method name `is_failed` looks like wrong in English. I know it is
 because of consistency, but it would look more correct as `has_failed` or
 `is_restarting`.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2244#comment:7>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list