BIND 10 #775: b10-auth should not exit if it cannot bind to ports

Mon Apr 18 09:31:59 UTC 2011

#775: b10-auth should not exit if it cannot bind to ports
-------------------------------------+-------------------------------------
                 Reporter:  shane    |                Owner:  hanfeng
                     Type:  defect   |               Status:  reviewing
                 Priority:           |            Milestone:
  critical                           |  Sprint-20110419
                Component:           |           Resolution:
  b10-auth                           |            Sensitive:  0
                 Keywords:           |  Add Hours to Ticket:  0
Estimated Number of Hours:  0.0      |          Total Hours:  0
                Billable?:  1        |
                Internal?:  0        |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => hanfeng

Comment:

 Hello

 Replying to [comment:9 hanfeng]:
 > Replying to [comment:8 vorner]:
 > > So, in short, the throw must stay there. In case of the first startup,
 if it throws, we might want to catch it and not exit the whole program, at
 last as a short-time workaround, before we make it possible to configure
 things even when they are not running.
 > For this point, I don't agree, if every time port binding failed, we
 restart, it will make boss quite busy, during our test last time, we can
 see the server launch and quit for several times. The auth server isn't
 alive or dead, but keep jumping between them, which is quite terrible, you
 even can not get a chance to modify the configure by hand.

 Well, for one, it wouldn't get so busy, one restart every 10 seconds isn't
 busy (but it can be annoying, of course).

 Anyway, the abort is not responsible for the jumping. If the process is
 starting up and it can't bind to the ports, the rollback is to the empty
 set of addresses, so the second exception can not happen, therefore the
 abort can't happen. The thing that did kill the process was the first
 exception, which you catch now.

 The effect of the abort is, if user changes the configuration at runtime
 and it fails, it tries to return back to the original ones. If that fails
 as well (which it should not, in reality), there's some serious problem.
 So in that case, it aborted, making the boss restart it (with the old
 config). That should work, because it worked some time before already. But
 if that fails as well, it rollbacks to empty set of sockets, you catch the
 exception and it sits there. So it would jump only once and only in the
 really improbable situation.

 But, after explaining the situation, I don't really care much about it,
 it's rare. In the long term, we should rewrite the changing of sockets so
 the old ones are released only after the new ones are successfully bound,
 so we wouldn't have to care about it (eg. the rollback couldn't throw).

 > As for the last throw in port config. I have restore it.

 ACK. I don't really like the catch-all thing there or the fact that the
 server would be sitting there in completely useless way (and, in fact, in
 somehow inconsistent configuration). But due to the current problem with
 configuration, it's probably the less evil thing. So, could you add some
 comment around it that it's a temporary solution and should be removed
 once we are able to configure modules while they are not running?

 And, maybe, it should have a changelog.

 Otherwise, it is OK

 Thanks

-- 
Ticket URL: <https://bind10.isc.org/ticket/775#comment:11>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development