BIND 10 #640: cfgmgr hanging and command and configurations for no-longer-running components
BIND 10 Development
do-not-reply at isc.org
Wed Feb 8 16:26:23 UTC 2012
#640: cfgmgr hanging and command and configurations for no-longer-running
components
-------------------------------------+-------------------------------------
Reporter: jreed | Owner: jinmei
Type: | Status: reviewing
defect | Milestone:
Priority: major | Sprint-20120221
Component: | Resolution:
Unclassified | Sensitive: 0
Keywords: | Sub-Project: DNS
Defect Severity: N/A | Estimated Difficulty: 0.0
Feature Depending on Ticket: | Total Hours: 0
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Changes (by jelte):
* owner: jelte => jinmei
Comment:
Answering both your comment fields as one, in an attempt to cope with the
potentially confusing non-threaded trac comment view :)
Replying to [comment:10 jinmei]:
> First, comments on the rest of the original branch (before you updated
> it based on the initial comments).
>
> '''some higher level points'''
>
> The relationship between bindctl and module CC session is complicated
> and difficult to understand. For example, it took some time for me to
> understand the interaction of UIModuleCCSession and bindctl - these
> two (defined in separate modules) are tightly coupled and work based
> on some implicit assumption of each other's features. So, to
> understand changes to the module CC session I needed to look into
> bindctl. Is it only me, or is this a known issue and kind of a longer
> term TODO? In any case, however, I'm not requesting it be resolved
> within this ticket.
>
It isn't an explicit TODO so far, but it grew this way partly because of
the way bindctl was originally written and how it changed over time. The
idea should be that bindctl is just one tool that can use 'the API'
(UIModuleCCSession) to control everything (hence a lot of functionality
does not reside in bindctl itself), but I readily admit that it may have
gotten quite muddled over time. Also see next point.
> Maybe unrelated to this ticket but: does bindctl try to get the full
> latest spec and config data of all running modules on accepting every
> command? If so, is the overhead acceptable?
>
Yes, it (currently) does; IIRC this got put in by Likun when he made
bindctl a stateless http server, and we found that on the one hand you
want to modify things locally (i.e. you need to be able change multiple
settings and submit them as one), but do want things like tab-completion,
for which you need at least the full specs and config. Rather than
inventing a 'given your previous state, these are the changes' protocol he
opted for a full fetch of the remote data on every command, since that was
easiest at the time. As long as it's run locally and we don't have
kilobytes of data to send, I think the overhead is acceptable. There are a
few easy targets for improvements; we could consider having a very simple
'anything changed?' request/response prior to requesting all data, and we
could make the actual data one big request (instead of a small one for
each module). But I must say I have not yet though much about doing it
completely differently (we could for instance go the other way completely
and make it more like AJAX websites; putting all interactive knowledge on
the server side in a session and making bindctl a much more basic tool).
> When the application doesn't die gracefully, sending the stop message
> can be skipped. Is it a TODO thing when it supports some kind of
> RAII? For that matter, the whatever successor of msgq may make it
> possible for cfgmgr to (easily) detect it when a specific module is
> disconnected. If so, we may not have to have a "send stopping"
> primitive, either directly or indirectly.
>
Yes, this is what I meant by making ModuleCCSession a Context; the way to
do RAII in python is different from C++, since the often implicit
reference counting messes up much of the reliability of __del__(), so what
you do is make your objects implement the Context interface; give it a
method __enter__() and __exit()__, and use your object with the with-
statement. So what I had in mind would look something like
{{{
with ModuleCCSession(args) as mccs:
main_proccessing_loop()
}}}
But since this requires some fundamental changes in nearly all modules, I
deemed it out of scope for this ticket itself, but have not made a new
ticket yet (want to work out a bit of an approach for that first).
> '''ccsession.py'''
>
> I've come up with additional comments on this file while reading other
> part of the branch.
>
> - request_specifications: it would be helpful if it had more detailed
> description, including it clears the previous data. It would also
> be helpful how this method (via update_specs_and_config()) is
> expected to be used - and, for that matter, whether
> request_specification() is expected to be public or essentially
> private (in the latter case adding `__` will clarify the intent a
> bit clearer, although this will make some update to the test code).
> - request_specifications: not directly related to this ticket, but I
> suspect we can remove or at least revise the comment.
>
updated the docs a bit (and moved the update function to below the two it
calls). In short, the update one is a convenience function, since most of
the time you need them both. There are exceptions (mainly in the tests, at
this moment, but if we change the way we do the updates to something more
efficient, this may or may not be the case for bindctl too).
> '''ccsession_mock.py'''
>
> - I'd like to have some pydoc description for MockModuleCCSession.
>
ok
> '''cfgmgr_test.py'''
>
> - This part is a bit confusing:
> {{{#!python
> # drop the two messages for now
> self.assertEqual(len(self.fake_session.message_queue), 2)
> self.fake_session.get_message("Cmdctl", None)
> self.fake_session.get_message("TestModule", None)
>
> self.assertEqual(len(self.fake_session.message_queue), 0)
> }}}
> It's probably for clearing the faked message queue for the
> convenience of subsequent tests, but this part itself raises some
> unrelated questions: what are these two messages? Where do the
> magic string of "Cmdctl" or "TestModule" come from? Maybe the
> answer can be found if I read the other tests above this part (I
> didn't), but unless these preceding tests are prerequisite of the
> added tests, I'd rather be free from the duty of understanding it.
> Also related, test_handle_msg() now looks too long and is generally
> difficult to understand when something is added. I'd suggest
> separating the message handling tests into independent sub tests. I
> suspect the handling of "stopping" can be a separate test case, and
> if so, it will much easier to understand.
>
Refactored the entire test, split it into four separate ones (and redid a
bit of the internals)
> '''cmdctl.py.in'''
>
> - While the comment says: "If it is None, delete the module if it is
> known (and ignore otherwise)", it seems that None will be set in the
> "otherwise" case. And, is it okay to ignore it anyway? If this is
> a bogus input I think we should rather return an error, log it, or
> both. Or, if it's an okay value, the comment should be updated.
>
i originally opted for lenience, but returning an error is probably a good
idea (though i'm not sure what we should do with that error). Anyway the
code itself was wrong to put in that none, so added a test for that as
well, and it now sends back an error
> '''cmdctl_test.py'''
>
> - I think it should also test the case when arg[1] is NULL and args[0]
> is not in modules_spec. See also the comment on cmdctl.py.
>
ah, yes, just did :)
> '''bind10_src.py'''
>
> - Is "rest of the system" correct (or appropriate)?
> {{{#!python
> # If ccsession is still there, inform rest of the system this
module
> # is stopping.
> }}}
> I guess it's only delivered to cfgmgr and cmdctl (and perhaps
> bindctl via cmdctl).
I used 'system' here because the fact that this message goes to the
ConfigManager channel is kind of an implementation detail for
ModuleCCSession (and if we want to be pedantic, this is just a channel
that cfgmgr happens to listen on, its name being the same as the module
name by convention).
> - Is it possible that self.ccs is None?
> {{{#!python
> if self.ccs is not None:
> self.ccs.send_stopping()
> }}}
>
hmm no maybe i was overly conservative there, removed the check
> '''ddns_test.py'''
>
> - Why doesn't this use `MockModuleCCSession`?
>
these tests use much more details about the CCSession than the rest, and
tbh i figured if we're gonna seriously refactor those parts of the tests
anyway, we could better do it then (for starters, mockmoduleccsession
would need to be a subclass of configdata for these tests, and so it needs
specfiles for input, etc.). So I chose to defer that part, and left the
MyCCSession in here.
> '''Others'''
>
> - stats and xfrout don't seem to be tested.
i knew i missed some :p
Or actually just one; in the first commit after the review I've added the
b10-stats test.
Xfrout went into a separate commit; the main Xfrout code wasn't tested at
all, and there is a problem with the interaction between
XfroutServer.shutdown() and threads in unit tests. I've modified the
shutdown code to only wait for other threads when the was a
UnixSocketServer, but simply go on and not join() them if there wasn't.
I'm not entirely sure whether this is correct, so i've put it in a
separate commit for now.
(next points will go into a third commit)
Replying to [comment:11 jinmei]:
> Replying to [comment:9 jelte]:
>
> > Oh, sorry for not being clear; that ConfigManager timeout was actually
closely related; the timeout values for communication from bindctl to
configmanager and from configmanager to any other (in this case non-
running) module are the same; so the 'configmanager is not responding' was
not because configmanager was hanging, but because it was simply waiting
for a timeout itself. So this would address the root cause, but not that
specific problem (we probably need to make cmdctl->cfgmgr timeouts a bit
higher than the rest).
>
> Okay, I actually suspected it was not hanging, just a long timeout.
> What about the traceback?
>
I could not reproduce it at this time, and if it is an existing bug I
consider it a separate problem.
> > > - logging itself could cause an exception, so the destructor could
> > > still throw. However, this is not the only point that has this
> > > problem, so maybe we can leave it open for now...
> > >
> >
> > hmm, imo we should either make logging calls exception-free or add
exception-free ones, since it is usually about the only thing you can do
in a destructor.
>
> Yeah, I made a ticket before, but it didn't seem to be very easy.
>
ok, added a catchall there, but we should probably 'try again' :)
> > > - In our convention we'd explicitly add 'virtual' (for clarity) to
the
> > > destructor declaration.
> >
> > actually, it's even needed for technical reasons for classes that are
subclassed, but this one isn't supposed to be subclassed afaik I realize
we can't stop people from doing it, but personally i tend to look at the
virtualness of the destructor to see if it should be subclassed in the
first place. Of course, one could argue that it would be better to add it
when it's not necessary than to not have it when it is, so if you insist
i'll make it virtual.
>
> I meant this is because it has a base class and the destructor is
> derived from it, not because this class could be derived from others.
> I thought it was in the coding guideline, but I couldn't find it...
>
oh, ok. Can't find it either, but do recall some discussion about this.
OK, made it virtual.
> > > '''cfgmgr.py'''
> > >
> > > - if `_handle_module_stopping` is private, should it better be
> > > prefixed with `__`? Same for `_send_module_spec_to_cmdctl`.
> >
> > as you can see from the rest of the code here, i tend to only use the
one when I don't think the class/method name mangling is necessary. Should
we start doing this as a principle? (and if so, should we update all
'private' methods?)
>
> I think it's a good practice, but not a strong opinion. In Python
> it's not fully enforcible anyway.
>
ok, changed, there were a number of tests that actually used them, but
they were easily converted to use the general public method
> > > - Not directly related to this ticket, but the if-else in handle_msg
> > > is getting too big to understand/manage. I guess it's time to
> > > consider refactoring, maybe in some OO way.
> >
> > depends on how; as it is now, it is indeed getting long, but is
essentially a switch. I'm open to proposals, but some 'smart' solutions
might actually be more complicated :p
>
> It's probably related to extract common initial patterns of Python
> apps. In any case it'd be beyond the scope of this ticket.
>
noted
> > > '''ccsession_test.py'''
> > >
> > > - test_stop: not necessarily a problem, but it seems the same
pattern
> > > repeats in multiple tests. It would be better to unify the common
> > > sequence of tests for conciseness.
> >
> > Yes, actually, more generally, this is partly why I was looking at the
callcounter thingy a few nights ago; [...]
>
> I'm not sure if we are talking about the same thing...what I meant is
> something rather simple: repeating something like this:
> {{{#!python
> fake_session = FakeModuleCCSession()
> self.assertFalse("Spec1" in fake_session.subscriptions)
> mccs = self.create_session("spec1.spec", None, None,
fake_session)
> self.assertTrue("Spec1" in fake_session.subscriptions)
> }}}
> but anyway, I don't request it be solved within this ticket. The
> branch is already quite large, so it's better to complete it only with
> focused and relevant changes.
>
ah, right, noted again :)
> One other comment on the revised diff:
>
> '''(lettuce)bind10_control.py'''
>
> - run_bindctl: I'd set the default for cmdctl_port to None to avoid
> duplicate hardcoding of 47805.
>
ack, done
--
Ticket URL: <http://bind10.isc.org/ticket/640#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list