BIND 10 #547: Update of statistics daemon for HTTP/XML reporting: implement

Fri Apr 8 10:08:55 UTC 2011

#547: Update of statistics daemon for HTTP/XML reporting: implement
-------------------------------------+-------------------------------------
                 Reporter:           |                Owner:  naokikambe
  naokikambe                         |               Status:  reviewing
                     Type:           |            Milestone:
  enhancement                        |  Sprint-20110419
                 Priority:  major    |           Resolution:
                Component:           |            Sensitive:  0
  statistics                         |  Add Hours to Ticket:  0
                 Keywords:           |          Total Hours:  0
Estimated Number of Hours:  40.0     |
                Billable?:  1        |
                Internal?:  0        |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => naokikambe

Comment:

 Hello

 Replying to [comment:9 naokikambe]:
 > Thank you very much for so deeply reviewing and for giving so many
 comments!  And I'm very sorry for confusing reviewers a lot. So I would
 like to suggest you this. I'll close this ticket and split it into
 following 4 tickets.
 > And I'll delete the existing branches on the repository (trac547,
 trac547-summarized) and recreate new branch and complete history by each
 ticket. How about that? Can it make sense for you? I'll do that if it is
 okay for you. Please let me know.

 No need to. I already went trough the code, so splitting it up now is no
 longer needed, it would only waste your time.

 > >  * Every process/module has a separate directory in the bin/. Would
 the xml stats server deserve it's own directory as well?
 > Because I think the stats daemon and the HTTP server are grouped in the
 same category according to the picture in [[DesignDiagrams]], and they
 almost share the same libraries and the same stub codes, so I put them
 into the same directory. If it is incorrect, I will separate the
 directory.

 Well, with these things, there's no correct or incorrect, it's just how we
 feel about it. It surprised me, so I asked ‒ if it is on purpose, then
 it's OK.

 > >  * Do we expect users to want to know the internal how it works? Like
 how it is started by boss, etc. Maybe the man page could be less detailed
 in that sense, and just say that it presents the statistics from stats
 module trough XML/HTTP?
 > Yes, the users needs to know details but I don't exactly know what we
 should add in the man page. What about this? Please show me more
 suggestion if you have.

 Why do they need to know?

 I mean, if I was a user, I would care about if it runs or not and how to
 ask it for my web page with statistics. Where or how it gets the data,
 it's none of my problems. So, why would any other user want to care what
 is the internal format that goes from one place in the BIND10 system to
 another? I fear that the man page is too long and confusing users about
 too much detail.

 This information is, of course, valuable as developer documentation. But
 man page is for the end user.

 > >  * The man page talks about parameters when it talks on which
 addresses it listens. IMO that's little bit misleading, user might want to
 put it into the command line after reading it. Maybe call them options or
 configuration points so it is clear they are not command line arguments?
 > OK. What about adding the following description into OPTIONS section in
 the man page?
 > {{{
 > The verbose mode is only supported in the command-line option of
 > b10-stats-httpd. You can set the parameters like addresses and ports in
 HTTP
 > into the specification file (stats-httpd.spec). Please see
 "CONFIGURATION AND
 > COMMANDS" section below for details.
 > }}}

 But that's not true. No user edits the .spec file, it's configured trough
 bind control.

 > >  * The {{{BASE_LOCATION = BASE_LOCATION.replace("${datarootdir}",
 DATAROOTDIR).replace("${prefix}", PREFIX)}}} is little bit suspicious. Can
 the `BASE_LOCATION` really contain strings like '${prefix}'? Where do they
 come from?
 > Yes. In my environment, BASE_LOCATION is "${datarootdir}/bind10-devel",
 DATAROOTDIR is "${prefix}/share" and PREFIX is "/home/kambe/bind10", then
 ${datarootdir} in BASE_LOCATION is replaced with DATAROOTDIR and ${prefix}
 in DATAROOTDIR is replaced with "/home/kambe/bind10". That is,
 BASE_LOCATION contains ${datarootdir} and ${datarootdir} contains
 ${prefix}. Is that correct?

 Hmm, I see, it does so even here. OK, we have a strange build system.

 > >  * You seem to use {{{dict(key=value)}}} consistently trough the code.
 Is there any special reason about that? I think the use of
 {{{{'key':value}}}} is more common in the python world.
 > I don't have so special reason. The way of dict method is more intuitive
 for me because we don't need to bracket the key name with quotations. If
 it is the bad manner, I will correct through the all codes.

 I don't know, it probably isn't bad. Again, I was just surprised to see
 it, because I wouldn't write it that way myself, but it's OK if you like
 it this way more.

 > >  * Why is this limitation in place?:
 > >    {{{#!python
 > >    # The size of template file should be under 100k. TODO: should be
 > >    # considered later.
 > >    MAX_SIZE_OF_TEMPLATE = 102400
 > >    }}}
 > >    Allowing arbitrary size should be no problem, since the file is
 local one and the only one able to change it is the administrator. Why
 stop him? Reading the whole file by single call without the limitation
 should be no problem in python (the file object should have something like
 {{{readAll()}}} or similar).
 > I don't have so specific reason for it. But I cannot expect the
 situation which so huge file is read-in. For the unexpected situation, I
 set such limitation. In almost cases an administrator dosen't need to
 change this limitation. Do we need to remove it?

 Well, the limitation probably will not be hit, but there's no technical
 reason for this exact number. The code and hardware can surely handle much
 more. So it only makes the code little bit more complicated and there's no
 benefit for the user (it can, in theory, make him unhappy actually), so I
 don't see much reason to have the limitation there.

 > >  * If you want to raise the original exception:
 > >    {{{#!python
 > >    except select.error as err:
 > >        if err.args[0] == errno.EINTR:
 > >        (rfd, wfd, xfd) = ([], [], [])
 > >    else:
 > >        raise select.error(err)
 > >    }}}
 > >    You might want to use just raise without any parameters. That
 raises the exact instance of exception as it caught in the except,
 preserving all other information (like errno, which your way doesn't) and
 the backtrace is more helpful (it doesn't show the note about exception
 being raised while handling another exception).
 > OK. I will replace with another exception instead of the original
 exception(select.error).

 I'm not sure if we understand each other here. Was the original intention
 to catch the sellect.error only in the case it is EINTR, but let it fall
 when it's different errno? If so, then you don't need to replace any
 exception, the correct code would be:
 {{{#!python
     except select.error as err:
         if err.args[0] == errno.EINTR:
         (rfd, wfd, xfd) = ([], [], [])
     else:
         raise
 }}}

 Just `raise`, without any parameter. It will just re-raise the original,
 unmodified, exception.

 > >  * You use select around the httpd object and when a socket is
 readable, you call it. But, does it use non-blocking sockets? If there's a
 client that sends `GET url` and gets stuck before sending the rest of
 headers, will our program get stuck as well, trying to read until the end
 of the headers?
 > The HTTP server can handle only one request at a time. If the server is
 trying to read, it will be timed out and the client will be reset later. I
 think this interface is not so busy like a DNS server, because I think
 this feature is for administration use only. Is that correct?

 If you are aware of it and think it's not so big problem to complicate the
 code, then OK. But it would deserve a FIXME comment in the code at last,
 to note that the issue is known.

 > >  * You seem to use regexps in the tests when you really think
 substrings:
 > >  {{{#!python
 > >  self.assertRegexpMatches(handler.response.body,
 stats_httpd.XSD_NAMESPACE)
 > >  self.assertRegexpMatches(handler.response.body,
 stats_httpd.XSD_URL_PATH)
 > >  }}}
 > >  The paths aren't valid regexps, they are just string. Furthermore, if
 they contain regexp metacharacters, the test would break.
 > That's right. I'll use find method in string object for it. What about
 this?
 > {{{#!python
 > self.assertTrue(handler.response.body.find(stats_httpd.XSD_NAMESPACE)>0)
 > self.assertTrue(handler.response.body.find(stats_httpd.XSD_URL_PATH)>0)
 > }}}

 Yes, that seems better.

 > No, but I'll remove these assertions as I mentioned above.
 > >  * In `test_config`, you call it with various configs and check that
 it doesn't crash and it returns the correct answer. But shouldn't there be
 a check it did anything „internal“, like it sets any value inside?
 > OK. I will add tests to check internal values. What about this?
 > {{{#!python
 >   self.assertEqual(
 >       self.stats_httpd.config_handler(
 >                   dict(listen_on=[dict(address="::2",port=8000)])),
 >       isc.config.ccsession.create_answer(0))
 >   self.assertTrue(self.stats_httpd.config in "listen_on")
 >   self.assertTrue(self.stats_httpd.config["listen_on"] in "address")
 >   self.assertTrue(self.stats_httpd.config["listen_on"] in "port")
 >   self.assertTrue(self.stats_httpd.config["listen_on"]["address"] ==
 "::2")
 >   self.assertTrue(self.stats_httpd.config["listen_on"]["port"] == 8000)
 > }}}

 I don't really remember the exact internal layout, but yes, I mean
 something like this.

 > >  * What is tested in `test_no_buildpath`? How can the test fail, if
 there's no self.assert* call? The same with `test_osenv` (probably older
 code, but I saw it in the diff as well).
 > There aren't so specific reason. I intended to let it go through the
 following code in stats_http.py. It will fail if it has errors in syntax.
 Do we need to remove `test_no_buildpath`?

 No need to remove it, but a comment about it would help, it looks like
 part of the code might be missing or something.

 > >  * The get_value seems suspicious. It returns either some spec file
 default value or default value of a type. But is it able to return any
 actual configured value? Or why isn't it needed?
 > No, it is not able to return any actual value. The purpose of this
 get_value is only for getting the default value in the spec file, because
 this method is called only when stats_http is loading the spec file at
 first.  But the behavior of this method is unlike to the behavior of the
 original method in the original class. So what about switching the name of
 it to `get_default_value`?

 OK, seems reasonable.

 > >  * The mock select has some hidden state which can make it quite
 surprising, if someone tried to reuse it in some other tests or wrote
 another test here that used it. Shouldn't the variable be exported and
 explicitly set in the test using it?
 > I think these mock classes (select.py, socket.py, etc) are not reusable
 for other tests, the codes should be more coded if they need to be, and I
 think these tasks are out of scope of this ticket. Should they be
 reusable?

 They don't really need to be reusable, but again, a comment with warning
 about it might prevent someone from trying to reuse it and failing.

 OK to anything I left out from direct answer.

 Thanks

-- 
Ticket URL: <http://bind10.isc.org/ticket/547#comment:10>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development