[bind10-dev] comments on the statistics design

Fri Jul 27 05:06:09 UTC 2012

Hello,

Thank you for the comment. I answer as much as possible.

> - should each module really do validation on the statistics data it's
>   sending?  It doesn't seem to make much sense to me; normally the
>   sender should know what it's sending is valid (otherwise it's a
>   bug), and in any case the receiver needs to validate it.
>   It could be even be a non negligible overhead for a busy module like
>   b10-auth.

I understand that overhead. Also the stats module can log if data are
invalid. I will revise the description about that obligation of the
target module. e.g. the target module can check validation of
statistics data by such a function as validate_statistics.

> - Regarding how to identify "how many...", I don't think we can always
>   rely on 'special'.  Not all modules would have it.  Maybe beyond the
>   scope of this discussion, but my gut feeling is that we should solve
>   this type of thing as part of more fundamental restructuring of
>   inter-module relationships (how to know which module and how many
>   instances of it is running, when something stops/starts, etc)

Yes, but I couldn't imagine a way for the stats module to count
instances better than this. Could anyone know the best way about that?

> - what should happen if stats dies and restarts?  e.g. Is it okay to
>   drop all statistics collected by then?

The stats module drops all collected statistics after the restart in
this design. I think it's not so difficult for the stats module to
store that into a permanent file or some database.  If a whole data
structure is changed while the stats module is dead, such a file or DB
might make no sense.

> - what should other modules do if stats doesn't send requests or isn't
>   even running at all?  Should they keep maintaining local statistics
>   anyway?  What if the amount of the statistics is huge (like the case
>   of per-zone statistics with billions of zones)?

IMO the target module don't have to mind whether the stats module is
running. If the 'getstats' command is invoked, the target module
should reply to it. However, I think we should consider memory usage
of the auth module somewhere when it has a huge amount of data for
keeping statistics data. e.g. the target module can discard statistics
data once the module sent it for preventing it.

> - is each module expected to reset their statistics to 0 (if that's
>   resettable) every time it responds to a request from the stats?  I
>   guess so because otherwise the accumulated data at stats won't make
>   sense, but it's not clear from the document.  What if it's not
>   "resettable" (such as number of RRs of some zone/cache, etc,
>   assuming we consider it "statistics" and want to maintain it)?

To be honest, I think the stats module just keeps a copy of statistics
data in each module. If the target module say "the new value is 100",
the stats replaces the old value with the new value.  I think the
stats module might make no sense if a administrator can invoke the
'getstats' command of each module via bindctl.

> - keeping statistics even if some (instance of) module stops would
>   generally make sense, but I'm not sure if it's always the case.
>   If someone completely stops authoritative service once and then
>   restarts it with a completely new set of config weeks later, it
>   would rather be desirable if we start from fresh statistics.

In that case, the administrator might have to restart the stats module
too. The old statistics data must be discarded.

> - does that make sense? "If the target module caught an error when
>   returning, the module should return (1, 'error message'), the
>   return code is 1 and the error message is included detail of
>   the error. Error message is text type."  If responding to a stat
>   request fails, it's also pretty likely that sending an error message
>   fails too.

If the target module can reply via CC-session and if it cannot reply
statistics data, it can include the reason in the error response. But
I think such case is very rare. We might not have to consider the case.

> - related to the previous point, what should the sending module do
>   about the statistics if sending statistics fails?  Keep it until the
>   next request?  Clear it?  Like asked above, what if the amount could
>   be huge?

In this document, it should keep statistics data which it fails to
send. And when the module asks next time, it should reply again.

> - I guess we need to revisit the representation of statistic
>   (counters), especially in terms of the spec.   Things like per
>   RR-type counter aren't easily represented in this style.

I think JSON format isn't basically friendly to represent varying
statistics data. :(

> - Now that we switch to the "request model", I think it also makes
>   sense to consider more synchronized update upon user request (via
>   bindctl or http).

Do you mean a new command for the stats module to collect statistics
data is needed? e.g a "collect-immediate" command. That makes sense
for me if so. But if a administrator sets 'poll-interval' to 1,
statistics data would be refreshed after a second. But I didn't make
such a command because of preventing a high load in system.

In case of the multi-process model and the inter-communication
protocol, It is difficult for me to design statistics data collection
and to keep consistency of that, I think. :(

Thanks,

Naoki Kambe