[bind10-dev] comments on the statistics design
JINMEI Tatuya / 神明達哉
jinmei at isc.org
Fri Jul 27 00:57:55 UTC 2012
I've quickly read the revised version of statistics design:
http://bind10.isc.org/wiki/StatsModule
Some random comments (absolutely not comprehensive):
- should each module really do validation on the statistics data it's
sending? It doesn't seem to make much sense to me; normally the
sender should know what it's sending is valid (otherwise it's a
bug), and in any case the receiver needs to validate it.
It could be even be a non negligible overhead for a busy module like
b10-auth.
- Regarding how to identify "how many...", I don't think we can always
rely on 'special'. Not all modules would have it. Maybe beyond the
scope of this discussion, but my gut feeling is that we should solve
this type of thing as part of more fundamental restructuring of
inter-module relationships (how to know which module and how many
instances of it is running, when something stops/starts, etc)
- what should happen if stats dies and restarts? e.g. Is it okay to
drop all statistics collected by then?
- what should other modules do if stats doesn't send requests or isn't
even running at all? Should they keep maintaining local statistics
anyway? What if the amount of the statistics is huge (like the case
of per-zone statistics with billions of zones)?
- is each module expected to reset their statistics to 0 (if that's
resettable) every time it responds to a request from the stats? I
guess so because otherwise the accumulated data at stats won't make
sense, but it's not clear from the document. What if it's not
"resettable" (such as number of RRs of some zone/cache, etc,
assuming we consider it "statistics" and want to maintain it)?
- keeping statistics even if some (instance of) module stops would
generally make sense, but I'm not sure if it's always the case.
If someone completely stops authoritative service once and then
restarts it with a completely new set of config weeks later, it
would rather be desirable if we start from fresh statistics.
- does that make sense? "If the target module caught an error when
returning, the module should return (1, 'error message'), the
return code is 1 and the error message is included detail of
the error. Error message is text type." If responding to a stat
request fails, it's also pretty likely that sending an error message
fails too.
- related to the previous point, what should the sending module do
about the statistics if sending statistics fails? Keep it until the
next request? Clear it? Like asked above, what if the amount could
be huge?
- I guess we need to revisit the representation of statistic
(counters), especially in terms of the spec. Things like per
RR-type counter aren't easily represented in this style.
- Now that we switch to the "request model", I think it also makes
sense to consider more synchronized update upon user request (via
bindctl or http).
---
JINMEI, Tatuya
More information about the bind10-dev
mailing list