BIND 10 #170: document how stats are collected (via spec files)
BIND 10 Development
do-not-reply at isc.org
Fri Jun 4 11:00:53 UTC 2010
#170: document how stats are collected (via spec files)
------------------------+---------------------------------------------------
Reporter: larissas | Owner: naokikambe
Type: task | Status: reviewing
Priority: major | Milestone: y2 6 month milestone
Component: statistics | Resolution:
Keywords: | Sensitive: 0
------------------------+---------------------------------------------------
Comment(by naokikambe):
First, I apologize for my poor English skills, but I reply to
Fujiwara's comment. His original comments is in bind10-dev list:
https://lists.isc.org/pipermail/bind10-dev/2010-June/000982.html
> |+------ Sending modules -----+
> || +------+ +------+ +------+ |
> || | Boss | | Auth | | etc. | | <- *1
> || +------+ +------+ +------+ |
> |+-----^--------^--------^----+
> | | | |
> | +--[CC protocol]--+
> | | <- *2
> | v
> | +--------------+
> | | Stats | <- *3
> | +--------------+
> | | <- *4
> | v
> | +-----------------+
> | | Cmd-Ctrld | <- *5
> | +-----------------+
> |
> |*1 Modules except Boss and Auth, which send stats data to stats module,
> | is not supported in initial version of stats module
>
> No, it has nothing to do with the statistics specification.
For example, statistics information related other modules, like xfrin
or xfrout, may be required in the near future release. But only Boss
and Auth modules may be supported in initial version.
>
> |== Procedure of stats module ==
> |=== Basic procedure ===
> | * Initial process:
> | 0. Boss starts stats daemon and other modules.
> | * Main process in loop:
>
> First, "statistics module" contacts config manager.
>
> Configuration changes and commands from bindctl are come from config
> manager.
That's right, so I already described about it following section in the
document.
>
> | 1. Stats starts to subscribe in stats channel.
> | 1. Other modules send stats data to stats module periodically.
> | 1. Stats module collects data and then aggregates it.
> | 1. When print_stats command is invoked via bindctl, stats daemon
> | reports formatted statistics data via bindctl.
> | * Final process:
> | X. When Boss is shutting down, stats module and other modules are
> | killed.
>
> |== Collecting items ==
> |Stats module collects following items from Boss and Auth.
> | * In general (for both modules)
> | * version -- A version number of this stats data definition
>
> version is not necessary in the protocol because the version number
> will be written in *.spec configuration file.
This item may be optional and not always required. I don't know
whether it is necessary or not now. But it may be needed to check the
format validation when the definition of the format will have changed
in the future release. If the version of the format mismatches between
the sender and receiver, data may be dropped by receiver.
> | * module -- A module name which sends the stats data
> | * process_id -- A process id of the module
>
> process_id is not used in another part of BIND 10.
> So, local name defined in msgq may be better.
I think process_id is not so useful item, but it may be simple
information for initial statistics features.
> | * processes -- A number of processes of the same module, if
> | multiple processes of the module are running
>
> then, "process_id" may be a list of processes.
"process_id" is not a list but a number of process id of a existent
process.
> | * send_time -- Milli-seconds of current time since epoch time
> | (1970-01-01T00:00:00Z)
>
> why milli second?
> I prefer unixtime + microsecond (struct timeval) format.
I adopted JSON schema for definition of the statistics data in BIND
10. It's described in the Internet-Draft
http://tools.ietf.org/html/draft-zyp-json-schema . "utc-millisec" is a
format described in it.
>
> What are "T" and "Z" characters?
> Text printable format is hard to parse.
"YYYY-MM-DDThh:mm:ssZ" is ISO-8601 format. "T" is a delimiter between
date and time, "Z" means UTC. Because of standard format, it may not
be so hard to parse.
>
> | * sequence -- A sequence number which must be unique and consistent
> | in the sending module
>
> Is this necessary?
I think it's necessary. Because of packet loss or something, if
sequence of data of the sender is wrong, the receiver can detect it is
wrong data.
>
> | * For Boss module
> | * boot_time -- A date time when BIND 10 starts up, format is
> | YYYY-MM-DDTHH:MM:SSZ
> | * For Auth module
> | * queries_in [[BR]]
> | * tcp -- A number of query counts per a process which Auth servers
receives in
> | TCP since it sends last
> | * udp -- A number of query counts per a process which Auth servers
receives in
> | UDP since it sends last
>
> CC module has good counters.
In current version of CC module, it doesn't seem to have counters.
But if yes, it is good items for statistics.
> |== Reporting items ==
> |Stats module reports following items via bindctl.
>
> This format will be generated by parsing each *.spec file.
May spec file for stats module include a template of output format of
stats data?
>
> | * Local name -- A localname, which is returned from msgq module in CC
protocol
>
> Local name is assigned for each process.
>
> | * Boot time -- A date time when BIND 10 process starts
> | * Reported time -- A date time when stats module reports
> | * Process id -- Process ids of all related modules
> | * Incoming Queries (TCP) -- A calculated query counts by stats module
> | * Incoming Queries (UDP) -- A calculated query counts by stats module
> |
> |This is an example of output image via bindctl.
> |{{{
> | ++ BIND 10 Statistics Report ++
> | Local name: 4bea7903_4 at host
> | Boot time: 2010-05-13T05:19:43Z
> | Report time: 2010-05-13T05:44:41Z
> | Process id(Boss): 777
> | Process id(Auth): 888
> | Process id(Stats): 999
> | Incoming Queries (TCP): 8888
> | Incoming Queries (UDP): 9999
> | ++ BIND 10 Statistics Report ++
> |}}}
>
> The output format requires another knowledge.
> The statistics module only knows input data format.
> Or we must define output data format definition.
>
> My idea was:
>
> * BIND 10 Statistics report
> Report time: ...
>
> bind10.LocalName: xxxx at localhost
> bind10.BootTime: ...
>
> auth.LocalName: xxx
> auth.queries.tcp: ...
> auth.queries.udp: ...
This is also good idea. I think output format via bndctl should be
human friendly and easy changeable for administrator, and should require
less another knowledge. We must decide the best format for
it. Property names printed via bindctl may be defined in somewhere.
>
> |== Available commands in bindctl ==
> |Two commands via bindctl are available in initial version of stats
> |module.
> | * "print_stats" command:
> | Stats module aggregates current numbers and prints the list of
> | them by using formatted text.
>
> print_stats command may have module name arguments.
> print_status without arguments show all statistics.
I think it may require no arguments of this command in initial version
of stats module. Because output data should be summarized by stats module.
But somebody wants to know stats data of a specific module for
debugging or something.
>
> | * "print_clear" command:
>
> typo. It is "clear_statistics".
Yes, "print_clear" command makes no sense. I'll change this name to
"clear_stats".
>
> | Stats module resets query counts to zero. If this command is
> | invoked, then at first 'Are you sure?' prompt to confirm it.
>
> The command may also have module name arguments.
This command clears only two query counters. It doesn't require
argument of module name in initial version.
>
> |== Backend DB for stats module ==
> |'''(TBD)'''
> |A specific DB, like sqlite3 or Berkeley DB, is not used in stats
> |module in initial version. It's assumed that stats module keeps
> |aggregated data in memory.
>
> It is not defined, I think.
It is simple. Stats module in initial version requires no specific
Backend DB. It may store data only in python variables. It may depend
on python runtime.
>
> |== Message format ==
> |'''Message format from Boss module to Stats:'''
> |{{{
> |#!js
> |{
> | "stats_data":
> | {
> | "General":
> | {
> | "version": "1.0",
>
> It is not necessary.
>
> | "module": "Auth",
>
> The parameter may be included in the data itself.
It's my mistake. "Boss" is correct. This item is required because
stats module don't know which module sends data.
>
> | "process_id": 777,
>
> The localname in the envelope is sufficient.
>
> | "send_time": "2010-05-13T05:40:41Z",
>
> It may be included in the data itself.
I think the time when stats data departs from the module may
required. Because only sender module knows it.
>
> | "sequence": 2345,
>
> It is not necessary.
>
> | },
> | "Boss":
>
> Is it a mistake? Is it "Auth:"?
> But It contains module name.
This is correct, this property is equivalent to the above module
name. Stats module reads "module name" item above and then it follows
here. Otherwise, it doesn't know which module stats data comes
from. If lack of this item, it's too difficult to express data schema
of stats data, so it may be too difficult for stats module to validate
coming data.
> Then "General" section is not necessary.
Yes, "General" section may be disable and items in it may be moved to
upper level.
>
> |'''Message format from Auth module to Stats:'''
> |{{{
> |#!js
> |{
> | "stats_data":
> | {
> | "General":
> | {
> | "version": "1.0",
> | "module": "Auth",
> | "process_id": 888,
> | "processes": 2,
> | "send_time": "2010-05-13T05:40:41Z",
> | "sequence": 2345,
> | },
> | "Auth":
> | {
> | "queries_in":
> | {
> | "tcp" : 123,
> | "udp": 4567
> | }
> | }
> | }
> |}
> |}}}
>
> To simplify the format, I propose a new data format.
> Add a "timestamp" as a unixtime in the outermost.
>>From or Local name is obtained from envelope.
> Module name is "Auth" which exists in the data.
I think at least items such as "module", "process_id", "send_time",
and "sequence" may be required. I mentioned the reason above.
>
> {
> "timestamp": unixtime,
> | "Auth":
> | {
> | "queries_in":
> | {
> | "tcp" : 123,
> | "udp": 4567
> | }
> | }
> }
>
> |== Data schema ==
> |A schema which defines above massage formats, filename of which is
configured in
> |spec file for stats module.[[BR]]
> |'''stats_data_schema.spec:'''
> |{{{
> |#!js
> |{
> | "stats_data":
> | {
> | "description": "A schema for BIND 10 stats data definitions \
> | using JSON schema syntax (http://json-
schema.org/)",
> | "type": "object",
> ~~~~~~~~
> I prefer "dict".
"object" is a type defined in JSON schema. See the internet-draft
above.
> The "stats_data" schema is defined for each module.
> I prefer it will be written in *.spec file.
>
> I prefer new spec file format will be:
> { "module_spec": { ... },
> "commands": { ... },
> "stats_spec": { ... }
> }
>
> The "stats_spec" format should be compatible with "module_spec" and
> "commands" format.
I'm afraid that spec file is very long if it also contains all stats
data definitions in the spec file. It depends on the implementation of
config manager, so config manager may need to parse this new item.
Besides, if spec files for each module contain stats data schema, it
may become very verbose because each module share the schema and same
schema may be written in spec file for each module.
--
Ticket URL: <http://bind10.isc.org/ticket/170#comment:6>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list