approach on parsing the query-log file

Wed Apr 29 02:35:14 UTC 2009

After feedback and running some tests today I've found that the most
"cost-effective" approach as far as performance goes is to use the
native querylog and rotate it often enough to have as "live" data as
possible.

Some quick notes (all tests done with perl):
- Parse the querylog 500 000k queries: 3 seconds
- Parse tcpdump while running 1 million queries: 300k picked up the
rest lost due to too high CPU load

I haven't tried to pipe querylog through stderr but it feels like that
could look a bit ugly running something that os more layered is
favored.

At this point I'll have to make the sacrifice of having real-time
data, parsing the querylog is the most efficient way as I see it based
on my tests.

Thanks for all the feedback on this, I'll publish my code once I'm finished.

/Jonathan

On Tue, Apr 28, 2009 at 5:24 PM, Scott Haneda <talklists at newgeo.com> wrote:
> I have read the other posts here, and it looks like you are setting on tail,
> or a pipe, but that log rotation is causing you headaches.
>
> I have had to deal with things like this in the past, and took a different
> approach.  Here are some ideas to think about.
>
> Since you mentioned below you wanted this in real time, and that parsing an
> old log file is out, what about setting up a second log in named, of the
> same data, but do not rotate the log at all?
>
> This gives you a log that you can run tail on.  It probably is going to grow
> too large.  I solved this for a different server in the past, by telling the
> log that was a clone to be be limited in size.  In this way, it was not
> rolled out, but rather, truncated.
>
> I am not sure how named would do this.  If it will not truncate it, you can
> write a small script to do it for you.  Now that you have a log that is
> maintained at a fixed size that is manageable, you can do your tail business
> on it.
>
> I also seem to remember, tail has some flags that may help you with dealing
> with the log ration issues.  I only remember them vaguely, as they were not
> applicable to what I was doing at the time.
>
> Hope this helps some.
>
> On Apr 27, 2009, at 10:26 PM, Jonathan Petersson wrote:
>
>> Hi all,
>>
>> I'm thinking of writing a quick tool to archive the query-log in a
>> database to allow for easier reports.
>>
>> The obvious question that occurs is; What would be what's the best
>> approach to do this?
>>
>> Running scripts that parses through the query-log would cause locking
>> essentially killing BIND on a heavy loaded server and only parsing
>> archived files wouldn't allow real-time information, also re-parsing
>> the same set of data over and over again until the log has rotated
>> would cause unnecessary I/O load. I'm guessing the best would be to
>> have BIND write directly to a script that dumps the data where-ever it
>> makes sense to.
>>
>> I've used BIND statistics and found it highly useful but then again it
>> doesn't allow me to make breakdowns based on host/query.
>>
>> If anyone has done something like this or having pointers on how this
>> could achieved any information is welcome!
>
> --
> Scott * If you contact me off list replace talklists@ with scott@ *
>
>