Possible timing problems with cache cleaning and forwarding queries?
Jeff Schreiber
schreiber at process.com
Wed Jun 21 19:57:01 UTC 2000
I've been trying to track down a problem I've been seeing lately on one
customer's site. The code is 8.1.2, but I didn't see any changes for
8.2.3 in the area where my questions lie.
Basically every so often the nameserver does down with a panic caused
from the check in DRCNTDEC, where the databuf rcnt is 0 and so it tries
to decrement it and panics because it's already 0.
I think I might have found a situation that could be causing the problem.
What happens if:
1) We have a query scheduled to retry.
2) We go through clean_cache.
3) the retry timer fires and we go into retry()
4) We've tried to query the remote nameserver 3 times, so we
go to remove the query.
5) We go qremove()->ns_freeqry()->ns_freeqns().
But the nsdata databuf's d_rcnt was decremented and the databuf marked
DB_F_FREE when clean_cache saw the record expired?
Perhaps I'm jumping to conclusions, so let me add exactly what I know.
I know the panic was in the DRCNTDEC() when decrementing the d_rcnt of
the nsdata databuf in ns_freeqns(). I know the record expired about 30
seconds previously. I know the record had the DB_F_FREE flag set. So
Something marked it free. The only thing I can think of is clean_cache.
Anyone have any other suggestions, or know what I might be able to do
to further debug this [I have a dump of the process at the time of the
panic]?
-Jeff
More information about the bind-workers
mailing list