Possible timing problems with cache cleaning and forwarding queries?

Wed Jun 21 19:57:01 UTC 2000

    I've been trying to track down a problem I've been seeing lately on one
    customer's site.  The code is 8.1.2, but I didn't see any changes for
    8.2.3 in the area where my questions lie.

    Basically every so often the nameserver does down with a panic caused
    from the check in DRCNTDEC, where the databuf rcnt is 0 and so it tries
    to decrement it and panics because it's already 0.

    I think I might have found a situation that could be causing the problem.

    What happens if:

        1) We have a query scheduled to retry.
        2) We go through clean_cache.
        3) the retry timer fires and we go into retry()
        4) We've tried to query the remote nameserver 3 times, so we
           go to remove the query.
        5) We go qremove()->ns_freeqry()->ns_freeqns().

    But the nsdata databuf's d_rcnt was decremented and the databuf marked
    DB_F_FREE when clean_cache saw the record expired?

    Perhaps I'm jumping to conclusions, so let me add exactly what I know.
    I know the panic was in the DRCNTDEC() when decrementing the d_rcnt of
    the nsdata databuf in ns_freeqns().  I know the record expired about 30
    seconds previously.  I know the record had the DB_F_FREE flag set.  So
    Something marked it free.  The only thing I can think of is clean_cache.

    Anyone have any other suggestions, or know what I might be able to do
    to further debug this [I have a dump of the process at the time of the
    panic]?

                                                            -Jeff