Latest BIND: Error "rpz_rewrite_name: mismatched summary data; continuing"

Sat Apr 27 19:17:07 UTC 2019

>> I think there must be something wrong with the log message.  It
>> seems excessive to log this message about once per query,
>> especially since it seems to (misleadingly?) indicate an error
>> condition?  I'm not intimate enough with the code to suggest what
>> the exact problem is, though.
>>
>> And ... as stated, configuring without --enable-querytrace
>> removes the log message.
>
> I can't speak for ISC BIND 9 (you seem to have mailed my old email
> address)[...]

That was the one sitting in the old bind-users@ messages...

> [...] but --enable-querytrace was not meant to be used in
> production. One should expect to observe excessive logging when it is
> configured so, because that's what the configure argument implies
> ("perform abundant logging about query processing even if it hurts
> performance").

OK, that's now clear to me.  For some reason I had mixed up the
potential for "query logging" with "query tracing".

> To summarize, the log message is that the code has observed something
> unexpected (the carpet has been pulled from under it) but it copes with
> it. Perhaps the logging call can be removed from the code.
>
> In more detail, BIND uses a per-view RPZ summary data structure from the
> contents of response policy zones configured within that view. The
> summary datastructure is a singular combination of the contents of all
> the RPZ zones from that view, so that queried names can be efficiently
> matched against RPZ triggers in the query path. The summary
> datastructure is built and updated when the RPZ zones it is dependent on
> are updated. The logged message complains that the RPZ summary
> datastructure is out-of-sync with the RPZ zones, and so named was not
> able to lookup the action for a trigger from the RPZ zone it was looking
> at. In this case, named tries to continue from the next matching zone in
> precedence.

I've now replicated the setup on my local workstation, and I can confirm
that the "rpz_rewrite_name: mismatched summary data; continuing" log
message is happening at least once per received query (confirmed by
temporarily also turning on query logging).

There's no race here (the rpz zone hasn't been updated today).  So
either the code doesn't actually work the way it's supposed to, and this
is a valid error condition, or the error message itself is misleading
and the condition which triggers the logging is "normal".

Reading what you say above leads me to beleive that there is an internal
inconsistency in the RPZ datastructures -- that the "summary data
structure" says one thing about the queried-for name, while an actual
lookup in the RPZ zones gives a different result, and therefore that the
log message is in some sense "valid" and points out this inconsistency.

I've addd some logging here and there, and have verified that both
rpz_get_zbits() and dns_rpz_find_name() both indicates that they've
found a match(?) (zbits != 0), while looking up the name with
rpz_find_p() returns DNS_R_NXDOMAIN, which I think indicates "no rpz
handling for this queried-for name".  It would thus seem to me that the
intended optimization(?) of at least dns_rpzz_find_name() isn't actually
working at all, and when RPZ processing is turned on, all lookups end up
going through rpz_find_p().

Debugging a bit further reveals that in my setup, dns_rbt_findnode()
inside of dns_rpz_find_name() always returns DNS_R_PARTIALMATCH.

The entries in our single rpz zone file all follow the following
pattern:

<blocked-domain>.<rpz-zone-name>. CNAME <some-local-host>.
*.<blocked-domain>.<rpz-zone-name>. CNAME <some-local-host>.

if that makes any difference.

Regards,

- Håvard