BIND 10 #2228: research requirements for DB-based data source performance
BIND 10 Development
do-not-reply at isc.org
Thu Nov 8 18:24:06 UTC 2012
#2228: research requirements for DB-based data source performance
-------------------------------------+-------------------------------------
Reporter: | Owner: larissas
jinmei | Status: assigned
Type: task | Milestone:
Priority: | Resolution:
medium | Sensitive: 0
Component: | Sub-Project: DNS
Unclassified | Estimated Difficulty: 0
Keywords: | Total Hours: 0
Defect Severity: N/A |
Feature Depending on Ticket: |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:1 shane]:
> Just to be clear, we are not building a custom solution for a single
user, so there ARE no performance requirements, in the sense that "we must
serve X queries per second on hardware Y" or "we must handle N zones of
size M".
Yes, I understand that. But knowing real examples are still very
useful. If we have a sufficient number of such examples, we'll be
able to figure out the sensible target for a general purpose
implementation. Even if we have only one or a few examples, we can at
least discuss it based on some real requirement, and consider whether
it can be reasonable data point for the general purpose. That's far
from just guessing an arbitrary number (or even skipping that) and
trying random optimization ideas.
Anyway, thanks for the pointers. These are exactly what I wanted to
see.
> Our main competition in this space is PowerDNS, which has successfully
worked into DNS hosting markets in Europe.
>
> According to this presentation, PowerDNS gets 46k queries per second
with 10 million domains:
>
> http://www.sanog.org/resources/sanog14/sanog14-devdas-dns-
scalability.pdf
This looks quite useful, but some important points are missing,
especially about the benchmark.
- the query pattern for the test data
- which DB backend was used in the case of PowerDNS
- regarding "PowerDNS, RBT", whether it used any DB backend at all, or
whether it's the in-memory data with hot spot cache. I suspect it
was the latter.
In my rough (and quite possibly incorrect) understanding, the response
performance of PowerDNS largely relies on its cache, especially its
full packet (response) cache. So, if the above query pattern results
in a high query rate, it's not surprising that it has seemingly very
good performance. And, if that's actually based on the pattern of
their live queries, it means such caching architecture completely
makes sense; on the other hand, if it's just crafted for the
experiments, the results are not that informative if not misleading.
That's one of the things we still need to know (from someone).
I'd also note that if the startup time is the main issue, the
imaginary full mmap version of our in-memory data source could also be
a solution.
Another thing I wonder is what "user defined / hardcoded schema" means
(like in p.13 about MyDNS or in p.7 mentioning "user defined
queries"). In my understanding PowerDNS also defines specific schema
for zone data. Did they want to define a different DB schema for
their zone data and configure the DNS server to send queries based on
it? Is that possible with PowerDNS (or BIND 9 DLZ)?
> We might not need that level of performance, but the user has seen DoS
attacks of 10k queries per second, so we *do* need at least that level of
performance.
Sure, this part is probably the most informative bit in the slides (it
also suggests in their normal operation the expected qps is much lower
than that). I'd note, however, smart attackers would choose query
patterns so the internal cache isn't very effective, so we need to
think about achieving that level of performance without benefiting
from things like a full packet cache.
> Note that this does not match my previous research, which was from
around the same time (at my previous job):
>
> https://lists.dns-oarc.net/pipermail/dns-
operations/2009-February/003556.html
I think your numbers actually match the other slides to some extent.
I suspect the latter mixed apples and oranges and pineapples and can
be misleading (I don't think that was intentional, though). I suspect
the data in the slides that most match your experiments is the case of
"PowerDNS, hash as cache, 3M domains". In that case the cache was
probably not that effective, so the bottleneck of DB queries should
affect the overall performance.
(Your email message to dns-oarc also seems mix different fruits, but
that's pretty obvious from the context so it's not that misleading:-)
> In any case, 2k queries/second is not enough.
If you're referring to my previous experiments:
https://lists.isc.org/pipermail/bind10-dev/2012-October/003866.html
I'd note that it only used a single core, and it was on my laptop (not
that poor in terms of CPU, but generally far from ideal for production
server operations). I can also think of other types of optimizations
that wouldn't rely on cache that depends on a specific query patterns,
so assuming we do it and run it on a reasonable high performance
machine with several CPU cores, I guess it's not that difficult to
reach the 10K-ish qps performance.
--
Ticket URL: <http://bind10.isc.org/ticket/2228#comment:3>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list