BIND 10 #1028: Large memory footprint for b10-xfrin
BIND 10 Development
do-not-reply at isc.org
Thu Oct 27 07:23:48 UTC 2011
#1028: Large memory footprint for b10-xfrin
-------------------------------------+-------------------------------------
Reporter: shane | Owner: jelte
Type: | Status: reviewing
defect | Milestone:
Priority: major | Sprint-20111108
Component: xfrin | Resolution:
Keywords: | Sensitive: 0
Defect Severity: | Sub-Project: DNS
Medium | Estimated Difficulty: 9
Feature Depending on Ticket: | Total Hours: 0
Add Hours to Ticket: |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:12 jinmei]:
> > Code looks good. I did however find that when I repeatedly send
retransfer commands, it still looks like b10_ixfr keeps growing in size...
>
> You mean b10-xfrin? Hmm.
Okay, I believe I've found other leaks.
First one is in DataSourceClient.get_updater(). See commit
65bd895. The fix is trivial, although it was difficult to figure it
out because it was indirect from the visible symptom.
The other one is a circular (self) reference within XfrinConnection,
which is fixed in commit 1fc79b9.
Both of these somehow prevent XfrinConnection from being released
(I was not really sure how exactly it happened though - simply because
having a self reference or composing an object with a non zero
reference doesn't always seem to cause leak. It may be specific to
threaded cases).
And, while fixing the second leak, I've noticed there are other
(though less likely to happen) possibilities of similar leak in
process_xfrin. So I also fixed it in commit 738b11d. This also
addresses some part of the concern described in #1292 (with this
fix it will be at least logged, and the session "lock" will be
released - although it still doesn't help much for #1292 because xfr
won't succeed anyway unless the fundamental issue of dlopen is
solved).
Finally, I made a small, unrelated cleanup: commit 1e9bb55.
I've been running the fixed code while repeating retransfer, and
I don't see significant growth of memory. Actually, I've still seen a
gradual increase of memory footprint - right now I'm not sure if
there's still leak or it's system level leak such as the one due to
memory fragmentation. But even if it's real remaining leak in our
code, I believe the current set of fixes is worth merging.
Another question, related to commit 65bd895 but not related to the
main topic of this ticket: I've moved Py_INCREF in
createZoneUpdaterObject() inside the first if block; the reference
seems to leak otherwise if tp_alloc fails and returns NULL. If I'm
correct here, we'll need the same change to createZoneIteratorObject()
and createZoneFinderObject(), but I've not touched them because
they are not really relevant to the topic of the ticket (and this
failure mode would be unlikely to happen in practice). Also, is there
a valid case where base_obj is NULL? If not, we should probably
rather throw an exception, or maybe we could pass base_obj by
reference if it can never be NULL.
This is the updated changelog entry:
{{{
305.? [bug] jinmei
Python isc.dns, isc.datasrc, xfrin, xfrout: fixed reference leak
in Message.get_question(), Message.get_section(),
RRset.get_rdata(), and DataSourceClient.get_updater().
The leak caused severe memory leak in b10-xfrin, and (although no
one reported it) should have caused less visible leak in
b10-xfrout. b10-xfrin had its own leak, which was also fixed.
(Trac #1028, git TBD)
}}}
--
Ticket URL: <http://bind10.isc.org/ticket/1028#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list