BIND 10 #1028: Large memory footprint for b10-xfrin

Thu Oct 27 07:23:48 UTC 2011

#1028: Large memory footprint for b10-xfrin
-------------------------------------+-------------------------------------
                   Reporter:  shane  |                 Owner:  jelte
                       Type:         |                Status:  reviewing
  defect                             |             Milestone:
                   Priority:  major  |  Sprint-20111108
                  Component:  xfrin  |            Resolution:
                   Keywords:         |             Sensitive:  0
            Defect Severity:         |           Sub-Project:  DNS
  Medium                             |  Estimated Difficulty:  9
Feature Depending on Ticket:         |           Total Hours:  0
        Add Hours to Ticket:         |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:12 jinmei]:

 > > Code looks good. I did however find that when I repeatedly send
 retransfer commands, it still looks like b10_ixfr keeps growing in size...
 >
 > You mean b10-xfrin?  Hmm.

 Okay, I believe I've found other leaks.

 First one is in DataSourceClient.get_updater().  See commit
 65bd895.  The fix is trivial, although it was difficult to figure it
 out because it was indirect from the visible symptom.

 The other one is a circular (self) reference within XfrinConnection,
 which is fixed in commit 1fc79b9.

 Both of these somehow prevent XfrinConnection from being released
 (I was not really sure how exactly it happened though - simply because
 having a self reference or composing an object with a non zero
 reference doesn't always seem to cause leak.  It may be specific to
 threaded cases).

 And, while fixing the second leak, I've noticed there are other
 (though less likely to happen) possibilities of similar leak in
 process_xfrin.  So I also fixed it in commit 738b11d.  This also
 addresses some part of the concern described in #1292 (with this
 fix it will be at least logged, and the session "lock" will be
 released - although it still doesn't help much for #1292 because xfr
 won't succeed anyway unless the fundamental issue of dlopen is
 solved).

 Finally, I made a small, unrelated cleanup: commit 1e9bb55.

 I've been running the fixed code while repeating retransfer, and
 I don't see significant growth of memory.  Actually, I've still seen a
 gradual increase of memory footprint - right now I'm not sure if
 there's still leak or it's system level leak such as the one due to
 memory fragmentation.  But even if it's real remaining leak in our
 code, I believe the current set of fixes is worth merging.

 Another question, related to commit 65bd895 but not related to the
 main topic of this ticket:  I've moved Py_INCREF in
 createZoneUpdaterObject() inside the first if block; the reference
 seems to leak otherwise if tp_alloc fails and returns NULL.  If I'm
 correct here, we'll need the same change to createZoneIteratorObject()
 and createZoneFinderObject(), but I've not touched them because
 they are not really relevant to the topic of the ticket (and this
 failure mode would be unlikely to happen in practice).  Also, is there
 a valid case where base_obj is NULL?  If not, we should probably
 rather throw an exception, or maybe we could pass base_obj by
 reference if it can never be NULL.

 This is the updated changelog entry:
 {{{
 305.?   [bug]           jinmei
         Python isc.dns, isc.datasrc, xfrin, xfrout: fixed reference leak
         in Message.get_question(), Message.get_section(),
         RRset.get_rdata(), and DataSourceClient.get_updater().
         The leak caused severe memory leak in b10-xfrin, and (although no
         one reported it) should have caused less visible leak in
         b10-xfrout.  b10-xfrin had its own leak, which was also fixed.
         (Trac #1028, git TBD)
 }}}

-- 
Ticket URL: <http://bind10.isc.org/ticket/1028#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development