BIND 10 #2853: Python wrapper of data source extensions

Thu Jun 13 17:28:37 UTC 2013

#2853: Python wrapper of data source extensions
-------------------------------------+-------------------------------------
            Reporter:  jinmei        |                        Owner:
                Type:  task          |  vorner
            Priority:  medium        |                       Status:
           Component:  data source   |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130625
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  5             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |  shared memory data source
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------

Comment (by jinmei):

 (Sorry for the frequent intrusion.  I've been making additional notes
 as I found things related to this branch while working on #2854, but
 now I'm done with #2854 it won't happen anymore)

 Replying to [comment:26 muks]:

 > > Regarding the test: if you meant how to test the case of unused state,
 > > you should simply use a data source with disabling in-memory cache.
 >
 > With `MasterFiles`, it expects the cache to be always enabled. With
 > other names, it looks for a loadable `.so` module of that name.

 I don't see why it's a problem, or I still didn't understand what you
 mean.  Some tests for the python datasrc modules actually use sqlite3
 data sources.

 > > Regarding the code, I personally think it's better to check the state
 > > explicitly and use Py_None if it's SEGMENT_UNUSED.
 >
 > I wanted to avoid doing this in the Python wrapper, because we are
 > introducing this `SEGMENT_UNUSED` checking behavior in two places (one
 > in the C++ code and another in the Python wrapper). Is there something
 > wrong with how we do it in the current Python wrapper?

 There's nothing absolutely "wrong", but it doesn't seem to be better
 either.

 In general, I'd avoid using C++ exceptions in this type condition,
 i.e., checking a direct result of a single function call.  C++
 exceptions are generally more expensive than a simple if block
 (although in this specific case performance wouldn't matter much), and
 the code is more complicated - we at least need a pair of try and
 catch; and the exception isn't expected to propagated from a deeper
 level of call stack, so we are not saving any intermediate code in
 this case.

 Also, in my sense C++ exceptions are for really unexpected events.
 The original C++ `getSegmentType()` method shouldn't really called for
 with an UNUSED state, and the exception is thrown to indicate the
 programming error.  This Python wrapper actually uses this mechanism
 as a test if the state is UNUSED, with the understanding it's a
 possible case.  Such "abuse" of C++ exception makes me confused
 (because if I see try-catch, this should be something related to error
 handling or some other unexpected event).

 This is also related to your reasoning of avoiding the double check
 for "UNUSED".  As a general matter, I agree; so if we were
 implementing a Python wrapper of `getSegmentType()` itself, I'd
 definitely leave it to the actual C++ code, rather than checking the
 state within the python wrapper.  But I don't think it applies to the
 `getStatus` case; it actually does the check for the state, simply
 with a different tool (i.e., exception, and in my sense an abuse of
 it) instead of the straightforward `if`.

 Further, the `InvalidOperation` exception is quite generic, and,
 although it's unlikely such a simple method as getSegmentType() would
 throw it for a different reason, it might still be possible in a
 future extension.  If and when that happens this implementation will
 become buggy.  The explicit check on the state is more robust against
 such changes.

 That said...I admit these may be subtle points and can be a matter of
 taste.  And, especially because I'm not the official reviewer of the
 branch, I'd leave it you and Michal.

 > > And, not directly on this point, but on a closer look the code looks
 > > generally unsafe.  For example, in `ConfigurableClientList_getStatus`
 > > if this fails:
 > >
 > > {{{#!cpp
 > >             PyObject* tup = Py_BuildValue("(sOI)",
 > >                                           status[i].getName().c_str(),
 > >                                           segment_type,
 > >
 status[i].getSegmentState());
 > > }}}
 > >
 > > then the reference to segment_type will be lost.  You'll need a help
 > > of `PyObjectContainer` here.
 >
 > By "fails", do you mean it throws an exception that causes
 > control flow to exit that block at that location? `Py_BuildValue()` is a
 > C function. If there is a failure within it, it will return `NULL`. In

 No, I understand it.

 > this case, we first drop our reference on `segment_type` before
 > returning a failure. `status[i]`, `getName()` and `getSegmentState()`
 > should not throw in the way our code is written.

 On looking at it again, I realized `segment_type` won't leak.  So,
 after a very careful read I can conclude there's no safety issue in
 the code.  I'd still suggest revising it with `PyObjectContainer`,
 though, because the safety isn't obvious and is (IMO) indeed fragile:

 - One confusing point is that this block works for both normal and
   failure cases:
 {{{#!cpp
             if (segment_type) {
                 // The Py_BuildValue() above increments its refcount,
                 // so we drop our reference.
                 Py_DECREF(segment_type);
             }
 }}}
   But from a quick look it seems it only intends to handle the
   successful case, so I need to make an extra effort to be sure about
   the safety.
 - Failure of this isn't handled immediately:
 {{{#!cpp
                 segment_type = Py_BuildValue(
                     "s", status[i].getSegmentType().c_str());
 }}}
   Again, with a careful read you can understand it's still safe
   because this will make the next Py_BuildValue fails, and the `if`
   block is skipped as `segment_type` should be NULL in this case,
   and the error will be returned here:
 {{{#!cpp
             if (!tup) {
                 Py_DECREF(slist);
                 return (NULL);
             }
 }}}
   But deferring the error handling is itself not a good practice, and
   even if it's safe in the end, you'll need to use an extra effort to
   be sure about it.  It's also more fragile to future extensions.
   On the other hand, we could catch the failure of building
   segment_type immediately with an additional `if-else`.  There's
   nothing obviously wrong here, too, but this way the entire code will
   be full of this type of error handling, making it more unreadable.

 But, like the case of the previous point, I won't insist.  I'd leave
 it to you two, too.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2853#comment:29>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development