BIND 10 #363: "H" and "I" are probably harmful for PyArg_ParseTuple()

Wed Mar 16 06:12:00 UTC 2011

#363: "H" and "I" are probably harmful for PyArg_ParseTuple()
-------------------------------------+-------------------------------------
                 Reporter:  jinmei   |                Owner:  jinmei
                     Type:  defect   |               Status:  reviewing
                 Priority:  minor    |            Milestone:  A-Team-
                Component:           |  Sprint-20110316
  DNSPacket API                      |           Resolution:
                 Keywords:           |            Sensitive:  0
Estimated Number of Hours:  0.0      |  Add Hours to Ticket:  0
                Billable?:  1        |          Total Hours:  0
                Internal?:  0        |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:9 zzchen_pku]:

 > > It didn't pass a test for me:

 > For my 32bit OS, the range of interger is (-0x80000000, 0x7fffffff),so
 the test will throw TypeError for 0x80000000. Since the unittest is
 platform dependent, I don't think it make sense to keep it.

 You mean "since the unittest shouldn't be platform dependent"?  Anyway
 I see the point.  We could still test the overflow cases selectively
 (with a helper routine to identifying the size of int, long, etc), but
 that's probably overkilling.  So I'm okay with removing it.

 > > Also, it was not clear to me why you chose long (instead of int) in
 > > some cases while choosing int for others.  Please clarify the
 > > rationale.
 > I find this in python2 doc:
 > {{{
 > python2 supports numeric types: plain integers, long integers
 > Plain integers (also just called integers) are implemented using long in
 C, which gives
 > them at least 32 bits of precision,Long integers have unlimited
 precision.
 > }}}
 > and also
 > {{{
 > In Python 3, there is only one integer type, called int, which mostly
 behaves like the
 > long type in Python 2.
 > }}}
 > PyArg_ParseTuple() is responsible for converting a Python integer to a
 plain C numeric type, as described above, I think long precision is more
 suitable to prevent overflow.

 On thinking about it further, I now agree long is better than int, but
 not for that reason.  The above citation is about internal
 representation of integers in the python language, and regardless of
 the internal representation PyArg_ParseTuple() can convert them to
 various C integer types, so the question of int vs long (vs short,
 etc) still holds.

 In our usage many integers are unsigned 16-bit ones.  To convert them
 safely (without worrying about overflow), we need at least 32-bit
 signed integers.  That's why I agree long (which is ensured to be at
 least 32-bit) is better than int (which can be e.g. 16-bit).  For this
 rationale, in some cases we really don't need long, and in some other
 cases even long is not safe enough.  See specific cases below.

 Now, specific comments about the code:

 - this may not specific to this patch, but some calls to PyErr_Clear()
   seem to be redundant.  example:
 {{{
     if (messageflag < 0 || messageflag > 0xffff) {
         PyErr_Clear();
         PyErr_SetString(PyExc_OverflowError, "Message header flag out of
 range");
 }}}

 - As far as I can see, some of the long int variables don't have to be
   so (they can be a plain int): Message_init, Message_clear,
   MessageRenderer_setCompressMode. (although being long is not
   necessarily incorrect; it just consumes more (stack) space
   unnecessarily).  Actually, in these cases I think it make more sense
   to do this check within the corresponding C++ class (Message,
   MessageRenderer, etc) and delegate the check to it.

 - on the other hand, in this case the integer has to be long:
  - Message_setQid() (because size of int may be 16 bits)

 - Message_addRRset: I'd avoid using hardcoded range here.  Actually,
   in this case I think it make more sense to delegate the check to the
   C++ code.

 - Name_init: I wouldn't bother to check '> 0xffff'.  In fact, the
   buffer could store longer data than 64KB.  Invalid (positive)
   positions will be rejected in setPosition() anyway.  (with removing
   that check) I'd also consider this case a "type error" rather than
   overflow, that is, a negative integer type is passed where an
   unsigned is expected.  Same comment applies to other similar cases.

 - Name_at, Name_split: likewise.  And, in this case it doesn't even have
 to be
   long (int should be sufficient).

 - Name_split: is this case tested? (catching it seems to be correct and
 good)
 {{{
 +    PyErr_Clear();
 +    PyErr_SetString(PyExc_TypeError,
 +                    "No valid type in split argument");
 }}}

 - Rcode_init: ext_code doesn't have to be long.

 - RRTTL_init: this doesn't seem to be correct:
 {{{
 -        } else if (PyArg_ParseTuple(args, "I", &i)) {
 +        } else if (PyArg_ParseTuple(args, "k", &i)) {
 }}}
  "k" omits overflow check, so it essentially has the same problem as
  "I".  One solution is to use "L" with long long (assuming our target
  system has it; this is probably acceptable.  We already rely on
  uint64_t etc, anyway).  Another solution is to ignore overflow in
  this specific case.

 - message_python_test.py: the change makes the following comment
   incorrect:
 {{{
          # this would cause overflow and result in a "valid" flag
 }}}

 - tests in general.  I've not closely checked the test coverage, but
   from a quick look there seems to be some cases that were not tested
   (e.g. for RRTTL).  Please make sure all new code is covered by tests
   (note also that the coverage problem is unlikely to happen if it was
   developed test driven); or, if I miss something and the code was
   fully covered already, please say so.

-- 
Ticket URL: <http://bind10.isc.org/ticket/363#comment:10>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development