New vector API

Russ Allbery rra at stanford.edu
Wed May 9 15:25:10 UTC 2001


I needed this for some parts of configuration parsing, and I've also been
wanting to replace argify and glom for a while with something that could
also be easily used to hold multiple occurrences of headers and other such
similar things.

Here's what I came up with for an API for people to check.  The vector
object isn't opaque; it's too useful to be able to easily fiddle with it
or walk through it.  It's a bit more complicated than the existing
NULL-terminated list of pointers that argify generates, but it should
allow reuse of the same memory to a much larger degree to save on
constantly allocating and freeing small bits of memory.

I've already written all of the code to implement this, but I haven't
tested it yet (and don't let that stop you from suggesting major changes).

The argify and glom equivalents are:

  struct vector *vector = vector_split_whitespace(string, false, NULL);
  char *string = vector_join(vector, " ");

Down the road, I want to add a vector_split_table that works like the
other vector_split functions but takes a char[128] table saying whether or
not each character is a delimiter; that would be useful for parsing the
Path header.  (The reason for 128 rather than 256 is that I think we
should start thinking in terms of UTF-8 rather than any 8-bit character
set, so anything >128 is potentially part of a multibyte character, and
the easiest and best way of handling that for the medium term is to just
outlaw non-ASCII delimiters for the time being.)


/*  $Id$
**
**  Vector handling (counted lists of char *'s).
**
**  Written by Russ Allbery <rra at stanford.edu>
**  This work is hereby placed in the public domain by its author.
**
**  A vector is a simple array of char *'s combined with a count.  It's a
**  convenient way of managing a list of strings, as well as a reasonable
**  output data structure for functions that split up a string.
**
**  Vectors can be "deep," in which case each char * points to allocated
**  memory that should be freed when the vector is freed, or "shallow," in
**  which case the char *'s are taken to be pointers into some other string
**  that shouldn't be freed.
*/

#ifndef INN_VECTOR_H
#define INN_VECTOR_H 1

#include <inn/defines.h>

struct vector {
    size_t count;
    size_t allocated;
    char **strings;
    bool shallow;
};

BEGIN_DECLS

/* Create a new, empty vector. */
struct vector *vector_new(bool shallow);

/* Add a string to a vector.  If vector->shallow is false, the string will be
   copied; otherwise, the pointer is just stashed.  Resizes the vector if
   necessary. */
void vector_add(struct vector *, char *string);

/* Resize the array of strings to hold size entries.  Saves reallocation work
   in vector_add if it's known in advance how many entries there will be. */
void vector_resize(struct vector *, size_t size);

/* Reset the number of elements to zero, freeing all of the strings if the
   vector isn't shallow, but not freeing the strings array (to cut down on
   memory allocations if the vector will be reused). */
void vector_clear(struct vector *);

/* Free the vector and all resources allocated for it. */
void vector_free(struct vector *);

/* Split functions build a vector from a string.  vector_split splits on a
   specified character, while vector_split_whitespace splits on any sequence
   of whitespace.  If copy is true, a deep vector will be constructed;
   otherwise, the provided string will be destructively  modified in-place to
   insert nul characters between the strings.  If the vector argument is NULL,
   a new vector is allocated; otherwise, the provided one is reused.

   Empty strings will yield zero-length vectors.  Adjacent delimiters are
   treated as a single delimiter (zero-length strings are not added to the
   vector). */
struct vector *vector_split(char *, char sep, bool copy, struct vector *);
struct vector *vector_split_whitespace(char *, bool copy, struct vector *);

/* Build a string from a vector by joining its components together with the
   specified string as separator.  Returns a newly allocated string; caller is
   responsible for freeing. */
char *vector_join(struct vector *, const char *seperator);

END_DECLS

#endif /* INN_VECTOR_H */

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>


More information about the inn-workers mailing list