brainstorming about shared memory and configuration files (esp. for sm -d )

Fri Apr 27 02:51:26 UTC 2001

xload showed a humongous load spike.

I looked through the log files and apparently the load spike
was due to dozens of instances of sm being called, not sure from
where, as part of handling of a burst of control files.

I assume this is normal.  I also suppose this kind of thing is
what kept killing my ovdb, and hope that moving the log files
to a different spindle, as suggested on the ovdb man page, will
help. But I digress.

Each sm starts by reading the control file, then it does what
it is told to do, then exits.  So a burst of cancels would cause
the load spike.

I wonder these things:

1:  if the sm tool could queue up commands instead of seeing
to them directly, possibly with a -defer switch

2:  if shared memory segments could be used for these purposes:

	a: eliminating configuration file reading, by passing
	   a segment ID of a previously read configuration to each
	   tool, or having a one-line file that the segment ID can
	   be read from, eliminating the parsing step (the segment
	   would hold the struct conf_vars, or a compliant program
	   would, on error, read the file and share the segment and
	   rewrite the (hypothetical) $pathdb/shmid/inn.conf file

	b: a fast queueing system.

I do not know what kind of locking sm uses, so it is possible that
a dozen sm utilities running at the same time are all simply waiting
for the lock instead of using any CPU, in which case the problem is
simply with the OS for showing a blocked process as part of the load.

If not, if it is busy-waiting, that could be changed, for instance by
designating a temporary directory to copy sm directives to, one instance
per file, moving the directives to a second directory, and having a smd
that checks the second directory every few seconds and serializes the
directives, in effect doing a MailDir delivery to the daemon. We might lose
or even exacerbate elevator issues this way, by giving the fs Yet More to
do.
I imagine the directories would live w/in pathdb, next to history, and
be called sm_queue and sm_queue_temp , and that few files in there would
live
long enough to ever get written to disk.

Synchronizing them all by doing what I described above, but into data
structures
in a big shared memory block instead of via the FS, would be sneakier.  The
first sm process would want to stay alive as the daemon, after starting out
by creating the Big Shared Memory Block, writing its ID to
${pathdb}/sm.shmid
and queuing up whatever it is supposed to do for itself when it switches
hats.

If the queueing system gets full (!) how big is a token, anyway (!) sm
could
open a second shared bock or could start blocking for a chance to write
into the queue.

Say we use a queue block with space in it for 200 tokens.  sm takes its pid
mod 200 and checks to see if that space is empty (set to all 0 -- is that a
valid article ID?) and if so, writes the tokens it wants deleted into its
hashed slot and exits.

The sm server process checks all 200 slots until they are all zeros, sleeps
a second, tries again.  When it finds a token in a slot, it deletes the
file
(using sm as it is now).

Maybe slots have space in them for many tokens. Maybe locking of some kind
is required before slotting tokens.

tokens are 16 bytes, which means an 8k segment could hold 512 of them.

Mod your pid by 128 and multiply by four and start putting tokens in
zeroed slots, if you get all the way back around to where you started and
you still have tokens left to delete, open your own shared segment and
rewrite $dbfiles/shmid/sm

If "copy to destination address if value there is zero and return success"
is not atomic enough (it it?) for use in SMP architectures --- hmmm ---
semaphores can be atomically incremented and can hold integers, a semaphore
could indicate which slot in the deletion table is the next one to write
to. A second semaphore is needed to do locking on the first semaphore,
which
makes using the semaphore for anything other than controlling mutexes
silly;
we would use a semaphore to control access to incrementing the slot
counter,
so a second or later sm process, instead of directly dealing with storage,
to delete an article, would do this, after reading in the semaphore and shm
IDs from a configuration file.

	Aquire the mutex
	find a zero slot and copy the token into it, for all tokens
	we are to schedule deletions for
	release the mutex
	exit.

http://www.p-nand-q.com/linux/mutex.htm provides some alleged example code
of setting and releasing a semaphore-based mutex.

I just want to smooth the performance so you can stack up a hundred
sm -d calls and you don't get as high of a load spike.

what think everyone?

-- 
                      David Nicol 816.235.1187 dnicol at cstp.umkc.edu
                                      and they all say "yodelahihu"