HDR/OVER/XPAT and keyword generation

Julien ÉLIE julien at trigofacile.com
Sun Nov 1 16:52:50 UTC 2009


Hi Nix,

> I thought the point of keyword generation was that it kicked in only
> for those posts that *don't* already have a keywords header?
>
> ... but, hm, it looks like it appends generated keywords to the
> existing keyword header value. That's really not ideal, is it :/

It appears that keyword generation is done for every article, even
though it already has a Keywords: header.
I would tend to think like you that it should not regenerate it for
such articles.

We can change the behaviour of keyword generation.
Does someone think we should go on regenerating the Keywords: header
of articles which already have one?



>> "HDR keywords" would give the contents of the real Keywords: header
>> and "HDR :keywords" would give the contents of the computed Keywords:
>> header...
>> But I doubt clients know how to deal with that...
>
> I was assuming that the point of the keyword generation was that it
> made those articles which didn't have keywords look as if they did,
> leaving those which *did* have human-written keywords alone. But it
> looks like that's not quite what it does...

We do not modify the article, but only its overview data, as it is said
in INN 2.5.1 documentation:  "In order to use this feature, the regex
library should be available and INN configured with the --enable-keywords
flag.  Otherwise, no keywords will be generated, even though this boolean
value is set to true.  You also have to add the integration of the
Keywords: header into the overview with extraoverviewadvertised or
extraoverviewhidden."

Note that all versions of INN also mention that "INN has optional support
for generating keyword information automatically from article body text
and putting that information in *overview* for the use of clients that know
to look for it."

My point was that HDR and XPAT also returned that overview data.  I do not
know whether we should do that.  Maybe yes, I assume.



Incidentally, I have just caught another bug in keyword generation,
depending on the value of the keylimit: inn.conf parameter.

Just set it to "-5" and innd will die with:
    SERVER cant malloc 4294967292 bytes at keywords.c line 138
Because of a xmalloc().

And also, just set it to "5" and wait for an article containing
a Keywords: header.  (It also crashes with an article containing a
very long Keywords: header, with the default "512" value.)
Because of a memcpy().

-- 
Julien ÉLIE

« -- Tu dois avoir un messager zélé autant qu'ailé
  pour faire rapidement le trajet.
  -- Oui ! et c'est une fine mouche ! » (Astérix) 




More information about the inn-workers mailing list