sed and encodings

Julien ÉLIE julien at trigofacile.com
Mon Jan 19 20:40:38 UTC 2009


Hi William,

>    sed 's/.*/x/'
>
> normally should replace an arbitrary string by a single x.
> The dot, however, does not match non-Ascii characters any more!

Could you please try with the attached docheckgroups file?
(only change the path to innshellvars at line 2)

I replaced all the occurrences of sed /.*/ with cut or perl.



[Patch where tabs are *not* conserved.]

Index: docheckgroups.in
===================================================================
--- docheckgroups.in    (révision 8292)
+++ docheckgroups.in    (copie de travail)
@@ -28,26 +28,25 @@

 ##  Get the top-level newsgroup names from the message and turn it into
 ##  an egrep pattern.
-PATS=`${SED} <${T}/$$msg \
-        -e 's/[        ].*//' -e 's/\..*//' \
-        -e 's/^!//' -e '/^$/d' \
+PATS=`cut -f1 <${T}/$$msg | cut -f1 -d' ' | cut -f1 -d'.' | \
+        ${SED} -e 's/^!//' -e '/^$/d' \
         -e 's/^/^/' -e 's/$/[.         ]/' \
     | ${SORT} -u \
     | (tr '\012' '|' ; echo '' )\
     | ${SED} -e 's/|$//'`

 ##  Check for missing and obsolete newsgroups in active.
-${EGREP} "${PATS}" ${ACTIVE} | ${EGREP} "${1:-.}" | ${SED} 's/ .*//' | ${SORT} >${T}/$$active
-${EGREP} "${PATS}" ${T}/$$msg | ${EGREP} "${1:-.}" | ${SED} 's/[       ].*//' | ${SORT} >${T}/$$newsgrps
+${EGREP} "${PATS}" ${ACTIVE} | ${EGREP} "${1:-.}" | cut -f1 -d' ' | ${SORT} >${T}/$$active
+${EGREP} "${PATS}" ${T}/$$msg | ${EGREP} "${1:-.}" | cut -f1 | cut -f1 -d' ' | ${SORT} >${T}/$$newsgrps

 comm -13 ${T}/$$active ${T}/$$newsgrps >${T}/$$missing
 comm -23 ${T}/$$active ${T}/$$newsgrps >${T}/$$remove

 ##  Check for proper moderation flags in active (we need to be careful
 ##  when dealing with wire-formatted articles manually fed from the spool).
-${EGREP} "${PATS}" ${ACTIVE} | ${EGREP} "${1:-.}" | ${SED} -n '/ m$/s/ .*//p' | ${SORT} >${T}/$$amod.all
+${EGREP} "${PATS}" ${ACTIVE} | ${EGREP} "${1:-.}" | ${PERL} -n -e 'if (/ m$/) {s/ .*//; print $_;}' | ${SORT} >${T}/$$amod.all
 ${EGREP} "${PATS}" ${T}/$$msg | ${EGREP} "${1:-.}" | ${SED} 's/\r\?$//' \
-    | ${SED} -n '/ (Moderated)$/s/[    ].*//p' | ${SORT} >${T}/$$ng.mod
+    | ${PERL} -n -e 'if (/ \(Moderated\)$/) {s/[       ].*//; print $_;}' | ${SORT} >${T}/$$ng.mod

 comm -12 ${T}/$$missing ${T}/$$ng.mod >${T}/$$add.mod
 comm -23 ${T}/$$missing ${T}/$$ng.mod >${T}/$$add.unmod


-- 
Julien ÉLIE

« La chaîne du mariage est si lourde qu'il faut être deux pour la porter,
  souvent trois. » (Alexandre Dumas) 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: docheckgroups
Type: application/octet-stream
Size: 6412 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/inn-workers/attachments/20090119/b5aacf83/attachment.obj>


More information about the inn-workers mailing list