The Porter Stemming Algorithm
Glenn Scheper
glenn_scheper at
Thu Dec 8 07:34:47 CST 2005
I decided to stop surfing and get my hands dirty, my
feet wet. So first, I compiled this simple program:
The purpose of word stemming is to fold related words
into a same string for Information Retrieval indexing.
Unique word count in GR.
Before stemming: 28963
After stemming: 20504
(This count includes such memorable non-words as:
1 a-
1 a-a-a
51 a-and
1 a-and-
1 a-ballin'
1 a-bloomin'
1 a-bout
1 a-bustle
1 a-fracturing...)
Here is an example of applying the Porter stemmer to GR:
=== before stemming: ===
a screaming comes across the sky
a screaming comes across the sky.
it has happened before,
but there is nothing to compare it to now.
it is too late.
the evacuation still proceeds,
but it's all theatre.
there are no lights inside the cars.
no light anywhere.
above him lift girders old as an iron queen,
and glass somewhere far above that would let the light of day through.
but it's night.
he's afraid of the way the glass will fall soon it will be a spectacle:
the fall of a crystal palace.
but coming down in total blackout,
without one glint of light,
only great invisible crashing.
inside the carriage,
which is built on several levels,
he sits in velveteen darkness,
with nothing to smoke,
feeling metal nearer and farther rub and connect,
steam escaping in puffs,
a vibration in the carriage's frame,
a poising,
an uneasiness,
all the others pressed in around,
feeble ones,
second sheep,
all out of luck and time:
old veterans still in shock from ordnance 20 years obsolete,
hustlers in city clothes,
exhausted women with more children than it seems could belong to anyone,
stacked about among the rest of the things to be carried out to salvation.
only the nearer faces are visible at all,
and at that only as half-silvered images in a view finder,
green-stained vip faces remembered behind bulletproof windows speeding through the city....
they have begun to move.
they pass in line,
out of the main station,
out of downtown,
and begin pushing into older and more desolate parts of the city.
is this the way out?
faces turn to the windows,
but no one dares ask,
not out loud.
rain comes down.
this is not a disentanglement from,
but a progressive knotting into they go in under archways,
secret entrances of rotted concrete that only looked like loops of an underpass ...
certain trestles of blackened wood have moved slowly by overhead,
and the smells begun of coal from days far to the past,
smells of naphtha winters,
of sundays when no traffic came through,
of the coral-like and mysteriously vital growth,
around the blind curves and out the lonely spurs,
a sour smell of rolling-stock absence,
of maturing rust,
developing through those emptying days brilliant and deep,
especially at dawn,
with blue shadows to seal its passage,
to try to bring events to absolute zero ...
=== after stemming: ===
a scream come across the sky
a scream come across the sky.
it ha happen befor,
but there is noth to compar it to now.
it is too late.
the evacu still proce,
but it's all theatr.
there ar no light insid the car.
no light anywher.
abov him lift girder old as an iron queen,
and glass somewher far abov that would let the light of dai through.
but it's night.
he's afraid of the wai the glass will fall soon it will be a spectacl:
the fall of a crystal palac.
but come down in total blackout,
without on glint of light,
onli great invis crash.
insid the carriag,
which is built on sever level,
he sit in velveteen dark,
with noth to smoke,
feel metal nearer and farther rub and connect,
steam escap in puff,
a vibrat in the carriag's frame,
a pois,
an uneasi,
all the other press in around,
feebl on,
second sheep,
all out of luck and time:
old veteran still in shock from ordnanc 20 year obsolet,
hustler in citi cloth,
exhaust women with more children than it seem could belong to anyon,
stack about among the rest of the thing to be carri out to salvat.
onli the nearer face ar visibl at all,
and at that onli as half-silver imag in a view finder,
green-stain vip face rememb behind bulletproof window speed through the citi....
thei have begun to move.
thei pass in line,
out of the main station,
out of downtown,
and begin push into older and more desol part of the citi.
is thi the wai out?
face turn to the window,
but no on dare ask,
not out loud.
rain come down.
thi is not a disentangl from,
but a progress knot into thei go in under archwai,
secret entranc of rot concret that onli look like loop of an underpass ...
certain trestl of blacken wood have move slowli by overhead,
and the smell begun of coal from dai far to the past,
smell of naphtha winter,
of sundai when no traffic came through,
of the coral-like and mysteri vital growth,
around the blind curv and out the lone spur,
a sour smell of roll-stock absenc,
of matur rust,
develop through those empti dai brilliant and deep,
especi at dawn,
with blue shadow to seal it passag,
to try to bring event to absolut zero ...
Yours truly,
Glenn Scheper
glenn_scheper + at +
Copyleft(!) Forward freely.
More information about the Pynchon-l
mailing list