WordsEx
Glenn Scheper
glenn_scheper at earthlink.net
Thu Jul 5 14:42:35 CDT 2007
> hey, I finally got a chance to play with WordsEx.
>
> It probably does more than this, but I was thrilled
> to be able to pull in the W.A.S.T.E. p-list archive
> from the whole month of March 1997 and browse around
> by clicking on titles! Next up: using it intelligently...
>
> This is something I've daydreamed about implementing
> in Lisp through emacs, but haven't really learned enough
> of either to even start.
>
> question: is this use considerate of the W.A.S.T.E. server -
> ie am I doing something harmful or selfish puling these in?
>
> Finally - yes, the green is very easy on the eyes.
>
> Thanks, Glenn!
> forwarding freely,
> the program is at http://home.earthlink.net/~glenn_scheper/
Thanks Mike!
WASTE is one of the "search engines" that I have examined in
designing WordsEx. The web page has a search form, and there
is no login password required, so it can be used by WordsEx.
Here is a typical sequence of HTML tags used in a FORM:
<base href="http://www.waste.org/pynchon-l/">
<form method="post" action="http://waste.org/mail/?list=pynchon-l">
<input type="hidden" name="list" value="pynchon-l">
<input name="keywords" value="" size=40> --?so text=default type?
<input type="submit" value="Search">
</form>
So I run WordsEx, and do:
Add, One Web Page, and specify URL:
http://www.waste.org/mail/?list=pynchon-l
As a side thing, I do:
Help, Memory Stores
and find that I have encountered and saved "1 html forms"
Then I do:
File, Save Query Forms, and specify a filename, "wq.txt"
Then I edit wq.txt, finding originally:
...lots of text about the file format...
...and the current paucity of 8 embedded search urls, a list soon to grrOOOWWWW!!!....
GET 001 http://websearch.cs.com/cs/search?fromPage=cscom&uType=5049181&query=
GET 002 http://www.google.com/search?hl=en&num=100&ie=ISO-8859-1&btnG=Google+Search&q=
GET 003 http://search.looksmart.com/p/search?tb=dir&qt=
GET 004 http://monstercrawler.com/beta-bin/nph-beta.pl?qry=
GET 005 http://ixquick.com/do/metasearch.pl?cat=web&cat=web&cmd=process_search&language=english&query=
GET 006 http://www.ask.com/web?o=0&l=dir&q=
GET 007 http://gigablast.com/search?n=100&q=
GET 008 http://www.altavista.com/web/results?nbq=50&itag=wrx&kgs=0&kls=0&q=
... and finally, the following data about the HTML FORM
that was encountered fetching the WASTE search page URL:
Location: http://www.waste.org/mail/
--Specimens--
FYI 000 http://www.waste.org/mail/?donename=mailing lists&doneurl=/mail/&list=pynchon-l&action=subscribe&addr=
FYI 000 http://www.waste.org/mail/?list=pynchon-l&keywords=
--Alternatives--
ACTION
ACTION /mail/
IN_CHECKBOX digest=
IN_HIDDEN donename=mailing lists
IN_HIDDEN doneurl=/mail/
IN_HIDDEN list=pynchon-l
IN_RADIO action=subscribe
IN_RADIO action=unsubscribe
IN_SUBMIT =
IN_TEXT addr=
IN_TEXT keywords=
METHOD POST
So while I am out editing the saved file wq.txt,
I replace the entire contents of the wq.txt file
with this single line that I have put together:
GET 001 http://www.waste.org/mail/?list=pynchon-l&keywords=
And I again run WordsEx, and first do:
File, Load Search Engines, and specify that filename, "wq.txt"
Now I would query WASTE, which is the only active search engine,
having replaced the original list of 8 engines, so I will do:
Add, Internet Search, and specify a topic, say, "wordsex".
Doing Help, Memory Stores shows that I now have 15 web pages in memory.
Viewing the search thread result, I see that WordsEx
scraped 14 hits from the original result page.
>From the form of the URLs, I see that 2 are to continuation result pages,
most others are to actual "hit" emails, and others are to overhead URLs,
which I could next eliminate by adding some NOT rules to the WQ.TXT file.
Since I am still working on searching, and have not implemented the
intelligent following of hit URLs and continuation URLs on result pages:
I would next view each of the continuation pages, and during each, do:
Add, All Links on Page.
Then, All of the hit results would be in memory. But today I will not.
Here, I'll copy and paste the entire search result view:
========================================================
Add Web Search:
wordsex
Thread started.
Trying search engine 001
http://www.waste.org/mail/?list=pynchon-l&keywords=wordsex
waste mailing lists / pynchon-l search results: wordsex
0000-00-00 / Unknown / Windows-1252
0 score, 0 phrases, 0 words, 3 kb, 0 terms, 14 links.
Query 001 took 2 seconds, produced 14 hits, 14 novel.
Now fetching the 14 novel hit pages.
http://www.waste.org/mail/?list=pynchon-l&keywords=wordsex&page=3
waste mailing lists / pynchon-l search results: wordsex
0000-00-00 / Unknown / Windows-1252
0 score, 7 phrases, 124 words, 1 kb, 92 terms, 7 links.
scheper glenn bytes vineland mikebailey june wrote wed jun just gmt miscellany
happened before notorious omitted pages major during reread results corner
cannot sensing famous getting iran note tore matches
http://www.waste.org/mail/?list=pynchon-l&keywords=wordsex&page=2
waste mailing lists / pynchon-l search results: wordsex
0000-00-00 / Unknown / Windows-1252
0 score, 18 phrases, 333 words, 3 kb, 181 terms, 14 links.
scheper glenn bytes wordsex gmt atdtda omitted type apr fri mikebailey neville
message egyptian alchemy mon wed jan got exe reef try earth original category
conservapedia encyclopedias unicode preservation
http://www.waste.org/mail/?list=pynchon-l&month=0702&msg=115405&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
53 score, 60 phrases, 969 words, 5 kb, 476 terms, 13 links.
him lew sin scheper glenn she her god when shall thought original remember
supposed everyone done sure like wife job night once mouth could wheel three
following before committed condemned pulled ferris
http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=113877&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
8 score, 7 phrases, 133 words, 1 kb, 87 terms, 15 links.
scheper glenn yahoo wordsex waste exe monropolitan trepidation functionality
opened everyone omitted obliged message given discussions raving freeware todays
date update certain here dave beta pynchon
http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=114097&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
11 score, 16 phrases, 262 words, 2 kb, 155 terms, 15 links.
scheper glenn studies wordsex riddles riddle exe there pynchon authors
literature revelation literary immediately functionality several correlate
superficial however remarkable copyleft something specific
http://www.waste.org/mail/?list=pynchon-l&month=0702&msg=115754&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
30 score, 34 phrases, 550 words, 3 kb, 341 terms, 22 links.
scheper glenn wordsex text latin iso add url yet http limitation characters
coded pages reform usacii folder process eset file gave index them sich exe just
gions tue due ytt try www php accesibilidad
http://www.waste.org/mail/?list=pynchon-l&month=0706&msg=119190&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
10 score, 11 phrases, 248 words, 2 kb, 173 terms, 19 links.
scheper glenn html edu ferguson haraway cone few www http htm jacketmagazine
cyborgmanifesto literacy manifesto immanenscendence okeeffemuseum erasure
minutes everything rosamond modernpostmodern feminism
http://www.waste.org/mail/?list=pynchon-l&month=0704&msg=117018&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
14 score, 9 phrases, 204 words, 1 kb, 130 terms, 15 links.
scheper glenn wordsex watchman exe says sheep demolition evidence copyleft
binaries followed burritos omitted amazing professors controlled convincing
burrito suspicious quickening files freeware patriotsquestion
http://www.waste.org/mail/?list=pynchon-l&month=0702&msg=115134&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
10 score, 11 phrases, 204 words, 1 kb, 133 terms, 14 links.
scheper glenn dance fair chicago dancers timeline scandalized divided
interpretations relevant credited copyleft specific financial performed
containing village amalgam enraged fictitious given probably
http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=114850&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
18 score, 10 phrases, 361 words, 2 kb, 118 terms, 12 links.
matches text diagnostic file txt has this awake tokens stemmer scheper stemming
kin words glenn search thread yet memory awaked awaking containing concurrent
coded pages algorithm results porter started
http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=113855&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
4 score, 3 phrases, 102 words, 1 kb, 73 terms, 13 links.
scheper glenn wordsex cache exe functionality especially freeware review
present todays currently date update certain items file pynchon index scrolling
previous waste arrow text then them jan fun horn
http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=113812&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
52 score, 42 phrases, 870 words, 5 kb, 441 terms, 15 links.
scheper glenn tribulation tap metanoia upon god know thou revelation jesus
language wordsex knowledge abject wog them such exe which could chums unorthodox
happened autopoetic individuals divine another
http://www.waste.org/
waste
0000-00-00 / Unknown / Windows-1252
0 score, 0 phrases, 69 words, 1 kb, 69 terms, 53 links.
lostchocolatelab homepages oxymoron recipes lameness burnunit uvulas ivanish
apricot dimitri archives linux roadrunner embedded donors darkside azure engine
danger sensor egyptian unbroken ferret oracle
http://www.waste.org/mail/?list=pynchon-l
waste mailing lists / pynchon-l
0000-00-00 / Unknown / Windows-1252
42 score, 22 phrases, 578 words, 3 kb, 195 terms, 186 links.
mar jun jul sep nov dec jan aug apr oct waste pynchon feb list subscribe send
org may archives digest unsubscribe please instead then group message discussion
welcome replies whole reply leave dash want
Finished search engine 001
Thread ended.
-- Add Web Search:
========================================================
Next, I do:
Add, Word Search, and again specify as parameter, wordsex.
This is the result of that, chosing KWIC format, and I could next
click on any KWIC result line to view the containing source text:
In order that word-wrap during copy and paste not uglify these
KWIC lines, I first set a very narrow window to reduce the size
of the KWIC context that gets presented in the results:
========================================================
Find Words:
wordsex
Thread started.
Search tokens, without stemming:
wordsex - 29 matches
Search tokens, matches after stemming:
wordsex - 29 matches
s mornings work on WordsEx went very smoothly. M
ed, dash, and made WordsEx emulate it when copyi
g them opened in in WordsEx, and clicked on each
---------- You know, WordsEx fetched all these page
ation jesus language wordsex knowledge abject wog
scheper glenn wordsex watchman exe says sh
cked on each URL. WordsEx also supplied this text-
08:00) (9626 bytes) WordsEx hey, I finally got a cha
b tool this weekend: WordsEx.exe">http://home.eart
cheper glenn studies wordsex riddles riddle exe ther
posts today, I used WordsEx to read a folder from
of them were mine. WordsEx.exe">http://home.eart
net/~glenn_scheper/WordsEx.exe Yours truly, Glenn
net/~glenn_scheper/WordsEx.exe Yours truly, Glenn
chance to play with WordsEx. It probably does mor
g connects. I asked WordsEx for "write under erasu
scheper glenn wordsex cache exe functionalit
hed in my freeware, WordsEx.exe, which, by the wa
ain fun functionality: WordsEx.exe">http://home.eart
scheper glenn wordsex text latin iso add url ye
n tool WordsEx.exe: WordsEx.exe">http://home.eart
are information tool WordsEx.exe: WordsEx.exe">h
net/~glenn_scheper/WordsEx.exe I put this URL in i
scheper glenn bytes wordsex gmt atdtda omitted typ
net/~glenn_scheper/WordsEx.exe I had a few good
net/~glenn_scheper/WordsEx.exe Especially, you c
3 bytes) NP: Try my WordsEx now. http://home.earth
> functionality: > > WordsEx.exe">http://home.eart
net/~glenn_scheper/WordsEx.exe Due to, I think it w
nformation). When I WordsEx'd that url, and asked t
net/~glenn_scheper/WordsEx.exe ... Glenn Scheper
cheper glenn yahoo wordsex waste exe monropolita
net/~glenn_scheper/WordsEx.exe ______________
onality every day... WordsEx.exe">http://home.eart
hon-l search results: wordsex 23 matches were foun
hon-l search results: wordsex 23 matches were foun
Thread ended.
-- Find Words:
========================================================
So what else can WordsEx do?
On any of your longish emails I do a select all, and copy
from outlook, and paste the whole email text into WordsEx for
easier reading using green, big fonts, and smooth scrolling.
One of WordsEx text reduction strategies is to remove the
typical case of email quoting using right angle brackets,
as atop this email. I should try, by saving in this email
and querying for this unlikely search string: kskjebnkjj.
The last program update to my web site was June 20, but I
really am trying to work on it, and especially the search
engine URL choices (like 80 or so?) and techniques next.
As to the consumption of WASTE server resources, I'd say:
"Smoke 'em if you got 'em"
Yours truly,
Glenn Scheper
http://home.earthlink.net/~glenn_scheper/
glenn_scheper + at + earthlink.net
Copyleft(!) Forward freely.
More information about the Pynchon-l
mailing list