WordsEx

Glenn Scheper glenn_scheper at earthlink.net
Thu Jul 5 14:42:35 CDT 2007


> hey, I finally got a chance to play with WordsEx.
> 
> It probably does more than this, but I was thrilled
> to be able to pull in the W.A.S.T.E. p-list archive
> from the whole month of March 1997 and browse around
> by clicking on titles!  Next up: using it intelligently...
> 
> This is something I've daydreamed about implementing
> in Lisp through emacs, but haven't really learned enough
> of either to even start.
> 
> question: is this use considerate of the W.A.S.T.E. server -
> ie am I doing something harmful or selfish puling these in?
> 
> Finally - yes, the green is very easy on the eyes.
> 
> Thanks, Glenn!
> forwarding freely,
> the program is at http://home.earthlink.net/~glenn_scheper/


Thanks Mike!

WASTE is one of the "search engines" that I have examined in
designing WordsEx. The web page has a search form, and there
is no login password required, so it can be used by WordsEx.

Here is a typical sequence of HTML tags used in a FORM:
<base href="http://www.waste.org/pynchon-l/">
<form method="post" action="http://waste.org/mail/?list=pynchon-l">
<input type="hidden" name="list" value="pynchon-l">
<input name="keywords" value="" size=40> --?so text=default type?
<input type="submit" value="Search">
</form>

So I run WordsEx, and do:
	Add, One Web Page, and specify URL:
	http://www.waste.org/mail/?list=pynchon-l

As a side thing, I do:
	Help, Memory Stores
and find that I have encountered and saved "1 html forms"

Then I do:
	File, Save Query Forms, and specify a filename, "wq.txt"

Then I edit wq.txt, finding originally:

	...lots of text about the file format...

	...and the current paucity of 8 embedded search urls, a list soon to grrOOOWWWW!!!....

GET 001 http://websearch.cs.com/cs/search?fromPage=cscom&uType=5049181&query=
GET 002 http://www.google.com/search?hl=en&num=100&ie=ISO-8859-1&btnG=Google+Search&q=
GET 003 http://search.looksmart.com/p/search?tb=dir&qt=
GET 004 http://monstercrawler.com/beta-bin/nph-beta.pl?qry=
GET 005 http://ixquick.com/do/metasearch.pl?cat=web&cat=web&cmd=process_search&language=english&query=
GET 006 http://www.ask.com/web?o=0&l=dir&q=
GET 007 http://gigablast.com/search?n=100&q=
GET 008 http://www.altavista.com/web/results?nbq=50&itag=wrx&kgs=0&kls=0&q=

	... and finally, the following data about the HTML FORM
        that was encountered fetching the WASTE search page URL:

Location: http://www.waste.org/mail/

  --Specimens--
FYI 000 http://www.waste.org/mail/?donename=mailing lists&doneurl=/mail/&list=pynchon-l&action=subscribe&addr=
FYI 000 http://www.waste.org/mail/?list=pynchon-l&keywords=

  --Alternatives--
  ACTION  
  ACTION /mail/ 
  IN_CHECKBOX digest=
  IN_HIDDEN donename=mailing lists
  IN_HIDDEN doneurl=/mail/
  IN_HIDDEN list=pynchon-l
  IN_RADIO action=subscribe
  IN_RADIO action=unsubscribe
  IN_SUBMIT =
  IN_TEXT addr=
  IN_TEXT keywords=
  METHOD POST


So while I am out editing the saved file wq.txt,
I replace the entire contents of the wq.txt file
with this single line that I have put together:

GET 001 http://www.waste.org/mail/?list=pynchon-l&keywords=


And I again run WordsEx, and first do:

	File, Load Search Engines, and specify that filename, "wq.txt"

Now I would query WASTE, which is the only active search engine,
having replaced the original list of 8 engines, so I will do:

	Add, Internet Search, and specify a topic, say, "wordsex".


Doing Help, Memory Stores shows that I now have 15 web pages in memory.

Viewing the search thread result, I see that WordsEx
scraped 14 hits from the original result page.

>From the form of the URLs, I see that 2 are to continuation result pages,
most others are to actual "hit" emails, and others are to overhead URLs,
which I could next eliminate by adding some NOT rules to the WQ.TXT file.

Since I am still working on searching, and have not implemented the
intelligent following of hit URLs and continuation URLs on result pages:
I would next view each of the continuation pages, and during each, do:
	Add, All Links on Page.

Then, All of the hit results would be in memory. But today I will not.


Here, I'll copy and paste the entire search result view:
========================================================
Add Web Search:
wordsex

Thread started.

Trying search engine 001 

http://www.waste.org/mail/?list=pynchon-l&keywords=wordsex
waste mailing lists / pynchon-l search results: wordsex
0000-00-00 / Unknown / Windows-1252
0 score, 0 phrases, 0 words, 3 kb, 0 terms, 14 links.



Query 001 took 2 seconds, produced 14 hits, 14 novel.

Now fetching the 14 novel hit pages.
http://www.waste.org/mail/?list=pynchon-l&keywords=wordsex&page=3
waste mailing lists / pynchon-l search results: wordsex
0000-00-00 / Unknown / Windows-1252
0 score, 7 phrases, 124 words, 1 kb, 92 terms, 7 links.
 scheper glenn bytes vineland mikebailey june wrote wed jun just gmt miscellany 
happened before notorious omitted pages major during reread results corner 
cannot sensing famous getting iran note tore matches


http://www.waste.org/mail/?list=pynchon-l&keywords=wordsex&page=2
waste mailing lists / pynchon-l search results: wordsex
0000-00-00 / Unknown / Windows-1252
0 score, 18 phrases, 333 words, 3 kb, 181 terms, 14 links.
 scheper glenn bytes wordsex gmt atdtda omitted type apr fri mikebailey neville 
message egyptian alchemy mon wed jan got exe reef try earth original category 
conservapedia encyclopedias unicode preservation


http://www.waste.org/mail/?list=pynchon-l&month=0702&msg=115405&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
53 score, 60 phrases, 969 words, 5 kb, 476 terms, 13 links.
 him lew sin scheper glenn she her god when shall thought original remember 
supposed everyone done sure like wife job night once mouth could wheel three 
following before committed condemned pulled ferris


http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=113877&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
8 score, 7 phrases, 133 words, 1 kb, 87 terms, 15 links.
 scheper glenn yahoo wordsex waste exe monropolitan trepidation functionality 
opened everyone omitted obliged message given discussions raving freeware todays 
date update certain here dave beta pynchon


http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=114097&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
11 score, 16 phrases, 262 words, 2 kb, 155 terms, 15 links.
 scheper glenn studies wordsex riddles riddle exe there pynchon authors 
literature revelation literary immediately functionality several correlate 
superficial however remarkable copyleft something specific


http://www.waste.org/mail/?list=pynchon-l&month=0702&msg=115754&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
30 score, 34 phrases, 550 words, 3 kb, 341 terms, 22 links.
 scheper glenn wordsex text latin iso add url yet http limitation characters 
coded pages reform usacii folder process eset file gave index them sich exe just 
gions tue due ytt try www php accesibilidad


http://www.waste.org/mail/?list=pynchon-l&month=0706&msg=119190&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
10 score, 11 phrases, 248 words, 2 kb, 173 terms, 19 links.
 scheper glenn html edu ferguson haraway cone few www http htm jacketmagazine 
cyborgmanifesto literacy manifesto immanenscendence okeeffemuseum erasure 
minutes everything rosamond modernpostmodern feminism


http://www.waste.org/mail/?list=pynchon-l&month=0704&msg=117018&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
14 score, 9 phrases, 204 words, 1 kb, 130 terms, 15 links.
 scheper glenn wordsex watchman exe says sheep demolition evidence copyleft 
binaries followed burritos omitted amazing professors controlled convincing 
burrito suspicious quickening files freeware patriotsquestion


http://www.waste.org/mail/?list=pynchon-l&month=0702&msg=115134&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
10 score, 11 phrases, 204 words, 1 kb, 133 terms, 14 links.
 scheper glenn dance fair chicago dancers timeline scandalized divided 
interpretations relevant credited copyleft specific financial performed 
containing village amalgam enraged fictitious given probably


http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=114850&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
18 score, 10 phrases, 361 words, 2 kb, 118 terms, 12 links.
 matches text diagnostic file txt has this awake tokens stemmer scheper stemming 
kin words glenn search thread yet memory awaked awaking containing concurrent 
coded pages algorithm results porter started


http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=113855&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
4 score, 3 phrases, 102 words, 1 kb, 73 terms, 13 links.
 scheper glenn wordsex cache exe functionality especially freeware review 
present todays currently date update certain items file pynchon index scrolling 
previous waste arrow text then them jan fun horn


http://www.waste.org/mail/?list=pynchon-l&month=0701&msg=113812&keywords=wordsex
waste mailing lists / pynchon-l by date
0000-00-00 / Unknown / Windows-1252
52 score, 42 phrases, 870 words, 5 kb, 441 terms, 15 links.
 scheper glenn tribulation tap metanoia upon god know thou revelation jesus 
language wordsex knowledge abject wog them such exe which could chums unorthodox 
happened autopoetic individuals divine another


http://www.waste.org/
waste
0000-00-00 / Unknown / Windows-1252
0 score, 0 phrases, 69 words, 1 kb, 69 terms, 53 links.
 lostchocolatelab homepages oxymoron recipes lameness burnunit uvulas ivanish 
apricot dimitri archives linux roadrunner embedded donors darkside azure engine 
danger sensor egyptian unbroken ferret oracle


http://www.waste.org/mail/?list=pynchon-l
waste mailing lists / pynchon-l
0000-00-00 / Unknown / Windows-1252
42 score, 22 phrases, 578 words, 3 kb, 195 terms, 186 links.
 mar jun jul sep nov dec jan aug apr oct waste pynchon feb list subscribe send 
org may archives digest unsubscribe please instead then group message discussion 
welcome replies whole reply leave dash want


Finished search engine 001 


Thread ended.
  -- Add Web Search:
========================================================


Next, I do:
	Add, Word Search, and again specify as parameter, wordsex.

This is the result of that, chosing KWIC format, and I could next
click on any KWIC result line to view the containing source text:

In order that word-wrap during copy and paste not uglify these
KWIC lines, I first set a very narrow window to reduce the size
of the KWIC context that gets presented in the results:

========================================================
Find Words:
wordsex

Thread started.

Search tokens, without stemming:
wordsex - 29 matches

Search tokens, matches after stemming:
wordsex - 29 matches

s mornings work on WordsEx went very smoothly. M
ed, dash, and made WordsEx emulate it when copyi
g them opened in in WordsEx, and clicked on each 
---------- You know, WordsEx fetched all these page
ation jesus language wordsex knowledge abject wog
         scheper glenn wordsex watchman exe says sh
cked on each URL. WordsEx also supplied this text-
08:00) (9626 bytes) WordsEx hey, I finally got a cha
b tool this weekend: WordsEx.exe">http://home.eart
cheper glenn studies wordsex riddles riddle exe ther
 posts today, I used WordsEx to read a folder from 
of them were mine. WordsEx.exe">http://home.eart
net/~glenn_scheper/WordsEx.exe Yours truly, Glenn
net/~glenn_scheper/WordsEx.exe Yours truly, Glenn
chance to play with WordsEx. It probably does mor
g connects. I asked WordsEx for "write under erasu
         scheper glenn wordsex cache exe functionalit
hed in my freeware, WordsEx.exe, which, by the wa
ain fun functionality: WordsEx.exe">http://home.eart
         scheper glenn wordsex text latin iso add url ye
n tool WordsEx.exe: WordsEx.exe">http://home.eart
are information tool WordsEx.exe: WordsEx.exe">h
net/~glenn_scheper/WordsEx.exe I put this URL in i
scheper glenn bytes wordsex gmt atdtda omitted typ
net/~glenn_scheper/WordsEx.exe I had a few good 
net/~glenn_scheper/WordsEx.exe Especially, you c
3 bytes) NP: Try my WordsEx now. http://home.earth
 > functionality: > > WordsEx.exe">http://home.eart
net/~glenn_scheper/WordsEx.exe Due to, I think it w
nformation). When I WordsEx'd that url, and asked t
net/~glenn_scheper/WordsEx.exe ... Glenn Scheper
cheper glenn yahoo wordsex waste exe monropolita
net/~glenn_scheper/WordsEx.exe ______________
onality every day... WordsEx.exe">http://home.eart
hon-l search results: wordsex 23 matches were foun
hon-l search results: wordsex 23 matches were foun

Thread ended.
  -- Find Words:

========================================================

So what else can WordsEx do?

On any of your longish emails I do a select all, and copy
from outlook, and paste the whole email text into WordsEx for
easier reading using green, big fonts, and smooth scrolling.

One of WordsEx text reduction strategies is to remove the
typical case of email quoting using right angle brackets,
as atop this email. I should try, by saving in this email
and querying for this unlikely search string: kskjebnkjj.

The last program update to my web site was June 20, but I
really am trying to work on it, and especially the search
engine URL choices (like 80 or so?) and techniques next.

As to the consumption of WASTE server resources, I'd say:
"Smoke 'em if you got 'em"

Yours truly,
Glenn Scheper
http://home.earthlink.net/~glenn_scheper/
glenn_scheper + at + earthlink.net
Copyleft(!) Forward freely.




More information about the Pynchon-l mailing list