Got Pynchon links?
Glenn Scheper
glenn_scheper at earthlink.net
Wed Oct 10 07:59:06 CDT 2007
Wow! I just impressed the Hell out of myself. On Oct 10 2007,
I added a new feature to WordsEx that saves a list of all URLs
currently cataloged in memory to a Unicode text file, sorting
all the fetched pages by the count of links to the page, and
subsorting by each page's text complexity score, and showing
their mechanically derived annotation headers, then including
after each URL, fetched or not, a list of all the referrers
along with their anchor texts describing the linked URL.
The WordsEx saved URL list is divided into seven sections:
======= Group 1: Good URLs in memory
======= Group 2: Non-text/html URLs
======= Group 3: Query results URLs
======= Group 4: Not found etc. URLs
======= Group 5: Redirection URLs
======= Group 6: Query rejected hit URLs
======= Group 7: Unfetched new URLs
To try the feature out, I made a query for "Pynchon", which
ran for 10 minutes on a T1 connection. It pulled in 260 files.
I then had WordsEx save the cross-referenced URL list from just
that one query, and got a five megabyte Unicode text file.
Five megabytes is half of my web site's maximum space allowed,
so I used Notepad to open the Unicode file and save it again in
UTF-8 encoding, which is now only 2.7 megabyte.
(...more verbage on home page...)
See it here: PynchonQueryResultUrls.txt
http://home.earthlink.net/~glenn_scheper/PynchonQueryResultUrls.txt
Get the tool: Windows version:
http://home.earthlink.net/~glenn_scheper/WordsEx.exe
Yours truly,
Glenn Scheper
http://home.earthlink.net/~glenn_scheper/
glenn_scheper + at + earthlink.net
Copyleft(!) Forward freely.
More information about the Pynchon-l
mailing list