Sept. 10, 2007, 6:43 p.m.
posted by neverloop
Archive Yahoo! Groups Messages with yahoo2mbox
![]()
Looking to keep a local archive of your favorite mailing list? With yahoo2mbox, you can import the final results into your favorite mailer.
With the popularity of Yahoo! Groups (http://groups.yahoo.com) comes a problem. Sometimes, you want to save the archives of a Yahoo! Group, access the archives outside the Yahoo! Groups site, or move your list somewhere else and take your existing archive with you.
The Code
Vadim Zeitlin had these same concerns, which is why he wrote yahoo2mbox (http://www.tt-solutions.com/en/products/yahoo2mbox). This hack retrieves all the messages from a mailing list archive at Yahoo! Groups and saves them to a local file in mbox format. Plenty of options make this handy to have when you're trying to transfer information from Yahoo! Groups. You'll need Perl and several additional modules to run this code, including Getopt::Long, HTML::Entities, HTML::HeadParser, HTML::TokeParser, and LWP::UserAgent.
Running the Hack
Running the code looks like this:
perl yahoo2mbox.pl [options] [-o mbox] groupname
The options for running the program are as follows:
--help give the usage message showing the program options --version show the program version and exit --verbose give verbose informational messages (default) --quiet be silent, only error messages are given --resume resume an interrupted download -o mbox save the message to mbox instead of file named groupname --start=n start retrieving messages at index n instead of 1 --end=n stop retrieving messages at index n instead of the last one --last=n get the last specified number of messages from the list --noresume don't resume, **overwrites** the existing output file if any --user=name login to eGroups using this username (default: guest login) --pass=pass the password to use for login (default: none) --cookies=xxx file to use to store cookies (default: none, 'netscape' uses netscape cookies file). --proxy=url use the given proxy, if 'no' don't use proxy at all (even not the environment variable http_proxy which is used by default), may use http://username:password\@full.host.name/ --country=xx use the given country code in order to access localized yahoo --x-yahoo add X-Yahoo-Message-Num header to identify Yahoo! messages --delay=n sleep for the specified number of seconds between requests
So, this command downloads messages from Weird Al Club, starting at message 3258:
% perl yahoo2mbox.pl --start=3258 weirdalclub2 Logging in anonymously… ok. Getting number of messages in group weirdalclub2… Retrieving messages 3258..3287: .............................. done! Saved 30 message(s) in weirdalclub2.
Here, the messages are saved to a file called weirdalclub2. Renaming the file weirdalclub2.mbx means that I can immediately open the messages in Eudora, as shown in Figure. Of course, you can also open the resulting files in any mail program that can import (or natively read) the mbox format.
A Yahoo! Groups archive in Eudora
Hacking the Hack
Because this is someone else's program, there's not too much hacking to be done. On the other hand, you might find that you don't want to end this process with the mbox file; you might want to convert to other formats for use in other projects or archives. In that case, check out these other programs to take that mbox format a little further:
hypermail (http://sourceforge.net/projects/hypermail/)
-
Converts mbox format to cross-referenced HTML documents.
mb2md (http://www.gerg.ca/hacks/mb2md/)
-
Converts mbox format to Maildir. Requires Python and Procmail.
-
Converts mbox format to Maildir. Uses Perl.
Kevin Hemenway and Tara Calishain
- Comment