On Wed, February 20, 2019 10:24, John Rigg wrote:
On Wed, Feb 20, 2019 at 12:41:08AM +0100, Thomas Brand
wrote:
Currently the mailing list archive is for members
only. Then again
nabble.com seems to have a copy of the whole archive. Would there be any
issue for anybody if all archives are published publicly (say in a
github repository)?. Any thoughts?
If you do publish it please make sure email addresses are
obscured to make automated address harvesting more difficult, as is done
currently on
nabble.com and the official archive.
John
Looking at the mail archive, it is roughly covering a decade. During
testing the conversion output alone I stumbled upon mails that are
valuable to understand the history of the jack project better. It's a
concentrated resource (without ads!) of information that can be queried
with grep.
Attached to this mail are samples for 3 blindly chosen archived mails.
Each mail is represented as plain text and HTML file.
The encoding is UTF-8. If plain text file is viewed in browswer, the
display setting should be "Unicode" in order to correctly display. The
HTML variant should work out-of-the-box with an according header for
UTF-8. Some mails have garbled text which is possibly the result of mail
clients sending text forth and back with small encoding errors. Only very
few mails have that problem, where possible the encoding is converted from
the MIME part infos (using reformime, iconv).
Mail addresses are obfuscated with the pattern
Full Name <_hidden_(a)domain-untouched.tld>
This happens for all header addresses and all addresses in the mail body
(eg. "Jon Doe <email here> wrote:" will be replaced). Already obfuscated
mail addresses are left as is.
VCF Card attachments are removed. PGP Signature attachments are removed.
HTML variant:
-every mail is in a folder
-index.html links to available attachments (images, pdf, diff files, ...)
if any, in same folder
-links to In-Reply-To and Follow-Up messages
Text variant:
-no inline attachments
-all text concatenated in to a gzip file is around 3.5 MB.
I'd like to make this resource available without restriction (eg. not
required to be mailinglist member) as a source of information that can be
used stand-alone or referenced from other places.
Please have a look at the examples.
If you have written to this list and would like to be excluded from the
archives, please tell so (this will be fiddly and make the archive less
useful so please think twice when even considering this).
Any other feedback on form and function is welcome!
Greetings
Thomas