On Wed, February 20, 2019 10:24, John Rigg wrote:
On Wed, Feb 20, 2019 at 12:41:08AM +0100, Thomas Brand
wrote:
Currently the mailing list archive is for members
only. Then again
nabble.com seems to have a copy of the whole archive. Would there be any
issue for anybody if all archives are published publicly (say in a
github repository)?. Any thoughts?
If you do publish it please make sure email addresses are
obscured to make automated address harvesting more difficult, as is done
currently on
nabble.com and the official archive.
John
Yes I will do that. So far it looks pretty easy to extract just mails for
jack-devel. It needs a small parser per Mail to pick just the headers of
interest and handle content encoding (some mails have a base64 body). In
that step obfuscating the mail address is reasonable. I think nabble just
replaces the '@' with ' at ', which is better than nothing.
A flat file ordered by time would be the minimum. A better solution would
respect In-Reply-To and Message-ID headers in order to follow the thread.
It's good that data is not lost. In any case if there is something ready
to put out I'll first send a sample here.
Greetings
Thomas