Dear Luiz,
Thank you for your patience. I released WP-MIRROR 0.7.3 a few days ago, and have turned my attention now to your feature requests. Thanks also for the nicely detailed e-mail.
0) Bug report
I have not been able to reproduce the bug. That said, I set up `kvm' with my own script, rather than use VirtualBox. So I have not actually reproduced the environment in which you tested WP-MIRROR.
I will have to become familiar with VirtualBox for an additional reason. A Swiss friend who attended the recent Zurich Hackathon, wrote me about a presentation on `Vagrant', which apparently sets up a near current version of MediaWiki using VirtualBox. I will want to check this out.
I will get back to you on this.
1) MediaWiki Extensions
The variety of namespace IDs has me concerned. I count 64 `wikisource' wikis as follows:
(shell)$ rsync ftpmirror.your.org::wikimedia-dumps/ | grep wikisource | wc -l
64
I am wondering how to determine which wiki uses which namespace IDs. I know this information can be found in the <siteinfo> element at the head of every XML data dump. Choosing three of the languages that I can read, I immediately see inconsistency:
In `dewikisource-20140519-pages-articles.xml.bz2' I see:
<namespace key="102" case="first-letter">Seite</namespace>
<namespace key="103" case="first-letter">Seite Diskussion</namespace>
<namespace key="104" case="first-letter">Index</namespace>
<namespace key="105" case="first-letter">Index Diskussion</namespace>
In `enwikisource-201405023-pages-articles.xml.bz2' and `zhwikisource-20140517-pages-articles.xml.bz2` I see:
<namespace key="104" case="first-letter">Page</namespace>
<namespace key="105" case="first-letter">Page talk</namespace>
<namespace key="106" case="first-letter">Index</namespace>
<namespace key="107" case="first-letter">Index talk</namespace>
Two problems: 1) My reading comprehension does not extend to all 64 languages. 2) When the user is browsing, and Extension:ProofreadPage is executed, that <siteinfo> element is long gone.
Possible solution: I may need to implement some kind of (language-code, namespace ID pair) look-up table.
I do not want my `LocalSettings.php' file to become excessively crufty due to a plethora of special cases. I may need advice from someone in the `wikisource' community. To begin with: Has anyone compiled a table of (language-code, namespace ID pair) that I could use? Is anyone managing the assignment of namespace IDs?
2) Special wikis