Folks that use XML dumps of our projects will know that the dumps process has been stalled while we investigated bug 23264. We have been running individual project dumps manually and asking people to inspect them carefully. We have just started the automated dumps up again, and various code fixes should be checked in shortly. Thanks to all for your assistance and your patience.
If you are working with the XML dumps of the English language Wikipedia containing all page revisions (pages-meta-history), please note the following issues with the two completed runs.
The January 30 run is missing the text for a large number of old revisions of articles, primarily revisions created between January 1 2005 and May 14 2005. This was due to bug 20757 which was subsequently fixed. If you are doing analysis using the text data, you can retrieve the missing text by extracting it from an earlier file; see the archives.
The March 12 run is incomplete; it is missing about the last third of the revisions, due to early termination during the compression step.
The stubs files and the current page dumps appear to be fine, so statistical or other analyses that only use these files should not be impacted. The mysql table dumps are also unaffected.
We apologize for the inconvenience and are working on getting out a set of complete full history dumps with all revision text intact.
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?Start translation