Set up English Wikipedia

From XOWA: a free, open-source, offline wiki application

Overview

English Wikipedia has a lot of data. There are 14.5+ million pages with 20.0+ GB of text, as well as 3.9+ million thumbnails.

Setting all this up on your computer will not be a quick process. As a general estimate, you will need at least 30 GB of disk space: 10 GB for the dump file and 20 GB for the resulting wiki. The import will take about 5 hours processing time. If you want images as well, the numbers increase to 100 GB of disk space and 30 hours of processing time. However, when you are done, you will have a complete, recent copy of English Wikipedia with images that can fit on a 128 GB SD card.

Although the process itself is not hard, I strongly recommend that you try Simple Wikipedia first. Simple Wikipedia has 184,000 pages and 90,000 images. The text version uses 200 MB and sets up in 5 minutes. With images, this expands to 2 GB and 30 minutes of downloading time. Simple Wikipedia is a reasonably accurate simulation of English Wikipedia -- just much smaller. It'll also give you a pretty good idea of what XOWA can do.

Part 1: Set up the wiki

The first part is to set up the wiki. You have two approaches for this part: import the wiki yourself or download my copy by torrent.

Option 1: Import the wiki with XOWA

Note the following details about this approach:

  • XOWA will download the database dump from the Wikimedia backup servers
  • The database dump will be 10 GB and take about 3 hours to download
  • XOWA will take about 2.5 hours to build the wiki and use an additional 20 GB.[1]

If this sounds okay, then let's start:

  • Launch XOWA
  • Use the menu bar and select Tools -> Import From List. Alternatively, you can enter home/wiki/Help:Import/List into the address bar
  • Find en.wikipedia.org (It is the 9th item on the list)
  • Click on the "latest" link to the left.

That's it. The import process has now started. This part takes at least 5 hours so you may want to let it run for a while. When it's done, it will automatically load the Main Page.

Option 2: Download the wiki by torrent

Note the following details about this approach:

  • The torrent will use the latest version from my machine
  • The torrent will download approximately 15 GB
  • The torrent may be slower as I am generally the only seeder (It will get faster as more seeders are available)
  • No further disk space or processing time is required. You will just need to move the files and you're done.

If this sounds okay, then let's start:

  • Download the torrent from here
  • After the download completes, unzip the archive file in /xowa/wiki/en.wikipedia.org. When you are done you should have a file like /xowa/wiki/en.wikipedia.org/en.wikipedia.org.000.sqlite3
  • Launch XOWA
  • Enter "w:" in the address bar. The Main_Page should load.

Part 2: Download the images

This part takes much longer to complete. It will require at least 70 GB of disk space and 24+ hours of download time. You'll be downloading compressed files from archive.org.

If this sounds okay, then let's start:

  • Go to https://archive.org/details/Xowa_enwiki_latest
  • Download each of the listed links from 2014-07-07 #01 to 2014-07-07 #07
  • Extract the files at /xowa/. When you are done, you will have files from /xowa/file/en.wikipedia.org/fsdb.main/fsdb.bin.0000.sqlite3 to /xowa/file/en.wikipedia.org/fsdb.main/fsdb.bin.0098.sqlite3 as well as many others

Updating the wiki

Wikipedia is constantly updating. New pages are added, and existing pages are changed to include different images. The above steps will give you a complete set of images for 2014-07-07. However, if you want to stay up to date with Wikipedia, then you may also want to download the monthly updates.

Monthly updates will be posted at the same url: https://archive.org/details/Xowa_enwiki_latest. There will be a new link with the name of the wiki dump: for example: 2014-08-14. They will have new images introduced in the Wikipedia dump for that month. Note that these updates should be downloaded and unzipped in order (i.e.: first 2014-08-14, then 2014-09-14, etc). There are some files that appear in multiple sets: the most recent copy of the file should always "win".

Note that if you update your wiki, you do not have to update the images. The two are independent of each other. In other words, you can use the 2015-01-01 English Wikipedia xml dump with the 2014-07-07 English Wikipedia images. Note that new images in the 2015-01-01 dump will not show up until you download the appropriate monthly update.

Disk space usage

Some may wonder why XOWA needs so much disk space, especially when compared to other apps. The following is a brief list of reasons:

  • XOWA includes all articles across all namespaces, including the Wikipedia namespace, the Portal namespace, the Help namespace, and several others. It also includes redirect stubs. Other apps will only provide articles in the Main namespace.
  • XOWA includes Categories as well. Other apps will skip Categories altogether.
  • XOWA shows all content on the page. Other apps will omit sections, such as Table of Contents or Navigation boxes at the bottom of the page.
  • XOWA includes all images for the Main namespace, the Portal namespace and the Wikipedia namespace. Other apps will only provide images for the Main namespace
  • XOWA provides an accurate sized thumbnail for an article. If an article shows an 800 pixel wide image, XOWA shows an 800 pixel wide image. Other apps will actually show a smaller 220 pixel wide image.
  • XOWA includes the latest content. Other apps may be many months (if not years) behind.

Notes

  1. ^ Note that when the import completes, it will move the 10 GB file to /xowa/wiki/#dump/done. This file can be deleted safely. Note that XOWA doesn't delete the file, as some users may want to keep the 10 GB file around for archival purposes, and redownoading 10 GB would be time-consuming.

Namespaces

XOWA

Help

Donate