Archiving an Old WordPress Site

by Andy Appleton

I recently retired the crappy old neglected blog which had been festering on my personal domain, appleton.me and replaced it with a shiny new one page site. Whilst I don’t think many people will miss the old content, I was keen not to just let it disappear (what would Jeremy Keith say…).

The proper thing to do would be to move the old site to a subdomain and set up 301 redirects. This would mean that the (few) links to the old site would not die and the three people who ever cared would still be able to get to the content.

Static Content

The old site ran on WordPress, but there is really no need for a CMS when the site is never going to be updated again. It is also a good idea to archive information in the simplest file format available – who knows if PHP & MySQL will be available in 50 years and if they will be able to run a 50 year old weblog? The simplest format here would be plain HTML and CSS files. This also offers the benefit of reduced server load and improved page load times because we aren’t waiting for PHP and SQL queries to execute.

A Plan

So the idea seems simple:

  1. Create a static HTML & CSS version of the old site.
  2. Set it up on a subdomain
  3. Configure 301 redirects to all of the old content

And it turns out is was simple. Here’s how:

Staticification

I played around with wget for a while and learned that I am not really suited to command line tools. Then I found the Really Static plugin for WordPress which will scrape the entire site and produce a static version all nicely arranged into files and folders. This static version will directly mimic the URL structure of the dynamic site and is exactly what I was after.

I added a big old red banner to the top of the dynamic site (to let people know they’re looking at out of date content) and then set the plugin scraping. One thing to remember here is that all links will now be hard coded into the site – that means that you need to tell Really Static what subdomain you will be using (in my case old.appleton.me) before you start the process.

Now I have a plain HTML version of the site which I can configure on a subdomain. The second step is to redirect incoming links to the new (old) subdomain.

.htaccess and 301 redirects

I’m on an Apache web server so I need to configure a set of rewrite rules for all the old pages. Since the new site is only one page I can create a rule which says “Redirect everything at http://appleton.me/* to http://old.appleton.me/* except for the site root”.

If only .htaccess files we as easy as that. I needed a bit of help with this part, but fortunately Stack Overflow is awesome.

Here is the rewrite rule which got the job done:

<IfModule mod_rewrite.c>
  RewriteEngine On
</IfModule>

RewriteCond %{HTTP_HOST} ^appleton[.]me$ [NC]
RewriteCond %{REQUEST_URI} !^(/(index[.](html|php))?)?$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ http://old.appleton.me/%1$1 [R=301,QSA,L]

Let’s step through it.

The first block tells Apache to enable URL rewriting:

<IfModule mod_rewrite.c>
  RewriteEngine On
</IfModule>

Next we tell it to rewrite all URLs unless they are the site root (or /index.html or /index.php):

RewriteCond %{HTTP_HOST} ^appleton[.]me$ [NC]
RewriteCond %{REQUEST_URI} !^(/(index[.](html|php))?)?$

Tell it to also ignore direct requests for files (so that the CSS and JavaScript requests for the new index.html don’t get rewritten):

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

And finally tell it to return a 301 header and redirect to http://old.appleton.me/

RewriteRule ^(.*)$ http://old.appleton.me/%1$1 [R=301,QSA,L]

That just about covers it. I now have a new single page on appleton.me and the old site archived at old.appleton.me. There’s one last thing to consider though…

404 Errors

It is good practice to serve a 404 page when someone requests a page which doesn’t exist. The redirect rules we have set up will send any URL (real or non-existent) to the old.appleton.me subdomain. What I really want is for these 404 errors to be sent to my nice custom 404 page at appleton.me/404.html.

To do this I set up another redirect, this time in the old.appleton.me domain root:

<IfModule mod_rewrite.c>
  RewriteEngine On
</IfModule>

ErrorDocument 404 http://appleton.me/404.html

This little rule just tells Apache to enable the rewrite engine and send all 404 errors to appleton.me/404.html.

So there it is, a fully static archive of my old WordPress site complete with 301 redirects to all old content – no link rot and no reliance on databases and PHP.