Archiving an Old WordPress Site

24th March 2011
10:31 am

I recently retired the crappy old neglected blog which had been festering on my personal domain, appleton.me and replaced it with a shiny new one page site. Whilst I don’t think many people will miss the old content, I was keen not to just let it disappear (what would Jeremy Keith say…).

The proper thing to do would be to move the old site to a subdomain and set up 301 redirects. This would mean that the (few) links to the old site would not die and the three people who ever cared would still be able to get to the content.

Static Content

The old site ran on WordPress, but there is really no need for a CMS when the site is never going to be updated again. It is also a good idea to archive information in the simplest file format available — who knows if PHP & MySQL will be available in 50 years and if they will be able to run a 50 year old weblog? The simplest format here would be plain HTML and CSS files. This also offers the benefit of reduced server load and improved page load times because we aren’t waiting for PHP and SQL queries to execute.

A Plan

So the idea seems simple:

  1. Create a static HTML & CSS version of the old site.
  2. Set it up on a subdomain
  3. Configure 301 redirects to all of the old content

And it turns out is was simple. Here’s how:

Staticification

I played around with wget for a while and learned that I am not really suited to command line tools. Then I found the Really Static plugin for WordPress which will scrape the entire site and produce a static version all nicely arranged into files and folders. This static version will directly mimic the URL structure of the dynamic site and is exactly what I was after.

I added a big old red banner to the top of the dynamic site (to let people know they’re looking at out of date content) and then set the plugin scraping. One thing to remember here is that all links will now be hard coded into the site — that means that you need to tell Really Static what subdomain you will be using (in my case old.appleton.me) before you start the process.

Now I have a plain HTML version of the site which I can configure on a subdomain. The second step is to redirect incoming links to the new (old) subdomain.

.htaccess and 301 redirects

I’m on an Apache web server so I need to configure a set of rewrite rules for all the old pages. Since the new site is only one page I can create a rule which says “Redirect everything at http://appleton.me/* to http://old.appleton.me/* except for the site root”.

If only .htaccess files we as easy as that. I needed a bit of help with this part, but fortunately Stack Overflow is awesome.

Here is the rewrite rule which got the job done:

<IfModule mod_rewrite.c>
  RewriteEngine On
</IfModule>

RewriteCond %{HTTP_HOST} ^appleton[.]me$ [NC]
RewriteCond %{REQUEST_URI} !^(/(index[.](html|php))?)?$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ http://old.appleton.me/%1$1 [R=301,QSA,L]

Let’s step through it.

The first block tells Apache to enable URL rewriting:

<IfModule mod_rewrite.c>
  RewriteEngine On
</IfModule>

Next we tell it to rewrite all URLs unless they are the site root (or /index.html or /index.php):

RewriteCond %{HTTP_HOST} ^appleton[.]me$ [NC]
RewriteCond %{REQUEST_URI} !^(/(index[.](html|php))?)?$

Tell it to also ignore direct requests for files (so that the CSS and JavaScript requests for the new index.html don’t get rewritten):

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

And finally tell it to return a 301 header and redirect to http://old.appleton.me/

RewriteRule ^(.*)$ http://old.appleton.me/%1$1 [R=301,QSA,L]

That just about covers it. I now have a new single page on appleton.me and the old site archived at old.appleton.me. There’s one last thing to consider though…

404 Errors

It is good practice to serve a 404 page when someone requests a page which doesn’t exist. The redirect rules we have set up will send any URL (real or non-existent) to the old.appleton.me subdomain. What I really want is for these 404 errors to be sent to my nice custom 404 page at appleton.me/404.html.

To do this I set up another redirect, this time in the old.appleton.me domain root:

<IfModule mod_rewrite.c>
  RewriteEngine On
</IfModule>

ErrorDocument 404 http://appleton.me/404.html

This little rule just tells Apache to enable the rewrite engine and send all 404 errors to appleton.me/404.html.

So there it is, a fully static archive of my old WordPress site complete with 301 redirects to all old content — no link rot and no reliance on databases and PHP.

Leave a comment

  1. (Required)
  2. (Not published, required)

    7 Comments

  1. Matt Hill
    28th July 2011
    1:32 pm

    Awesome, thanks for the tutorial, great work! This is exactly what I’m going to need when I move to a new domain and retire the old one.

    Reply →
  2. calex
    2nd August 2011
    7:01 pm

    Very interesting, thanks! I’m trying to figure out something like this for a job. Did you set up the subdomain via a WordPress MU network? My client needs a way to move http://www.domain.com to 2011.domain.com when he’s done with it, and then have http://www.domain.com populated with his new content for 2012, and so on, on a yearly basis. HTML/CSS scraping like this might be the best way to do it, though I wish it could be completely automatized — looks like he’ll have to get his hands on at least some code.

    Reply →
    • Andrew Appleton
      2nd August 2011
      7:05 pm

      @calex:
      No, it’s just a standard cname pointing to another directory on the webserver containing my archive.

      I guess you could go for some kind of shell script which you could fire off from the WordPress admin but I wouldn’t know where to start with that!

  3. james
    14th October 2011
    12:33 am

    This is a very beautiful website. I love it :)

    Reply →
  4. Kirthi Raman
    4th November 2011
    12:04 pm

    This is very creative. I must admire your work and encourage you to keep working, you are very creative !!

    Reply →
  5. xiangzi
    20th December 2011
    12:34 pm

    Your comment style is great. I like it.

    Reply →
  6. Venkatesh
    6th April 2012
    2:23 pm

    Hi,
    Thanks for the write up. I was looking to create an archive of my old site and start off a new one. I want to keep the new one active, much like what Calex is talking.

    How would the redirect command then be different??

    Any and all help will be highly appreciated. Thanks a ton :)

    Reply →