My live migration from Wordpress to Ghost

A few months ago I heard about the Ghost blogging platform as it picked up some momentum and was being talked about in all the tech circles. My initial thoughts were "Why do we need another blogging platform? What's wrong with WordPress?"

Well, to get to get to the point, there's not a whole lot wrong with WordPress. Sure it's kind of old, but that also means it's a stable, functional, and feature-packed framework. If it doesn't do something out of the box then there's a variety of plugins available that'll do the trick.

What drew me to Ghost were four things that are fundamentally different than WordPress, starting with...

Markdown!

I'm sure there's a plugin somewhere that'll let you write your posts in Markdown on Wordpress, but it comes out of the box with Ghost. If you haven't ever composed an article with Markdown before, you may not know what you're missing. I started using it to compose Stackoverflow questions and realized how powerful it was. It's especially useful if I want to write some inline code without worrying about the HTML or using some special WYSIWYG editor. Basically I can express my thoughts faster in Markdown than I can if I was writing in HTML. I also can't stand the superfluous <span><span><p>some</p></span>text</span> that you often get when WYSIWYG editors convert your article to HTML.

Responsive design

What does responsive mean? A website is considered "responsive" if all the elements on the page get automatically resized to fit your web browser's viewport. Your web browser's viewport (window size) is going to be different depending on if you're viewing and interacting with the site on a mobile device, a tablet, or a desktop computer. While there's probably some sort of responsive theme for WordPress that'll do this, it's built into the core of Ghost and comes out of the box. More and more people are visiting sites on mobile devices so it's important that your site can offer a pleasant experience.

Speed

Ghost is fast! How fast? I don't know, but it sure feels zippy creating articles and navigating around the site. I hope to have some benchmarks soon. WordPress often needs some help via plugins like WP Super Cache.

Built on NodeJS

This is more of a bonus than anything in that I have some familiarity with NodeJS. Since I've built websites and services using NodeJS, I should have some comfort digging into the source code or following along with the new feature developments. Making modifications and plugins should be easier as well. I've also set up Amazon EC2 instances to run NodeJS web applications (sitting behind Nginx), and I'm already familiar with NPM (Node's excellent package manager) and some of the helpful NPM packages like Forever that are used to keep this site up.

Okay, so how did you do it?

My plan was to get an empty Ghost blog up and running on an Amazon EC2 instance and then import my data over from WordPress. Afterwards I would point the domain name netinstructions.com over to the IP address of the EC2 instance by modifying the A record. Then I could safely power down the WordPress instance.

Installing Ghost on an Amazon EC2 instance

A few months ago the t2.micro instances were announced. Theses are super low cost machines that are perfect for running websites. If you plan on using one for the next three years you can order a heavy utilization reserved instance, pay $109 upfront and $1.46 per month ($0.002 per hour). Over three years, that comes out to $161.60 total or $4.49 per month. If you're afraid of a 3 year comittment you can do:

  • 1 year commitment and pay $6.44 per month ($51 upfront and $0.003 per hour).
  • No commitments with on-demand instance prices and pay $9.50 per month ($0.013 per hour).

I spun up a t2.micro instance with a Ubuntu 14.04 LTS operating system, converted my .pem private key to a .ppk PuTTY key and used PuTTY to SSH onto the box. There were three things I wanted to install

sudo apt-get update
sudo apt-get install nginx
sudo apt-get install nodejs
sudo apt-get install npm

My plan was to have nginx webserver sit in front of the nodeJS webserver and act as a forwarding proxy. There are pros (ability to host more than one website on one EC2 instance) and cons (two webservers to configure and manage) to doing this, but I think the pros outweight the cons. In this configuration a request comes in, nginx looks at the headers and decides which webserver it should be sent to. Perhaps a request came in for netinstructions.com so my ghost blog should handle it. But maybe I want to set up a test blog at test.netinstructions.com or host catsarereallysuperawesome.com here as well. Nginx can inspect those requests and send them to the right place.

Anyways... We have almost everything except the Ghost blogging software itself. We need to download and run that. The offical guide is here but the TL;DR version is:

$ curl -L https://ghost.org/zip/ghost-latest.zip -o ghost.zip
$ unzip -uo ghost.zip -d ghost
$ cd /ghost
$ npm install --production
$ npm start

You should see something like

Ghost is running in development...
Listening on 127.0.0.1:2368
Url configured as: http://localhost:2368
Ctrl+C to shut down

Find your EC2's public IP address and attempt to visit your site with that IP and the default Ghost port. Do you get something like this?

Well, you need to change your security settings to allow inbound connections to port 2368.

You also need to change the server to 0.0.0.0. In config.json

// ### Development **(default)**
development: {
	url: 'http://localhost:3050',
    database {
    	// snip
    },
    server: {
    	host: '0.0.0.0',
        port: 2368
    }
    paths {
    	// snip
    }
},

Restart ghost with npm start. You should now be able to run in development mode and directly by IP + Port

But how do I run Ghost in production and behind Nginx?

Well, you can delete that firewall rule for port 2368 since incoming requests will be passing through Nginx on port 80, then make sure you allow incoming HTTP requests on port 80. In config.json well be editing the part for production:

production: {
	url: 'http://54.68.205.33',
    mail: {},
    database: { // snip },
    server: {
    	host: '127.0.0.1',
        port: '2368'
    }
}

I then used forever to start up the process

$ NODE_ENV=production forever start index.js

If we tail the forever log (use forever list to find the UID and then tail -20f ~/.forever/UID.log)

Ghost is running...
Your blog is now available on http://54.68.205.33

Okay, now to teach Nginx about it. I created a file in /etc/nginx/sites-available/www.netinstructions.com that looks like this:

server {
  server_name netinstructions.com www.netinstructions.com;
  listen 80;
  listen [::]:80;

  location / {
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header HOST $http_host;
  proxy_set_header X-NginX-Proxy true;

  proxy_pass http://127.0.0.1:2368;
  proxy_redirect off;
  }
}

which essentially tells Nginx to listen on port 80 and forward incoming HTTP requests to 127.0.0.1:2368 which is where our ghost blog is listening.

A convention of Apache and Nginx is to keep two separate directories sites-enabled and sites-available and symlink the sites you want to activate. One advantage of this is you can shut down a misbehaving site by just breaking that symlink.

But we want to activate the site. So I'll do that by creating the symlink and reloading Nginx:

sudo ln -s /etc/nginx/sites-available/www.netinstructions.com /etc/nginx/sites-enabled/
sudo service nginx reload

So the tricky part here is that I can't just go to www.netinstructions.com and see this new Ghost blog. Remember that the DNS is still pointing to my old host? My A record is pointing to a Dreamhost machine with the Wordpress blog.

DNS for testinstructions.com:
Record Type Value
       A    75.119.222.60
www    A    75.119.222.60

But my new blog will be hosted at 54.68.205.33 which is the IP address of the EC2 instance. And unfortunatly Nginx doesn't let you define a server_name: 54.68.205.33;

There's three options to get around this:

  1. Update your A record to point to the EC2 instance. But this sends all visitors to the not-yet-complete Ghost blog.
  2. Add another A record for a subdomain. Maybe www.netinstructions.com points to 75.119.222.60 but test.netinstructions.com points to 54.68.205.33.
  3. Modify your HTTP request headers when accessing 54.68.205.33 in a web browser and add in a value for the Host.

Option 1 is bad if you have frequent traffic going to your existing wordpress blog. Option 2 is not a bad one, but you'll need to change your Ghost config.js file so it knows about the subdomain. I actually went with option 3 and found a Chrome extension to modify my headers.

Then I was able to go to http://54.68.205.33/admin and walk through the first-time account setup process like creating a username and password.

Migrating all that data from Wordpress...

The first step is installing the Ghost WordPress plugin on your WordPress blog. Once it's installed you can navigate to the Tools section and click the export button. You'll end up with a (potentially large) wp2ghost_export.json file containing all your posts. You might want to save this file somewhere safe as a restore point if anything crazy happens.

The next step is importing that data to your Ghost blog which is probably empty at this point. If it contains a 'Hello World' post or anything else, that should be fine as well, since importing your data is just adding additional posts to your existing Ghost blog.

I headedad over to the top secret URL at http://54.68.205.33/ghost/debug/ and imported the .json file. Voila! All my wordpress data was now on the ghost blog.

However, there were some small things to fix. For example, the picture captions in Wordpress didn't translate very well onto the new Ghost blog. In fact, as of right now (12/6/2014) Ghost does not support picture captions out of the box. You would have to jury-rig something up yourself if that is important to you. I just went through the ten or so posts I had and manually cleaned up the image captions. If you have more than that you may want to write a script to go through the .json file to clean it up before importing it onto the Ghost blog.

Fixing (and preventing) broken URLs with redirects

One last thing to worry about are all the URLs out there in the world that link to my site. Google webmaster shows me that there are 229 links out in the world to my site.

It would really suck if some visitor found a link to one of my posts (perhaps on someone else's blog or a different website), clicked on it, and was met with a 404 not found page.

If you consult the Ghost roadmap it looks like custom permalinks are in the works. Until then we can use Nginx 301 redirects. I wrote a regular expression to translate

http://54.68.205.33/2011/10/next-steps-for-aspiring-programmers-after-you-know-the-basics/

into

http://54.68.205.33/next-steps-for-aspiring-programmers-after-you-know-the-basics/

Have a look at the rewrite line in /etc/nginx/sites-available/www.netinstructions.com/

server {
  server_name netinstructions.com www.netinstructions.com;
  listen 80;
  listen [::]:80;

  rewrite '^/\d{4}/\d{2}/(.*)$' /$1 last;

  location / {
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header HOST $http_host;
    proxy_set_header X-NginX-Proxy true;

    proxy_pass http://127.0.0.1:2368;
    proxy_redirect off;
  }
}

I reloaded Nginx and was on my way. This will take care of any /YYYY/DD/ patterns that Wordpress liked to use but is not yet supported in Ghost. It looks like Wordpress tags are supported in Ghost so I didn't need any special rewrites for that.

If you want an easy way to test this, try the curl command (and don't forget your header if you need it)

$ curl -I -L --header "Host: netinstructions.com" http://54.68.205.33/2011/10/next-steps-for-aspiring-programmers-after-you-know-the-basics/

HTTP/1.1 200 OK
Server: nginx/1.4.6 (Ubuntu)
Date: Tue, 09 Dec 2014 04:21:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 35681
Connection: keep-alive
X-Powered-By: Express
Cache-Control: public, max-age=0
ETag: W/"uKAIXdXPOi/Oaw+ZQwcilA=="
Vary: Accept-Encoding

Going live

Once I was satisfied with how the Ghost blog looked it was just a simple matter of updating the A record for my hostname netinstructions.com so that new requests were routed to Nginx on my EC2 instance instead of to my Wordpress blog on Dreamhost.

Does this mean I can shutdown my Wordpress blog? No, as the pictures are still hosted there. For example, the images are linked something like this

http://www.netinstructions.com/wp-content/uploads/2012/05/google-header-animated.gif

But on Ghost the images are linked like this

http://www.netinstructions.com/content/images/2014/12/404-not-found-ghost.png

Can you guess what the Nginx redirect looks like? In /etc/nginx/sites-available/www.netinstructions.com

rewrite '^/wp-content/uploads/(.*)$' /content/images/$1 last;

Don't forget to reload Nginx. And now we need to transfer all the files from the Dreamhost machine in the /wp-content/uploads/ directory and put them in the /content/images/ directory of the EC2 machine. I just used FileZilla for that, but SCP would also work.

The very last step was just to point the domain name to the EC2 machine that's now all ready to handle incoming requests. It was pointing to the Dreamhost machine (which hung out at IP 75.119.222.60), but I'll be overwriting that:

And that's it! A migration to a self-hosted installation of Ghost on a dirt-cheap EC2 instance without any downtime and without breaking any of my URLs out on the web.