Moving from Wordpress to Pelican

In the process of finding Site Sucker I also found something else. Apparently the cool kids have been moving their blogs over to something called static HTML generators for the past couple of years. As usually happens with trends, I only found out about it by accident.

After reading about the advantages of this approach, I decided to try it. Be warned: static site generators are built for and used by geeks. If you're not comfortable using the command line, stick with Wordpress. If you're ready to take the plunge into static site building, though, here's some information that could save you some time. This is the blog post I wish I'd found before I began the trial-and-error phase of this process. I'll walk through a complete installation of Pelican and then describe how I moved my old site's content over. If you've already gotten Pelican working, you can scroll down.

Installing Pelican

There's a slew of static site generators out there. They're like text editors or Twitter clients: every serious programmer apparently decides to build one from scratch at some point, so the world is littered with these projects. I suggest sticking with one of the more popular ones, which is much more likely to be supported in a few years than the more obscure options. Here is a list sorted by popularity.

I chose Pelican for a couple of rational and probably many irrational reasons. It's written in Python, a language I've been meaning to learn for awhile, and seems to have pretty good support and documentation. I also liked the look of the default theme.

The installation process is pretty easy but not entirely obvious. Here's what appears to be the right way, at least on a Mac running OS X Yosemite. Linux users will do essentially the same. Windows users are on their own.

First, update Python to 2.7.x. Pelican is theoretically compatible with Python 3, but everyone seems to use it with 2.x so I used that.

Now open Terminal and do the following:

sudo pip install virtualenv

This calls the built-in Python package manager, pip, and tells it to install a package called virtualenv. This is technically optional, but it's a very good idea.

Next, let's find a place for our new site:

mkdir Sites/static/sitename

cd Sites/static/sitename

virtualenv env

Terminal says something like: "New python executable in env/bin/python Installing setuptools, pip…done". We've now created a virtual Python environment that will load from a directory called env in our website folder. All of our Python scripts (including Pelican) for the site will live in there. The rest of the site files will be out here in sitename. To activate our new virtual environment and set up our site:

source env/bin/activate

The prompt now has (env) before it, indicating that whatever we do here will use our virtual environment.

pip install pelican markdown

Text scrolls by for a minute. We're installing Pelican and the Markdown interpreter.

pelican-quickstart

This will ask you a bunch of questions to create a new site. You can just accept the defaults for most of them. You'll need to know your FTP or SSH information if you want to configure this for automatic uploads to your web server. SSH is better, but if you only have FTP that will work. In either case, tell Pelican what path to save your web site into once it logs onto your web server. If you only want to play around with a local installation, just leave those options blank. You can add upload information later in the configuration files.

Now create your first post as described in the documentation, then start the server:

make devserver

Visit http://localhost:8000 in your browser. If your new site appears, great. If not, you'll need to troubleshoot that before proceeding. The documentation is pretty good.

Once the site is working, take a look at the directory structure. There should be some new subdirectories and files in your site directory. The content folder is where you'll put text files and images for posts and pages, output is where Pelican saves the finished HTML files after building your site, and pelicanconf.py and publishconf.py are configuration files where you can change various options. Subdirectories within content can include posts and pages.

When you've finished playing around, shut off the development server:

make stopserver

and exit the virtual environment:

deactivate

If this is a new blog, you can just start creating posts now. The workflow is quite natural for a writer, as it separates the writing, editing, and publishing steps.

Write a post in your favorite text editor and save the file in the site's content/posts directory. Use simple Markdown tags for links and formatting, and make sure the first line of the file says "Title: " and the title of your post. You can put other header information below that if you like, including categories and tags; check the Pelican documentation for details.

Once your masterpiece is ready to go, it's time for copyediting and layout. Open Terminal, go to the site directory, activate the virtual environment (source env/bin/activate) and devserver (make devserver), and view the page in your browser (http://localhost:8000). If you spot errors at this stage, go back to the text editor, make changes, save them, and then refresh your browser to see how it looks.

Finally, publish the post with make rsync_upload (or FTP if that's your setup), and your new post is online five seconds later.

Moving Wordpress content to Pelican

If you're moving to Pelican from an established Wordpress site, there's a bit more work to do. I should probably act all macho-geek and claim that this procedure was obvious, but in fact it took me several hours of blundering around to figure it out. Hopefully I can make it easier for you.

There's a script available on the Pelican site that purports to take a Wordpress XML export file (what you get when you choose "Tools > Export" from the standard Wordpress dashboard) and convert it into the files for a Pelican site. Try it. Maybe it will work for you. It failed spectacularly for me.

Instead, I eventually found the plugin I described in the previous post, Jekyll-export. This plugin does 99% of the job, but configures the resulting Markdown text files for a different static site generator. Once you have all of these files, duplicate the directory and put one copy in a safe place. If anything goes wrong in subsequent steps you'll be able to backtrack to the clean Jekyll-export files.

The files - one for each post and page - are saved with date-encoding file names in the format yyyy-mm-dd-this-is-the-post-slug.md. Pelican will automatically read these files and correctly infer the date of each post. Unfortunately, the bird chokes on other components of Jekyll's potion.

I eventually discovered that Pelican can't find the "Title:" line unless it's the first line in the text file, and without this it won't process the post. Pelican also reacts badly to some of the other cruft that the Jekyll-export plugin inserts.

Given several hundred text files that all needed similar but not quite identical edits, I turned to my cantankerous old chum, sed. He lives in the Terminal, of course. Navigate the command line to the working directory of Jekyll-export files, and run these two commands:

find *.md -exec sed -i '' '1,2d' {} \;

find *.md -exec sed -i '' '2,/^---/d' {} \;

The first deletes the first two lines (putting the "title: line at the top), and the second deletes from the second line (now right after title: ) to the line beginning ---, to remove the rest of the header.

Follow that with find *.md -exec sed -i '' "s/^title: '\(.*\)'/Title: \1/g" {} \; to remove the back ticks surrounding some posts' titles, and we seem to be all set.

Copy about 100 or so of the cleaned-up posts into content/posts, and some cleaned-up pages into content/pages. I sub-sorted my posts into folders for each year, but that's optional.

Build the site with:

make devserver

Pelican apparently dislikes any near-duplicate file names in the content folder, even if they're named with, e.g. filename.md and filename-2.md, so rename any of those that Jekyll-export created. It also repeatedly rejected one post for no obvious reason, so I just deleted that post (it was a link to a TWiV episode, which people can presumably find more easily by going straight to the TWiV site) anyway.

Continue transferring files, turning the server off and on again periodically and checking the site to see that everything is going okay.

When it's all working, make rsync_upload (or FTP if you prefer), and the site goes online for the world to see.

Explore

Subscribe