Blogger to Wordpress
I have been tasked to migrate a few blogs from Blogger to a self-hosted WordPress. So far I have experiences migrating my own blog between different software:
Time to try a new migration combination.
The plan here is:
- export what can be exported from Blogger;
- start a local WordPress;
- import the data exported from Blogger to the local WordPress;
- fix issues and repeat the process until the data are good enough;
- backup the local WordPress database and files;
- load the backup on a staging WordPress instance;
- have the blog owner configure WordPress’ theme and fix the posts;
- migrate the staging blog to production.
Export the Blogger’s blog
Log in with the blog’s owner account on Blogger.
Manage Blog then click on
Back up content.
This will export and download an XML file containing several of the blog’s information like settings, posts, labels and comments.
Spin-up a local WordPress
First of all, I started a WordPress on my computer to try the import.
So I installed it with Docker with a simple
Then start it with
docker compose up.
http://localhost:8080/wp-admin/ and follow the WordPress installation instruction (set the language, blog title and administrator account).
Import the blog
In WordPress, go to
Blogger and click
Once it’s done, click
Run Importer, select the XML file downloaded from Blogger, upload it, then I associated all the posts to the admin user and waited.
Once done, the posts and comments are available in WordPress 🎉.
More information about the blogger importer plugin is on wordpress.org.
The WordPress’ Docker image is… bad. Basically they put a WordPress with default configuration running on a server with default Apache and PHP configuration.
So obviously that’s not what’s going in production, it’s only to test on my computer, but it still comes with issues. The first one being the default file upload size which is 2 MB. And of course one of the blogs I’m migrating has an XML file bigger than that.
Let’s do a quick modification of the container image:
$ docker exec container-wordpress-1 bash -c "echo \"upload_max_filesize = 40M\" > /usr/local/etc/php/conf.d/upload.ini"
After restarting the container, the new configuration is taken into account and I can upload the file.
Semi manual cleanup
The XML file has
content nodes containing the raw HTML code of each post.
I don’t know if it’s because the way Blogger is or the persons who wrote the posts, but the HTML content is clearly not clean.
There are style sections included in each post, which is not great for consistency and portability.
So I did quite a lot of search and replace to clean some of it, but still, a lot will have to be done manually after the import.
Labels on Blogger’s posts are converted to
Categories on WordPress’ posts.
Which seems the obvious choice given that the
Labels are in a
category node in the XML file.
The issue here is that, on the blogs I’m migrating, the
Labels were used like
WordPress having both
Tags, migrating the
Tags would make more sense here.
So before running the import task, let’s do a quick modification of the blogger import plugin to store categories in tags:
$ docker exec container-wordpress-1 bash -c "sed -i 's/wp_create_categories(array_map('\''addslashes'\'', \$this->categories), \$post_id)/wp_add_post_tags(\$post_id, \$this->categories)/' /var/www/html/wp-content/plugins/blogger-importer/blogger-entry.php"
I discovered afterward that in WordPress’ import tools, there’s one called “Categories and Tags Converter”. That’s probably the normal way to fix this issue.
The Blogger’s XML file contains text only. So what about all the images?
After the import is done, the website shows the pictures but looking into WordPress’ media library, I can see only a tiny subset of the pictures. In fact, it’s only the pictures of the most recent posts. The older ones still have the pictures fetched from Blogger ☹️.
This one seems to be a genuine bug.
On line 118 in
process_images method is called, and it imports the pictures for 20 posts.
The way it’s written shows that it was meant to be called in a loop, but somehow this part was missed.
So I opened an issue and did a Pull Request to fix it.
The import process also crashed on me. Some links seem to be invalid and that’s the reason the process crashed.
It’s an old plugin, and the quality is what I remember the average WordPress quality was when I looked at it years ago. In this specific case the code was copied from somewhere else and the person clearly stated that it should not work… and that person was right. Too bad they didn’t think about fixing it.
Hop, another issue opened for that.
Weird line breaks
After importing, the posts had plenty of line breaks in middle of sentences.
This is because WordPress automatically converts line breaks to real HTML ones (
<br/>)… which is a completely stupid thing to do for HTML content.
I solved this by “cleaning” the posts content during the import. Basically I minified the HTML, thus removing all unnecessary white spaces and line breaks. While doing so I took the opportunity to convert HTTP links to HTTPS.
Backup local WordPress
Back up the database by connecting to the container (if you don’t have MariaDB/MySQL tools installed on your computer) then dump the database:
$ docker exec -it container-db-1 /bin/bash
# mysqldump --add-drop-table -u exampleuser -p exampledb > wordpress.sql
$ docker cp container-db-1:/wordpress.sql .
At the moment I’m only interested in the uploaded files, located in WordPress’
In my system, docker volumes are in
Let’s create an archive file from the
$ sudo tar -cvzf wordpress.tar.gz /var/lib/docker/volumes/container_wordpress/_data/wp-content/uploads
Once the database is restored in a new environment, you may need to update URLs in the database to make it work:
UPDATE wp_options SET option_value = 'https://blog.example.net' WHERE option_name = 'siteurl';
UPDATE wp_options SET option_value = 'https://blog.example.net' WHERE option_name = 'home';
UPDATE wp_posts SET post_content = REPLACE(post_content, 'http://localhost:8080/', 'https://blog.example.net/');
And that’s it for the “interesting” part of the conversion of a Blogger blog to a WordPress one.