Import IntenseDebate in Wordpress

After migrating a blog from blogger to WordPress, the next step was to import the comments. On the original blog, the comments where not inside blogger but using a third party system called IntenseDebate.

On IntenseDebate website, there’s an option to export all the comments for a given blog as an XML file.

The XML file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<output>

<blogpost>
<url>http%3A%2F%2Fblogname.blogspot.com%2F2010%2F01%2Fmy-blog-post-title.html</url>
<title>http://blogname.blogspot.com/2010/01/my-blog-post-title.html</title>
<guid>http://blogname.blogspot.com/2010/01/my-blog-post-title.html</guid>
<comments>
	<comment id='123456789' parentid='0'>
		<isAnon>1</isAnon>
		<name><![CDATA[Commenter&amp;#039;s Name]]></name>
		<email>commenter@example.com</email>
		<url></url>
		<ip>12.34.56.78</ip>
		<text><![CDATA[My blog&#039;s link <a href="http:\/\/example.net\/2010\/01\/yolo.html?utm_source=rss&amp;amp;utm_medium=rss" target="_blank"> with some things $amp; others.  ]]></text>
		<date>2010-01-20 10:11:12</date>
		<gmt>2010-01-20 10:11:12</gmt>
		<score>0</score>
	</comment>
</comments>
</blogpost>
</output>

There used to be WordPress plugins to import those files, but they haven’t been maintained for more than 10 years… Shouldn’t be too difficult to come up with a little script to get the job done.

Looks like some cleanup, unescaping and decoding of content and URL is required. But there are 2 important things to find out:

  • how to match a blogpost node from the XML file with a post in WordPress;
  • how to keep the comments threaded.

The second point is easy, each comment has an id attribute that we will use as identifier in WordPress, and each comment has a parentid attribute pointing to the parent comment.

For the first point, after looking a bit in WordPress database, it appears that while importing the blogger’s XML file, a blogger_permalink entry has been created for each post in wp_postmeta table. That permalink, once prefixed with https:// and domain name, is what we find in the url tag of the blogpost node.

Here is what I used:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
<?php

function getPostId($xml_post, $ps) {
  $url = substr(rawurldecode($xml_post->url), strlen('https://www.blogname.com'));
  mysqli_stmt_bind_param($ps, 's', $url);
  mysqli_stmt_execute($ps);
  mysqli_stmt_bind_result($ps, $post_id);
  mysqli_stmt_fetch($ps);
  return $post_id;
}

function decode($text) {
  $text = stripslashes($text);
  $text = html_entity_decode($text, ENT_QUOTES | ENT_XML1 | ENT_SUBSTITUTE, 'UTF-8');
  $text = html_entity_decode($text, ENT_QUOTES | ENT_HTML5 | ENT_SUBSTITUTE, 'UTF-8');
  $from = ['&rsquo;'];
  $to = ['’'];
  $text = str_replace($from, $to, $text);
  return trim($text);
}

mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);
$sql_conn = mysqli_connect('localhost', 'username', 'password', 'dbname') or die('Connection failed: ' . mysqli_connect_error());
$xml = simplexml_load_file('intensedebate-backup.xml') or die('Error reading XML file.');

$ps1 = mysqli_prepare($sql_conn, "SELECT post_id FROM wp_postmeta WHERE meta_key='blogger_permalink' AND meta_value=?");
$ps2 = mysqli_prepare($sql_conn, "INSERT INTO wp_comments (comment_ID, comment_post_ID, comment_author, comment_author_email, comment_author_url, comment_author_IP, comment_date, comment_date_gmt, comment_content, comment_parent, user_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)");
foreach($xml->blogpost as $post) {
  $post_id = getPostId($post, $ps1);
  mysqli_stmt_reset($ps1);
  if (!$post_id) {
    print('No post found for ' . $post->title . PHP_EOL);
    continue;
  }
  print('Processing comments for ' . $post->title . PHP_EOL);
  foreach($post->comments->comment as $comment) {
    $id = $comment->attributes()->id;
    $parent_id = $comment->attributes()->parentid;
    $ip = $comment->ip;
    $email = strtolower(trim($comment->email));
    $user_id = $email == 'livgaunt@gmail.com' ? 1 : 0;
    $author = $user_id ? 'Liv' : decode($comment->name);
    $author_url = $user_id ? 'https://www.blogname.com/author/liv' : $comment->url;
    $date = $comment->date;
    $date_gmt = $comment->gmt;
    $content = decode($comment->text);
    print('- ' . join(' | ', [$id, $parent_id, $user_id, $ip, $author, $email, $author_url, $date, $date_gmt]) . PHP_EOL);
    mysqli_stmt_bind_param($ps2, 'iisssssssii', $id, $post_id, $author, $email, $author_url, $ip, $date, $date_gmt, $content, $parent_id, $user_id);
    mysqli_stmt_execute($ps2);
    mysqli_stmt_reset($ps2);
  }
}

$result = mysqli_query($sql_conn, "UPDATE wp_posts p SET comment_count = (SELECT COUNT(*) FROM wp_comments c WHERE c.comment_post_ID = p.ID) WHERE p.post_type = 'post'");
if ($result === FALSE) {
  print('Failed to update comments count on posts.' . PHP_EOL);
}

mysqli_close($sql_conn);

Some explanations:

  • Lines 3 to 10: a function to retrieve WordPress post id from a blogpost URL.
  • Lines 12 to 20: a function to clean content. The content was not consistant, hence requiring different unescaping methods and still some had to be manually done.
  • Line 23: open database connection.
  • Line 26 and 27: create prepared statements for the 2 SQL queries that will be executed several times. The first one is to find the blog post ID, cf. function getPostId. The second one is to insert a comment in the database.
  • Line 28: looping over each blog post from the XML file.
  • Line 29: find the post id.
  • Line 36: looping over each comment of current blog post from the XML file.
  • Lines 37 to 46: extract and parse the different fields needed to create a comment entry.
  • Lines 48 and 49: bind the fields to the prepared statement to insert the comment in the database then execute it.
  • Line 54: update the comments count on each blog post.

This script and the comments XML file are then uploaded to the container running WordPress. The script can be executed with PHP CLI or by calling it through the web interface if exposed (not recommended).

Note: raw HTML is saved in the comment. Make sure it does not contain something you don’t want or could be a security issue.

Comments Add one by sending me an email.