In this post I’ll discuss the process of moving the content and the files from my wife’s WordPress site to the Strapi system I setup in the previous post. I threw this together mostly in one afternoon, so it’s very crude, and very much tailored to her site and content. You might be able to re-use parts of it, feel free, it’s on GitHub.
I didn’t have time to write the blog post last week when I wrote the migration scripts, and I had only tested them locally as well. I just went through the process in the last hour of running the scripts with her latest content and running the upload to her production Strapi system, and it went ok generally, with a few glitches, which I’ll discuss shortly. But overall, the content we decided to migrate is on Strapi now, so she’ll have to post new content in both environments for a while, until we finish with the new revamped statically generated site design.
Steps involved in the migration
The plan was to write a few scripts that would take care of the process of exporting the data from WordPress, and importing that same data into Strapi, nothing else. Having done a fair bit of WordPress over the years, I know how varied these environments can get, so my plan was only to write the scripts with her website in mind. WordPress websites range from simple blogging systems to complex and elaborate ecosystems, with hundreds of custom post types and custom fields, thousands of records secured by clever roles & permissions settings, I won’t even get into the mess shortcodes create in the content, luckily, there isn’t much of that in her blog. Her blog has 200+ posts, 600+ files, and the dozen or so pages will need to be re-constructed entirely on the new site to reflect the new motif, as they are mostly a decade old and don’t apply really anymore. The real goal though behind the move is what’s important I think, as moving data and content into a headless CMS represents both a promise and a contract with oneself. Turning content into data, and treating it as such, is synonymous with giving it a forever portable nature that is decoupled from how and where it is stored. This move represents a promise that the content can be leveraged across current and future systems as those evolve and become available in the future. It’s important to keep that promise in mind, so that we don’t fall into the trap now and later in using features of a system like Strapi and others that would alter the content for their own purposes rather than guarantee the integrity of the content in its most “timeless” and “unblemished” format.
With this in mind, I decided to tackle the project as follows:
- Download the WordPress xml export file to the file system
- Convert the WordPress xml export file to a more usable json format, and break up the json into a set of files broken up by post type primarily.
- Download all the images and files from the site into a matching directory folder structure, also maintaining the “relationship” information between those images/files and the posts they are referenced in
- Convert all the WordPress encoded html content into
markdown
- Generate the taxonomies and tags from WordPress in Strapi
- Generate the WordPress posts in Strapi
- Upload all the images/files to Strapi
- Modify the content in Strapi to reference images/urls of embedded content from Strapi, instead of WordPress (by finding urls and making the appropriate substitutions). On my local system, the image references would be pulled from the local Strapi uploads directory, and on the production site, they would reflect their URLs inside of AWS S3.
Exporting the content from WordPress
The first thing I did was export the content in xml
format which is already a feature built-into WordPress. That’s just a couple clicks.
Once exported, it’s just a matter of parsing it, reformatting it, and downloading files from WordPress. This script is available here.
- Converting the
xml
tojson
is made easy with the fast-xml-parser npm module.
const parser = require("fast-xml-parser");
const path = require("path");
const fs = require("fs");
const _get = require("lodash.get");
- Locating the html urls inside the free-text content can be done with the
html-urls
npm module
const htmlUrls = require("html-urls");
- Downloading files was really simple with the
image-downloader
npm module
const download = require("image-downloader");
- Once files were downloaded, determining whether the file was an image or not, could be done with the
is-image-url
npm module
const isImageUrl = require("is-image-url");
- Parsing dates is a job that calls for using the
moment
library
const moment = require("moment");
- Turning
html
intomarkdown
is a job forturndown
const TurndownService = require("turndown");
The rest is all primarily just nodejs
and javascript scripting.
Importing content into Strapi
Strapi doesn’t let you manage content types using an API, at least as of right now. So, I had to manually create the content types before doing the import. Then, there’s a bit of hard-coding to do in the import to map the exported content to the Strapi types and fields. I’m not sure there’s any good way around that, and I do see that as a major deficiency in Strapi. To be honest, having to maintain and develop locally with Strapi is a big deficiency, but I’m sure over time, they will address some of this, so I’ll give them the benefit of the doubt right now and see how things evolve over the next few months.
That said, the Strapi API is really easy to work with. I setup the import using the REST api, it’s simple and works as expected.
Reading the posts from the file system, is as simple as:
const wpPosts = JSON.parse(
fs.readFileSync("./wp-export/posts/post_collection.json", "utf8")
);
And posting to Strapi, is just as simple as:
const axios = require("axios");
const { data } = await axios.post(url, obj);
Before we do anything with Strapi, we have to authenticate and set the JWT token into the axios http header. We also get the list of users and get a default user that we can map items to if there isn’t a Strapi user that maps to the matching WordPress username.
let users = [];
let defaultUser = null;
let _axios = null;
const authenticate = async () => {
try {
const { data } = await axios.post(strapiUrl + "/admin/auth/local", {
identifier: process.env.STRAPI_USERNAME,
password: process.env.STRAPI_PASSWORD,
});
const { jwt } = data;
_axios = axios.create({
baseURL: strapiUrl,
timeout: 1000,
headers: { Authorization: "Bearer " + jwt },
});
users = (await _axios.get("/users?_limit=1")).data;
defaultUser = process.env.STRAPI_POSTS_DEFAULTUSER &&
process.env.STRAPI_POSTS_DEFAULTUSER.length > 0
? users.find((u) => u.username === process.env.STRAPI_POSTS_DEFAULTUSER)
: null;
console.log("Authenticated");
} catch (e) {
console.error(e);
throw e;
}
};
I had a few issues with uploading files using axios, and ended up just using the needle
npm module. This is what my upload
method looks like:
const _upload = async (file, name, caption, alternativeText) => {
try {
const data = {
fileInfo: JSON.stringify({
alternativeText,
caption,
name,
}),
files: { file, content_type: mime.contentType(file) },
};
const { body } = await needle("post", strapiUrl + "/upload", data, {
multipart: true,
headers: {
authorization: _axios.defaults.headers.Authorization,
},
});
return Array.isArray(body) ? (body.length > 0 ? body[0] : null) : body;
} catch (e) {
console.error(`File upload error: ${e.message}`);
return null;
}
};
The import is setup so that I can run it as many times as I want, without causing any issues, and it doesn’t repeat the same task twice, so if a blog post has already been created, it doesn’t try to re-create it, same with all the other types of content, including files.
To create slugs
for content items, I used a function I found that does the job nicely here.
The last thing that took a little time to figure out was how to replace all the URLs inside the content with the new URLs of the files hosted over on AWS S3. What I did there was use a manifest.json
file created during the export process of all files, and I reconcile that at runtime with the files uploaded to Strapi and create a dictionary mapping the from
to the to
. I also added a urls
property to the posts during the export that contains all the URLs inside post content including feature images in WordPress, which I could then use to locate the corresponding Strapi URL using a simple key lookup in the dictionary just mentioned. This prevents the need to iterate all possible options on every post, and reduces lookups to only the known elements that need to be replaced. Of course, links to external content stay the same.
That’s all there is to it. I had a few bugs I had to fix, I ran into a few comments that wouldn’t upload because of weird unicode characters from what I could tell and I just added those post comments manually (only 5 didn’t process, out of a couple hundred). 3 of the 600 files didn’t upload the first time around, but they uploaded on the 2nd run.
Migrating content was a very dev
heavy exercise here, I’m not sure how someone without coding
or scripting
experience would do this. It does beg the question whether it’s worth at this time for a casual blogger with a previously heavy investment in WordPress to take the plunge into the headless CMS eco-system. I’m sure over time, tools will appear, tools will improve and there will be a visual drag & drop mapping tool to move data, files, images, map post types to content types, fields, etc… For a dev, it’s obviously not too difficult, but it’s still a very custom exercise on a per-site basis.
I can start dreaming up what the new site should look like now. Any design ideas? Send them my way.
Enjoyed following along and could definitely use bits of it for an upcoming project of mine! Thanks for sharing.
Curious – did you consider / why didn’t you use WordPress’ rest api as the backend? Was my first option/consideration for said project.
Cheers!
I apologize for taking so much time to comment back as well. I’ve played around with using WordPress as a backend, but it takes a half-dozen plugins and some configuration to get it to work, and doesn’t feel quite so turnkey. That said, there are lots of complimentary plugins in WordPress that could add value (image manipulation, CDN/S3 integrations, etc…), so I think it’s a personal choice. After Strapi commercialized, I took a step back and I’m evaluating a range of Headless CMSs right now, even considering writing my own. Strapi is solid, but I’m not a fan of having to develop the schema as a dev task and then deploying the schema to production separately. For small sites, I’d like a fully integrated environment that let’s me do everything in one place in the browser on the headless side of things, which is how other headless options work. I think Strapi is only going to improve, but this one thing is a show-stopper for me personally.