Building BIG WordPress sites

So you need to build a BIG site in WordPress?

We work a lot with smaller WordPress sites handling well under 1000 posts and pages.

Recently, however, we built and populated a site with about 1.3M posts and 33M rows of metadata.

For the most part, WordPress ‘just worked’ as the numbers went up.

However there were a few interesting technical challenges and areas where things which work on smaller sites started to become problematic at larger scales.

This post is an opportunity to note some of the lessons and techniques we learned along the way.

Meta queries don’t scale well

If you want your WordPress site to scale, it’s important to make good early decisions about how you structure your data.

The wp_postmeta table lies at the heart of a lot of WordPress functionality, and when it gets big it can have a significant impact in a lot of areas.

As the name suggests, wp_postmeta is designed to store metadata about posts, and optimised for queries where you know a post id and want to find its meta values.

It is not optimised for what WordPress developers know as ‘meta queries‘. These are the reverse of the scenario above, where you have a meta value and want to find the post ids which match.

Meta queries work absolutely fine on small sites, but they can be incredibly slow when the database gets big, even if you manage to avoid expensive ‘LIKE’ comparisons.

The most scalable way to store relational data like this is with taxonomies, like the built-in WordPress categories and tags. The database tables used by taxonomies are designed for exactly this purpose – finding all posts using a particular taxonomy term.

Obviously this all depends on your data; taxonomies are not so great for values like unique ids from legacy systems, where you would have many taxonomy terms and only one post associated with each term.

In that case you probably need another strategy like a custom table with indexes tailored for your queries.

Be careful with custom fields

It’s easy to create custom fields for every bit of metadata you need to store, especially if you’re using the amazing ACF plugin (and you probably should be). When you break all your data up into separate fields, you have all sorts of flexibility around querying and using that data.

However, it comes at a cost; every new post added to your site will also add a lot of rows to the wp_postmeta table discussed above – as many as you have meta fields on the post, and double this if you’re using ACF (it stores another meta field for every one of yours). That means a single post with 30 fields results in 60 new rows in wp_postmeta!

This becomes a problem at large scales for the reasons discussed above – wp_postmeta is used by many different features, plugins and themes, and not always efficiently.

The best way to speed this table up is to keep it as small as you can.

One way to reduce your row count is to consider whether you really need separate fields for every single bit of metadata on your posts.

Storing serialized data in a single meta field does go against the general point of relational databases, and should usually be avoided, but it can provide big performance gains in some scenarios – for example, seldom-used data which, when it is needed, is always needed all at once.

Use core WordPress fields

When you’re working with your own custom post types, consider whether you can use core WordPress fields in wp_posts for some of your data, rather than adding custom meta fields.

For example you could associate a single user to your custom post type using post_author, or save a date in post_date. This lets you run really fast queries on just wp_posts without joining it to larger tables.

Limit the use of plugins and fine-tune your themes

This is good advice for WordPress in general, but it’s magnified in large WordPress sites.

A poorly-written plugin or theme can introduce big performance hits with inefficient database queries, and it can take some time to track them down.

Cache smart

Caching is one of the best ways to scale any website. The idea is simple: do ‘expensive’ or slow queries only when necessary, and save the results in a quickly-accessible location for later use.

Pages which are only shown to anonymous or not-logged-in users can be cached as one big chunk of HTML and served up extremely quickly.

More complex sites with user accounts need to be selective with what they cache, because (for example) my ‘recent orders’ page is not going to be the same as yours.

However, it should be entirely possible to cache each user’s list of recent orders and only refresh that specific cache when the user places a new order

In conclusion

This is a huge topic and I’ve only scratched the surface in this post. If you’d like to know more or have a large WordPress project in mind, please don’t hesitate to get in touch with us here at Mogul.