I’ve recently been working on the search engine for a large, information-dense WordPress site.
The Hawke’s Bay Knowledge Bank has more than 36,000 documents published and searchable, so there is a plenty of text information for a search engine to work with.
In this post I want to take a look at some of the search engine options available for large WordPress sites, and go into a little more detail on the solution I arrived at.
Vanilla search
The default WordPress search engine is perfectly fine for small sites (up to a couple of hundred pages), perhaps with a plugin or two to manage which content types and custom fields are indexed.
However, it’s only a very basic keyword search. It does make an attempt to sort by relevance, but it doesn’t offer too many features beyond that unless you build them yourself.
It also directly searches your website database, which is not going to scale with a large site, especially if it’s busy.
Relevanssi – a big step up
For a while we used Relevanssi. This plugin is a full replacement for the default WordPress search, and a major improvement too.
You can get the free version installed and working in a few minutes. Unlike some plugins, the free version has most of the core features you really need (the premium version does things like indexing PDF files, which might be an attractive feature in some cases).
If you’re a developer, it also has good documentation and a bunch of WordPress hooks to let you modify its behaviour and results.
Unfortunately Relevanssi can only take you so far. It uses your website database for its own index tables, so it’s faster than default but its performance will decrease – and its impact on your web server will increase – as your site grows.
Going external
The next stop is an external dedicated search engine.
Options like Google’s Programmable Search Engine are great in some cases, but the Knowledge Bank has mainly used open source software since its inception, so I wanted to stick with an open source search solution too.
The most popular open source search engine options are Apache Solr and ElasticSearch, which are both built on the Apache Lucene search engine framework.
Solr
Having used Apache Solr some years ago when the Knowledge Bank was on a different CMS, my first inclination was to go back to it.
I quickly found the official Solr Power WordPress plugin and (less quickly) got a shiny new Solr instance up and running.
At that point I discovered the plugin requires a version of Solr released during the first Obama administration, five major versions ago.
To be fair it does still work, but I decided to investigate ElasticSearch. If that didn’t pan out, I’d look into possibly forking the Solr plugin to work with a more modern version of the search engine.
ElasticSearch
The leading ElasticSearch plugin for WordPress is ElasticPress.
Getting the ElasticSearch server up and running was extremely simple and quick. The plugin setup went fine too; it was immediately on speaking terms with the ElasticSearch server, and after some clicking around in its settings it indexed all ~36,000 posts in a few minutes.
One thing I immediately noticed was a lack of custom fields among the options for indexing.
I later learned it will automatically index all public meta fields (i.e. all except the ones prefixed with _) but won’t actually search them unless you tell it to with a hook.
Fortunately the plugin offers a number of useful hooks for a developer to control exactly what gets indexed and searched, and what kind of search is performed.
I decided to stay with ElasticSearch, but I did make some significant modifications to the defaults. Here’s what I did, in case it helps another developer:
- Restrict the meta fields being indexed
Due to the way ACF stores its data, the Knowledge Bank has >10,000 unique meta keys, which makes ElasticSearch a little uncomfortable (its default is 1000 max search fields). We really only need to search 20 or 30 unique keys in total.
Hook: ep_search_fields - Add meta fields to the plugin’s settings page
It only needs a small snippet of code to get our chosen custom fields into the field settings page, which gives us better visibility over what is being indexed and the relative priorities of different fields.
Hook: ep_weighting_fields_for_post_type - Reformat meta fields before they are indexed
Some of the Knowledge Bank’s fields are a little more complex than a simple text string. Not to worry, we can reformat any field on its way to be indexed.
Hook: ep_prepare_meta_data - Change which fields are searched by default
No point indexing data if we are not going to search it!
Hook: ep_search_fields - Completely change the type of ElasticSearch query being performed
Saving the best for last, I changed the fundamental type of ElasticSearch query the plugin performs.
It uses multi-match by default, but I decided to go with the simple query string search instead. This offers some handy search operators and features useful for the typical kind of searching our users do.
Hook: ep_formatted_args – this one is not in the docs at time of writing, which is odd considering it’s probably the most powerful of all the hooks mentioned.
In conclusion
ElasticPress is good. You’ll probably need to write a bit of your own code, but if you’re looking at this level of search engine that is probably not a major hurdle.
If you need a more powerful search engine for your WordPress site and need some help making it happen, please get in touch.