Static Site Search with Gatsby and Algolia

About a month ago, I wrote about developing a static website with server-side Javascript. As I discovered, there are a lot of advantages to a static site. But one of the disadvantages to the lack of a database is that it isn’t obvious how to make the site searchable. With no database to pull from, generating search results can be a challenge.

Rather than pull search results from a database and generating results pages with server-side scripts, static sides tend to opt for storing results in a single file or external database and relying on AJAX to submit queries of the search index. That’s easier said than done, though. Every time you update the site, you need to rebuild the index (whether it’s a JSON file or a database hosted somewhere else). Then, when your users are actually searching the website, you need to set up the AJAX calls to the index and transform the response into a user interface.

There are a bunch of ways to do this, but the one I settled on was to use Algolia, a commercial hosted search solution. Basically, Algolia stores all of the information about your site in a database on its servers. When you build your Gatsby site, you submit the searchable information to Algolia in JSON format. Each time a user conducts a search, your site sends the search term to Algolia. Algolia’s speedy servers run the search against your database and return the results in a standard format. Then your website formats and displays the results.

There’s a Gatsby plugin that handles most of that process (pretty much everything except the UI). However, the plugin is still in active development, and there’s not a ton of documentation on it. There’s a writeup in the Gatsby documentation (which appears to be somewhat out-of-date), barebones documenatation on the plugin’s Github page, and a couple of blog posts. Piecing those together, I was able to create a functional and attractive search interface for my website.

API Keys & Environment Variables

A quick note on API keys. Services like Algolia use API keys—typically randomly generated strings of characters unique to a user—to authenticate requests and track usage. That way you have to know my API key to submit a request to Algolia using my search index. You need to use these keys in your application, but you don’t want them stored in your git repo because you don’t want them publicly available. I use a private repository for this website, so it’s not a huge security risk, but it’s still not great practice to commit your API keys to a repository.

Probably the simplest solution is to use environment variables. These are variables you define for your build environment that can be changed inedependently of the source code. You can read about Gatsby’s implementation of environment variables here. But defining environment variables in the .env file only gave my local development server access to the variables; it didn’t help my deploy to Netlify. Fortunately, Netlify has an interface for defining environment variables. As long as you preface your environment variables with GATSBY_ and use the same variable names in both the Netlify interface and your .env file, your code should run just fine in both development and production.

Partial Updates

I immediately noticed a couple of problems. First, sometimes after a build, I’d find that my entire search index on Algolia had been cleared. There would be nothing in it, and no search results would appear. This was annoying and a little concerning, but rebuilding the site (without any code changes consistently fixed the problem). Second, Algolia cleared all of my index configuration settings, including searchable attributes, ranking, and sorting, every time I rebuilt the index, regardless of whether it cleared the index or not. Third, since I was rebuilding the entire search index on every update, I was using a lot of Algolia index operations. Algolia bills customers based on “units” that include search queries and index operations, and you get 10,000 per month for free. Ten thousand might seem like a lot, but I was burning through them pretty quickly when I was rebuilding the search index on every site build.

The Gatsby Algolia plugin provides support for “partial updates”; instead of rebuilding the index from scratch on every build, you only send Algolia the items that have been added, deleted, or changed. To do that, you need to tell the Gatsby plugin which items have changed. You do that by adding a modified field to your search index. This can be either a boolean or a timestamp.

Adding the modified field would be easy enough if I had a database powering my site; it would be a trivial matter to add a timestamp every time the database is updated. It’s less straightforward in with a Gatsby static site. Fortunately, Gatsby exposes the variables I needed through its GraphQL queries. There’s a modifiedTime attribute for files, but that can be thrown off if you’re using a git repo (like I am). Using this blog post, I figured out how to access the timestamp of a git commit associated with a given file. I told Gatsby to create a field for each file I wanted to index, accessible through GraphQL, that stores that timestamp, by modifying my gatsby-node.js file:

const { execSync } = require('child_process');

exports.onCreateNode = ({ node, actions }) => {
  if (node.internal.type === 'MarkdownRemark'
      || node.internal.type === 'Mdx') {
    const gitAuthorTime = execSync(
      `git log -1 --pretty=format:%aI ${node.fileAbsolutePath}`
    ).toString();
    actions.createNodeField({
      node,
      name: 'gitAuthorTime',
      value: gitAuthorTime
    });
  }

Then it was a simple matter to add the gitAuthorTime field to my search index as the modified field. Once I enabled partial updates in my gatsby-config.js file, I found that the partial updates worked perfectly.

Conclusion

Overall I’m very happy with my Algolia-powered search, and I plant to continue using for the foreseeable future. If you want to test it out, click or tap on the search box in this site’s navigation menu. But there are some drawbacks to using Algolia that mean I’ll keep an eye out for other solutions that I might like better. Here’s my pro and con list for Algolia.

Advantages

  • Speed: it’s really fast
  • Instant search: I don’t have to wait for the user to submit a query to display results; results show up as you type
  • Flexibility: I can customize exactly what is in my search indices, use different indices for production and development, configure ranking and sorting, and build the search input and results interfaces myself

Disadvantages

  • Commercial: Algolia isn’t open source, and there are limitations on the free tier
  • Third-party: I like my sites to be self-contained for cosmetic reasons, but there’s also the issue that if anything happens to Algolia, or if they stop offering a free tier, I’ll have to find another search solution, and I may have to build it from scratch
  • Requires Javascript: It doesn’t work for users who have disabled Javascript, and there’s no fallback