"Call for backup" ... with Elasticsearch

Introduction

I bet you all heard this ancient adage:

There are two types of people - those who backup, and those who will backup.

This post is dedicated to the second group of users. Those who just started using Elasticsearch with their production data, and unconsciously feel that something is wrong.

 

Tools

No doubt - I'm a big Docker fan. It saves tons of time and is super easy to use. If you haven't done it before I strongly encourage you to learn it now.

For performing backup we will use official elasticsearch-dump tool within Docker container. In the example below I'm running Elasticsearch 2.4.

We will create two bash scripts - one for backing up data and the other, more importantly - for restoring it. Optionally you can also configure Cron scheduler to do this for you automatically.

Feel free to adjust the directories and file names accordingly to your needs - there are no strict rules here.

Backup

Create a file called perform_backup.sh with the following content:

Make sure to provide valid path to ES server and replace the <INDEX> with the name of the index you want to backup.

That's it.

Restore

Create a file revert_backup.sh and paste the code below:

Like in the previous example make sure that --output argument points to the destination you want it to point.

Also note that you need to pass a backup file as an argument for the script to run correctly, ie:

Cron

If you want to become a pro you can even go one step further - schedule automatic backups.

With only one command:

The following command adds an entry to crontab to execute the script every day at 5 AM.

Summary

Sooner or later everybody will be backing up their data. A feeling that we can deal with unexpected situations reassure our minds and is worth pursuing.

Remember to thoroughly test the whole process a couple times and see if you can fully rely on it. Scripts provided above are exemplary and you should adjust them accordingly to your needs.

Norbert

Let's combine software craftsmanship and data engineering skills results to produce some clean and understandable code.