"Call for backup" ... with Elasticsearch

Introduction

I bet you all heard this ancient adage:

There are two types of people - those who backup, and those who will backup.

This post is dedicated to the second group of users. Those who just started using Elasticsearch with their production data, and unconsciously feel that something is wrong.

 

Tools

No doubt - I'm a big Docker fan. It saves tons of time and is super easy to use. If you haven't done it before I strongly encourage you to learn it now.

For performing backup we will use official elasticsearch-dump tool within Docker container. In the example below I'm running Elasticsearch 2.4.

We will create two bash scripts - one for backing up data and the other, more importantly - for restoring it. Optionally you can also configure Cron scheduler to do this for you automatically.

Feel free to adjust the directories and file names accordingly to your needs - there are no strict rules here.

Backup

Create a file called perform_backup.sh with the following content:

Make sure to provide valid path to ES server and replace the <INDEX> with the name of the index you want to backup.

That's it.

Restore

Create a file revert_backup.sh and paste the code below:

Like in the previous example make sure that --output argument points to the destination you want it to point.

Also note that you need to pass a backup file as an argument for the script to run correctly, ie:

Cron

If you want to become a pro you can even go one step further - schedule automatic backups.

With only one command:

The following command adds an entry to crontab to execute the script every day at 5 AM.

Summary

Sooner or later everybody will be backing up their data. A feeling that we can deal with unexpected situations reassure our minds and is worth pursuing.

Remember to thoroughly test the whole process a couple times and see if you can fully rely on it. Scripts provided above are exemplary and you should adjust them accordingly to your needs.

Boolean Multiplexer in Practice

Introduction

There are two popular types of problems for evaluating learning classifier systems:

  • single-step - like "question-answer" systems,
  • multi-steps - problems where multiple consequential steps are needed to solve it. Most popular are different kind of mazes (in literature often referred as various kind of MAZE or WOODS environments).

This article will focus on a method for testing single-step systems, where the environment has the Markov property (each state is independent of it's predecessor).

A method referred as boolean multiplexer function will be first described, followed by some examples and a simple Python implementation.

Enter ...

Multiplexer

First, let's gain some intuition about the idea of a multiplexer:

Multiplexing is the generic term used to describe the operation of sending one or more analog or digital signals over a common transmission line at different times or speeds. [source]

In the following scheme, an example of the 4-1 multiplexer with 4 inputs, 2 control signals, and 1 output is presented. The output Q can be one of the input signal A, B, C or D depending on the value of a and b.

There are of course many different configuration options available but this knowledge should be sufficient for now.

Boolean multiplexer function

Boolean multiplexer is a case where each signal is represented in binary using either 0 or 1.

There is a convention that the incoming signal consists of two, concatenated parts - control and data bits

In the example, above we are dealing with 6-bit boolean multiplexer. First 2 bits are capable of addressing 4 inputs ( 2^2 = 4 ) that came along.

The output is a data bit at a location specified by converting control bit number into decimal (in this case bin(01) = dec(1)). Data bits indexing starts from zero.

Examples

Below you will find three examples of multiplexer functions.

3-bit

Control bits: 1, Data bits: 2

The output is the 0-th data bit.

6-bit

Control bits: 2, Data bits: 4

The output is the 3-rd data bit.

11-bit

Control bits: 3, Data bits: 8

The output is the 5-th data bit.

Implementation

The following implementation generates a random binary signal (user needs to provide a number of control bits), and prints the correct value of the signal.

Mind that you need to make sure the bitstring module is available in your OS.

Here is an example of using 2 bits for controlling the signal (6-bit multiplexer):

 

Integrating Apache Spark 2.0 with PyCharm CE

The following post presents how to configure JetBrains PyCharm CE IDE to develop applications with Apache Spark 2.0+ framework.

  1. Download Apache Spark distribution pre-built for Hadoop (link).
  2. Unpack the archive. This directory will be later referred as $SPARK_HOME.
  3. Start PyCharm and create a new project File → New Project. Call it "spark-demo".
  4. Inside project create a new Python file - New → Python File. Call it run.py.
  5. Write a simple script counting the occurrences of A's and B's inside Spark's README.md file. Don't worry about the errors, we will fix them in next steps.
  6. Add required librariesPyCharm → Preferences ... → Project spark-demo → Project Structure → Add Content Root. Select all ZIP files from $SPARK_HOME/python/lib. Apply changes.
  7. Create a new run configuration. Go into Run → Edit Configurations → + → Python. Name it "Run with Spark" and select the previously created file as the script to be executed.
  8. Add environment variables. Inside created configuration add the corresponding environment variables. Save all changes.
  9. Run the script - Run → Run 'Run with Spark'. You should see that the script is executed properly within Spark context.

Now you can improve your working experience with IDE advanced features like debugging or code completion.

Happy coding.