Reproducible infrastructure for Data Scientist

You should always care about the reproducibility of your data analysis. A lot of related resources focus on how the data is processed. This article is about maintaining unified data-scientist technology stack.

An original version of the text was published on Kaggle blog. The main idea is to make use of Docker containers as a central place of storing all necessary libraries and tools. Later on, it is possible to import and export such container on different environments making sure that everything works exactly the same.

These are alternative commands that can be used to spin up the container.

Mind the differences between the original link:

  • fit to work on *nix systems,
  • no explicit Docker Machine is needed,
  • ability to render graphics (plots, etc)


  • Make sure that the Docker Engine is installed.
  • Download the kaggle/python image (notice that it is nearly 8GB)

  • At the end of ~/.bashrc add new aliases:

  • Reload the shell

Those 3 commands are responsible for spinning instant containers executing desired Python tasks.

Happy coding.

Securing Docker container with HTTP Basic Auth


On the certain stage of developing a product, we want to make it publicly visible. Naturally, it needs to be restricted only to the privileged visitors. You might consider options like:

  • implementing custom authentication within the system,
  • configuring a server to act as a proxy between the user and the application,
  • limit the access to certain IP addresses,

We will also consider following a good practice of keeping the infrastructure as a code. There are many ways to provision the server - ChefPuppetAnsible or Docker.

This article presents the steps needed to secure a container exposing public port using an extra nginx container acting as a proxy.

Docker 1.9+ and Docker Compose are required.


Begin with the docker-compose.yml for the exemplary demo application:

You can start the application with docker-compose up -d, and then proceed to the web browser. It is expected to see "Hello World".

The architecture is shown in the diagram below. In this case, Docker is forwarding an internal 5000 port to the host 80 port.

Nginx proxy

To secure the web-app we are going to:

  • remove the port mapping from the web-app (it won't be directly accessible),
  • add extra Nginx container with custom configuration (proxy all traffic),
  • to communicate nginx with web-app we are going to make use of the networking feature introduced in Docker 1.9.

The new architecture can be expressed as follows:

Before moving on stop previously started web-app container, create a directory called nginx and modify the docker-compose.yml to match the following snippet:

User credentials

A data about users capable of accessing the web-app will be stored as a .htpasswd file.

Let's create a credentials for the admin user:

The command creates a file with all encrypted user credentials.

Proxy configuration

Inside a directory nginx create a file nginx.conf. Paste the following content:

The configuration forwards all the traffic going to the webapphost (visible due to container networking) from port 5000 to 80. Also a HTTP Auth Basic security is declared and configured here.

Run the application

Make sure that the directory structure looks the same:

If all looks good spin up the infrastructure by typing docker-compose --x-networking up -d.

Now you can access the website only after providing proper credentials.