Configuring Docker data storage a 101

This is a short walkthrough on configuring Docker storage options on your development machine.

I’ll use my preferred version of Hello world on Docker – “Setting up mongodb” which lends itself nicely to a walk through of the storage options.

This walkthrough assumes basic familiarity with Docker. First let’s look at setting everything up on a single container.

I started from the DockerFile described here
mongoDB Dockerfile for demoing Docker storage options

Creating the image using

docker build -t mongodb .

You will note that in this Dockerfile we use the VOLUME command to define the target data directory for mongoDB

# Define the MongoDB data directory
VOLUME ["/data/db"]

I am walking through all this on my Mac thus I am using the following lean & mean command to start a mongodb container up as a background process ( daemon) from the mongodb image created from the docker file :

docker run -p 27017:27017 --name mongo_instance_001 -d mongodb --noprealloc --smallfiles

I can then add some data to a mongodb collection ( see Data loading below) That is quick and for some quick tests as part of a SDLC that might be fine but having to recreate your database and reload each time you create a container will eventually prove limiting.
We all know that you need representative datasets for a true test and it’s likely that your datasets are going to be more than 118 records and reloading data every time you run up a mongodb container is not going to be practical!

So we have two options as to how to address the persistance requirements:

  1. Data volume
  2. Data volume container

Data Volume

We will want to create a volume that maps to a folder on your local host in my case I will be mounting a folder on my Mac called $HOME/mongodata ( replace $HOME with your folder name if you are following this through on another OS )

We then create the container from the image but the difference is we now get the container to mount the local folder using this command to create a container:

$ docker run -v $HOME/mongodata/:/data/db -p 27017:27017 --name mongo_instance_001 -d mongodb --noprealloc --smallfiles

Note that as virtualbox shared folders does not support fsync() on directories mongodb will not actually start but you can validate that the mounting of a shared folder on the host works as the logs will show the error and you will see that it created some files in the shared folder before it halted. This part of the walkthrough will work as expected using mongoDB on AWS ec2 for example and is perfectly valid for those applications that do not require fsync() if you are using virtualbox.

Data volume container

This option in my opinion is the most flexible.

First you need to create a data container

docker run -v /data/db --name mongodata busybox

The above creates a data volume contaner based on the busybox image. (Its a small image)

Next you need to start up the application container but this time mounting the data container created earlier

docker run -p 27017:27017 --name mongo_instance_001  --volumes-from mongodata -d mongodb --noprealloc --smallfiles

Load some data into mongoDB

To validate this works as expected stop container 1 then start another container using a similar start up command attaching the Data volume container

docker run -p 27017:27017 --name mongo_instance_002  --volumes-from mongodata -d mongodb --noprealloc --smallfiles

You can check that now when you start mongoDB and look at the databases and collections that the data you loaded using the previous container is available.

You can remove the application containers whenever you like and create new ones as required mounting the data volume container. Note that using the docker ps command does not give you any indication of what containers are mounted to the data volume container .
You can also tar the data volume and copy to another docker host etc see the docker docs for detail on the process

Data loading

I am assuming some familiarity with mongoDB . If you need a quick primer have a look here: Getting started with mongodb

I am using a json file that consists of a dataset of the elements of the Periodic table to populate my database. Here’s how I load my demo databases with data :

mongoimport --db demo --collection periodictable  --type json --file periodictable.json  --jsonArray 

For the purposes of this walkthrough I am using images that are on my local machine rather than pushing up to a registry and pulling back down again.

This walkthrough has been focused on the practicalities of storage with Docker for a deeper dive on storage have a read of this excelent post  on the Overview of storage scalablity in Docker on the RedHat developer blog


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s