A real quick quick start with Google Cloud Platform command line tools

Getting started with Google Cloud Platform (GCP) is actually very easy but as with getting started with anything sometimes you just want a quick 101 of essential steps to set you on your merry way.

Note this post is to help you get up and running as quickly as possible but it does assume that you have read best practices for configuring permissions on gcp

Developing and deploying applications on GCP is arranged around projects and thus understanding how you intend to set up development and admin access for projects is an important initial step.

The next thing to do is to sign up for a Google Cloud account via the sign up page Sign Up for Google Compute Engine

Now that you are ready to get started start a terminal session .

Now  you’ve signed up download the GCloud SDK following the instructions for your OS gcloud compute

Then go through the authorisation process by typing

gcloud auth login

The GCloud SDK uses Google’s OAuth 2.0 service to authorize users and programs to access Google APIs.
Managing authentication and credentials

The GCloud SDK actually bundles the individual command line tools for each service as well as the appropriate SDK’s for each supported language . The moduarlarity is great but can be a bit confusing at first.

So what exactly do you get when you download the Google Cloud Platform SDK?
If you’re following along type the following:

gcloud components list

This gives you a list of the modules for each service

For those components you do not have installed or need an update use the command

gcloud components update component-name


so for example to update the App engine SDK for GO I would type:

gcloud components update gae-go


Next make sure you are working in the correct project


gcloud config list

to check what the currently set default project has been set to

You can set a different default project by typing

gcloud config set project YourProjectID

You can run this at anytime to reset the default project. If you are working on more than one project you will need to specify the non default project appropriately and where or when you do this depends on the comamnd.

Note: you will probably have to activate any services you need to use for a particular project by using the console, making sure you are in the project you wish to activate the service for then selecting API’s under API’s and Auth and setting the status to on for the services you want activated for the project

Each GCP product has it’s own set of commands and you need to use the appropriate set of commands to interact with the service.
See the list produced from the

gcloud components list

For example to interact with BigQuery you use bq and for Cloud Storage you use gsutil.

Below is an example showing how easy it is to get started by listing out the set of commands that I used to upload a csv file to Cloud storage usng gsutil, using bq to load the data into Bigquery, and to start querying that data.

First I created a schema as this is needed to pass to bq when creating the table
Then I uploaded my data set to cloud storage:

gsutil cp PeriodicTableDataSet.csv gs://my-periodicatable-bucket

Next I created a Big table dataset called elements

bq mk elements

Then in a single command I created a table and loaded it with my dataset

bq load elements.ptable gs://my-periodictable-bucket/PeriodicTableDataSet.csv

This returns a success status like he one below if everything okay
Waiting on bqjob_r1c83caf93cc4a0db_0000014a0057532d_1 … (27s) Current status: DONE

So Now I could start querying my data after just  3 steps (4 if you include creating the schema) once I had signed up

bq query "SELECT name, symbol from elements.ptable where Z >100"

Waiting on bqjob_r64487038b0ec017d_0000014a005b650e_1 ... (0s) Current status: DONE
|      name      | symbol |
| Mendelevium    | Md     |
| Nobelium       | No     |
| Lawrencium     | Lr     |
| Rutherfordium  | Rf     |
| Dubnium        | Db     |
| Seaborgium     | Sg     |
| Bohrium        | Bh     |
| Hassium        | Hs     |
| Meitnerium     | Mt     |
| Darmstadtium   | Ds     |
| Roentgenium    | Rg     |
| Ununbium       | Uub    |
| Ununtrium      | Uut    |
| Ununquadium    | Uuq    |
| Ununpentium    | Uup    |
| Ununhexium     | Uuh    |
| Ununseptium    | Uus    |
| Ununoctium     | Uuo    |

Help is avaiable by typing Command –help or comamnd help
Thus for cloudstorage you would type gsutil --help for BigQuery bq --help

So as you can see within a few minutes of  setting up your  account you are ready to start using the command line tools for GCP and getting immediate pay back.

Google Cloud Platform and the choices to be made on how to deploy an application

The Cloud gives you plenty of choices but this is a double edged sword as deciding how to architect your solution and what is the best way to deploy can lead to some hair tearing times. I keep my hair short for a reason! 😃

This post will not help with any of those decisions though all it will do is walk you through deploying the same application ( jenkins) on a single cloud platform – Google cloud platform ( GCP) in different ways using the Gcloud command line tools.

The cool thing is that each method literally takes minutes! Personally I’m a big fan of immutable infrastructures and trying to minimise errors by using scripts so it won’t be a suprise that I like the most hands off Docker approach ( even if I detest YAML) but Ill leave it to you to decide which method best suits you.

Note this assumes you have some familiarity with the Google Cloud SDK ( If not look out for my 101 post) It also assumes some familiarity with basic Docker commands

Method 1 : Installing direct to a GCP instance

First deploy an instance

gcloud compute instances create jenkins-instance --image debian-7 --zone us-central1-a

Grab the external IP

gcloud compute instances list

Connect to the instance

gcloud compute ssh jenkins-instance

Install jenkins:

wget -q -O - http://pkg.jenkins-ci.org/debian/jenkins-ci.org.key | sudo apt-key add -

sudo apt-get update
sudo apt-get install Jenkins

sudo bash -c 'echo deb http://pkg.jenkins-ci.org/debian binary/ >> /etc/apt/sources.list

next set up firewall rules to expose port 8080 ( note you should make this very restrictive initially so you can set up securty initially)

gcloud compute firewall-rules create allow-http --description "Incoming http allowed." --allow tcp:8080

Check the firewall rules have been set up okay

gcloud compute firewall-rules list

You can now access the jenkins admin web interface via the external Ip address on port 8080

Method 2 : Using a Container-optimized Google Compute Engine image interactively

You need to select a Container-optimized Google Compute Engine image .

List the available versions

gcloud compute images list --project google-containers

This will list the available container optimised images. Select an appropriate image name ( In this walkthrough I select the default Google container optimised image)

NAME                                PROJECT           ALIAS              DEPRECATED STATUS
container-vm-v20141016              google-containers container-vm                  READY

centos-6-v20141108                  centos-cloud      centos-6                      READY

centos-7-v20141108                  centos-cloud      centos-7                      READY
coreos-alpha-509-1-0-v20141124      coreos-cloud                                    READY
coreos-beta-494-1-0-v20141124       coreos-cloud                                    READY
coreos-stable-444-5-0-v20141016     coreos-cloud      coreos                        READY
backports-debian-7-wheezy-v20141108 debian-cloud      debian-7-backports            READY
debian-7-wheezy-v20141108           debian-cloud      debian-7                      READY
container-vm-v20141016              google-containers container-vm                  READY
opensuse-13-1-v20141102             opensuse-cloud    opensuse-13                   READY
rhel-6-v20141108                    rhel-cloud        rhel-6                        READY
rhel-7-v20141108                    rhel-cloud        rhel-7                        READY
sles-11-sp3-v20140930               suse-cloud        sles-11                       READY
sles-11-sp3-v20141105  `             suse-cloud        sles-11                       READY
sles-12-v20141023                   suse-cloud                                      READY
ubuntu-1204-precise-v20141031       ubuntu-os-cloud   ubuntu-12-04                  READY
ubuntu-1404-trusty-v20141031a       ubuntu-os-cloud   ubuntu-14-04                  READY
ubuntu-1410-utopic-v20141030a       ubuntu-os-cloud   ubuntu-14-10                  READY

Start a container optimised instance

gcloud compute instances create jenkins-instance  --image container-vm-v20141016  --image-project google-containers  --metadata-from-file google-container-manifest=containers.yaml  --zone us-central1-a  --machine-type f1-micro

Note that you need to declare the project that the image you select to deploy the image is associated with

once the instance is up and running ssh into it and install jenkins by pulling the official jenkins repo down and exposing port 8080

gcloud compute ssh jenkins-cnt-vm
sudo docker pull jenkins:latest
sudo docker run -p -d -t jenkins
sudo docker  run -p  8080:8080 -d -t jenkins

Listing running Docker instances

sudo docker ps 
CONTAINER ID        IMAGE                     COMMAND                CREATED             STATUS              PORTS                               NAMES
2c8dfb26da3a        jenkins:latest            "/usr/local/bin/jenk   10 seconds ago      Up 9 seconds        50000/tcp,>8080/tcp   jovial_thompson
d7d799d93d55        google/cadvisor:latest    "/usr/bin/cadvisor"    32 minutes ago      Up 32 minutes                                           k8s_cadvisor.417cd83c_cadvisor-agent.file_4da26b48
3d719fdc322e        kubernetes/pause:latest   "/pause"               33 minutes ago      Up 33 minutes>8080/tcp              k8s_net.f72d85c8_cadvisor-agent.file_19d8274a

If firewall rules have not been set up for the project do that now so you can access the jenkins admin web interface via the external Ip address on port 8080 (see above).

Method 3 : Using a Container-optimized Google Compute Engine image without logging onto instance

Create a yaml manifest file. This is the equivalent of a Dockerfile so will pull down any images and run any commands. In my example the containers.yaml file contains:

   version: v1beta2
     - name: jenkins-demo
       image: jenkins:latest
          - name: allow-http-8080
            hostPort: 8080
            containerPort: 8080

Then deploy a container optimised image passing the manifest

gcloud compute instances create jenkins-instance  --image container-vm-v20141016  --image-project google-containers  --metadata-from-file google-container-manifest=containers.yaml  --zone us-central1-a  --machine-type f1-micro

If firewall rules have not been set up for the project do that now so you can access the jenkins admin web interface via the external Ip address on port 8080 (see above).

A comparison of Cloud object stores



This is an update to my 2011 summary table comparing key features of Amazon Web Services (AWS) S3 and Microsoft Azure (Azure) blog storage . I’ve also expanded it to cover more features added since then and I have now included Google Cloud Platform (GCP) Cloud Storage.

All data is collated from information available on public sites ( so you don’t have to) and reflects what you as the consumer see as an out of the box experience ( so anything not available using just the SDK’s without requiring third party libraries, Command line tools or console are not covered – for example I do not include various solutions on github such as the Azure encryption extensions)
This is focused on the storage of immutable objects that are typically used for website static objects and Big Data projects. It is not covering any specific features related to the storage of AWS EBS snapshots, Azure Page blobs ( which are not immutable) or GCP compute engine images.

Costs are not included as these change faster (happily downwards) than I ever update my blog posts

To keep this to a sane length I haven’t provided lots of explanatory notes I leave that to readers to delve deeper as required

Note this is not an opinionated post but hopefully you find it a helpful table that assists in decision making


Feature AWS Simple Storage Service (S3) Azure Blob Storage GCP Cloud Storage
Namespace considerations
  • Activating S3 is associated with an AWS account but the account name is NOT associated with the namespace of the objects stored on S3
  • The bucket name you choose must be unique across all existing bucket names in Amazon S3
A Storage account is a globally uniquely identified entity within blob storage. The account is the parent namespace for the Blob service
  • Activating Cloud Storage is associated with a project but the project name or ID is NOT associated with the namespace of the objects stored on Cloud Storage
  • every bucket must have a unique name across the entire Google Cloud Storage namespace
How objects are grouped together Objects are placed in containers called buckets Objects are placed in containers called containers Objects are placed in containers called buckets
Defintion of object An object is a file and optionally any metadata that describes that file An object is represented by a blob. A blob is made up of resources that includes content, properties, and metadata Objects have two components: object data and object metadata
  • An account can have a maximum of 100 Buckets
  • A bucket can store an unlimited number of objects
  • Maximum object size = 5 TB
  • An account can contain an unlimited number of containers
  • A container can store an unlimited number of blobs
  • up to 500 TB of total storage per account
  • A single subscription supports up to 50 storage accounts
  • Maximum block blob size = 200 GB
  • Maximum page blob size = 1 TB
  • There is no limit on the number of buckets that you can create in a project
  • There is no limit on the number of objects that you can create in a bucket
  • Maximum object size = 5 TB
Interacting with buckets and objects Interaction with buckets and objects are via the rest API Interaction with containers and blobs are via the rest API Interaction with buckets and objects are via the rest API
Bucket naming The bucket name you choose must be unique across all existing bucket names in Amazon S3

Bucket names must comply with the following requirements:

  • Can contain lowercase letters, numbers, periods (.), underscores (_), and dashes (-)
  • Must start with a number or letter
  • Must be between 3 and 255 characters long
  • Must not be formatted as an IP address (e.g., 30%.255.5.4)

To conform with DNS requirements, AWS recommend following these additional guidelines when creating buckets:

  • Bucket names should not contain underscores (_)
  • Bucket names should be between 3 and 63 characters long
  • Bucket names should not end with a dash
  • Bucket names cannot contain two, adjacent periods
  • Bucket names cannot contain dashes next to periods
The container name must be unique within a storage account

The container name must be a valid DNS name, conforming to the following naming rules:

  • Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.
  • Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in container names.
  • All letters in a container name must be lowercase.
  • Container names must be from 3 through 63 characters long.
  • Avoid blob names that end with a dot (.), a forward slash (/), or a sequence or combination of the two.
Every bucket must have a unique name across the entire Google Cloud Storage namespace

  • Bucket names must contain only lowercase letters, numbers, dashes (-), underscores (_), and dots (.). Names containing dots require verification
  • Bucket names must start and end with a number or letter
  • Bucket names must contain 3 to 63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters
  • Bucket names cannot be represented as an IP address in dotted-decimal notation (for example,
  • Bucket names cannot begin with the “goog” prefix
  • Bucket names cannot contain “google” or close misspellings of “google”
  • If creating a bucket with a custom domain ( e.g ending .com, .co.uk etc) then Domain name verification will be part of the process

For DNS compliance and future compatibility, you should not use underscores (_) or have a period adjacent to another period or dash

Object naming
  • Flat structure
  • The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long
  • Can infer logical hierarchy using keyname prefixes and delimiters. You can use the delimiter ‘/’ to present a folder
  • Flat storage scheme, not a hierarchical scheme
  • A blob name can contain any combination of characters, but reserved URL characters must be properly escaped.
  • A blob name must be at least one character long and cannot be more than 1,024 characters long
  • you may specify a delimiter such as “/” ” within a blob name to create a virtual hierarchy
  • Flat namespace to store objects
  • Object names can contain any combination of Unicode characters (UTF-8 encoded) less than 1024 bytes in length
  • By using “/” in an object name, you can make objects appear as though they’re stored in a hierarchical structure.
Nesting You cannot nest buckets You cannot nest containers You cannot nest buckets
Locality S3 buckets can be created in specific regions Storage accounts can be created in specific regions Cloud Storage buckets can be created in a specific geographical region.Regional buckets are in alpha at time of writing
  • Access to objects and buckets is managed via access control lists (ACLs) and bucket policies. You can use them independently or together
  • Query string authentication
  • Server side encryption with Customer-Provided Keys
  • Server side encryption with Amazon S3 Key Management
  • Encrypt your data at rest using keys that you manage in the AWS Key Management Service
  • Logging – Access (requests) and api calls to S3 via Cloudtrail
  • Access to blobs and containers is controlled via ACL’s which allow you to grant public access and Shared Access signatures which provide more granular access
  • shared access signature
  • Object and container ACL’s
  • Loggng – Transactions, storage account blob and container size details
  • Access to objects and buckets is managed via access control lists (ACLs)
  • Signed URL’s
  • Automatic server-side encryption, Google manages the cryptographic keys on your behalf
  • client-side encryption, you manage your own encryption keys and encrypt data before writing it to Google Cloud Storage. In this case, your data is encrypted twice, once with your keys and once with Google’s keys
  • Logging – Access (requests) and storage logs
Object Consistency
  • Provides read-after-write consistency for PUTS of new objects
  • Eventual consistency for overwrite PUTS and DELETES

For all regions apart from US_standard whch Provides eventual consistency for all requests.

  • Strong read-after-write consistency model for all PUT requests
  • eventually consistent model for all List (GET) operations
  • strong global consistency for all read-after-write, read-after-update, and read-after-delete operations ( Note can override read-after-delete if ache-Control metadata has not been explicitly set to disable caching of the object
  • List operations are eventually consistent
Uploading large objects To load large objects use Multipart upload which allows you to upload a single object as a set of parts. Multipart upload allows the upload of parts in parallel to improve throughput. Smaller part sizes minimize the impact of restarting a failed upload due to a network error. To upload large blobs use block blobs. Block blobs allows the upload of blobs larger than 64MB. It allows the upload of blocks in parallel. It allows the resumption of failed uploads by retrying only the blocks that weren’t already uploaded. Resumable upload
URI request The location of your object is a URL, generally, of the form: http://bucket-name.S3.amazonaws.com/ For a blob, the base URI includes the name of the account, the name of the container, and the name of the blob:
The URI for accessing objects storage.googleapis.com/yourbucket/yourobject or yourbucket.storage.googleapis.com/yourobject
If using a CNAME alias to redirect requests use c.storage.googleapis.com in host name portion of CNAME record
Programatic access(check website for languages supported) To access programmatically Use the AWS SDK various languages supported To access programmatically use the Azure SDK various languages supported To access programmatically Use the GCloud SDK various languages supported
Custom Domain support To use a custom domain requires the use of CNAMES or ALIAS records ( if using Route 53) To use a custom domain requires the use of CNAMES To use a custom domain requires the use of CNAMES
Ability to trigger Notification against bucket action Amazon S3 event notifications – can be configured to trigger on any event that results in the creation of an object
including PUTs, POSTs, COPYs, and when a multi-part upload is complete.
No Object change notification – A notification event is sent when a new object is added to a bucket, an existing object’s content or metadata has been modified, or an object is deleted from a bucket.
Lifecycle Management
  • Versioning
  • Object expiration actions
  • Object Archival ( migration to Glacier)
  • Versioning
  • Object deletion policies
Storage Availabilty options
  • Standard redundancy
  • Reduced redundancy
  • Amazon Glacier – for cold storage ( infrequent access)
  • Locally redundant storage
  • Zone redundant
  • Geo redundant
  • Standard storage class
  • Durable Reduced Availability
Host static website Yes Yes Yes

The pricing model is pretty much the same based on amount of storage, Redundancy level selected , request operations and egress charges

Note: If you believe I have missed something out please leave a comment and I’ll review and update accordingly.

Useful Links to start delving:


Keep it simple please a small hope for the evolution of Docker!

Okay Docker is probably past the hipster stage and the recent announcements from the big players in the Cloud playground gets it to  that next stage we’re not going to go away any time soon respectability. All good but…

I started playing around with Docker purely because I was intrigued as to why containers were suddenly the new hotness ( container tech has been around for a while after all) , it was fun, easy to grok and the potential is huge. Also it was a nice thing to get my hands dirty with in my own time that wasn’t directly (at that time anyway) connected with my day job.

One of the first things I did when I started playing with Docker was test the mantra of write once deploy anywhere so a simple docker image I created on my Mac worked on AWS EC2 and elastic beanstalk and I’ve tried it on Google Cloud too ! It did what it said on the tin with no pain to talk of.

Creating and running simple Docker based applications really is easy but to really exploit the potential creating micro services, knitting them together and creating hugely distributed applications is where I personally feel that Docker comes into its own! However setting up the networking and managing that sort of distributed application architecture using Docker is less than easy and a lot of side projects have popped up to address these pain points.

Basically managing containers at scale is hard and having players who can and have been running things at scale for years come in with managed services in this area is great as it saves disappointment setting in as the effort in building out these micro services in a highly available way leads to frustration after the ease in developing the apps is replaced with the effort needed to get it just to run properly in a HA distributed configuration.

Deving (is this even a word?) locally you can  take your pick for starters among chef , puppet , ansible , saltsack, vagarant , fig or just stick to boot2docker(with a little bash around it) . There is even a pretty GUI based solution just for using on your Mac

Coming more from the Ops than Dev side I have always had a   keen interest  around  the deployment  and management of solutions. Thus I have  managed some hands on with the likes of Kubernetes and panamax so far. ( There really isn’t enough spare time to play around with everything I would like to). There is a list of potential solutions in this area.  The managed services side of things takes care of what solution in this area you choose which imho kinda makes sense as you should just worry about your application and let someone else take care of the managing this at scale which ultimately  gives you no business advantages in focusing on that area!

This is what is great about Docker you have this unit that you can use with what ever wrappers around it the ecosystem can come up with.

Recently though there have been some concerns about what and how Docker should evolve and my concern is that if they bloat it too much and add too many bells and whistles the simple idea of build once run anywhere won’t be so sweet .

Three areas this concern has really bubbled up to the surface are:

The incorporation of fig  like functionality into Docker itself ( I like the way this one is developing)

Docker clustering

Docker extensibility 

The good thing is though is that this is all being discussed in the open. Read the comments and see how the discussion is going with these and you can join in the conversation too.

Docker needs to allow the ecosystem to thrive and thus functionality that is best delivered around Docker via a plugin approach has surely got to be the right route here else we’ll start seeing forks and the phenomenal momentum and support from the ecosystem may start splintering into different camps as Docker takes on more and cannot deliver on it’s original promise of “Build, ship and run any app anywhere” as the targets all run ‘optimised’ versions of Docker for their platforms.

Configuring Docker data storage a 101

This is a short walkthrough on configuring Docker storage options on your development machine.

I’ll use my preferred version of Hello world on Docker – “Setting up mongodb” which lends itself nicely to a walk through of the storage options.

This walkthrough assumes basic familiarity with Docker. First let’s look at setting everything up on a single container.

I started from the DockerFile described here
mongoDB Dockerfile for demoing Docker storage options

Creating the image using

docker build -t mongodb .

You will note that in this Dockerfile we use the VOLUME command to define the target data directory for mongoDB

# Define the MongoDB data directory
VOLUME ["/data/db"]

I am walking through all this on my Mac thus I am using the following lean & mean command to start a mongodb container up as a background process ( daemon) from the mongodb image created from the docker file :

docker run -p 27017:27017 --name mongo_instance_001 -d mongodb --noprealloc --smallfiles

I can then add some data to a mongodb collection ( see Data loading below) That is quick and for some quick tests as part of a SDLC that might be fine but having to recreate your database and reload each time you create a container will eventually prove limiting.
We all know that you need representative datasets for a true test and it’s likely that your datasets are going to be more than 118 records and reloading data every time you run up a mongodb container is not going to be practical!

So we have two options as to how to address the persistance requirements:

  1. Data volume
  2. Data volume container

Data Volume

We will want to create a volume that maps to a folder on your local host in my case I will be mounting a folder on my Mac called $HOME/mongodata ( replace $HOME with your folder name if you are following this through on another OS )

We then create the container from the image but the difference is we now get the container to mount the local folder using this command to create a container:

$ docker run -v $HOME/mongodata/:/data/db -p 27017:27017 --name mongo_instance_001 -d mongodb --noprealloc --smallfiles

Note that as virtualbox shared folders does not support fsync() on directories mongodb will not actually start but you can validate that the mounting of a shared folder on the host works as the logs will show the error and you will see that it created some files in the shared folder before it halted. This part of the walkthrough will work as expected using mongoDB on AWS ec2 for example and is perfectly valid for those applications that do not require fsync() if you are using virtualbox.

Data volume container

This option in my opinion is the most flexible.

First you need to create a data container

docker run -v /data/db --name mongodata busybox

The above creates a data volume contaner based on the busybox image. (Its a small image)

Next you need to start up the application container but this time mounting the data container created earlier

docker run -p 27017:27017 --name mongo_instance_001  --volumes-from mongodata -d mongodb --noprealloc --smallfiles

Load some data into mongoDB

To validate this works as expected stop container 1 then start another container using a similar start up command attaching the Data volume container

docker run -p 27017:27017 --name mongo_instance_002  --volumes-from mongodata -d mongodb --noprealloc --smallfiles

You can check that now when you start mongoDB and look at the databases and collections that the data you loaded using the previous container is available.

You can remove the application containers whenever you like and create new ones as required mounting the data volume container. Note that using the docker ps command does not give you any indication of what containers are mounted to the data volume container .
You can also tar the data volume and copy to another docker host etc see the docker docs for detail on the process

Data loading

I am assuming some familiarity with mongoDB . If you need a quick primer have a look here: Getting started with mongodb

I am using a json file that consists of a dataset of the elements of the Periodic table to populate my database. Here’s how I load my demo databases with data :

mongoimport --db demo --collection periodictable  --type json --file periodictable.json  --jsonArray 

For the purposes of this walkthrough I am using images that are on my local machine rather than pushing up to a registry and pulling back down again.

This walkthrough has been focused on the practicalities of storage with Docker for a deeper dive on storage have a read of this excelent post  on the Overview of storage scalablity in Docker on the RedHat developer blog

Scaling out the security attack surface when using Docker – A timely reminder

With all the excitement over Docker some folks seem to forget that it’s more than just making life easy for developers. This stuff will need to be exposed to the big wide scary world and exploits such as the Bash vulnerability will be dispersed over a wider landscape than just the hosts themselves!

Yes you might point out that containers are being managed at scale by the likes of Google but they do have the resources to look after the infra so you don’t have to!

Remember the tools and processes you use today to manage patches will need to be applied up the stack as well and that means look to your docker images and containers too.

If you really are running immutable infrastructure and can afford to tear everything down and throw updated Docker images out there then that is an alternative path although you still need to worry about the underlying hosts even in that scenario.

Daniel Walsh from RedHat has a great series he is writing on Docker security and how RedHat are dealing with the issues. This is a great read and brings a little sobering realism to the areas that still need to be thought about when deploying Docker based solutions

From Daniel’s posts I want to reiterate this list of good practise as a timely reminder

  • Only run applications from a trusted source
  • Run applications on a enterprise quality host
  • Install updates regularly
  • Drop privileges as quickly as possible
  • Run as non-root whenever possible
  • Watch your logs
  • setenforce

Docker what’s all the fuss about?

It’s been a while since I’ve blogged here and as I’ve been looking at the hottest thing in “hipster tech” ( see below for a defintion) in Docker thought I’d get that blogging mojo back by starting to share my thoughts on that subject!

For a detailed description of Docker there are plenty of great articles, slide decks and videos. The Docker site is a good starting point and this page what is Docker has two diagrams that graphically depict what Docker is about versus a VM. This post from Sleekd discussing the difference between Docker vs Virtualization is also a nice background read for the layman   so I won’t be repeating a Docker 101 here. To set the scene though I summarise Docker like this :

  • Provides Operating system level virtualization. containers run user space on top of an operating system’s kernel. That makes them lightweight and fast.
  • It uses resource isolation features of the Linux kernel such as cgroups and kernel namespaces to allow independent “containers” to run within a single Linux instance
  • It uses the power of Linux containers (LXC) ( although more accurate to say it has  evolved from here ) and aufs (Another Union File System) to create a way of packaging and process isolation
  • It Allows you to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.
  • Docker allows applications to use the same Linux kernel as the system that they’re running on and only requires applications be shipped with things not already running on the host computer. This gives a significant performance boost and reduces the size of the application.
  • Ultimatley It should provide more certainty for application developers by providing a set of known abstractions that define how the application will run, no matter what hardware is underneath.

For a good initial deep dive on containers my current favourite slide deck is ths one : Inside Docker for Fedora20/RHEL7

Okay let’s start with a list of the fundamental issues that need to be solved first for mere mortals(see below for a defintion) to really get on board. No depth just headlines at this point with pointers to more info. Else I’d never have even got this post out to kickstart blogging again!

It’s early days and has a blossoming growing ecosystem. This lovely mindmap Makes a nice stab at illustrating the ecosytem that has been building around it ( Personally I would not have placed some of the tech in the sections they have been put in but hey it’s still lovely). Equally it shows the bewildering choices that have yet to be made with regards which if any approaches will win out and what may or may not suit your environment.

The potential however is huge and I think with a list of
USP’s that have what I’ve listed below you can begin to understand the rallying round and the fuss.

  • Simplifying the use of containers ( Container technology is not new despite the hype)
  • Micro services
  • Portability
  • Mutable infrastructures
  • PaaS solutions

In upcoming posts I’ll discuss some of the ecosystem tools where Ive had hands on ( promise it won’t take years though) , some of the issues and the USP’s in more depth .

This post was just to get me into blogging mode again !

I won’t however be neglecting my passion for Devops though ( It’s not just about the tools stoopid! although I’ll probably be talking about the tools a lot!!)


My defintion of Hipster Tech – Latest cool idea in Tech where the hype has over taken the reality but the potential is very high on the potential-ometer.

Thanks to @chrismunns for the succinct definition of mere mortals – Those running responsible and performant production environments
( My spin i.e no time to get distracted by debugging and feeding & watering the tools they use to deploy those solutions but focusing on deliverng value to their business)


Windows in the cloud a 1st class citizen

The perception is that running windows instances in the cloud is often as a second class citizen. This just not true. Both Opsocode and puppets lab have made great strides in  making their configuration management tools ‘windows friendly’  (Disclaimer:  I’ve used Chef with windows no actual experience with puppet). To add to this Amazon Web services introduced Cloudformation friendly windows  base AMI’s . The combination of these AMI’s with the more windows friendly configuration management tools means you can really treat windows as you would Linux instances and use the same tools to manage windows instances as you would Linux.

You can use PowerShell as you would normally so the learning curve isn’t as steep as you’d expect as a windows administrator.Go on give it a go.

If you have an estate that is made up of both windows and Linux  starting from a point where you can use the same tools to mange both environments makes life easy for your Operations/ DevOps or whatever label you place on the team that makes sure you have systems that are up and running each day.