A comparison of Cloud object stores

 

 

This is an update to my 2011 summary table comparing key features of Amazon Web Services (AWS) S3 and Microsoft Azure (Azure) blog storage . I’ve also expanded it to cover more features added since then and I have now included Google Cloud Platform (GCP) Cloud Storage.

All data is collated from information available on public sites ( so you don’t have to) and reflects what you as the consumer see as an out of the box experience ( so anything not available using just the SDK’s without requiring third party libraries, Command line tools or console are not covered – for example I do not include various solutions on github such as the Azure encryption extensions)
This is focused on the storage of immutable objects that are typically used for website static objects and Big Data projects. It is not covering any specific features related to the storage of AWS EBS snapshots, Azure Page blobs ( which are not immutable) or GCP compute engine images.

Costs are not included as these change faster (happily downwards) than I ever update my blog posts

To keep this to a sane length I haven’t provided lots of explanatory notes I leave that to readers to delve deeper as required

Note this is not an opinionated post but hopefully you find it a helpful table that assists in decision making

 

Feature AWS Simple Storage Service (S3) Azure Blob Storage GCP Cloud Storage
Namespace considerations
  • Activating S3 is associated with an AWS account but the account name is NOT associated with the namespace of the objects stored on S3
  • The bucket name you choose must be unique across all existing bucket names in Amazon S3
A Storage account is a globally uniquely identified entity within blob storage. The account is the parent namespace for the Blob service
  • Activating Cloud Storage is associated with a project but the project name or ID is NOT associated with the namespace of the objects stored on Cloud Storage
  • every bucket must have a unique name across the entire Google Cloud Storage namespace
How objects are grouped together Objects are placed in containers called buckets Objects are placed in containers called containers Objects are placed in containers called buckets
Defintion of object An object is a file and optionally any metadata that describes that file An object is represented by a blob. A blob is made up of resources that includes content, properties, and metadata Objects have two components: object data and object metadata
Limits
  • An account can have a maximum of 100 Buckets
  • A bucket can store an unlimited number of objects
  • Maximum object size = 5 TB
  • An account can contain an unlimited number of containers
  • A container can store an unlimited number of blobs
  • up to 500 TB of total storage per account
  • A single subscription supports up to 50 storage accounts
  • Maximum block blob size = 200 GB
  • Maximum page blob size = 1 TB
  • There is no limit on the number of buckets that you can create in a project
  • There is no limit on the number of objects that you can create in a bucket
  • Maximum object size = 5 TB
Interacting with buckets and objects Interaction with buckets and objects are via the rest API Interaction with containers and blobs are via the rest API Interaction with buckets and objects are via the rest API
Bucket naming The bucket name you choose must be unique across all existing bucket names in Amazon S3

Bucket names must comply with the following requirements:

  • Can contain lowercase letters, numbers, periods (.), underscores (_), and dashes (-)
  • Must start with a number or letter
  • Must be between 3 and 255 characters long
  • Must not be formatted as an IP address (e.g., 30%.255.5.4)

To conform with DNS requirements, AWS recommend following these additional guidelines when creating buckets:

  • Bucket names should not contain underscores (_)
  • Bucket names should be between 3 and 63 characters long
  • Bucket names should not end with a dash
  • Bucket names cannot contain two, adjacent periods
  • Bucket names cannot contain dashes next to periods
The container name must be unique within a storage account

The container name must be a valid DNS name, conforming to the following naming rules:

  • Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.
  • Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in container names.
  • All letters in a container name must be lowercase.
  • Container names must be from 3 through 63 characters long.
  • Avoid blob names that end with a dot (.), a forward slash (/), or a sequence or combination of the two.
Every bucket must have a unique name across the entire Google Cloud Storage namespace

  • Bucket names must contain only lowercase letters, numbers, dashes (-), underscores (_), and dots (.). Names containing dots require verification
  • Bucket names must start and end with a number or letter
  • Bucket names must contain 3 to 63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters
  • Bucket names cannot be represented as an IP address in dotted-decimal notation (for example, 192.168.5.4)
  • Bucket names cannot begin with the “goog” prefix
  • Bucket names cannot contain “google” or close misspellings of “google”
  • If creating a bucket with a custom domain ( e.g ending .com, .co.uk etc) then Domain name verification will be part of the process

For DNS compliance and future compatibility, you should not use underscores (_) or have a period adjacent to another period or dash

Object naming
  • Flat structure
  • The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long
  • Can infer logical hierarchy using keyname prefixes and delimiters. You can use the delimiter ‘/’ to present a folder
  • Flat storage scheme, not a hierarchical scheme
  • A blob name can contain any combination of characters, but reserved URL characters must be properly escaped.
  • A blob name must be at least one character long and cannot be more than 1,024 characters long
  • you may specify a delimiter such as “/” ” within a blob name to create a virtual hierarchy
  • Flat namespace to store objects
  • Object names can contain any combination of Unicode characters (UTF-8 encoded) less than 1024 bytes in length
  • By using “/” in an object name, you can make objects appear as though they’re stored in a hierarchical structure.
Nesting You cannot nest buckets You cannot nest containers You cannot nest buckets
Locality S3 buckets can be created in specific regions Storage accounts can be created in specific regions Cloud Storage buckets can be created in a specific geographical region.Regional buckets are in alpha at time of writing
Security
  • Access to objects and buckets is managed via access control lists (ACLs) and bucket policies. You can use them independently or together
  • Query string authentication
  • Server side encryption with Customer-Provided Keys
  • Server side encryption with Amazon S3 Key Management
  • Encrypt your data at rest using keys that you manage in the AWS Key Management Service
  • Logging – Access (requests) and api calls to S3 via Cloudtrail
  • Access to blobs and containers is controlled via ACL’s which allow you to grant public access and Shared Access signatures which provide more granular access
  • shared access signature
  • Object and container ACL’s
  • Loggng – Transactions, storage account blob and container size details
  • Access to objects and buckets is managed via access control lists (ACLs)
  • Signed URL’s
  • Automatic server-side encryption, Google manages the cryptographic keys on your behalf
  • client-side encryption, you manage your own encryption keys and encrypt data before writing it to Google Cloud Storage. In this case, your data is encrypted twice, once with your keys and once with Google’s keys
  • Logging – Access (requests) and storage logs
Object Consistency
  • Provides read-after-write consistency for PUTS of new objects
  • Eventual consistency for overwrite PUTS and DELETES

For all regions apart from US_standard whch Provides eventual consistency for all requests.

  • Strong read-after-write consistency model for all PUT requests
  • eventually consistent model for all List (GET) operations
  • strong global consistency for all read-after-write, read-after-update, and read-after-delete operations ( Note can override read-after-delete if ache-Control metadata has not been explicitly set to disable caching of the object
  • List operations are eventually consistent
Uploading large objects To load large objects use Multipart upload which allows you to upload a single object as a set of parts. Multipart upload allows the upload of parts in parallel to improve throughput. Smaller part sizes minimize the impact of restarting a failed upload due to a network error. To upload large blobs use block blobs. Block blobs allows the upload of blobs larger than 64MB. It allows the upload of blocks in parallel. It allows the resumption of failed uploads by retrying only the blocks that weren’t already uploaded. Resumable upload
URI request The location of your object is a URL, generally, of the form: http://bucket-name.S3.amazonaws.com/ For a blob, the base URI includes the name of the account, the name of the container, and the name of the blob:
http://youraccount.blob.core.windows.net/yourcontainer/yourblob
The URI for accessing objects storage.googleapis.com/yourbucket/yourobject or yourbucket.storage.googleapis.com/yourobject
If using a CNAME alias to redirect requests use c.storage.googleapis.com in host name portion of CNAME record
Programatic access(check website for languages supported) To access programmatically Use the AWS SDK various languages supported To access programmatically use the Azure SDK various languages supported To access programmatically Use the GCloud SDK various languages supported
Custom Domain support To use a custom domain requires the use of CNAMES or ALIAS records ( if using Route 53) To use a custom domain requires the use of CNAMES To use a custom domain requires the use of CNAMES
Ability to trigger Notification against bucket action Amazon S3 event notifications – can be configured to trigger on any event that results in the creation of an object
including PUTs, POSTs, COPYs, and when a multi-part upload is complete.
No Object change notification – A notification event is sent when a new object is added to a bucket, an existing object’s content or metadata has been modified, or an object is deleted from a bucket.
Lifecycle Management
  • Versioning
  • Object expiration actions
  • Object Archival ( migration to Glacier)
  • Versioning
  • Object deletion policies
Storage Availabilty options
  • Standard redundancy
  • Reduced redundancy
  • Amazon Glacier – for cold storage ( infrequent access)
  • Locally redundant storage
  • Zone redundant
  • Geo redundant
  • Standard storage class
  • Durable Reduced Availability
Host static website Yes Yes Yes

The pricing model is pretty much the same based on amount of storage, Redundancy level selected , request operations and egress charges

Note: If you believe I have missed something out please leave a comment and I’ll review and update accordingly.

Useful Links to start delving:

 

Keep it simple please a small hope for the evolution of Docker!

Okay Docker is probably past the hipster stage and the recent announcements from the big players in the Cloud playground gets it to  that next stage we’re not going to go away any time soon respectability. All good but…

I started playing around with Docker purely because I was intrigued as to why containers were suddenly the new hotness ( container tech has been around for a while after all) , it was fun, easy to grok and the potential is huge. Also it was a nice thing to get my hands dirty with in my own time that wasn’t directly (at that time anyway) connected with my day job.

One of the first things I did when I started playing with Docker was test the mantra of write once deploy anywhere so a simple docker image I created on my Mac worked on AWS EC2 and elastic beanstalk and I’ve tried it on Google Cloud too ! It did what it said on the tin with no pain to talk of.

Creating and running simple Docker based applications really is easy but to really exploit the potential creating micro services, knitting them together and creating hugely distributed applications is where I personally feel that Docker comes into its own! However setting up the networking and managing that sort of distributed application architecture using Docker is less than easy and a lot of side projects have popped up to address these pain points.

Basically managing containers at scale is hard and having players who can and have been running things at scale for years come in with managed services in this area is great as it saves disappointment setting in as the effort in building out these micro services in a highly available way leads to frustration after the ease in developing the apps is replaced with the effort needed to get it just to run properly in a HA distributed configuration.

Deving (is this even a word?) locally you can  take your pick for starters among chef , puppet , ansible , saltsack, vagarant , fig or just stick to boot2docker(with a little bash around it) . There is even a pretty GUI based solution just for using on your Mac

Coming more from the Ops than Dev side I have always had a   keen interest  around  the deployment  and management of solutions. Thus I have  managed some hands on with the likes of Kubernetes and panamax so far. ( There really isn’t enough spare time to play around with everything I would like to). There is a list of potential solutions in this area.  The managed services side of things takes care of what solution in this area you choose which imho kinda makes sense as you should just worry about your application and let someone else take care of the managing this at scale which ultimately  gives you no business advantages in focusing on that area!

This is what is great about Docker you have this unit that you can use with what ever wrappers around it the ecosystem can come up with.

Recently though there have been some concerns about what and how Docker should evolve and my concern is that if they bloat it too much and add too many bells and whistles the simple idea of build once run anywhere won’t be so sweet .

Three areas this concern has really bubbled up to the surface are:

The incorporation of fig  like functionality into Docker itself ( I like the way this one is developing)

Docker clustering

Docker extensibility 

The good thing is though is that this is all being discussed in the open. Read the comments and see how the discussion is going with these and you can join in the conversation too.

Docker needs to allow the ecosystem to thrive and thus functionality that is best delivered around Docker via a plugin approach has surely got to be the right route here else we’ll start seeing forks and the phenomenal momentum and support from the ecosystem may start splintering into different camps as Docker takes on more and cannot deliver on it’s original promise of “Build, ship and run any app anywhere” as the targets all run ‘optimised’ versions of Docker for their platforms.

Configuring Docker data storage a 101

This is a short walkthrough on configuring Docker storage options on your development machine.

I’ll use my preferred version of Hello world on Docker – “Setting up mongodb” which lends itself nicely to a walk through of the storage options.

This walkthrough assumes basic familiarity with Docker. First let’s look at setting everything up on a single container.

I started from the DockerFile described here
mongoDB Dockerfile for demoing Docker storage options

Creating the image using

docker build -t mongodb .

You will note that in this Dockerfile we use the VOLUME command to define the target data directory for mongoDB

# Define the MongoDB data directory
VOLUME ["/data/db"]

I am walking through all this on my Mac thus I am using the following lean & mean command to start a mongodb container up as a background process ( daemon) from the mongodb image created from the docker file :

docker run -p 27017:27017 --name mongo_instance_001 -d mongodb --noprealloc --smallfiles

I can then add some data to a mongodb collection ( see Data loading below) That is quick and for some quick tests as part of a SDLC that might be fine but having to recreate your database and reload each time you create a container will eventually prove limiting.
We all know that you need representative datasets for a true test and it’s likely that your datasets are going to be more than 118 records and reloading data every time you run up a mongodb container is not going to be practical!

So we have two options as to how to address the persistance requirements:

  1. Data volume
  2. Data volume container

Data Volume

We will want to create a volume that maps to a folder on your local host in my case I will be mounting a folder on my Mac called $HOME/mongodata ( replace $HOME with your folder name if you are following this through on another OS )

We then create the container from the image but the difference is we now get the container to mount the local folder using this command to create a container:

$ docker run -v $HOME/mongodata/:/data/db -p 27017:27017 --name mongo_instance_001 -d mongodb --noprealloc --smallfiles

Note that as virtualbox shared folders does not support fsync() on directories mongodb will not actually start but you can validate that the mounting of a shared folder on the host works as the logs will show the error and you will see that it created some files in the shared folder before it halted. This part of the walkthrough will work as expected using mongoDB on AWS ec2 for example and is perfectly valid for those applications that do not require fsync() if you are using virtualbox.

Data volume container

This option in my opinion is the most flexible.

First you need to create a data container

docker run -v /data/db --name mongodata busybox

The above creates a data volume contaner based on the busybox image. (Its a small image)

Next you need to start up the application container but this time mounting the data container created earlier

docker run -p 27017:27017 --name mongo_instance_001  --volumes-from mongodata -d mongodb --noprealloc --smallfiles

Load some data into mongoDB

To validate this works as expected stop container 1 then start another container using a similar start up command attaching the Data volume container

docker run -p 27017:27017 --name mongo_instance_002  --volumes-from mongodata -d mongodb --noprealloc --smallfiles

You can check that now when you start mongoDB and look at the databases and collections that the data you loaded using the previous container is available.

You can remove the application containers whenever you like and create new ones as required mounting the data volume container. Note that using the docker ps command does not give you any indication of what containers are mounted to the data volume container .
You can also tar the data volume and copy to another docker host etc see the docker docs for detail on the process

Data loading

I am assuming some familiarity with mongoDB . If you need a quick primer have a look here: Getting started with mongodb

I am using a json file that consists of a dataset of the elements of the Periodic table to populate my database. Here’s how I load my demo databases with data :

mongoimport --db demo --collection periodictable  --type json --file periodictable.json  --jsonArray 

For the purposes of this walkthrough I am using images that are on my local machine rather than pushing up to a registry and pulling back down again.

This walkthrough has been focused on the practicalities of storage with Docker for a deeper dive on storage have a read of this excelent post  on the Overview of storage scalablity in Docker on the RedHat developer blog

Scaling out the security attack surface when using Docker – A timely reminder

With all the excitement over Docker some folks seem to forget that it’s more than just making life easy for developers. This stuff will need to be exposed to the big wide scary world and exploits such as the Bash vulnerability will be dispersed over a wider landscape than just the hosts themselves!

Yes you might point out that containers are being managed at scale by the likes of Google but they do have the resources to look after the infra so you don’t have to!

Remember the tools and processes you use today to manage patches will need to be applied up the stack as well and that means look to your docker images and containers too.

If you really are running immutable infrastructure and can afford to tear everything down and throw updated Docker images out there then that is an alternative path although you still need to worry about the underlying hosts even in that scenario.

Daniel Walsh from RedHat has a great series he is writing on Docker security and how RedHat are dealing with the issues. This is a great read and brings a little sobering realism to the areas that still need to be thought about when deploying Docker based solutions

From Daniel’s posts I want to reiterate this list of good practise as a timely reminder

  • Only run applications from a trusted source
  • Run applications on a enterprise quality host
  • Install updates regularly
  • Drop privileges as quickly as possible
  • Run as non-root whenever possible
  • Watch your logs
  • setenforce

Docker what’s all the fuss about?

It’s been a while since I’ve blogged here and as I’ve been looking at the hottest thing in “hipster tech” ( see below for a defintion) in Docker thought I’d get that blogging mojo back by starting to share my thoughts on that subject!

For a detailed description of Docker there are plenty of great articles, slide decks and videos. The Docker site is a good starting point and this page what is Docker has two diagrams that graphically depict what Docker is about versus a VM. This post from Sleekd discussing the difference between Docker vs Virtualization is also a nice background read for the layman   so I won’t be repeating a Docker 101 here. To set the scene though I summarise Docker like this :

  • Provides Operating system level virtualization. containers run user space on top of an operating system’s kernel. That makes them lightweight and fast.
  • It uses resource isolation features of the Linux kernel such as cgroups and kernel namespaces to allow independent “containers” to run within a single Linux instance
  • It uses the power of Linux containers (LXC) ( although more accurate to say it has  evolved from here ) and aufs (Another Union File System) to create a way of packaging and process isolation
  • It Allows you to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.
  • Docker allows applications to use the same Linux kernel as the system that they’re running on and only requires applications be shipped with things not already running on the host computer. This gives a significant performance boost and reduces the size of the application.
  • Ultimatley It should provide more certainty for application developers by providing a set of known abstractions that define how the application will run, no matter what hardware is underneath.

For a good initial deep dive on containers my current favourite slide deck is ths one : Inside Docker for Fedora20/RHEL7

Okay let’s start with a list of the fundamental issues that need to be solved first for mere mortals(see below for a defintion) to really get on board. No depth just headlines at this point with pointers to more info. Else I’d never have even got this post out to kickstart blogging again!

It’s early days and has a blossoming growing ecosystem. This lovely mindmap Makes a nice stab at illustrating the ecosytem that has been building around it ( Personally I would not have placed some of the tech in the sections they have been put in but hey it’s still lovely). Equally it shows the bewildering choices that have yet to be made with regards which if any approaches will win out and what may or may not suit your environment.

The potential however is huge and I think with a list of
USP’s that have what I’ve listed below you can begin to understand the rallying round and the fuss.

  • Simplifying the use of containers ( Container technology is not new despite the hype)
  • Micro services
  • Portability
  • Mutable infrastructures
  • PaaS solutions

In upcoming posts I’ll discuss some of the ecosystem tools where Ive had hands on ( promise it won’t take years though) , some of the issues and the USP’s in more depth .

This post was just to get me into blogging mode again !

I won’t however be neglecting my passion for Devops though ( It’s not just about the tools stoopid! although I’ll probably be talking about the tools a lot!!)

Definitions:

My defintion of Hipster Tech – Latest cool idea in Tech where the hype has over taken the reality but the potential is very high on the potential-ometer.

Thanks to @chrismunns for the succinct definition of mere mortals – Those running responsible and performant production environments
( My spin i.e no time to get distracted by debugging and feeding & watering the tools they use to deploy those solutions but focusing on deliverng value to their business)

 

Windows in the cloud a 1st class citizen

The perception is that running windows instances in the cloud is often as a second class citizen. This just not true. Both Opsocode and puppets lab have made great strides in  making their configuration management tools ‘windows friendly’  (Disclaimer:  I’ve used Chef with windows no actual experience with puppet). To add to this Amazon Web services introduced Cloudformation friendly windows  base AMI’s . The combination of these AMI’s with the more windows friendly configuration management tools means you can really treat windows as you would Linux instances and use the same tools to manage windows instances as you would Linux.

You can use PowerShell as you would normally so the learning curve isn’t as steep as you’d expect as a windows administrator.Go on give it a go.

If you have an estate that is made up of both windows and Linux  starting from a point where you can use the same tools to mange both environments makes life easy for your Operations/ DevOps or whatever label you place on the team that makes sure you have systems that are up and running each day.

One tool to manage them all

 

I’ve been waiting to give the chef-client MSI a try ever since I noticed it had been released. I wanted to see if it really has made the numerous ( albeit fairly straight forward) steps to get chef-client working on Windows 2008 R2 that much easier.  After all the easier it becomes the more converts there will be as the barriers to adoption are removed.

Running the MSI is simple. It takes care of installing ruby( version 1.9.2p290)  and installing chef-client . Now all you need to do is set up a couple of files to allow your client to authenticate with your Chef server as detailed quite nicely here:

http://wiki.opscode.com/display/chef/Installing+Chef+Client+on+Windows

That’s it you’re good to go.  First impressions a big thumbs up.

I then had a quick look into how things have got better in the windows  recipe development  department. I started by checking out

The opscode supported  windows cookbook

This is looking really promising as the ability to install roles & features, and more importantly install MSI’s can be treated in the same way as you would install services and install packages  on Linux . Meaning you could actually have one person who is capable of  writing  high level recipes for both platforms. You will always need someone who understands the target O/S but this just means you can get admin staff using Chef  ( yeah I know I’m talking about the hosted version)  and it doesn’t really matter which O/S they are more comfortable with . Opscode in my  humble opinion have removed a layer of obstruction to adoption by the work they’ve done here.

Blathering on

Last night ( 8th Sept 2011) I had the privilege of being interviewed together with my ex colleague and good friend James  by Richard & Carl of  . NET Rocks!  to talk about DevOps, how we see it evolving , what we think it means , the effect it is having on the industry just for starters.   The fact this  is on  a show aimed at .NET developers is a statement in itself.

It’s  Due to be  broadcast on the 13th  September 2011.  It will be interesting to see what folks think as it’s not the usual type of subject covered. James & I are both passionate about the subject and  I  must confess I kept forgetting during the interview  that I wasn’t there just to listen to James,  but his eloquence about the subject with or without the DevOps label is always compelling . Please feel free to blame Howard of endjin for  suggesting letting  James & I loose on your ears though Smile

The interview is here:  .NET Rocks! Show 697

Targeting Windows 2008 R2 nodes using chef

Just a quick note.

I’d advise sticking to ruby 1.8.7-p344 on the target node if you are targeting windows 2008 R2.  I recently revisted targeting windows 2008 R2 and found that using  the latest version of ruby  1.9.2-p180 on the windows 2008 r2 target node  and  attempting to run chef-client  after installing the chef gem is a proverbial pain . I’m not sure if Opscode are looking into this but  it’s easy to reproduce the pain :-)