A real quick quick start with Google Cloud Platform command line tools

Getting started with Google Cloud Platform (GCP) is actually very easy but as with getting started with anything sometimes you just want a quick 101 of essential steps to set you on your merry way.

Note this post is to help you get up and running as quickly as possible but it does assume that you have read best practices for configuring permissions on gcp

Developing and deploying applications on GCP is arranged around projects and thus understanding how you intend to set up development and admin access for projects is an important initial step.

The next thing to do is to sign up for a Google Cloud account via the sign up page Sign Up for Google Compute Engine

Now that you are ready to get started start a terminal session .

Now  you’ve signed up download the GCloud SDK following the instructions for your OS gcloud compute

Then go through the authorisation process by typing

gcloud auth login

The GCloud SDK uses Google’s OAuth 2.0 service to authorize users and programs to access Google APIs.
Managing authentication and credentials

The GCloud SDK actually bundles the individual command line tools for each service as well as the appropriate SDK’s for each supported language . The moduarlarity is great but can be a bit confusing at first.

So what exactly do you get when you download the Google Cloud Platform SDK?
If you’re following along type the following:

gcloud components list

This gives you a list of the modules for each service

For those components you do not have installed or need an update use the command

gcloud components update component-name

gcloud-cmpt-list

so for example to update the App engine SDK for GO I would type:

gcloud components update gae-go

gcloud-update

Next make sure you are working in the correct project

type

gcloud config list

to check what the currently set default project has been set to

You can set a different default project by typing

gcloud config set project YourProjectID

You can run this at anytime to reset the default project. If you are working on more than one project you will need to specify the non default project appropriately and where or when you do this depends on the comamnd.

Note: you will probably have to activate any services you need to use for a particular project by using the console, making sure you are in the project you wish to activate the service for then selecting API’s under API’s and Auth and setting the status to on for the services you want activated for the project

Each GCP product has it’s own set of commands and you need to use the appropriate set of commands to interact with the service.
See the list produced from the

gcloud components list

For example to interact with BigQuery you use bq and for Cloud Storage you use gsutil.

Below is an example showing how easy it is to get started by listing out the set of commands that I used to upload a csv file to Cloud storage usng gsutil, using bq to load the data into Bigquery, and to start querying that data.

First I created a schema as this is needed to pass to bq when creating the table
Then I uploaded my data set to cloud storage:

gsutil cp PeriodicTableDataSet.csv gs://my-periodicatable-bucket

Next I created a Big table dataset called elements

bq mk elements

Then in a single command I created a table and loaded it with my dataset

bq load elements.ptable gs://my-periodictable-bucket/PeriodicTableDataSet.csv
ptable_schema.json

This returns a success status like he one below if everything okay
Waiting on bqjob_r1c83caf93cc4a0db_0000014a0057532d_1 … (27s) Current status: DONE

So Now I could start querying my data after just  3 steps (4 if you include creating the schema) once I had signed up

bq query "SELECT name, symbol from elements.ptable where Z >100"

Waiting on bqjob_r64487038b0ec017d_0000014a005b650e_1 ... (0s) Current status: DONE
+----------------+--------+
|      name      | symbol |
+----------------+--------+
| Mendelevium    | Md     |
| Nobelium       | No     |
| Lawrencium     | Lr     |
| Rutherfordium  | Rf     |
| Dubnium        | Db     |
| Seaborgium     | Sg     |
| Bohrium        | Bh     |
| Hassium        | Hs     |
| Meitnerium     | Mt     |
| Darmstadtium   | Ds     |
| Roentgenium    | Rg     |
| Ununbium       | Uub    |
| Ununtrium      | Uut    |
| Ununquadium    | Uuq    |
| Ununpentium    | Uup    |
| Ununhexium     | Uuh    |
| Ununseptium    | Uus    |
| Ununoctium     | Uuo    |
+----------------+--------+

Help is avaiable by typing Command –help or comamnd help
Thus for cloudstorage you would type gsutil --help for BigQuery bq --help

So as you can see within a few minutes of  setting up your  account you are ready to start using the command line tools for GCP and getting immediate pay back.

Advertisements

A comparison of Cloud object stores

 

 

This is an update to my 2011 summary table comparing key features of Amazon Web Services (AWS) S3 and Microsoft Azure (Azure) blog storage . I’ve also expanded it to cover more features added since then and I have now included Google Cloud Platform (GCP) Cloud Storage.

All data is collated from information available on public sites ( so you don’t have to) and reflects what you as the consumer see as an out of the box experience ( so anything not available using just the SDK’s without requiring third party libraries, Command line tools or console are not covered – for example I do not include various solutions on github such as the Azure encryption extensions)
This is focused on the storage of immutable objects that are typically used for website static objects and Big Data projects. It is not covering any specific features related to the storage of AWS EBS snapshots, Azure Page blobs ( which are not immutable) or GCP compute engine images.

Costs are not included as these change faster (happily downwards) than I ever update my blog posts

To keep this to a sane length I haven’t provided lots of explanatory notes I leave that to readers to delve deeper as required

Note this is not an opinionated post but hopefully you find it a helpful table that assists in decision making

 

Feature AWS Simple Storage Service (S3) Azure Blob Storage GCP Cloud Storage
Namespace considerations
  • Activating S3 is associated with an AWS account but the account name is NOT associated with the namespace of the objects stored on S3
  • The bucket name you choose must be unique across all existing bucket names in Amazon S3
A Storage account is a globally uniquely identified entity within blob storage. The account is the parent namespace for the Blob service
  • Activating Cloud Storage is associated with a project but the project name or ID is NOT associated with the namespace of the objects stored on Cloud Storage
  • every bucket must have a unique name across the entire Google Cloud Storage namespace
How objects are grouped together Objects are placed in containers called buckets Objects are placed in containers called containers Objects are placed in containers called buckets
Defintion of object An object is a file and optionally any metadata that describes that file An object is represented by a blob. A blob is made up of resources that includes content, properties, and metadata Objects have two components: object data and object metadata
Limits
  • An account can have a maximum of 100 Buckets
  • A bucket can store an unlimited number of objects
  • Maximum object size = 5 TB
  • An account can contain an unlimited number of containers
  • A container can store an unlimited number of blobs
  • up to 500 TB of total storage per account
  • A single subscription supports up to 50 storage accounts
  • Maximum block blob size = 200 GB
  • Maximum page blob size = 1 TB
  • There is no limit on the number of buckets that you can create in a project
  • There is no limit on the number of objects that you can create in a bucket
  • Maximum object size = 5 TB
Interacting with buckets and objects Interaction with buckets and objects are via the rest API Interaction with containers and blobs are via the rest API Interaction with buckets and objects are via the rest API
Bucket naming The bucket name you choose must be unique across all existing bucket names in Amazon S3

Bucket names must comply with the following requirements:

  • Can contain lowercase letters, numbers, periods (.), underscores (_), and dashes (-)
  • Must start with a number or letter
  • Must be between 3 and 255 characters long
  • Must not be formatted as an IP address (e.g., 30%.255.5.4)

To conform with DNS requirements, AWS recommend following these additional guidelines when creating buckets:

  • Bucket names should not contain underscores (_)
  • Bucket names should be between 3 and 63 characters long
  • Bucket names should not end with a dash
  • Bucket names cannot contain two, adjacent periods
  • Bucket names cannot contain dashes next to periods
The container name must be unique within a storage account

The container name must be a valid DNS name, conforming to the following naming rules:

  • Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.
  • Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in container names.
  • All letters in a container name must be lowercase.
  • Container names must be from 3 through 63 characters long.
  • Avoid blob names that end with a dot (.), a forward slash (/), or a sequence or combination of the two.
Every bucket must have a unique name across the entire Google Cloud Storage namespace

  • Bucket names must contain only lowercase letters, numbers, dashes (-), underscores (_), and dots (.). Names containing dots require verification
  • Bucket names must start and end with a number or letter
  • Bucket names must contain 3 to 63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters
  • Bucket names cannot be represented as an IP address in dotted-decimal notation (for example, 192.168.5.4)
  • Bucket names cannot begin with the “goog” prefix
  • Bucket names cannot contain “google” or close misspellings of “google”
  • If creating a bucket with a custom domain ( e.g ending .com, .co.uk etc) then Domain name verification will be part of the process

For DNS compliance and future compatibility, you should not use underscores (_) or have a period adjacent to another period or dash

Object naming
  • Flat structure
  • The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long
  • Can infer logical hierarchy using keyname prefixes and delimiters. You can use the delimiter ‘/’ to present a folder
  • Flat storage scheme, not a hierarchical scheme
  • A blob name can contain any combination of characters, but reserved URL characters must be properly escaped.
  • A blob name must be at least one character long and cannot be more than 1,024 characters long
  • you may specify a delimiter such as “/” ” within a blob name to create a virtual hierarchy
  • Flat namespace to store objects
  • Object names can contain any combination of Unicode characters (UTF-8 encoded) less than 1024 bytes in length
  • By using “/” in an object name, you can make objects appear as though they’re stored in a hierarchical structure.
Nesting You cannot nest buckets You cannot nest containers You cannot nest buckets
Locality S3 buckets can be created in specific regions Storage accounts can be created in specific regions Cloud Storage buckets can be created in a specific geographical region.Regional buckets are in alpha at time of writing
Security
  • Access to objects and buckets is managed via access control lists (ACLs) and bucket policies. You can use them independently or together
  • Query string authentication
  • Server side encryption with Customer-Provided Keys
  • Server side encryption with Amazon S3 Key Management
  • Encrypt your data at rest using keys that you manage in the AWS Key Management Service
  • Logging – Access (requests) and api calls to S3 via Cloudtrail
  • Access to blobs and containers is controlled via ACL’s which allow you to grant public access and Shared Access signatures which provide more granular access
  • shared access signature
  • Object and container ACL’s
  • Loggng – Transactions, storage account blob and container size details
  • Access to objects and buckets is managed via access control lists (ACLs)
  • Signed URL’s
  • Automatic server-side encryption, Google manages the cryptographic keys on your behalf
  • client-side encryption, you manage your own encryption keys and encrypt data before writing it to Google Cloud Storage. In this case, your data is encrypted twice, once with your keys and once with Google’s keys
  • Logging – Access (requests) and storage logs
Object Consistency
  • Provides read-after-write consistency for PUTS of new objects
  • Eventual consistency for overwrite PUTS and DELETES

For all regions apart from US_standard whch Provides eventual consistency for all requests.

  • Strong read-after-write consistency model for all PUT requests
  • eventually consistent model for all List (GET) operations
  • strong global consistency for all read-after-write, read-after-update, and read-after-delete operations ( Note can override read-after-delete if ache-Control metadata has not been explicitly set to disable caching of the object
  • List operations are eventually consistent
Uploading large objects To load large objects use Multipart upload which allows you to upload a single object as a set of parts. Multipart upload allows the upload of parts in parallel to improve throughput. Smaller part sizes minimize the impact of restarting a failed upload due to a network error. To upload large blobs use block blobs. Block blobs allows the upload of blobs larger than 64MB. It allows the upload of blocks in parallel. It allows the resumption of failed uploads by retrying only the blocks that weren’t already uploaded. Resumable upload
URI request The location of your object is a URL, generally, of the form: http://bucket-name.S3.amazonaws.com/ For a blob, the base URI includes the name of the account, the name of the container, and the name of the blob:
http://youraccount.blob.core.windows.net/yourcontainer/yourblob
The URI for accessing objects storage.googleapis.com/yourbucket/yourobject or yourbucket.storage.googleapis.com/yourobject
If using a CNAME alias to redirect requests use c.storage.googleapis.com in host name portion of CNAME record
Programatic access(check website for languages supported) To access programmatically Use the AWS SDK various languages supported To access programmatically use the Azure SDK various languages supported To access programmatically Use the GCloud SDK various languages supported
Custom Domain support To use a custom domain requires the use of CNAMES or ALIAS records ( if using Route 53) To use a custom domain requires the use of CNAMES To use a custom domain requires the use of CNAMES
Ability to trigger Notification against bucket action Amazon S3 event notifications – can be configured to trigger on any event that results in the creation of an object
including PUTs, POSTs, COPYs, and when a multi-part upload is complete.
No Object change notification – A notification event is sent when a new object is added to a bucket, an existing object’s content or metadata has been modified, or an object is deleted from a bucket.
Lifecycle Management
  • Versioning
  • Object expiration actions
  • Object Archival ( migration to Glacier)
  • Versioning
  • Object deletion policies
Storage Availabilty options
  • Standard redundancy
  • Reduced redundancy
  • Amazon Glacier – for cold storage ( infrequent access)
  • Locally redundant storage
  • Zone redundant
  • Geo redundant
  • Standard storage class
  • Durable Reduced Availability
Host static website Yes Yes Yes

The pricing model is pretty much the same based on amount of storage, Redundancy level selected , request operations and egress charges

Note: If you believe I have missed something out please leave a comment and I’ll review and update accordingly.

Useful Links to start delving: