The term "containers" became popular in the recent times, thanks to
Docker. However, the idea of containers is there for long, through things like:
Solaris Zones,
Linux Containers, etc. (even though the underlying implementations are
different). In this post, I try to give a small overview of the containers ecosystem (as it stands in 2017), from my perspective.
This post is written in response to a question by, hacker extraordinaire,
Varun on what one should know about Containers as of today. Though the document is mostly generic, some lines of it are India specific, which I have highlighted clearly. Please mention in comments, if there is anything else that should have been covered, or if I have made any mistakes or if you have any opinions.
So, What exactly are Containers ?
Containers are an unit of packaging and deployment, that will guarantee repeatability and isolation. Let us see what each part of that sentence means.
Containers are a packaging tool like
RPMs or
EARs in the sense that they offer you a way to bundle up your binaries (or sources in case of interpreted languages). But instead of merely archiving your sources, Containers provide a way to even deploy your archive, repeatably too.
Anyone who has done packaging, knows, how much of a pain
dependency-hell can cause. For example, An application A needs a library L of version 0.1, whereas another application B needs the same library L but of version 0.3 Just to screw up the life of packagers, the versions 0.1 and 0.3 may be conflicting each other and may not co-exist in a system, even in different installation paths. Containerising your application puts each of these applications A and B into their own bundle, with their own library dependencies. However, the real power of containerising is that for each of your application, A and B, they get a view of isolation that they are running in a private environment and so L1 0.1 and 0.3 may never share any runtime data.
One may be reminded about Virtual Machines (VMs) while reading the above text. Even VMs solve the above isolation problem, but they are very heavy. The fundamental difference between a VM and a Container is: a VM virutalizes/abstracts a hardware/operating-system and gives you a machine abstraction, while a Container virtualizes/abstracts an application of your choice. Containers are thus very lightweight and far more approachable.
The Ecosystem
Docker is the most used container technology today. There are other container runtimes such as
rkt too. There is an
Open Containers Initiative to create standards for container runtimes. All these container runtimes make use of linux kernel features, especially,
cgroups to provide process isolation. Microsoft has been making a lot of efforts to support containers natively in the Windows kernel, to support Containers natively as part of their Azure cloud offering for quite some time now.
Container Orchestration is a way for deploying different containers on a bunch of machines. While Docker is arguably the champion of container runtimes,
Kubernetes is unarguably the King/Queen of container orchestration. Google has been using containers in production, for much long before it became fashionable. In fact the first patch of cgroups support in the linux kernel was
submitted to LKML by Google as far back as 2006. Google had/s a large scale cluster management system named
Borg which deployed containers (not docker containers) across the humongous google cloud farm. Kubernetes is an
open source evolution of Borg, supporting Docker containers natively. Docker-Swarm is an attempt by Docker (the company behind the Docker project) to achieve container orchestration across machines, but there simply is no competition in terms of quality or documentation or feature coverage, compared to Kubernetes (in my limited experience).
Also, in addition to these, There are some poorly implemented, company-specific tools that try to emulate Kubernetes, but these are mostly technical debt and it is wise (imho) for companies to ditch such efforts and move to open projects backed by companies like Google, Red Hat and Microsoft. A
distinguished engineer once told me,
There is no compression algorithm for experience and there is no need for us to repeat the mistakes made by these companies, decades ago. If you are a startup focussing on solving an user problem, you should focus on your business problem and a container orchestration software should be the last thing that you need to implement.
Kubernetes, though initially a Google project, has now attracted a lot of contributors from a variety of companies such as Red Hat, Microsoft etc. Red Hat have built
OpenShift, a platform that provides a lot of useful features such as, Pipelines, Blue-Green deployments, etc. on top of Kubernetes. They even offer a
hosted version.
Tectonic (on top of Kubernetes) by Core OS is also a big (at least in terms of developer mindshare) player in this ecosystem.
SUSE has come up recently with the
Kubic project for containers (even though I have not played with it myself).
Microsoft have hired some high profile names in the container ecosystem for working on the Kubernetes + Azure (Including people like:
Brendan Burns,
Jess Frazelle, etc.) cloud. Azure is definitely way ahead of Google in India, when it comes to cloud business. Their pricing page is localised for India, while Google does not even support Indian currency yet and charges in USD (leading to jokes like the oil/dollar conspiracy, among the Indian startup ecosystem ;) ). AWS and Azure definitely have a bigger developer mindshare in India than Google Cloud Platform (as of 2017).
The founding team of kubernetes (Xooglers) have started a company named
Heptio. While I have no doubts on their engineering prowess, I am skeptical if relying on these companies may be risky for startups in India (lack of same timezone support, etc.). If you are in the west, these options (and others such as
rancher) may be interesting.
Kubernetes Basics
In Kubernetes, the unit of deployment is a Pod. A pod is merely a collection of Docker containers which will be deployed together always. For example, if your application is a API server that makes use of a Redis cache, before hitting the database for each request, you create a Pod with two containers, a API server container and a Redis container and you deploy them together.
Kubernetes refers to an umbrella of projects that run on a cloud, to manage a cloud. It has various components, such as an API server to interact with the kubernetes system, an agent software named kubelet that runs on each machine in the cloud, a fluentd type of daemon to accumulate logs from various containers and provide a single point of access, a web dashboard, a CLI tool named
kubectl to perform various options, etc. In addition to these kubernetes specific components, there are also other services, such as the distributed hashstore
etcd (originally from coreos) that you need to setup a basic kubernetes cluster. However, If you are a small company, It'll be wise to make use of GKE or Azure hosting or OpenShift hosting instead of deploying your own kubernetes system managed by your own admins. It is not worth the hassle.
If you want to play with kubernetes in your development laptop (unless you can afford to treat production as your test box), there is a tool named
minikube to help you with that. If you are an application developer and considering to dockerizing and deploying your application, then minikube is definitely the best place to start.
There are quite a few meetups happening for kubernetes all around the world. Visiting some of these may be enlightening. The webinar
series by Janakiram was good, but it is a little too long to my taste and I lost interest halfway. The persistent ones among you may find it very useful.
Docker Compose
One of the tools from the Docker project that I love a lot is the handy
Docker Compose. It is a tool to work with multiple containers, in a sense it is somewhat like your kubernetes Pods, but without having to install / manage the heavyweight kubernetes ecosystem. I use Docker Compose extensively in CI, where it is the perfect fit for doing end-to-end testing of a webstack, if your sources are in a monolithic repository. In your CI system, you can bring up all your components (say, an API server, a database, a front end node server) and perform an end-to-end testing (say, via selenium). In fact, I cannot fathom how I was doing CI earlier without docker-compose, (just like how I cannot fathom how I used cvs before git, etc.)
AWS
No blog post on cloud technologies will be complete, without mentioning the 800 pound gorilla, Amazon Web Services. Amazon supports containers natively. You can deploy either a single container or multi-container images natively, via
Amazon Beanstalk. It is very much similar to the Google Appengine (if you have used it). Beanstalk is a PaaS offering and it takes a Container image and scales it automagically depending on various factors (such as CPU usage, HTTP usage, etc.). I've run Beanstalk and is very satisfied with it (perhaps not as much as with AppEngine though). It is very reliable, performant and scales well (tested for a few hundred users in my limited experience).
For the larger workloads and those who want more control, Amazon offers
Elastic Container Service. You can create a bunch of EC2 instances and a bunch of Containers, and ask ECS to run these containers on these VMs in a way that you prefer. This, however locks you to the AWS platform (unlike k8s).
Both Beanstalk and ECS do not cost anything extra other than the price of VMs, which you already pay.
I, however, wish that Amazon starts supporting kubernetes natively. There are other ways to make use of kubernetes in AWS. The most enterprisey is probably
Tectonic by Core OS, but we also have projects like
kube-aws and
kops.
Conclusion:
If you have actually read until this point, Thanks a lot :-) I could have written a little bit in detail about the nuts and bolts of the containers technology, but I believe that this post, as is, will be a good material for a 101 type of introduction. Also, there are people with far more working knowledge than me, who are more equipped to write on the details. So, I have left it as an exercise to the readers to find such talks, blogs or books :)