Towards a Strong Mental Model of Docker

TL;DR I'm a full stack engineer who works almost exclusively on Macs. I don't have a computer science degree nor do I have strong experience with Linux architecture. This post documents my journey of learning Docker.

What We'll Learn

If you stick with this blog post, you'll have decent answers for the following questions. If you've already used Docker, these questions will strengthen your understanding.

  1. What is "Docker"?
  2. What is a "container"? Did Docker invent containers?
  3. What does it mean to "run a container"?
  4. Are containers stateful? What does container state even mean?
  5. Do containers run only one process? Or can they run multiple processes?
  6. What exactly is "Docker for Mac"? I thought Docker was cross platform?
  7. Does Docker work for Windows development?
  8. What are "docker-compose" and "BuildKit"? Why are there so many different applications?
  9. Most Docker images start with something like FROM ubuntu. Is my container running a full Ubuntu operating system? If not, why is it from Ubuntu? In fact, what does it even mean to "run an operating system?"

The summarized answers for all of these questions are at the end of this post.

Towards a Strong Mental Model

How might you explain React.js to a newcomer? You could say:

React is a JavaScript library for building user interfaces

I don't think this would help a newcomer form a mental model of React, as it's too high level. You could also say:

React is a declarative DSL that outputs a tree-like "virtual DOM," mutating the DOM (or another view target) with O(n) reconciliation performance, as a pure function of component state and props

And I think, justifiably, the newcomer would slap you in the face, because you just told them to go to hell.

Something more approachable is:

React is mainly a Javascript library that takes in data, like a username, and spits out HTML based on that data. It gives you a performant way to update the view in real time based on changes to that data. React can do a lot more than this, but the most common use case is taking in data and spitting out HTML.

Does this accurately describe all of React? No, and once someone dives in, they'll have to learn the intricacies of the API. But it's more approachable.

I find the Docker documentation is written by, and targeted to, people who already have a deeper understanding of Linux fundamentals, virtualization, and kernels versus operating systems.

If someone tells you that Docker is "cgroups and namespaces" with no other information, it's technically accurate, but not approachable.

Why Care About Learning Docker?

If you just want to build web servers, why care about Docker at all? Well, I've personally encountered many of the issues containers claim to solve:

  • Requiring multiple versions of userland libraries (like Node, and also npm packages) in different repositories, causing conflicts for local development because they aren't isolated
  • Requiring multiple service or database versions causing conflict in local development (have you ever tried to run two versions of MySQL on a Mac using Homebrew?)
  • Running into dependency installation issues when checking out both old and new projects (Ruby / Rails are particularly bad at this, damn you Nokogiri). Who hasn't run an old project only to learn everything is broken?
  • Needing to easily run multiple services for local development, some of which need multiple commands (such as both a server process and a static file builder process like Webpack)
  • Losing parity between environments due to version or process execution discrepancies, making it impossible to reproduce situations that cause some production bugs

What This Post Isn't

This post isn't a deep dive on Docker's API. We won't explain the syntax of Dockerfiles, docker-compose.yml files, nor the CLI. We also won't talk much about "orchestration," the verb used to describe running multiple containers and their dependencies and dealing with things like networking and automatically restarting containers.

From Zero

What is Docker?

Believe it or not, let's start with the Docker logo. It's great for our mental model!

The Docker Logo, which is a whale with a stack of shipping containers on its back

We have a whale carrying some shipping containers. (His name is "Moby Dock," Moby for short). He's a whale because, a long time ago, he beat out other animals in a community logo design contest.

In our mental model, the whale is Docker. Docker is a platform. It's not one thing, it's a collection of different technologies and standards that provide a platform to run containers! The whale is supporting, or "running" containers. We'll learn more about "containers" soon.

Key icon Key concept: Docker is a platform, including online hosting, and a suite of local tools, like docker-compose and the Docker engine.

When learning Docker, you might wonder what the "right" or "Docker" way is. By viewing Docker as a platform, we realize there are multiple ways to do things. It's like Git: Git is a platform and suite of command line tools. In Git there are different ways to use the suite of tools. Some people use forks, some use feature branches, some use rebasing, some use merging. Docker provides you tools to build "images" and "run" "containers," with multiple ways to achieve the same thing.

"Docker" also means the company. I think this overloading makes learning Docker confusing. The Docker platform is entirely open source, so what does the company do? It makes money by charging for private hosting and management tools, and offering enterprise services for complex image/container management.

An aside: In 2019 Docker was acquired by a company named Mirantis which makes money by supporting cloud application development around Kubernetes.

"Containers"

What is a "container"? Did Docker invent containers?

Containers were the most difficult part of Docker for me to understand. I had some idea they were running a process in isolation, but I didn't really know what that meant. To answer this, let's make sure we understand the Docker platform by looking at a different question.

What exactly is "Docker for Mac?" I thought Docker worked on all platforms?

I initially thought Docker was "cross platform." This is both true and a complete, filthy lie.

Take Docker for Mac out of the equation and pretend like you're developing directly on a Linux operating system, like Ubuntu. All operating systems have a "kernel", the core computer program of the operating system that controls everything and facilitates access to hardware.

The Linux kernel has features that allow you to run processes (computer programs) in "isolation":

  • "Control groups", usually called "cgroups," let you run a Linux process with a specifically allocated set of resources, such as how much memory it has.
  • "Namespaces", which add additional isolation, like making your process look like it's the only one on the system.

Key icon Key concept: Docker uses existing Linux features to run containers. These features only work on Linux. MacOS and Windows don't have cgroups nor namespaces.

So what is Docker for Mac, if these technologies only exist on Linux? Docker for Mac runs a full Linux "virtual machine." Virtual machines are a broader topic, but they contain full operating systems, including kernels, so we have access to these Linux features.

My mental image is the whale icon on my Mac's status bar holds a little Linux computer.

This also has the implication that the most common type of application developed with Docker is an application for Linux. For example, you can't run a Mac native application like Photoshop in a Docker container. On Windows there's Docker container support for both Windows (.exe) applications as well as Linux applications, but they use different underlying technologies. And you can't run a Windows container on a Mac. See what I mean about Docker not really being "cross platform"?

On Mac and Windows, the Linux virtual machine is hidden from you as an implementation detail. You can typically SSH into computers and run commands on them. Docker for Mac doesn't let you directly SSH into this virtual machine. Instead, the Docker commands you run locally are what coordinate sending files and commands into this Linux virtual machine.

Finally, did Docker invent containers? No, containers are a general concept. Docker didn't invent containers. You can create and run containers without Docker. Cgroups and namespaces also aren't the only way to run processes in isolation, they're just the tools Docker uses.

So does Docker work on Windows?

The official Docker Enterprise video, at 0:45, mentions "Windows containers." So you can use Docker on Windows?

It depends on what you mean by "use on Windows." If you're developing Linux applications, then Docker for Windows does the same thing as Docker for Mac, it runs a Linux virtual machine, and you communicate with that.

If however you want to develop an application that only runs on Windows, like a file ending in ".exe," you have to run your containers on a Windows host somehow. This is supported by Docker Enterprise, but much less common.

What does it mean to "run a container"?

"Container" is a noun. What we've talked about so far is the verb of "running" a container.

Recap: When Docker runs your container, it starts your process (your computer program, developed for Linux architecture) wrapped in suite of Linux kernel tools to make it look like it's the only process on the system. Docker for Mac (and Windows) provide a Linux virtual machine so you can use these technologies. Running the container is basically the same as what you'd do on your host machine (like npm start or rails s), it's just in "isolation" inside the Linux VM.

Let's poke at containers a little more to strengthen our understanding.

Do containers have only one process? Or can they have multiple processes?

Let's check! First let's install htop, the awesome process viewer on your Mac (not in a container): brew install htop.

Now run htop on your Mac. If you haven't used htop, don't worry too much about the interface, just note you see all sorts of processes running:

A screenshot of htop running on Mac, showing multiple processes

Press ctrl-c to exit htop. Now let's see what htop shows inside a Docker container. Let's run bash, a shell, a process, inside a container.

docker run -it ubuntu bash  

Explanation:

  • We use the image named ubuntu, which Docker will automatically download from the image hosting platform "Docker Hub."
  • Run the computer program bash as the only process in the container.
  • The container will be -i, interactive, required for shells, but hard to explain.
  • And -t, a "pseudo-tty", essentially running one program on the host and one in the Linux VM that forwards key presses so you can type in the container.
  • We run the container, which is actually two commands: create and start, which hints at the noun container.

Back to our question about multiple processes. In the container, install and run htop:

# install htop in the container
apt-get update && apt-get install htop

# then run it
htop  

htop output in an ubuntu container showing what appears to be two two level processes, bash and htop

We see two processes in our container, htop and bash. So containers can run multiple top level processes? Let's check. Still in htop, press t to switch to tree view:

htop running in an ubuntu container showing that htop is a child process of bash

It's subtle, but htop is showing us that it's running as child process of bash, which makes sense, since we ran it from bash.

The command you start the container with becomes the only "top level" process in the container. Also notice that in the leftmost PID (process ID) column, bash has PID 1, which has special responsibilities in Linux. So... containers can't run multiple top level processes?

Let's do one more thing. On your Mac, still with your container running, open a new terminal and type docker ps. You should see your container:

docker ps  
CONTAINER ID   IMAGE    COMMAND   CREATED          STATUS  
31a94aa45fbe   ubuntu   "bash"    2 minutes ago    Up 2 minutes  

Grab the container ID, and run another bash command in the container. docker exec is a command which executes a command against a running container. It doesn't create a new container.

docker exec -it YOUR_ID_FROM_ABOVE bash  

Then check htop again:

htop running in an ubuntu container showing that we ha

We see two bash programs, and the new program is not a child of the first bash you ran, which has PID 1. So yes, Docker containers can run multiple "top level" processes!

Under the hood, a container isn't technically 1:1 with a process. Instead, a container is 1:1 with a process namespace, which can hold multiple processes. This is the namespace created by Linux's namespace feature that Docker uses when you run a container. Most of the time, you only run one process in a container. docker exec isn't usually something you run as part of deploying containers. It's more common in debugging.

However, if you kill the initial process, the whole container stops! If you type kill 1 in either bash process, both terminals end abruptly.

Key icon Key concept: Docker containers create a process namespace that can run multiple processes. The first process in Linux with PID 1 has special responsibilities, and if the first process stops, the whole container stops.

We're almost at a strong understanding of containers. But there's something else we should know.

Do containers have state? What does container state even mean?

Once again, let's check! If you're still running both instances of bash from the previous section, exit out of the second one (you can type exit in the container to exit bash, or use the keyboard shortcut ctrl-d to do the same thing). You should now still have bash running in the container. If you haven't done the previous section, start a new container with:

docker run -it ubuntu bash  

Now in bash in the container, let's create a file:

touch MyFile  

Now let's exit the container by typing exit or hitting ctrl-d. You should be back on your host terminal now. Let's run bash again and check for the file:

# On your Mac Terminal
docker run -it ubuntu bash

# Then, in the container
ls  

Our file isn't there anymore! So containers are stateless? Well, in this case, it turns out we've actually created and ran two separate containers!

Now let's run a new container, and name it for easier referencing later:

docker run --name docker_tutorial -it ubuntu bash  

Again, let's create a file in the container: touch MyFile and exit the container with ctrl-d.

Let's look for the container named docker_tutorial. Run docker ps. There's no container with that name. So is this an ephemeral, stateless container? What is happening when we "exit the container?"

Our container is stopped. It still exists! Let's look for all containers:

docker ps -a

CONTAINER ID  COMMAND  STATUS  NAMES  
fb8fcf9e670d  "bash"   Exited  docker_tutorial  

Let's try to execute bash against the stopped container:

docker exec -it docker_tutorial bash

Error response from daemon: Container bc1baaa2d is not running  

Hmm, that didn't work. Our bash process is completely dead and can't be revived. Even if we docker restarted the container, bash would exit instantly because it's no longer interactive and accepting user input. So while containers can run multiple processes, we can't get back in by running a parallel bash instance.

Instead we can use the little known docker start command to restart the container, which has --attach and --interactive as flags. Notice that start doesn't accept a command to run. It restarts the same process (bash) as what we originally ran the container with, but it will be a new bash process.

docker start -ai docker_tutorial  

And finally, let's check for our file:

# In the container
ls  
# You should see:
MyFile  

Even though we stopped the and restarted the container, the file was persisted. We can definitively say, yes, Docker containers have state, even between container restarts! The state is artifacts on the filesystem. When you run a container, Docker creates a thin "writeable layer" as the final layer, which is part of the "container." We won't dive into layers, but think of them as filesystem diffs.

So "running a container" is two steps: creating the container (which includes creating the writeable layer) and then running it. We can stop a container by stopping the container's primary process, but the container and its state live on.

If you're new to Docker, when you run ps -a, you might see dozens of stopped containers lying around. You can manually remove them with docker rm CONTAINER_ID, but another useful trick is the --rm flag, which tells Docker to remove the container immediately after its stopped.

docker run --rm -it ubuntu bash  
exit  
# Because you ran bash with --rm, it won't show up below
docker ps -a  

Handy!

Solidifying our understanding of containers

Let's sum up our more foundational idea of what a "container" is:

  • A container is a noun that is created from an image by adding a writeable filesystem as a layer to the image.
  • A container is run with a process that's isolated from other processes using Linux technologies. By default, when this process exits, the container stops.
  • All of this happens in Linux, and if on Mac, in a Linux VM. Containers do not run natively on Mac or Windows.
  • A container is not 1:1 with a process, it's 1:1 with a process namespace. We can execute multiple commands in a container, and can restart stopped containers. Although most of the time, you only run one process in one container.
  • Containers hold state in the form of a final writeable layer. State persists between container stops and starts.
  • Containers can be removed. Removing a container will permanently delete its writeable layer and state.
  • If you end up with lots of containers lying around, now you know why. They must be explicitly removed, or started with --rm, which will remove them when stopped.

Containers Vs Operating Systems

Is my container running a full Ubuntu operating system?

Most Docker images start with something like FROM ubuntu. Is my container running a full Ubuntu operating system? If not, why is it from Ubuntu? In fact, what does it even mean to "run an operating system?

If you haven't already, on your Mac, run brew install htop and then run it with htop. Now press t to switch to tree view. Look at the top of the process tree, and you'll see that every process is a child of two parent processes:

A screenshot of htop for Mac showing that kernel_task and launchd are the two top level processes in tree view

When your computer boots, it loads the kernel program into memory and hands over control to it. The kernel is the core glue between the hardware and the operating system. Then the kernel runs a process, which you can see on your Mac as launchd. Processes ending in d signify they're a "daemon," which is a process that runs in the background and doesn't accept user input.

I think of launchd as the program that boots the operating system. It's responsible for setting up networking, and scheduling jobs and services the operating system needs.

All operating systems have some variation of this kernel and initialization program. On your Mac, the initialization program is launchd. On the server hosting this website, I see /sbin/init as the root process. It's different per operating system, and can also be changed by savvy users.

Now let's run htop in Ubuntu in a Docker container again:

# on your host mac
docker run --rm -it ubuntu bash

# then, in the container
apt-get update && apt-get install htop  
htop  

htop running in an ubuntu Docker container showing the only processes visible are bash and htop itself

Nope, we don't see an init process here! So no, we aren't booting a full Ubuntu operating system in the container. Remember that this all happens in isolation in a Linux VM. The Linux VM does have an init process, otherwise it wouldn't be running. We just can't see it because we're running in isolation in a process namespace.

So what does it mean to be FROM ubuntu and why would we specify an operating system? Recall that earlier we made a distinction between the kernel and the operating system. The operating system (in this case, Ubuntu) brings along lots of software and libraries that we want to be repeatable between builds and environments. We've already seen Ubuntu has the shell bash in it (vs Alpine Linux FROM alpine, which only has sh, not bash).

We've also seen operating systems have their own package managers (apt-get in our case), which we want to specify per-container to ensure we can install dependencies for the right operating system. Operating systems also have system libraries that our programs might depend on to run.

The Docker Platform

What are "docker-compose" and "BuildKit"? Why are there so many different applications?

Docker only gives us a fairly low level and clunky CLI to build and manage individual containers and images. If we want to repeatedly run multiple containers locally based off a configuration file, we turn to docker-compose. This introduces two concepts:

  • docker-compose.yml, which is that configuration file of the containers you want to run.
  • "Services," which is the definition of the Dockerfile, environment variables, networking, and other configuration your service needs.

docker-compose also has much nicer command line interface, partly because most of the options you would pass to docker by hand are captured in the docker-compose.yml file. It's also more thoughtfully designed than docker, for example it assumes interactive by default when you docker-compose run servicename bash.

Docker itself is lacking in many features we need for modern application development. The default Docker engine doesn't have easy secret management inside containers (like passwords, which you don't want to store in the container). BuildKit is a work-in-progress replacement for the Docker build engine by the Docker team. You have to opt in to using it with docker with DOCKER_BUILDKIT=1 docker build.... It adds many features the Docker engine is dearly lacking.

If you're feeling like these tools are disjointed, you're right. Docker tooling is still immature and full of pitfalls, bugs, and inconsistencies. This is despite Docker being around for years. Examples:

  • Docker is sometimes unusably slow on Mac.
  • docker-compose is inconsistent with itself. It has a version 2 and a version 3, but version 3 isn't an upgrade, it's a fork. You should only use version 3 if you're running swarm locally. Version 2 has the useful depends_on for describing which local services should start before others.
  • docker and docker-compose build different images, and have for five years.
  • BuildKit isn't in core, and how you use BuildKit with docker is inconsistent with how you use BuildKit with docker-compose.

To Docker

We started this journey with the section "from zero," so let's end it with "to Docker." Let's abbreviate the answers to the questions we started off with:

1. What is "Docker"?

Docker is both the name of a suite of tools to manage and run processes in isolation, and a company that provides paid services for doing the same.

2. What is a "container"? Did Docker invent containers?

A container is a noun that's created by adding a writeable filesystem to an image. Docker didn't invent containers and the tools that Docker uses, "cgroups" and "namespaces," aren't the only way to run processes in isolation.

3. What does it mean to "run a container"?

A process is run in an isolated "namespace." The process is usually, but not technically, 1:1 with the container.

4. Do containers have state? What does container state even mean?

Containers have state in the form of a writeable filesystem from a final layer placed over the image as part of creating the container. Container state is persisted for as long as the container lives (whether stopped or started). Removing a container removes the state.

5. Do containers have only one process? Or one process tree? Or can they have multiple processes?

Containers are usually 1:1 with a process, but can have more than one process run in them. It's more accurate to say a container is 1:1 with a process namespace. If you stop the process a container was started with, the entire container stops, even if other processes are still running in it.

6. What exactly is "Docker for Mac"?

Docker for Mac is a suite of tools for managing communicating with a Linux Virtual Machine that builds and runs containers.

7. Does Docker work for Windows development?

Docker Desktop for Windows also uses Linux virtualization. Running Windows programs like .exes is quite rare and requires a Windows host environment to do it on. Docker Enterprise is one of the only platforms that currently supports this.

8. For that matter, what are "docker-compose" and "BuildKit"? Why are there so many different applications?

docker-compose is a tool for managing multiple "services." BuildKit is a work-in-progress replacement for the Docker build engine which adds much needed features.

9. Is my container running a full Ubuntu operating system?

No, the init process isn't being run in the container, although it is run on the parent Linux VM in Docker for Mac. The process you run in the container take advantage of programs provided by the operating system, like package managers.

And Beyond

This post documents my journey from understanding very little about Docker to intermediate understanding. I hope it also brought you further in your journey.

Where to go from here

The best resource I found that helped me understand Docker is the Udemy course Docker for Node.js Projects. Seriously, the class is incredible, in depth, and approachable. I recommend it to anyone building web services using Docker. This blog post isn't possible without it.

This post didn't cover many topics. We didn't talk about images, layers, volumes, Dockerhub, networking, orchestration, how to use the command line tools, nor build docker-compose.yml files. I hope armed with stronger foundations, you are more easily able to dive into the Docker documentation to continue learning.

If you found this post helpful, consider following me on Twitter where I share updates about new content and projects I'm working on!

comments powered by Disqus