Advanced Containers
In the first part, we pulled and ran existing container images from Docker Hub. In this section, we will build an image from scratch for running some of our own Python3 code. Then, we will push that image back up to Docker Hub so others may find and use it. After going through this module, students should be able to:
Install and test code in a container interactively
Write a Dockerfile from scratch
Build a Docker image from a Dockerfile
Push a Docker image to Docker Hub
Design Principles: As with the previous lecture, we will see how containers contribute to the portability of software projects
Getting Set Up
Scenario: You are a developer who has written some code the provides a JSON API to some data. You now want to distribute that code for others to use in what you know to be a stable production environment (including OS and dependency versions). End users may want to use this application on their local workstations, in the cloud, or on an HPC cluster.
The first step in a typical container development workflow entails installing and testing an application interactively within a running Docker container.
Note
We recommend doing this on your own student VM. But, one of the most important features of Docker is that it is platform agnostic. These steps could be done anywhere Docker is installed.
To begin, make a new folder for this work and prepare to gather some important files.
[user-vm]$ cd ~/coe-332/
[user-vm]$ mkdir docker/
[user-vm]$ cd docker/
[user-vm]$ pwd
/home/ubuntu/coe332/docker
We’re going to containerize the degrees_api
Flask application from the end of Unit 4.
Specifically, you need your degrees_api.py
script that we wrote together in class. To simplify
our work today, we will just copy the file into our docker directory:
[user-vm]$ cp ~/coe-332/degrees_api.py .
Note
You may need to update the path above to point to your degrees_api.py
file.
Warning
It is important to carefully consider what files and folders are in the same
PATH
as a Dockerfile (known as the ‘build context’). The docker build
process will index and send all files and folders in the same directory as
the Dockerfile to the Docker daemon, so take care not to docker build
at
a root level.
Containerize Code Interactively
There are several questions you must ask yourself when preparing to containerize code for the first time:
What is an appropriate base image?
What dependencies are required for my program?
What is the install process for my program?
What environment variables may be important?
We can work through these questions by performing an interactive installation of our Python script. The official Python container image that worked with last lecture is a an excellent choice for Python projects, as it already contains a well-maintained installation of Python, and we can easily pick the version of Python that we want by selecting different tags. The Python we have been using on the VM is Python 3.8.10, so we’ll stick with that one to avoid any Python-specific issues that could arise from chaging the version.
To get that exact version of the Python offisial image, use docker pull python:3.8.10
.
We can also run a container from the image to
[user-vm]$ docker pull python:3.8.10
[user-vm]$ docker run --rm -it python:3.8.10 /bin/bash
[root@7ad568453e0b /]#
Here is an explanation of the options:
docker run # run a container
--rm # remove the container on exit
-it # interactively attach terminal to inside of container
python:3.8.10 # image and tag from Docker Hub
/bin/bash # shell to start inside container
The command prompt will change, signaling you are now ‘inside’ the container. Let’s check that
we can run python
from the shell:
[root@7ad568453e0b /]$ python
Python 3.8.10 (default, Jun 23 2021, 15:19:53)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
What about Flask? What happens if try to import it from the Python repl?
[root@7ad568453e0b /]$ python
Python 3.8.10 (default, Jun 23 2021, 15:19:53)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import flask
The base Python images have the Python interpreter and standard library, but they do not include
any third-party packages. If we try to import flask
, we’ll get a ModuleNotFoundError
exception.
>>> import flask
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'flask'
To use Flask, we’ll need to install it in the container image ourselves.
Install Required Packages
For our Python Flask program to work, we need to install the Flask
package.
How do we typically install Python packages? We use the pip
package manager. Does
our Python container image have pip
installed? Let’s check:
[root@7ad568453e0b /]# pip -h
Usage:
pip <command> [options]
Commands:
install Install packages.
...
It does! That’s great. So we should be able to use pip to install a particular version of Flask. Which version do we want to install? We can check which version we were using on our VM and use the same one in the container. That way, we won’t hit any issues due to version changes with the package.
Back out in the VM, we can get a list of packages pip knows about using pip freeze
:
[user-vm]$ pip freeze
appdirs==1.4.3
apturl==0.5.2
asttokens==2.2.1
attrs==19.3.0
Automat==0.8.0
backcall==0.2.0
blinker==1.4
Brlapi==0.7.0
cached-property==1.5.1
ceph==1.0.0
cephfs==2.0.0
certifi==2019.11.28
. . .
Wow, that’s a long list! We really just want to know the version of the Flask package, so we can
pipe the output of pip freeze
to grep
to just select lines with Flask
in them (note the capital
F
):
[user-vm]$ pip freeze | grep Flask
Flask==2.2.2
Great, so we need Flask version 2.2.2. Back in the container, we can try to install Flask using pip
[root@7ad568453e0b /]$ pip install Flask==2.2.2
Collecting Flask==2.2.2
Downloading Flask-2.2.2-py3-none-any.whl (101 kB)
|████████████████████████████████| 101 kB 3.3 MB/s
Collecting itsdangerous>=2.0
. . .
Successfully installed Flask-2.2.2 Jinja2-3.1.2 MarkupSafe-2.1.2 Werkzeug-2.2.3 click-8.1.3 importlib-metadata-6.0.0 itsdangerous-2.1.2 zipp-3.14.0
That worked! Note that when pip installed Flask 2.2.2 it also installed its dependencies for us.
Warning
An important question to ask is: Does the versions of Python and other dependencies match the versions you are developing with in your local environment? If not, make sure to install the correct version of Python.
Assemble a Dockerfile
After going through the build process interactively, we can translate our build steps into a Dockerfile using the directives described below. A Dockerfile is just a text file that contains commands for building a new image. We’ll cover a few of the different Dockerfile instructions below.
Create a new file called Dockerfile
and open it with a text editor.
The FROM Instruction
We can use the FROM instruction to start our new image from a known base image. This should be the first line of our Dockerfile. In our scenario, we want to use an official Python image that contains the same version of Python that we have been using on the VM:
FROM python:3.8.10
At this point, our new image just has the Python 3.8.10 official image in it.
The RUN Instruction
We can install updates, install new software, or download code to our image by
running commands with the RUN instruction. The RUN instruction works by literally
running the command line provided after RUN
in the existing container image.
Any files created, modified or deleted by the command line will be correspondingly
changes in the image.
In our case, our only dependency was the Flask library which we can install with pip
.
We will use a RUN instruction to execute the pip
command to install it.
RUN pip install Flask==2.2.2
Each RUN instruction creates an intermediate image (called a ‘layer’). Too many layers makes the Docker image less performant, and makes building less efficient. We can minimize the number of layers by combining RUN instructions. Dependencies that are more likely to change over time (e.g. Python3 libraries) still might be better off in in their own RUN instruction in order to save time building later on.
The COPY Instruction
Now we need to add our flask application. There are a couple different ways to get your source code inside the image. When you are developing, the most practical methods is usually to copy code in from the Docker build context using the COPY instruction. For example, we can copy our script to the root-level / directory with the following instructions:
COPY degrees_api.py /degrees_api.py
The CMD Instruction
Another useful instruction is the CMD
instruction. This sets a default command line
to run in the container when none is provided to a docker run
command that makes use
of the image. To run our flask application, we can simply execute the file with python:
[user-vm]$ python degrees_api.py
To provide the command line to CMD
instruction, separate each part of the command line
into a list of strings.
CMD ["python", "degrees_api.py"]
Putting It All Together
The contents of the final Dockerfile should look like:
1FROM python:3.8.10
2
3RUN pip install Flask==2.2.2
4
5COPY degrees_api.py /degrees_api.py
6
7CMD ["python", "degrees_api.py"]
Build the Image
Once the Dockerfile is written and we are satisfied that we have minimized the number of layers, the next step is to build an image. Building a Docker image generally takes the form:
[user-vm]$ docker build -t <dockerhubusername>/<code>:<version> .
The -t
flag is used to name or ‘tag’ the image with a descriptive name and
version. Optionally, you can preface the tag with your Docker Hub username.
Adding that namespace allows you to push your image to a public registry and
share it with others. The trailing dot ‘.
’ in the line above simply
indicates the location of the Dockerfile (a single ‘.
’ means ‘the current
directory’).
To build the image, use:
[user-vm]$ docker build -t username/degrees_api:1.0 .
Note
Don’t forget to replace ‘username’ with your Docker Hub username.
Use docker images
to ensure you see a copy of your image has been built. You can
also use docker inspect to find out more information about the image.
[user-vm]$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
jstubbs/degrees_api 1.0 2883079fad18 About a minute ago 928MB
...
[user-vm]$ docker inspect username/degrees_api:1.0
If you need to rename your image, you can either re-tag it with docker tag
, or
you can remove it with docker rmi
and build it again. Issue each of the
commands on an empty command line to find out usage information.
Test the Image
We can now test our newly-built image! Let’s start a container from the image
using the docker run
command. Execute the following in your VM:
[user-vm]$ docker run -it --rm -p 5000:5000 username/degrees_api:1.0
* Serving Flask app 'degrees_api'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://172.17.0.2:5000
Press CTRL+C to quit
* Restarting with stat
* Debugger is active!
* Debugger PIN: 127-571-634
Note the use of the -p
flag to bind a port on the container to a port on the VM. We’ll talk more about
container networking later, but for now, understand that every container gets a complete set of “private”
ports that, by default, are not connected to the host ports. Since our Flask application listens on port
5000, if we do not connect the container’s port 5000 to the host’s port 5000, then we won’t be able to communicate
with our Flask program on the VM.
The -p
flag takes the form <host port>:<container port>
and connects the two.
With our degrees_api
container running, let’s use curl in another window to interact with our
program:
[user-vm]$ curl 127.0.0.1:5000/degrees
[
{
"degrees": 5818,
"id": 0,
"year": 1990
},
{
"degrees": 5725,
"id": 1,
"year": 1991
},
{
"degrees": 6005,
"id": 2,
"year": 1992
},
{
"degrees": 6123,
"id": 3,
"year": 1993
},
{
"degrees": 6096,
"id": 4,
"year": 1994
}
]
It worked!
Application Configuration Files and Container Volume Mounts
Some times we need to provide additional files to a container that we do not want to include in the public image available on Docker Hub. For example, if our application needs to use passwords or other secret information we don’t want to include that data in the public image because then anyone who downloaded the image could see the secrets.
Moreover, it is often convenient to allow applications to be configured to run in different ways. One way to allow them to be configured is with a configuration file. The configuration file contains settings and other values that the software reads to know how it should run.
First, we’ll modify our Flask application to use a configuration file. We’ll use YAML because it is easy to read and write, it can include comments, and it will give us some practice for when we write Kubernetes YAML files later in the semester.
In order to work with yaml, we’ll use the pyyaml
library. The pyyaml
library must be installed with pip
,
for example:
[user-vm] pip install pyyaml==6.0
For the very first version of our configuration file, we’ll accept a single configuration called debug
that
takes a bool
value (true or false). If debug
is set to true
in the configuration file, we’ll start our
server in debug mode; otherwise, we’ll start our server in normal mode.
Our strategy will be to look for a file called config.yaml
in the current working directory; that is,
in the same directory as our degress_api.py
file. We’ll use the following rules for configuring our application:
If the file doesn’t exist or if the file is not valid YAML we’ll simply ignore the configuration file and start the server in debug mode.
If the file exists and is valid YAML, we will read the file, check for a config variable called
debug
, and use the value, if it exists.If the configuration file exists and is valid YAML but
debug
is not set in it, we will ignore it and start the server in debug mode.
Reading the Configuration File
Our first step is to read the configuration file. For that, we need to import the yaml
package:
1from flask import Flask, request
2import json
3import yaml
To read the file, we’ll use the yaml.safe_load()
function. We’ll create a function called get_config
.
Our first attempt might look something like this:
def get_config():
with open('config.yaml', 'r') as f:
return yaml.safe_load(f)
However, we’ll have a problem if either the file does not exist or is not valid YAML. How can we fix it?
Note
We can use exceptions to handle different kinds of errors at run time.
Let’s use a try.. except
block to handle the different errors:
def get_config():
default_config = {"debug": True}
try:
with open('config.yaml', 'r') as f:
return yaml.safe_load(f)
except Exception as e:
print(f"Couldn't load the config file; details: {e}")
# if we couldn't load the config file, return the default config
return default_config
Use the Configuration to Start Flask
We need to read the configuration file before we start the flask server so we can know
whether to use the debug
mode. We can add a call to get_config()
in the main block:
1if __name__ == '__main__':
2 config = get_config()
3 if config.get('debug', True):
4 app.run(debug=True, host='0.0.0.0')
5 else:
6 app.run(host='0.0.0.0')
If we start up our Docker container again as before, then there will not be a configuration file, so we should see the default behavior of debug mode. We should also see a message indicating that it could not find the configuration file:
[user-vm] docker run -p 5000:5000 --rm -it jstubbs/c23-flask2
Couldn't load the config file; details: [Errno 2] No such file or directory: 'config.yaml'
* Serving Flask app 'degrees_api'
* Debug mode: on
Using Volume Mounts to Add a Configuration File
Now, let’s test our new configuration feature by providing a configuration file. We won’t add the configuration file to the image. Instead, we’ll allow operators to write their own and add it directly to the container. How will we do that?
We’ll use a Docker volume mount. In Docker, a volume mount allows you to add files and directories from the host
computer to a container. To add a volume mount, use the -v </path/on/host>:</path/in/container>
format.
Note
The host path and the container path should be provided as absolute paths.
Note
If you mount a path on the host to a path in the container that already exists in the image, the contents in the image will be replaced by those on the host path.
Let’s first create a new file called config.yaml
in the current directory and place the following contents
into it:
debug: false
Now let’s modify our docker run
statement to include a volume mount that adds the configuration file
into our running container:
docker run -p 5000:5000 -v /home/ubuntu/coe332/docker2/config.yaml:/config.yaml --rm jstubbs/degrees_api
Here is an explanation of the options used above:
docker run # run a container
-p 5000:5000 # map the host port 5000 to the container port 5000
-v /home/ubuntu/coe332/docker2/config.yaml # mount the config.yaml file on the host into the container.
:/config.yaml
--rm # remove the container after the program exits
jstubbs/degrees_api # image to use for the container
Now you should see output similar to:
* Serving Flask app 'degrees_api'
* Debug mode: off
Great! The flask app read our configuration file and turned off debug mode.