Docker provides a simple way to configure any docker image and create your own custom image with the help of the dockerfile.
I’ve always been wondering how Docker works in this regards, and whether I should either make as many “RUN apt-get install” commands as possible, or if I should instead try to use as few RUN commands as possible, as these increases the number of layers (?). So for example: RUN apt-get update && apt-get install -y python-qt4 python-pyside python-pip python3-pip python3. Docker builds images automatically by reading the instructions from a Dockerfile- a text file that contains all commands, in order, needed to build a given image. A Dockerfile adheres to a specific format and set of instructions which you can find at Dockerfile reference.
In this tutorial, you will learn how to pull an official image from the Docker repository and customize it according to your own requirements. And then you can run and verify the custom docker image.
Creating custom docker image
So, in this example, you’ll be using an Alpine Linux image that does not include the Vim editor by default. You’ll modify this image to create a new docker image of Alpine Linux that includes Vim editor by default. It cannot get any simpler than this. can it?
Prerequisite
If you have not already, please install Docker on Ubuntu or whichever Linux distribution you are using. Make sure to add yourself to the docker group so that you can run docker without sudo.
You’ll need active internet connection for downloading the base docker image.
Step 1: Get docker image [optional]
I chose Alpine Linux in this example because it is really small. The Alpine docker image is hardly 5 MB in size, can you believe it? It’s the perfect Linux distribution for containerization.
This step is optional. I included it show that you can compare it with the customized docker image.
Pull the latest docker image of Alpine Linux using docker pull command:
Step 2: Create Dockerfile with the needed customization
Now let’s create a new empty file named Dockerfile using touch command.
Now you need to edit this file and these three lines to it and save it. You can use an edition like Vim or Nano or use cat command to add these lines to the Dockerfile.
What you are doing here is to create a new docker image by downloading the latest Alpine docker image from the Docker Hub.
Like apt, Alpine uses apk package manager. So the next two commands are basically telling Alpine linux to update the available package cache (apk update) and then install Vim (apk add vim).
As you can see, with RUN in Dockerfile, you can customize your base docker image by running specific commands.
Step 3: Create the custom docker image with Dockerfile
The command to build the custom image from the Dockerfile looks like this:
With the -t tag, you specify the name of your custom docker image.
Considering that your Dockerfile is in your current directory, you can create the new docker image of Alpine Linux with Vim installed like this:
Let’s see the available Docker images on the system now:
You can see that the base docker image which was hardly 5 MB in size is now 33 MB with Vim installed on it (and the package cache updated).
Now, let’s verify that your modified docker images has vim installed by running a container from it:
Once you are inside the container, you can verify that Vim is installed by checking its version:
Exit the container by typing exit in the terminal. Stop the container, remove the container and remove the docker images (if you want to) to free up disk space.
Congrats! You just learned how to create your very own customized docker image.
I know it’s not a very extensive tutorial and you may have complex need. But this tutorial works as the first step towards understanding docker image customization.
I highly recommend reading more on Dockerfile to know what other options you have available for customizing docker images.
If you have questions or suggestions, please leave a comment below.
Become a Member for FREE
Join the conversation.
Estimated reading time: 31 minutes
This document covers recommended best practices and methods for buildingefficient images.
Docker builds images automatically by reading the instructions from aDockerfile
-- a text file that contains all commands, in order, needed tobuild a given image. A Dockerfile
adheres to a specific format and set ofinstructions which you can find at Dockerfile reference.
A Docker image consists of read-only layers each of which represents aDockerfile instruction. The layers are stacked and each one is a delta of thechanges from the previous layer. Consider this Dockerfile
:
Each instruction creates one layer:
FROM
creates a layer from theubuntu:18.04
Docker image.COPY
adds files from your Docker client’s current directory.RUN
builds your application withmake
.CMD
specifies what command to run within the container.
When you run an image and generate a container, you add a new writable layer(the “container layer”) on top of the underlying layers. All changes made tothe running container, such as writing new files, modifying existing files, anddeleting files, are written to this thin writable container layer.
For more on image layers (and how Docker builds and stores images), seeAbout storage drivers.
General guidelines and recommendations
Create ephemeral containers
The image defined by your Dockerfile
should generate containers that are asephemeral as possible. By “ephemeral”, we mean that the container can be stoppedand destroyed, then rebuilt and replaced with an absolute minimum set up andconfiguration.
Refer to Processes under The Twelve-factor Appmethodology to get a feel for the motivations of running containers in such astateless fashion.
Understand build context
When you issue a docker build
command, the current working directory is calledthe build context. By default, the Dockerfile is assumed to be located here,but you can specify a different location with the file flag (-f
). Regardlessof where the Dockerfile
actually lives, all recursive contents of files anddirectories in the current directory are sent to the Docker daemon as the buildcontext.
Build context example
Create a directory for the build context and cd
into it. Write “hello” intoa text file named hello
and create a Dockerfile that runs cat
on it. Buildthe image from within the build context (.
):
Move Dockerfile
and hello
into separate directories and build a secondversion of the image (without relying on cache from the last build). Use -f
to point to the Dockerfile and specify the directory of the build context:
Inadvertently including files that are not necessary for building an imageresults in a larger build context and larger image size. This can increase thetime to build the image, time to pull and push it, and the container runtimesize. To see how big your build context is, look for a message like this whenbuilding your Dockerfile
:
Pipe Dockerfile through stdin
Docker has the ability to build images by piping Dockerfile
through stdin
with a local or remote build context. Piping a Dockerfile
through stdin
can be useful to perform one-off builds without writing a Dockerfile to disk,or in situations where the Dockerfile
is generated, and should not persistafterwards.
The examples in this section use here documentsfor convenience, but any method to provide the Dockerfile
on stdin
can beused.
For example, the following commands are equivalent:
You can substitute the examples with your preferred approach, or the approachthat best fits your use-case.
Build an image using a Dockerfile from stdin, without sending build context
Use this syntax to build an image using a Dockerfile
from stdin
, withoutsending additional files as build context. The hyphen (-
) takes the positionof the PATH
, and instructs Docker to read the build context (which onlycontains a Dockerfile
) from stdin
instead of a directory:
The following example builds an image using a Dockerfile
that is passed throughstdin
. No files are sent as build context to the daemon.
Omitting the build context can be useful in situations where your Dockerfile
does not require files to be copied into the image, and improves the build-speed,as no files are sent to the daemon.
If you want to improve the build-speed by excluding some files from the build-context, refer to exclude with .dockerignore.
Note: Attempting to build a Dockerfile that uses COPY
or ADD
will failif this syntax is used. The following example illustrates this:
Build from a local build context, using a Dockerfile from stdin
Use this syntax to build an image using files on your local filesystem, but usinga Dockerfile
from stdin
. The syntax uses the -f
(or --file
) option tospecify the Dockerfile
to use, using a hyphen (-
) as filename to instructDocker to read the Dockerfile
from stdin
:
The example below uses the current directory (.
) as the build context, and buildsan image using a Dockerfile
that is passed through stdin
using a heredocument.
Build from a remote build context, using a Dockerfile from stdin
Use this syntax to build an image using files from a remote git
repository, using a Dockerfile
from stdin
. The syntax uses the -f
(or --file
) option tospecify the Dockerfile
to use, using a hyphen (-
) as filename to instructDocker to read the Dockerfile
from stdin
:
This syntax can be useful in situations where you want to build an image from arepository that does not contain a Dockerfile
, or if you want to build with a customDockerfile
, without maintaining your own fork of the repository.
The example below builds an image using a Dockerfile
from stdin
, and addsthe hello.c
file from the “hello-world” Git repository on GitHub.
Under the hood
When building an image using a remote Git repository as build context, Docker performs a git clone
of the repository on the local machine, and sendsthose files as build context to the daemon. This feature requires git
to beinstalled on the host where you run the docker build
command.
Exclude with .dockerignore
To exclude files not relevant to the build (without restructuring your sourcerepository) use a .dockerignore
file. This file supports exclusion patternssimilar to .gitignore
files. For information on creating one, see the.dockerignore file.
Use multi-stage builds
Multi-stage builds allow you to drastically reduce thesize of your final image, without struggling to reduce the number of intermediatelayers and files.
Because an image is built during the final stage of the build process, you canminimize image layers by leveraging build cache.
For example, if your build contains several layers, you can order them from theless frequently changed (to ensure the build cache is reusable) to the morefrequently changed:
Install tools you need to build your application
Install or update library dependencies
Generate your application
A Dockerfile for a Go application could look like:
Don’t install unnecessary packages
To reduce complexity, dependencies, file sizes, and build times, avoidinstalling extra or unnecessary packages just because they might be “nice tohave.” For example, you don’t need to include a text editor in a database image.
Decouple applications
Each container should have only one concern. Decoupling applications intomultiple containers makes it easier to scale horizontally and reuse containers.For instance, a web application stack might consist of three separatecontainers, each with its own unique image, to manage the web application,database, and an in-memory cache in a decoupled manner.
Limiting each container to one process is a good rule of thumb, but it is not ahard and fast rule. For example, not only can containers bespawned with an init process,some programs might spawn additional processes of their own accord. Forinstance, Celery can spawn multiple workerprocesses, and Apache can create one process perrequest.
Use your best judgment to keep containers as clean and modular as possible. Ifcontainers depend on each other, you can use Docker container networksto ensure that these containers can communicate.
Minimize the number of layers
In older versions of Docker, it was important that you minimized the number oflayers in your images to ensure they were performant. The following featureswere added to reduce this limitation:
Only the instructions
RUN
,COPY
,ADD
create layers. Other instructionscreate temporary intermediate images, and do not increase the size of the build.Where possible, use multi-stage builds, and only copythe artifacts you need into the final image. This allows you to include toolsand debug information in your intermediate build stages without increasing thesize of the final image.
Sort multi-line arguments
Whenever possible, ease later changes by sorting multi-line argumentsalphanumerically. This helps to avoid duplication of packages and make thelist much easier to update. This also makes PRs a lot easier to read andreview. Adding a space before a backslash () helps as well.
Here’s an example from the buildpack-deps
image:
Leverage build cache
When building an image, Docker steps through the instructions in yourDockerfile
, executing each in the order specified. As each instruction isexamined, Docker looks for an existing image in its cache that it can reuse,rather than creating a new (duplicate) image.
If you do not want to use the cache at all, you can use the --no-cache=true
option on the docker build
command. However, if you do let Docker use itscache, it is important to understand when it can, and cannot, find a matchingimage. The basic rules that Docker follows are outlined below:
Starting with a parent image that is already in the cache, the nextinstruction is compared against all child images derived from that baseimage to see if one of them was built using the exact same instruction. Ifnot, the cache is invalidated.
In most cases, simply comparing the instruction in the
Dockerfile
with oneof the child images is sufficient. However, certain instructions require moreexamination and explanation.For the
ADD
andCOPY
instructions, the contents of the file(s)in the image are examined and a checksum is calculated for each file.The last-modified and last-accessed times of the file(s) are not considered inthese checksums. During the cache lookup, the checksum is compared against thechecksum in the existing images. If anything has changed in the file(s), suchas the contents and metadata, then the cache is invalidated.Aside from the
ADD
andCOPY
commands, cache checking does not look at thefiles in the container to determine a cache match. For example, when processingaRUN apt-get -y update
command the files updated in the containerare not examined to determine if a cache hit exists. In that case justthe command string itself is used to find a match.
Once the cache is invalidated, all subsequent Dockerfile
commands generate newimages and the cache is not used.
Dockerfile instructions
These recommendations are designed to help you create an efficient andmaintainable Dockerfile
.
FROM
Whenever possible, use current official images as the basis for yourimages. We recommend the Alpine image as itis tightly controlled and small in size (currently under 5 MB), while stillbeing a full Linux distribution.
LABEL
You can add labels to your image to help organize images by project, recordlicensing information, to aid in automation, or for other reasons. For eachlabel, add a line beginning with LABEL
and with one or more key-value pairs.The following examples show the different acceptable formats. Explanatory comments are included inline.
Strings with spaces must be quoted or the spaces must be escaped. Innerquote characters ('
), must also be escaped.
An image can have more than one label. Prior to Docker 1.10, it was recommendedto combine all labels into a single LABEL
instruction, to prevent extra layersfrom being created. This is no longer necessary, but combining labels is stillsupported.
The above can also be written as:
See Understanding object labelsfor guidelines about acceptable label keys and values. For information aboutquerying labels, refer to the items related to filtering inManaging labels on objects.See also LABEL in the Dockerfile reference.
RUN
Split long or complex RUN
statements on multiple lines separated withbackslashes to make your Dockerfile
more readable, understandable, andmaintainable.
apt-get
Probably the most common use-case for RUN
is an application of apt-get
.Because it installs packages, the RUN apt-get
command has several gotchas tolook out for.
Always combine RUN apt-get update
with apt-get install
in the same RUN
statement. For example:
Using apt-get update
alone in a RUN
statement causes caching issues andsubsequent apt-get install
instructions fail. For example, say you have aDockerfile:
After building the image, all layers are in the Docker cache. Suppose you latermodify apt-get install
by adding extra package:
Docker sees the initial and modified instructions as identical and reuses thecache from previous steps. As a result the apt-get update
is not executedbecause the build uses the cached version. Because the apt-get update
is notrun, your build can potentially get an outdated version of the curl
andnginx
packages.
Using RUN apt-get update && apt-get install -y
ensures your Dockerfileinstalls the latest package versions with no further coding or manualintervention. This technique is known as “cache busting”. You can also achievecache-busting by specifying a package version. This is known as version pinning,for example:
Version pinning forces the build to retrieve a particular version regardless ofwhat’s in the cache. This technique can also reduce failures due to unanticipated changesin required packages.
Below is a well-formed RUN
instruction that demonstrates all the apt-get
recommendations.
The s3cmd
argument specifies a version 1.1.*
. If the image previouslyused an older version, specifying the new one causes a cache bust of apt-getupdate
and ensures the installation of the new version. Listing packages oneach line can also prevent mistakes in package duplication.
In addition, when you clean up the apt cache by removing /var/lib/apt/lists
itreduces the image size, since the apt cache is not stored in a layer. Since theRUN
statement starts with apt-get update
, the package cache is alwaysrefreshed prior to apt-get install
.
Official Debian and Ubuntu images automatically run apt-get clean
,so explicit invocation is not required.
Using pipes
Some RUN
commands depend on the ability to pipe the output of one command into another, using the pipe character (|
), as in the following example:
Docker executes these commands using the /bin/sh -c
interpreter, which onlyevaluates the exit code of the last operation in the pipe to determine success.In the example above this build step succeeds and produces a new image so longas the wc -l
command succeeds, even if the wget
command fails.
If you want the command to fail due to an error at any stage in the pipe,prepend set -o pipefail &&
to ensure that an unexpected error prevents thebuild from inadvertently succeeding. For example:
Not all shells support the -o pipefail
option.
In cases such as the dash
shell onDebian-based images, consider using the exec form of RUN
to explicitlychoose a shell that does support the pipefail
option. For example:
CMD
The CMD
instruction should be used to run the software contained in yourimage, along with any arguments. CMD
should almost always be used in the formof CMD ['executable', 'param1', 'param2'…]
. Thus, if the image is for aservice, such as Apache and Rails, you would run something like CMD['apache2','-DFOREGROUND']
. Indeed, this form of the instruction is recommendedfor any service-based image.
In most other cases, CMD
should be given an interactive shell, such as bash,python and perl. For example, CMD ['perl', '-de0']
, CMD ['python']
, or CMD['php', '-a']
. Using this form means that when you execute something likedocker run -it python
, you’ll get dropped into a usable shell, ready to go.CMD
should rarely be used in the manner of CMD ['param', 'param']
inconjunction with ENTRYPOINT
, unlessyou and your expected users are already quite familiar with how ENTRYPOINT
works.
EXPOSE
The EXPOSE
instruction indicates the ports on which a container listensfor connections. Consequently, you should use the common, traditional port foryour application. For example, an image containing the Apache web server woulduse EXPOSE 80
, while an image containing MongoDB would use EXPOSE 27017
andso on.
For external access, your users can execute docker run
with a flag indicatinghow to map the specified port to the port of their choice.For container linking, Docker provides environment variables for the path fromthe recipient container back to the source (ie, MYSQL_PORT_3306_TCP
).
ENV
To make new software easier to run, you can use ENV
to update thePATH
environment variable for the software your container installs. Forexample, ENV PATH=/usr/local/nginx/bin:$PATH
ensures that CMD ['nginx']
just works.
The ENV
instruction is also useful for providing required environmentvariables specific to services you wish to containerize, such as Postgres’sPGDATA
.
Lastly, ENV
can also be used to set commonly used version numbers so thatversion bumps are easier to maintain, as seen in the following example:
Similar to having constant variables in a program (as opposed to hard-codingvalues), this approach lets you change a single ENV
instruction toauto-magically bump the version of the software in your container.
Each ENV
line creates a new intermediate layer, just like RUN
commands. Thismeans that even if you unset the environment variable in a future layer, itstill persists in this layer and its value can be dumped. You can test this bycreating a Dockerfile like the following, and then building it.
To prevent this, and really unset the environment variable, use a RUN
commandwith shell commands, to set, use, and unset the variable all in a single layer.You can separate your commands with ;
or &&
. If you use the second method,and one of the commands fails, the docker build
also fails. This is usually agood idea. Using as a line continuation character for Linux Dockerfilesimproves readability. You could also put all of the commands into a shell scriptand have the
RUN
command just run that shell script.
ADD or COPY
Although ADD
and COPY
are functionally similar, generally speaking, COPY
is preferred. That’s because it’s more transparent than ADD
. COPY
onlysupports the basic copying of local files into the container, while ADD
hassome features (like local-only tar extraction and remote URL support) that arenot immediately obvious. Consequently, the best use for ADD
is local tar fileauto-extraction into the image, as in ADD rootfs.tar.xz /
.
If you have multiple Dockerfile
steps that use different files from yourcontext, COPY
them individually, rather than all at once. This ensures thateach step’s build cache is only invalidated (forcing the step to be re-run) ifthe specifically required files change.
For example:
Results in fewer cache invalidations for the RUN
step, than if you put theCOPY . /tmp/
before it.
Because image size matters, using ADD
to fetch packages from remote URLs isstrongly discouraged; you should use curl
or wget
instead. That way you candelete the files you no longer need after they’ve been extracted and you don’thave to add another layer in your image. For example, you should avoid doingthings like:
And instead, do something like:
For other items (files, directories) that do not require ADD
’s tarauto-extraction capability, you should always use COPY
.
ENTRYPOINT
The best use for ENTRYPOINT
is to set the image’s main command, allowing thatimage to be run as though it was that command (and then use CMD
as thedefault flags).
Let’s start with an example of an image for the command line tool s3cmd
:
Now the image can be run like this to show the command’s help:
Or using the right parameters to execute a command:
This is useful because the image name can double as a reference to the binary asshown in the command above.
The ENTRYPOINT
instruction can also be used in combination with a helperscript, allowing it to function in a similar way to the command above, evenwhen starting the tool may require more than one step.
For example, the Postgres Official Imageuses the following script as its ENTRYPOINT
:
Configure app as PID 1
This script uses the exec
Bash commandso that the final running application becomes the container’s PID 1. Thisallows the application to receive any Unix signals sent to the container.For more, see the ENTRYPOINT
reference.
The helper script is copied into the container and run via ENTRYPOINT
oncontainer start:
This script allows the user to interact with Postgres in several ways.
It can simply start Postgres:
Or, it can be used to run Postgres and pass parameters to the server:
Lastly, it could also be used to start a totally different tool, such as Bash:
VOLUME
The VOLUME
instruction should be used to expose any database storage area,configuration storage, or files/folders created by your docker container. Youare strongly encouraged to use VOLUME
for any mutable and/or user-serviceableparts of your image.
USER
If a service can run without privileges, use USER
to change to a non-rootuser. Start by creating the user and group in the Dockerfile
with somethinglike RUN groupadd -r postgres && useradd --no-log-init -r -g postgres postgres
.
Consider an explicit UID/GID
Users and groups in an image are assigned a non-deterministic UID/GID in thatthe “next” UID/GID is assigned regardless of image rebuilds. So, if it’scritical, you should assign an explicit UID/GID.
Due to an unresolved bug in theGo archive/tar package’s handling of sparse files, attempting to create a userwith a significantly large UID inside a Docker container can lead to diskexhaustion because /var/log/faillog
in the container layer is filled withNULL (0) characters. A workaround is to pass the --no-log-init
flag touseradd. The Debian/Ubuntu adduser
wrapper does not support this flag.
Avoid installing or using sudo
as it has unpredictable TTY andsignal-forwarding behavior that can cause problems. If you absolutely needfunctionality similar to sudo
, such as initializing the daemon as root
butrunning it as non-root
, consider using “gosu”.
Lastly, to reduce layers and complexity, avoid switching USER
back and forthfrequently.
Dockerfile To Install Docker On Centos
WORKDIR
For clarity and reliability, you should always use absolute paths for yourWORKDIR
. Also, you should use WORKDIR
instead of proliferating instructionslike RUN cd … && do-something
, which are hard to read, troubleshoot, andmaintain.
ONBUILD
An ONBUILD
command executes after the current Dockerfile
build completes.ONBUILD
executes in any child image derived FROM
the current image. Thinkof the ONBUILD
command as an instruction the parent Dockerfile
givesto the child Dockerfile
.
A Docker build executes ONBUILD
commands before any command in a childDockerfile
.
ONBUILD
is useful for images that are going to be built FROM
a givenimage. For example, you would use ONBUILD
for a language stack image thatbuilds arbitrary user software written in that language within theDockerfile
, as you can see in Ruby’s ONBUILD
variants.
Images built with ONBUILD
should get a separate tag, for example:ruby:1.9-onbuild
or ruby:2.0-onbuild
.
Dockerfile Install Docker Client
Be careful when putting ADD
or COPY
in ONBUILD
. The “onbuild” imagefails catastrophically if the new build’s context is missing the resource beingadded. Adding a separate tag, as recommended above, helps mitigate this byallowing the Dockerfile
author to make a choice.
Examples for Official Images
These Official Images have exemplary Dockerfile
s: