Container Security Protection – Runtime Security
Runtime Security
- Security Configuration for Container Launch
A container runs on the host as a process. Running container processes are isolated from one another. Each has its own file system, networking, and isolated process tree separate from the host. The following sections detail how to use the docker run[1] command to define a container’s resources at runtime.
1.1 Kernel Option Configuration
(1) Enable AppArmor.
AppArmor protects the Linux system and applications from various threats by executing security policies, which are reflected in AppArmor profiles (for details, see section 4.1.4.2). Users can create their own AppArmor profiles for containers or use Docker’s default AppArmor profile.
# docker run –security-opt=”apparmor=$PROFILE” |
(2) Set SELinux.
SELinux provides an effective Linux access control mechanism. When starting the Docker service, users are advised to configure SELinux. When running a container, users are also advised to use this security option. Creating or importing an SELinux policy template can effectively secure the Docker container.
# docker run –security-opt lable=level:TopSecret |
(3) Restrict the use of kernel capabilities.
The Linux kernel capabilities mechanism supports grant of only necessary kernel capabilities within a Docker container. With unnecessary capabilities removed, the attack surface is narrowed, which, in turn, enhances the security of containers. For example, NET_ADMIN, SYS_ADMIN, and SYS_MODULE capabilities can be removed as they are usually not required for containers.
Users can run the following commands to add or delete capabilities as required:
# docker run –cap-add={“NET_ADMIN”,“SYS_ADMIN”}
# docker run –cap-drop={“NET_ADMIN”,“SYS_ADMIN”} |
A recommended practice is to delete all capabilities first by using the following command and then add those that are required:
# docker run –cap-drop=all –cap-add={“NET_ADMIN”,“SYS_ADMIN”} |
Besides, particular attention should be paid to the –privileged option, which provides all Linux kernel capabilities for a container. At the same time, it lifts all bans on cgroups devices’ execution permissions, which means that the container is allowed almost all the same access to the host as processes running outside containers on the host. Therefore, unless absolutely necessary, users are advised not to set a container to privileged mode. A process can set the no_new_privs bit in the Linux kernel, which ensures that the process or its child processes do not gain any additional privileges through suid or sgid bits.
# docker run –security-opt=”no-new-privileges:true” |
(4) Do not disable the default seccomp profile.
Secure computing mode (seccomp) is a security feature of the Linux kernel since version 2.6.23. On the Linux system, most system calls are directly exposed to user mode programs. But, not all system calls are required. Besides, insecure code’s abuse of system calls would impose a security threat to the system. seccomp restricts a program’s use of certain system calls, thereby reducing the system’s exposure to threats and at the same time putting the program into “secure” mode.
By default, Docker enables the seccomp profile. Since Docker 1.10, the default profile prevents some of the system calls even if related capabilities have been added for a container by using the –cap-add command. In this case, users can use a custom seccomp profile to gain privileges for kernel system calls based on the whitelist mechanism.
# docker run –security-opt=”seccomp=profile.json” |
(5) Set ulimits when necessary.
Generally, the Docker daemon uses default configuration for container resource allocation. If a container requires custom configuration in terms of resource usage, the –ulimit option can be set to overwrite the default configuration when the container is launched.
# docker run –ulimit nofile=1024:1024 |
(6) Do not share kernel namespaces.
Linux implements isolation through kernel namespaces. For example, it uses the process ID (PID) namespace to isolate processes. That is to say, processes in different namespaces can have the same PID. If a container has the same PID namespace as the host, it is possible to view or even kill all processes on the host from within the container.
This is true for other types of namespaces. If containers share the namespace with the host, the isolation of containers falls void. Therefore, when launching a container, users should be cautious about the use of namespace-related parameters, such as docker run –pid=host (uses the PID namespace of the host), docker run –ipc=host (uses the IPC namespace of the host), and docker run –uts=host (uses the UTS namespace of the host).
1.2 Resource Management and Control
To prevent DoS attacks caused by system resource exhaustion, Docker can use specific parameters to limit containers’ usage of resources, such as the CPU, memory, or disk usage, thereby achieving monopoly or control of resources.
(1) Limit containers’ memory usage.
By default, each container on the Docker host can use the host’s available memory resources in an unlimited manner. Users can limit a container’s memory usage to prevent the container from exhausting host resources, thus leading to a denial of service, which may further affect normal running of other containers on the same host.
Specifically, the -m or –memory parameter can be used with the docker run command:
# docker run –memory 256 |
(2) Limit containers’ CPU usage.
By default, Docker uses time to allocate the CPU resource evenly between all containers on the same host. Users can use the –cpu-shares option to set the priority. The CPU share mechanism allows a container to take precedence of another for CPU usage and forbids containers with a lower priority level to frequently use the CPU resource. This guarantees the proper running of higher-priority containers and effectively prevents CPU resource exhaustion.
For example, the -c or –cpu-shares parameter can be used with the docker run command:
# docker run –cpu-shares 512 |
By default, each newly created container has a CPU share of 1024, that is, 100% of the CPU resource. In the preceding example, the CPU share for the container is set to 512, indicating that this new container will be executed with 50% of the CPU share allocated to other containers.
(3) Limit containers’ storage usage.
If data isolation is not achieved between containers or between the host and containers on the host, attackers can easily obtain important data, causing security risks. A better solution is to adopt the copy-on-write (CoW) feature, allowing all running containers to share a base file system. File systems in containers interact with the base file system, thus avoiding risks arising from data shared between containers and at the same time achieving data isolation.
Moreover, strict control should be exercised over containers’ mounted directories. It should be prohibited to use a sensitive directory on the host, such as /dev, /boot, /etc, or /sys, as the container volume for mounting. In addition, users should be especially cautious when sharing host devices to containers through –device. This is because, be default, containers have read/write access to these devices, which makes it possible to delete devices from the host. Therefore, it is not advisable to directly share host devices to containers. If this has to be done, permissions should be properly restricted in this regard.
1.3 Network Configuration
(1) Do not use the host networking mode on Docker hosts in production environments.
The default bridge networking mode requires setup of its own network protocol stack by using the virtualization technology, while the host networking mode uses the host’s local network stack, thus delivering better network performance. Therefore, when high network performance is required, containers may run in host mode in some scenarios.
But what deserves our attention is that containers, when using the host’s network stack, can “see” all interfaces of the host, thus providing containers with full access to local system services, such as D-Bus. For this reason, this mode is considered insecure and not recommended for use in production environments.
(2) Do not use the default bridge docker0 of Docker.
In the default bridge mode, all containers on the Docker host are connected to one another via the docker0 bridge. This mode brings in the risks of ARP spoofing, sniffing, and broadcast storm attacks on the local area network (LAN), thus threatening the security of containers. Therefore, users are advised not to use the default docker0 bridge when running containers.
It is recommended that a custom network be used to interconnect containers. For example, users can run the following commands to set up their own network on the Docker host:
# docker network create –subnet 102.102.0.0/24 test
# docker run –network test … |
Alternatively, a container cluster management platform, such as Docker Swarm or Kubernetes, can be leveraged to create an overlay network of clusters.
# docker network create -d overlay overlaynetwork-test
# docker run –network overlaynetwork-test … |
(3) Map only necessary ports.
There are some things to be noted regarding port usage. On the one hand, by default, common users should not use privileged ports with numbers below 1024 on Linux. Privileged ports are usually used to receive and send various types of sensitive and privileged data. If container ports are mapped onto privileged ones on the host, a severe consequence may ensue.
Of course, it is necessary to bind the HTTP service to port 80 and the HTTPS service to port 443, which is not within the scope of our discussion here. Such privileged ports include ports 22 (SSH), 21 (FTP), 25 (SMTP), and 110 (POP3 email service).
On the other hand, Dockerfiles of container images must be strictly audited to ensure that only necessary ports are exposed.
(4) Do not run unnecessary services, such as SSH, in containers.
It is not unusual that people, when using containers, tend to treat them like virtual machines. For example, they often consider how to enter a container for debugging and how to configure sshd. To properly use containers, we must have a mindset of containerization and look at containers as they really are. In particular, in the microservice system, containers, by nature, represent all kinds of dependencies necessary for one or more processes and the execution of processes, that is, the minimum set of runtime environments.
Execution of such services as SSH and Telnet in a container does not add any functions to the microservice. Instead, it brings in a series of security threats and at the same time complicates security O&M, which will involve, for example, the access policy and security compliance management of SSH, key and password management of containers, and security upgrade of the SSH service. Therefore, in production environments, SSH and other unnecessary services should be excluded from containers.
1.4 Other Configurations
(1) Configure a container restart policy.
If no container restart policy is configured, the restart service will continuously attempt to restart the container until it succeeds, which, in extreme situations, may crash the host.
When a container exits due to an error, the right thing to do is limit the number of container restart attempts and find the root cause instead of attempting to restart the container infinitely. For Docker, this number can be limited by setting the on-failure value. For the sake of security, it is recommended that this value be set to 5.
# docker run –restart=on-failure:5 redis |
(2) Do not run the container service as root.
The cgroups mechanism allows containers’ root users root access to the host when containers run as root. To prevent attacks launched by exploiting this flaw, a viable solution is to avoid allocating too many resources to a single container. Therefore, the container service should be launched by a common user rather than root.
(3) Check the health status of running containers.
If the HEALTHCHECK command is not specified in container images, users are advised to add the health-cmd parameter to check the health status of running containers. For example, the parameter can be set as follows:
# docker run –health-cmd = ‘stat /etc/passwd || exit 1’ |
(4) Restrict Linux kernel capabilities within containers.
Default Linux kernel capabilities on Docker include chown, dac_override, fowner, kill, setgid, setuid, setpcap, net_bind_service, net_raw, sys_chroot, mknod, setfcaph, and audit_write. Unnecessary ones can be removed by using the –cap-drop command and, when necessary, be added by using the –cap-add command.
- Runtime Security Monitoring and Audit
The monitoring and audit mechanism is a common approach to ensuring service security. Please read on to learn how to perform security monitoring and audits in the container environment.
2.1 Runtime Monitoring
Unlike conventional environments, the container environment requires monitoring and auditing at the levels of hosts, container instances, and images to ensure the stable running of containers.
(1) Host monitoring
Host monitoring in the container environment is the same as that in conventional computing environments. It generally covers the host’s operating system, CPU, memory, processes, file system, and network status. In O&M, host monitoring is nothing new and there are quite a few open-source tools available to implement it, such as Performance Co-Pilot[2], Icinga[3], and Munin[4].
(2) Container instance monitoring
Monitoring of container instances covers basic information of containers on the host (container ID, image name, time of creation, status, port information, and container name), and usage of various resources by containers, including the CPU, memory, block I/O, and network resources. The docker stats command can be used to view the preceding information:
# docker stats 9d83d3116584
CONTAINER ID NAME CPU% MEM USAGE/LIMIT MEM% NET I/O BLOCK I/O PIDS 9d83d3116584 dev_proxy_1 0.00% 14.86MiB / 125.9GiB 0.01% 286MB / 286MB 14.3MB / 8.19kB 25 |
[1] https://docs.docker.com/engine/reference/run/
[2] https://pcp.io/
[3] https://www.icinga.com/
[4] http://munin-monitoring.org/