Container Basics — Container Networking
Container Networking
From the evolutionary history of cloud computing systems, the industry has reached a consensus that, while constant breakthroughs have been made to drive the maturation of computing virtualization and storage virtualization, network virtualization has lagged behind, becoming a major bottleneck that encumbers the fast growth of cloud computing. Such features as network virtualization, multitenancy, and hybrid clouds are posing brand new challenges of varying degrees to the security development of cloud networks.
The container technology provides lightweight virtualization capabilities, significantly reducing resource usage of instances and enhancing the performance of distributed computing systems. However, networking of distributed container systems remains a complicated link.
This part dwells upon container networking, which is divided into host networking and cluster networking.
- Underlying Technologies of Container Networking
Container networks take various forms, but most use a fixed set of underlying technologies, including network namespaces, Linux bridges, and virtual Ethernet interface pairs (veth pairs).
1.1 Network Namespace
The network namespace technology is a technology for network isolation. A network namespace, after being created, has an independent network environment with such network resources as network interfaces, routes, and access control rules (iptables). The network of this namespace is isolated from other networks.
1.2 Linux Bridge
A Linux bridge is a virtual bridge in the Linux system. It connects network interfaces of different hosts to enable communication between hosts.
After Docker is started, a Linux bridge with the name of docker0 is created by default. If no container has been built, the docker0 bridge has no interface connection.
During creation of a container (d458f9bd528), Docker creates a virtual network interface for it and connects it to the docker0 bridge.
1.3 Veth Pair
For communication with the host network and with the external network, the container needs to connect to a Linux bridge through a veth pair.
When launching a container (d458f9bd528), Docker creates a veth pair, namely two interconnected virtual Ethernet interfaces. In this veth pair, one interface connects to the container and thus becomes its network interface card (NIC) eth0; the other connects to the docker0 bridge. As such, packets within the container will first go through the veth pair of eth0 and veth6ed0a8c successively before reaching the docker0 bridge. In this manner, different containers on the same subnet can communicate with one another.
Run the following commands to obtain the index (peer-ifindex) of NIC eth0 of the container:
Continue to run the following commands to find the name of the interface, whose peer_ifindex is 37, on the host:
Run ethtool to find the index of the peer interface:
- Host Networking
Take Docker for example. Currently, the Docker container host network comes in one of the following modes.
- None Mode
In None networking mode, each container has its own network namespace, but has no network configuration. A container on such a network has only a loopback interface. The user has to add NICs and configure an IP address for the container.
Rkt also supports the None networking mode, which is mainly used for container testing and then allocation of networks for containers as well as in scenarios where a high level of security is required and network connections are unnecessary.
- Bridge Mode
The bridge mode features a single-host network achieved by using iptables for NAT and port mapping. Similar to NAT networking of virtual machines, this mode allows containers on the same host to communicate with one another, but does not allow external access to the IP address assigned to each container.
Bridge networking is the default type used by Docker. After being installed, Docker creates the docker0 bridge by default and uses veth pairs to connect to containers and this bridge respectively. In this manner, all containers on the host are located on a layer 2 network.
Figure 2.5 Bridge networking
- Host Mode
In host mode, the Docker service, when launching a container, does not create an isolated network environment for it, but adds it to the network of the host to share the host’s network namespace (/var/run/docker/netns/default). The container shares the same network configurations (network address, routing table, iptables, and so on) as the host, and communicates with the external network via the host’s NIC and IP address.
The container does not have an independent network namespace, and the port of the container service is directly exposed on the host, omitting the step of port mapping. Therefore, the port number of the container service cannot conflict with those already in use on the host.
A container created in this networking mode has access to all network interfaces of the host, but may not reconfigure the host’s network stack unless deployed in privilege mode. Host networking is the default type used within Apache Mesos. In other words, if no network type is specified, a new network namespace will not be associated with the container, but with the host network.
Figure 2.6 Host networking
- Container Mode
The container mode is a special networking type which features a namespace shared by a newly created container and an existing container. This new container does not configure its own NIC or IP address, but shares the IP address and port range with a specified container.
The two containers share data in network configurations only, with isolated file systems and process lists, among others. Their processes communicate with each other through the loopback NIC.
Figure 2.7 Container networking
- Cluster Networking
This section takes Docker Swarm and Kubernetes as examples to illustrate how container cluster networking is implemented.
- Docker Swarm
Docker 1.19 and later add a native support for overlay networking (Docker Swarm), which uses networking tunnels for communication between hosts. This allows containers on different hosts of the same overlay network to communicate across hosts. However, containers on different overlay networks cannot communicate with one another even if they are on the same host. This concept is in unison with the universally recognized overlay network in cloud computing. Overlay networking of Docker Swarm is implemented by using the virtual extensible local area network (VXLAN) technology, which requires a Linux kernel of version 3.19 or later, and takes the following forms:
(1) Docker_gwbridge network
The docker_gwbridge network is a bridge network created when the Docker Swarm cluster is initialized. It is available on each host node in a cluster to enable communication between containers and hosts on which containers are running. To check the docker_gwbridge network, run the following commands:
As the _icc configuration item is false, containers connecting to the docker_gwbridge bridge cannot communicate with one another when passing through interfaces on docker_gwbridge.
(2) Ingress network
The ingress network is created when the Docker Swarm cluster is initialized. It is available on each host node in a cluster to expose services to the external network and provide the routing mesh. For how an ingress network exposes services to the external network, see section 4.5.1.
(3) Custom overlay network
The custom overlay network is created after the Docker Swarm cluster is initialized and before services are created. It is mainly used for communication between containers on the same overlay network.
Figure 2.8 Container networking model for a service
In Figure 2.8, the three containers provide the same service. In Docker Swarm mode, these containers share the overlay network defined by the user. test is the custom overlay network. Containers on different hosts all connect to docker_gwbridge, whose interfaces, however, cannot be used for communication between containers on the same host. Communication between containers of the same service is achieved through the test overlay network.
- Kubernetes
Kubernetes features the design of pod objects, which correspond to logical hosts in a particular application. Each service is split by task and related processes are packaged into corresponding pods. A pod consists of one or more containers, which run on the same host and share the same network namespace and Linux protocol stack.
A Kubernetes cluster usually involves the following types of communication:
- Communication between containers in the same pod
- Communication between pods on the same host
- Communication between pods across hosts
Figure 2.9 Container networking model within a pod
Communication between containers in the same pod is the most simple as these containers share the same network namespace and Linux protocol stack. As if running on the same machine, these containers directly use the local inter-process communication (IPC) mechanism of Linux for communication and their access to one another requires only localhost + port number.
Container networking model for different pods on the same host
Different pods on the same host connect to the same bridge (docker0) through veth and each pod dynamically obtains from docker0 an IP address, which is on the same segment as the IP address of docker0. The default routes of these pods lead to the IP address of docker0. All non-local network data is transmitted to docker0 by default for forwarding. This is equivalent to setup of a local layer 2 network.
Container networking model for different pods across hosts
Communication between pods across hosts is relatively complicated. The IP address of each pod is on the same segment as that of docker0 on the host that the pod is running. However, docker0 and the physical network of the host are on different segments. Then how to ensure network data within a pod passes through the physical network to find the physical host address of the peer pod and reach the peer pod becomes key to successful communication.
For this reason, on a Kubernetes network, a module for global network address planning is required so that Pod1 can obtain the host IP address of Pod2 when sending data to the latter. The data sent by Pod1 goes through the docker0 bridge of Host1 to the physical NIC eth0 of Host1, and then, via a physical network, to the physical NICs eth0 and docker0 of Host2 before reaching Pod2.
In most private container cloud environments, using a third-party open-source network plug-in, such as Flannel[i] or Calico[ii], to implement a cluster network is a common practice. Flannel is an overlay networking tool designed by the CoreOS team for Kubernetes to help each host that uses Kubernetes to own an integrated subnet.
Figure 2.12 Kubernetes Flannel network architecture
(To be continued)
[i] CoreOS Flannel, https://github.com/coreos/flannel
[ii] Calico, https://github.com/projectcalico