Hi Podman Developers and Users,
Thank you very much for Podman and related tools. It's a fantastic project.
I'm trying to convert my current container host VPS into a number of rootless pods and
I'm thinking about the pods networking. Some pods will need to be able to communicate
with each other (for example HAProxy has to be able to connect both WordPress and
Nextcloud) and some don't (WordPress and Nextcloud don't need to talk to each
other). From security least privilege principle: pods that don't need to communicate
shouldn't be allowed to.
The obvious solution is to use default settings slirp4netns and listen (publish port) on
127.0.0.1 or maybe on a dedicated private IP created by "ip link add name something
type dummy". That means that for example WordPress will listen on 8080 and Nextcloud
on 8081 (more info in Brent Baude's article
https://www.redhat.com/sysadmin/container-networking-podman).
As Dan Walsh often mentions in his Podman presentations one of the best things about
Podman is that it's not just one tool - it's Podman/libpod, Buildah, Skopeo,
CRI-O, RunC and they all do one thing and do it well which enables me to try some DIY
networking.
DIY:
==========================================================
### create bridge using "ip"
$ sudo ip link add name bridge1 type bridge
$ sudo ip link set dev bridge1 up
$ sudo ip address add 10.11.22.1/24 dev bridge1
### or by "systemd-networkd"
$ sudo systemctl --now enable systemd-networkd
$ cat << EOF | sudo tee /etc/systemd/network/bridge1.netdev
[NetDev]
Name=bridge1
Kind=bridge
EOF
$ cat << EOF | sudo tee /etc/systemd/network/bridge1.network
[Match]
Name=bridge1
[Network]
Address=10.11.22.1/24
EOF
### run rootless container
$ sudo mkdir /test-www
$ echo "Hello, World!" | sudo tee /test-www/index.html
$ cont_id=$(podman run --net=none -d --volume=/test-www:/usr/share/nginx/html
docker://docker.io/library/nginx:latest)
$ [[ ${cont_id} =~ ^[0-9a-z]{64}$ ]] &&
printf '%s\n' "OK: \"${cont_id}\""
OK:
"5811ac2e25dec942fd22c2e83657d103bbce199aa7775d7f4d10bf5c53af4778"
$ net_ns_name="cont-${cont_id}"
$ cont_pc_id=$(podman inspect -f '{{.State.Pid}}' "${cont_id}")
$ [[ ! -d /var/run/netns ]] &&
sudo mkdir -v /var/run/netns
$ sudo ln -sfTv "/proc/${cont_pc_id}/ns/net"
"/var/run/netns/${net_ns_name}"
'/var/run/netns/cont-5811ac2e25dec942fd22c2e83657d103bbce199aa7775d7f4d10bf5c53af4778'
-> '/proc/1217/ns/net'
$ ip netns list
cont-5811ac2e25dec942fd22c2e83657d103bbce199aa7775d7f4d10bf5c53af4778
$ sudo ip link add veth300 type veth peer name veth300p
$ sudo ip link set dev veth300 master bridge1
$ sudo ip link set veth300p netns "${net_ns_name}"
$ sudo ip -netns "${net_ns_name}" link set veth300p name eth0 # optional:
rename peer in namespace
$ sudo ip link set dev veth300 up
$ sudo ip -netns "${net_ns_name}" link set dev eth0 up
$ sudo ip -netns "${net_ns_name}" address add 10.11.22.50/24 dev eth0
$ sudo ip -netns "${net_ns_name}" route add default via 10.11.22.1
### to make it work, the host has to have routing enabled
$ sudo sysctl -w net.ipv4.ip_forward=1
### and iptables/nftables configured
$ sudo nft add table ip nat
$ sudo nft add chain ip nat nat-prerouting "{ type nat hook prerouting priority -100;
policy accept; }"
$ sudo nft add chain ip nat nat-postrouting "{ type nat hook postrouting priority
100; policy accept; }"
$ sudo nft add rule ip nat nat-prerouting iifname "eth0" tcp dport { 80, 8080,
8081 } counter dnat 10.11.22.50
$ sudo nft add rule ip nat nat-postrouting oifname "eth0" counter masquerade
### and to test that the container can go out
$ podman exec -it "${cont_id}" curl
https://1.1.1.1/
<a lot of html>
### and to access the container (the web server)
$ curl http://<container host public IP>/
Hello, World!
==========================================================
For those that don't want to read the code:
1. create bridge
2. run container without slirp4netns (--net=none) => that means it has only localhost
3. create a network namespace for the container process
4. create virtual ethernet pair (VETH), move one interface into the new bridge and the
second into the new network namespace
5. make it work by assigning IP addresses, default route in the new namespace, enabling
routing on the host and NAT on the host firewall
Note: At this moment this is not possible for pods since pods in the current stable
version of Podman don't support --net=none. But that will change in 3.0:
https://github.com/containers/podman/issues/9165,
https://github.com/mheon/libpod/commit/6bd3a6bcabda682243f531bacf3659b95d...,
https://github.com/containers/podman/releases/tag/v3.0.0-rc3.
Thank you Matthew Heon!
The benefits I get by doing this:
1. Rootless containers, no need to run rootfull for this.
2. Easy to firewall - for example interfaces in one bridge can connect interfaces in
another bridge but not in the opposite way
3. Easy to understand and visualize
4. Can be integrated with VLANs, Open vSwitch VXLANs and anything that uses bridges (QEMU
VMs...)
Could you please tell me is this a good idea?
Thank you.
Kind regards,
Rudolf Vesely