Hi Podman team,
I came across an unexpected systemd warning when running inside a container
- I emailed systemd-devel (this email summarises the thread, which you can
find at
https://lists.freedesktop.org/archives/systemd-devel/2023-January/048723....)
and Lennart suggested emailing here. Any thoughts would be great!
There are two different warnings seen in different scenarios, both cgroups
related, and I believe related to each other given they both satisfy the
points below.
The first warning is seen after 'podman restart $CTR', coming from
https://github.com/systemd/systemd/blob/v245/src/shared/cgroup-setup.c#L279:
Failed to attach 1 to compat systemd cgroup
/machine.slice/libpod-5e4ab2a36681c092f4ef937cf03b25a8d3d7b2fa530559bf4dac4079c84d0313.scope/init.scope:
No such file or directory
The second warning is seen on every boot when using '--cgroupns=private',
coming from
https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2967:
Couldn't move remaining userspace processes, ignoring: Input/output error
Failed to create compat systemd cgroup /system.slice: No such file or
directory
...
Both warnings are seen together when restarting a container using private
cgroup namespace.
To summarise:
- The warnings are seen when running the container on a Centos 8 host, but
not on an Ubuntu 20.04 host
- It is assumed this issue is specific to cgroups v1, based on the warning
messages
- Disabling SELinux on the host with 'setenforce 0' makes no difference
- Seen with systemd v245 but not with v230
- Seen with '--privileged' and in non-privileged with '--cap-add
sys_admin'
- Changing the cgroup driver/manager doesn't seem to have any effect
- The same is seen with docker except when running privileged the first
warning becomes a fatal error after hitting "Failed to open pin file: No
such file or directory" (coming from
https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2972) and
the container exits (however docker doesn't claim to support systemd)
Some extra details copied from the systemd email thread:
- On first boot PID 1 can be found in
/sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/init.scope/cgroup.procs,
whereas when the container restarts the 'init.scope/' directory does not
exist and PID 1 is instead found in the parent (container root) cgroup
/sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/cgroup.procs
(also reflected by /proc/1/cgroup). This is strange because systemd must be
the one to create this cgroup dir in the initial boot, so I'm not sure why
it wouldn't on subsequent boot.
- I confirmed that the container has permissions to create the dir by
executing a 'mkdir' in
/sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/ inside the
container after the restart, so I have no idea why systemd is not creating
the 'init.scope/' dir. I notice that inside the container's systemd cgroup
mount 'system.slice/' does exist, but 'user.slice/' also does not (both
exist on normal boot).
This should be reproducible using the following:
cat << EOF > Dockerfile
FROM ubuntu:20.04
RUN apt-get update -y && apt-get install systemd -y && ln -s
/lib/systemd/systemd /sbin/init
ENTRYPOINT ["/sbin/init"]
EOF
podman build . --tag ubuntu-systemd
podman run -it --name ubuntu --privileged --cgroupns private ubuntu-systemd
podman restart ubuntu
Thanks,
Lewis