Lewis Gaul <lewis.gaul(a)gmail.com> writes:
Hi Podman team,
I came across an unexpected systemd warning when running inside a container - I emailed
systemd-devel (this email summarises the thread, which
you can find at
https://lists.freedesktop.org/archives/systemd-devel/2023-January/048723....) and Lennart
suggested emailing here. Any thoughts
would be great!
There are two different warnings seen in different scenarios, both cgroups related, and I
believe related to each other given they both satisfy the
points below.
The first warning is seen after 'podman restart $CTR', coming from
https://github.com/systemd/systemd/blob/v245/src/shared/cgroup-setup.c#L279:
Failed to attach 1 to compat systemd cgroup
/machine.slice/libpod-5e4ab2a36681c092f4ef937cf03b25a8d3d7b2fa530559bf4dac4079c84d0313.scope/init.scope:
No such file or directory
The second warning is seen on every boot when using '--cgroupns=private', coming
from
https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2967:
Couldn't move remaining userspace processes, ignoring: Input/output error
Failed to create compat systemd cgroup /system.slice: No such file or directory
...
Both warnings are seen together when restarting a container using private cgroup
namespace.
To summarise:
- The warnings are seen when running the container on a Centos 8 host, but not on an
Ubuntu 20.04 host
- It is assumed this issue is specific to cgroups v1, based on the warning messages
- Disabling SELinux on the host with 'setenforce 0' makes no difference
- Seen with systemd v245 but not with v230
- Seen with '--privileged' and in non-privileged with '--cap-add
sys_admin'
- Changing the cgroup driver/manager doesn't seem to have any effect
- The same is seen with docker except when running privileged the first warning becomes a
fatal error after hitting "Failed to open pin file: No such file
or directory" (coming from
https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2972) and the container
exits (however docker doesn't
claim to support systemd)
I am afraid you are using a combination that is not well tested.
systemd takes different actions depending on what capabilities you give
it. I am not sure how CAP_SYS_ADMIN would affect it. So my suggestion
is to drop giving too many capabilities to the systemd container, as in
this case you don't want systemd to manage your system.
Have you considered using cgroupv2? cgroup delegation works much better
there, and it is safe.
Giuseppe
Some extra details copied from the systemd email thread:
- On first boot PID 1 can be found in
/sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/init.scope/cgroup.procs,
whereas when the container
restarts the 'init.scope/' directory does not exist and PID 1 is instead found in
the parent (container root) cgroup
/sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/cgroup.procs (also
reflected by /proc/1/cgroup). This is strange because systemd must be
the one to create this cgroup dir in the initial boot, so I'm not sure why it
wouldn't on subsequent boot.
- I confirmed that the container has permissions to create the dir by executing a
'mkdir' in /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/
inside the container after the restart, so I have no idea why systemd is not creating the
'init.scope/' dir. I notice that inside the container's systemd
cgroup mount 'system.slice/' does exist, but 'user.slice/' also does not
(both exist on normal boot).
This should be reproducible using the following:
cat << EOF > Dockerfile
FROM ubuntu:20.04
RUN apt-get update -y && apt-get install systemd -y && ln -s
/lib/systemd/systemd /sbin/init
ENTRYPOINT ["/sbin/init"]
EOF
podman build . --tag ubuntu-systemd
podman run -it --name ubuntu --privileged --cgroupns private ubuntu-systemd
podman restart ubuntu
Thanks,
Lewis
_______________________________________________
Podman mailing list -- podman(a)lists.podman.io
To unsubscribe send an email to podman-leave(a)lists.podman.io