On 2019-08-10 13:17, Max Bigras wrote:
Given an alpine:3.10.1 image
```
podman pull alpine:3.10.1
```
And a unit file foo.service
```
[Service]
ExecStart=/usr/bin/podman run --name %N --rm --tty alpine:3.10.1 sleep 99999
ExecStop=/usr/bin/podman stop %N
```
And starting `foo.service` with `systemctl`
```
# systemctl daemon-reload
# systemctl start foo.service
```
I don't see my `sleep` process in `foo.service` status:
```
# systemctl status foo.service | head
● foo.service
Loaded: loaded (/etc/systemd/system/foo.service; static; vendor
preset: enabled)
Active: active (running) since Sat 2019-08-10 19:58:05 UTC; 40s ago
Main PID: 15524 (podman)
Tasks: 9
Memory: 7.3M
CPU: 79ms
CGroup: /system.slice/foo.service
└─15524 /usr/bin/podman run --name foo --rm --tty
alpine:3.10.1 sleep 99999
```
I see `conmon` land in a different cgroup, visible with the
`systemd-cgls` command:
```
# systemd-cgls
Control group /:
-.slice
├─init.scope
│ └─1 /sbin/init
├─machine.slice
│ ├─libpod-conmon-c598f5a0c84881c69dcd69c5af981dd5071385138e45ce0c3b94dcc5308953a
│ │ └─15648 /usr/bin/conmon -s -c
c598f5a0c84881c69dcd69c5af981dd5071385138e45ce0
│ └─libpod-c598f5a0c84881c69dcd69c5af981dd5071385138e45ce0c3b94dcc5308953a5.scope
│ └─15662 sleep 99999
├─system.slice
│ ├─mdadm.service
│ │ └─880 /sbin/mdadm --monitor --pid-file /run/mdadm/monitor.pid
--daemonise --s
│ ├─foo.service
│ │ └─15524 /usr/bin/podman run --name foo --rm --tty alpine:3.10.1 sleep 99999
```
From listening to youtube presentations about podman I thought podman
using a traditional fork exec model would imply all my processes would
show up in the same `systemctl status` and be in the same control
group controlled by systemd.
Looking at the output of `ps` also shows that the `sleep` process is
the parent of the `conmon` process and not the `podman` process:
```
# ps -Heo pid,ppid,comm,cgroup
15524 1 podman
11:memory:/system.slice/foo.service,8:pids:/system.sl
15648 1 conmon
11:memory:/machine.slice/libpod-conmon-c598f5a0c84881
15662 15648 sleep
11:memory:/machine.slice/libpod-c598f5a0c84881c69dcd6
```
Instead it looks like `conmon` in a `scope` unit named:
```
libpod-conmon-c598f5a0c84881c69dcd69c5af981dd5071385138e45ce0c3b94dcc5308953a5.scope
```
Why doesn't `conmon` and `sleep` land in the same `foo.service` systemd unit?
_______________________________________________
Podman mailing list -- podman(a)lists.podman.io
To unsubscribe send an email to podman-leave(a)lists.podman.io
All Linux containers, at present, will create and manage their own
CGroups, independent of systemd (with a few potential exceptions).
This is mostly done so they can independently manage resources, though
it is also necessary to do some common container operations - the
'podman stats' and 'podman top' commands, for example, track processes
in the container by its CGroup.
The exception I mentioned was rootless containers in a CGroup v1
environment. There is no support for delegation to rootless containers
in the V1 hierarchy, so the containers have no permission to create
CGroups.
For most people, this is perfectly sufficient, but we definitely
recognize that there are use cases that require keeping containers in
CGroups managed elsewhere - most notably, from systemd unit files.
There is work ongoing at [1] to enable this, though I caution that
there are a lot of moving parts here - we don't just need support in
Podman, but also in 'runc' - Podman isn't actually creating the new
CGroup for the container, the OCI runtime (usually 'runc') is.
Also, a bit more context: the Conmon CGroup is not the container
CGroup. Conmon creates its own CGroup (for various legacy reasons -
we're evaluating whether these still hold true, and this could change)
and then spawns the OCI runtime - and then the OCI runtime spawns its
own CGroup. So you'll have a Conmon CGroup and another for the actual
container (the two 'libpod' cgroups).
The parent situation is simpler to explain; Podman launches conmon,
and then conmon double-forks to daemonize. At that point, the two are
effectively separate; Podman itself can go away completely, and the
Conmon process will continue managing the container. This is
deliberate - Podman is just the frontend that launches the container,
and we don't need it to keep running once the container is started.
Because of this, we recommend tracking Conmon, not Podman, with unit
files. In Podman 1.5.0 and later, 'podman generate systemd' will
properly handle this, creating a unit file that tracks Conmon using
PID file.
I hope this helps explain things.
Thanks,
Matt Heon
[1]
https://github.com/containers/libpod/pull/3581