Hello everyone,
**Disclaimer: This is a long e-mail.**
I am on NixOS (23.05), using the podman binary provided by the
distribution package. There are several issues that I am facing but
the issue that I want resolved is that _I want rootless Podman
containers started at boot_.
I won't get much into NixOS other than what is needed (i.e. no
advocacy for NixOS). NixOS, being a distribution with reproducible
builds, has a different method of storing binaries. Instead of
binaries living in `/usr/bin`, binaries actually live in
`/nix/store/<hash>-pkg-ver/bin`. Thereafter, the binaries are linked
into `/run/current-system/sw/bin`. My `PATH` (from a login shell)
looks like the following:
```
[pratham@sentinel] $ echo $PATH
/home/pratham/.local/bin:/home/pratham/bin:/run/wrappers/bin:/home/pratham/.local/share/flatpak/exports/bin:/var/lib/flatpak/exports/bin:/home/pratham/.nix-profile/bin:/etc/profiles/per-user/pratham/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
```
NixOS, being an OS that you can build with configuration files (i.e.
almost zero bash code to install; except for formatting and mounting),
there exists a way to declare your Podman containers like you do in a
compose.yaml and those containers will automatically be created as a
systemd service [0]. This is great! But those service files are placed
in `/etc/systemd/user`. This has an issue: the Podman container now
runs as root. I checked this by **logging in as root** and checking
the output of `podman ps` (not just `sudo podman ps`). If I wanted
rootful containers, I wouldn't be using Podman...
So, for the time being, I have resorted to writing a systemd unit file
by hand (which is stored in `$HOME/.config/systemd/user`). But the
path `/run/current-system/sw/bin` is missing from the unit's PATH. No
biggie, I can just add it using the following line under the
`[Service]` section:
```
Environment="PATH=/run/current-system/sw/bin:$PATH"
```
(This is a temporary hack and is strongly advised against, but I did
this as a troubleshooting measure, not as a solution.)
But the service fails with the following log entries in journalctl:
```
Jul 11 10:46:47 sentinel podman[36673]:
time="2023-07-11T10:46:47+05:30" level=error msg="running
`/run/current-system/sw/bin/newuidmap 36686 0 1000 1 1 10000 65536`:
newuidmap: write to uid_map failed: Operation not permitted\n"
Jul 11 10:46:47 sentinel podman[36673]: Error: cannot set up namespace
using "/run/current-system/sw/bin/newuidmap": should have setuid or
have filecaps setuid: exit status 1
Jul 11 10:46:47 sentinel systemd[1317]: testing-env.service: Main
process exited, code=exited, status=125/n/a
```
I never encountered this error on Fedora or RHEL. While experimenting,
I noticed one thing: **If I run _any_ Podman command (even `podman
ps`) from my _login shell_ and then restart the Podman container's
systemd service, the service runs cleanly.**
From the _Why can't I use sudo with rootless Podman_ article [1]:
One of the core reasons Podman requires a temporary files directory
is for detecting if the system has rebooted. After a reboot, all containers are no longer
running, all container filesystems are unmounted, and all network interfaces need to be
recreated (among many other things). Podman needs to update its database to reflect this
and perform some per-boot setup to ensure it is ready to launch containers. This is called
"refreshing the state."
This is necessary because Podman is not a daemon. Each Podman command is run as a new
process and doesn't initially know what state containers are in. You can look in the
database for an accurate picture of all your current containers and their states.
Refreshing the state after a reboot is essential to making sure this picture continues to
be accurate.
To perform the refresh, you need a reliable way of detecting a system reboot, and early
in development, the Podman team settled on using a sentinel file on a tmpfs filesystem. A
tmpfs is an in-memory filesystem that is not saved after a reboot—every time the system
starts, a tmpfs mount will be empty. By checking for the existence of a file on such a
filesystem and creating it if it does not exist, Podman can know if it's the first
time it has run since the system rebooted.
The problem becomes in determining where you should put your temporary files directory.
The obvious answer is /tmp, but this is not guaranteed to be a tmpfs filesystem (though it
often is). Instead, by default, Podman will use /run, which is guaranteed to be a tmpfs.
Unfortunately, /run is only writable by root, so rootless Podman must look elsewhere. Our
team settled on the /run/user/$UID directories, a per-user temporary files directory.
This means that Podman needs some sort of "initialization" when the
system has rebooted. Apparently, due to NixOS' nature, this
"initialization" doesn't occur when Podman is invoked from a systemd
service (something is missing but I can't figure out _what_). So I
rebooted and setup an `inotifywait` job (logged in as `root`--not with
the `sudo` prefix--with the command `inotifywait /run/user/1000/
--recursive --monitor`; `XDG_RUNTIME_DIR` for user `pratham` is
`/run/user/1000`) and ran `podman ps` when I was logged in as user
`pratham`. It generated the following output:
```
/run/user/1000/ ATTRIB,ISDIR libpod
/run/user/1000/libpod/ ATTRIB,ISDIR
/run/user/1000/libpod/tmp/ OPEN alive.lck
/run/user/1000/libpod/tmp/ CLOSE_WRITE,CLOSE alive.lck
/run/user/1000/libpod/tmp/ OPEN alive.lck
/run/user/1000/libpod/tmp/ CLOSE_WRITE,CLOSE alive.lck
/run/user/1000/libpod/tmp/ CREATE pause.pid.NjPiqQ
/run/user/1000/libpod/tmp/ OPEN pause.pid.NjPiqQ
/run/user/1000/libpod/tmp/ MODIFY pause.pid.NjPiqQ
/run/user/1000/libpod/tmp/ CLOSE_WRITE,CLOSE pause.pid.NjPiqQ
/run/user/1000/libpod/tmp/ MOVED_FROM pause.pid.NjPiqQ
/run/user/1000/libpod/tmp/ MOVED_TO pause.pid
/run/user/1000/ ATTRIB,ISDIR libpod
/run/user/1000/libpod/ ATTRIB,ISDIR
/run/user/1000/containers/ CREATE,ISDIR overlay
/run/user/1000/containers/ OPEN,ISDIR overlay
/run/user/1000/containers/ ACCESS,ISDIR overlay
/run/user/1000/containers/ CLOSE_NOWRITE,CLOSE,ISDIR overlay
/run/user/1000/containers/overlay/ CREATE overlay-true
/run/user/1000/containers/overlay/ OPEN overlay-true
/run/user/1000/containers/overlay/ CLOSE_WRITE,CLOSE overlay-true
/run/user/1000/containers/overlay/ OPEN overlay-true
/run/user/1000/containers/overlay/ CLOSE_NOWRITE,CLOSE overlay-true
/run/user/1000/containers/overlay/ CREATE metacopy()-false
/run/user/1000/containers/overlay/ OPEN metacopy()-false
/run/user/1000/containers/overlay/ CLOSE_WRITE,CLOSE metacopy()-false
/run/user/1000/containers/overlay/ CREATE native-diff()-true
/run/user/1000/containers/overlay/ OPEN native-diff()-true
/run/user/1000/containers/overlay/ CLOSE_WRITE,CLOSE native-diff()-true
/run/user/1000/containers/ CREATE,ISDIR overlay-containers
/run/user/1000/containers/ OPEN,ISDIR overlay-containers
/run/user/1000/containers/ ACCESS,ISDIR overlay-containers
/run/user/1000/containers/ CLOSE_NOWRITE,CLOSE,ISDIR overlay-containers
/run/user/1000/containers/ CREATE,ISDIR overlay-locks
/run/user/1000/containers/ OPEN,ISDIR overlay-locks
/run/user/1000/containers/ ACCESS,ISDIR overlay-locks
/run/user/1000/containers/ CLOSE_NOWRITE,CLOSE,ISDIR overlay-locks
/run/user/1000/containers/ CREATE,ISDIR networks
/run/user/1000/libpod/tmp/ OPEN alive.lck
/run/user/1000/libpod/tmp/ CLOSE_WRITE,CLOSE alive.lck
/run/user/1000/libpod/tmp/ OPEN alive.lck
/run/user/1000/containers/ OPEN,ISDIR networks
/run/user/1000/containers/ ACCESS,ISDIR networks
/run/user/1000/containers/ CLOSE_NOWRITE,CLOSE,ISDIR networks
/run/user/1000/libpod/tmp/ CREATE alive
/run/user/1000/libpod/tmp/ OPEN alive
/run/user/1000/libpod/tmp/ CLOSE_NOWRITE,CLOSE alive
/run/user/1000/libpod/tmp/ CLOSE_WRITE,CLOSE alive.lck
/run/user/1000/systemd/units/ CREATE .#invocation:dbus.serviced739c18053185984
/run/user/1000/systemd/units/ MOVED_FROM
.#invocation:dbus.serviced739c18053185984
/run/user/1000/systemd/units/ MOVED_TO invocation:dbus.service
/run/user/1000/ CREATE,ISDIR dbus-1
/run/user/1000/ OPEN,ISDIR dbus-1
/run/user/1000/ ACCESS,ISDIR dbus-1
/run/user/1000/ CLOSE_NOWRITE,CLOSE,ISDIR dbus-1
/run/user/1000/dbus-1/ OPEN,ISDIR services
/run/user/1000/dbus-1/services/ OPEN,ISDIR
/run/user/1000/dbus-1/ ACCESS,ISDIR services
/run/user/1000/dbus-1/services/ ACCESS,ISDIR
/run/user/1000/dbus-1/ ACCESS,ISDIR services
/run/user/1000/dbus-1/services/ ACCESS,ISDIR
/run/user/1000/dbus-1/ CLOSE_NOWRITE,CLOSE,ISDIR services
/run/user/1000/dbus-1/services/ CLOSE_NOWRITE,CLOSE,ISDIR
/run/user/1000/systemd/ CREATE,ISDIR transient
/run/user/1000/systemd/ OPEN,ISDIR transient
/run/user/1000/systemd/ ACCESS,ISDIR transient
/run/user/1000/systemd/ CLOSE_NOWRITE,CLOSE,ISDIR transient
/run/user/1000/systemd/transient/ CREATE podman-2894.scope
/run/user/1000/systemd/transient/ OPEN podman-2894.scope
/run/user/1000/systemd/transient/ MODIFY podman-2894.scope
/run/user/1000/systemd/transient/ CLOSE_WRITE,CLOSE podman-2894.scope
/run/user/1000/systemd/units/ CREATE
.#invocation:podman-2894.scopeb6be723b1ec13b95
/run/user/1000/systemd/units/ MOVED_FROM
.#invocation:podman-2894.scopeb6be723b1ec13b95
/run/user/1000/systemd/units/ MOVED_TO invocation:podman-2894.scope
/run/user/1000/containers/ CREATE,ISDIR overlay-layers
/run/user/1000/containers/ OPEN,ISDIR overlay-layers
/run/user/1000/containers/ ACCESS,ISDIR overlay-layers
/run/user/1000/containers/ CLOSE_NOWRITE,CLOSE,ISDIR overlay-layers
/run/user/1000/containers/overlay-layers/ CREATE mountpoints.lock
/run/user/1000/containers/overlay-layers/ OPEN mountpoints.lock
/run/user/1000/containers/overlay-layers/ CLOSE_WRITE,CLOSE mountpoints.lock
/run/user/1000/containers/overlay-layers/ OPEN mountpoints.lock
/run/user/1000/containers/overlay-layers/ CLOSE_WRITE,CLOSE mountpoints.lock
/run/user/1000/containers/overlay-layers/ OPEN mountpoints.lock
/run/user/1000/containers/overlay-layers/ CLOSE_WRITE,CLOSE mountpoints.lock
/run/user/1000/systemd/units/ DELETE invocation:podman-2894.scope
/run/user/1000/systemd/transient/ DELETE podman-2894.scope
/run/user/1000/libpod/tmp/ OPEN pause.pid
/run/user/1000/libpod/tmp/ ACCESS pause.pid
/run/user/1000/libpod/tmp/ CLOSE_NOWRITE,CLOSE pause.pid
/run/user/1000/systemd/transient/ CREATE podman-pause-f50834a6.scope
/run/user/1000/systemd/transient/ OPEN podman-pause-f50834a6.scope
/run/user/1000/systemd/transient/ MODIFY podman-pause-f50834a6.scope
/run/user/1000/systemd/transient/ CLOSE_WRITE,CLOSE podman-pause-f50834a6.scope
/run/user/1000/systemd/units/ CREATE
.#invocation:podman-pause-f50834a6.scope03db5d0ea8888975
/run/user/1000/systemd/units/ MOVED_FROM
.#invocation:podman-pause-f50834a6.scope03db5d0ea8888975
/run/user/1000/systemd/units/ MOVED_TO invocation:podman-pause-f50834a6.scope
```
Following is the output of `podman info` on my computer:
```
[pratham@sentinel] $ podman info
host:
arch: arm64
buildahVersion: 1.30.0
cgroupControllers:
- cpu
- io
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: Unknown
path: /run/current-system/sw/bin/conmon
version: 'conmon version 2.1.7, commit: '
cpuUtilization:
idlePercent: 81.03
systemPercent: 3.02
userPercent: 15.94
cpus: 4
databaseBackend: boltdb
distribution:
codename: stoat
distribution: nixos
version: "23.05"
eventLogger: journald
hostname: sentinel
idMappings:
gidmap:
- container_id: 0
host_id: 994
size: 1
- container_id: 1
host_id: 10000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 10000
size: 65536
kernel: 6.1.38
linkmode: dynamic
logDriver: journald
memFree: 3040059392
memTotal: 3944181760
networkBackend: netavark
ociRuntime:
name: crun
package: Unknown
path: /run/current-system/sw/bin/crun
version: |-
crun version 1.8.4
commit: 1.8.4
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
remoteSocket:
exists: true
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities:
CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_
CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: ""
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable:
/nix/store/n8lbxja2hd766pnz89qki90na2b3g815-slirp4netns-1.2.0/bin/slirp4netns
package: Unknown
version: |-
slirp4netns version 1.2.0
commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.4
swapFree: 2957766656
swapTotal: 2957766656
uptime: 0h 5m 34.00s
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- docker.io
- quay.io
store:
configFile: /home/pratham/.config/containers/storage.conf
containerStore:
number: 2
paused: 0
running: 0
stopped: 2
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/pratham/.local/share/containers/storage
graphRootAllocated: 13539516416
graphRootUsed: 7770832896
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 9
runRoot: /run/user/1000/containers
transientStore: false
volumePath: /home/pratham/.local/share/containers/storage/volumes
version:
APIVersion: 4.5.0
Built: 315532800
BuiltTime: Tue Jan 1 05:30:00 1980
GitCommit: ""
GoVersion: go1.20.5
Os: linux
OsArch: linux/arm64
Version: 4.5.0
```
So my current question is how do I do this initial setup manually? I
don't want to log into `pratham`'s login shell every time I have to
reboot my machine for the Podman containers to start.
[0]:
https://nixos.wiki/wiki/Podman#Run_Podman_containers_as_systemd_services
[1]:
https://www.redhat.com/sysadmin/sudo-rootless-podman
- Pratham Patel