[Podman] Rootless container startup failure at bootup, launches fine manually

Thursday, 25 May 2023

We've got a Ubuntu 22.04 server running a rootless container as a systemd user service
on Podman 4.4.2, using Quadlet. Podman and its dependencies are installed from binary
releases on GitHub and/or from source.

Last night the server required a reboot due to some security updates, but the
service/container failed to come back up after the boot. In the morning, I was able to
manually start it up with `systemctl --user start cms_backend` with no changes to any
configuration in the meantime.

Here's the full journalctl output for this unit from last night after the reboot
(timestamps removed for brevity):

systemd[746]: cms_backend.service: unit configures an IP firewall, but not running as
root.
systemd[746]: (This warning is only shown for the first unit using IP firewalling.)
systemd[746]: Starting CMS Backend...
cms_backend[787]: time="2023-05-25T03:01:58+03:00" level=error
msg="Refreshing container
8f03c9c90e6f8aab02344284ba760fe8ddbf52becb7ba95c383fb80c3bd04405: retrieving temporary
directory for container 8f03c9c90e6f8aab02344284ba760fe8ddbf52becb7ba95c383fb80c3bd04405:
no such container"
podman[787]: 2023-05-25 03:01:59.000295722 +0300 EEST m=+0.107418901 system refresh
cms_backend[787]: time="2023-05-25T03:01:59+03:00" level=warning msg="Found
incomplete layer
\"b767c3f7da27350a451d15f1e972cdd874ea0e6f4c38fad1f644686f0831786f\", deleting
it"
cms_backend[787]: time="2023-05-25T03:01:59+03:00" level=error msg="Free
container lock: no such file or directory"
podman[787]: 2023-05-25 03:01:59.037410032 +0300 EEST m=+0.144533221 container remove
8f03c9c90e6f8aab02344284ba760fe8ddbf52becb7ba95c383fb80c3bd04405
(image=<redacted>:latest, name=cms_backend,
com.<redacted>.cms.commit_sha=99d36804, com.<redacted>.cms.component=server,
com.<redacted>.cms.pipeline_id=25445, PODMAN_SYSTEMD_UNIT=cms_backend.service)
cms_backend[787]: Error: remove /run/user/1001/cms_backend.cid: no such file or directory
podman[787]: 2023-05-25 03:01:59.00150107 +0300 EEST m=+0.108624239 image pull 
<redacted>:latest
systemd[746]: cms_backend.service: Main process exited, code=exited, status=125/n/a
systemd[746]: cms_backend.service: Failed with result 'exit-code'.
systemd[746]: Failed to start CMS Backend.

Here's the Quadlet generator we use:

[Unit]
Description=CMS Backend
Wants=network-online.target
After=network-online.target

[Container]
Image=<redacted>:latest
ContainerName=cms_backend
Exec=/bin/bash -c "pip install -q -e . \
    && python project/manage.py migrate -v 1 \
    && python project/manage.py tailwind install \
    && python project/manage.py tailwind build \
    && python project/manage.py collectstatic --noinput \
    && python -Wd project/manage.py check \
    && exec gunicorn --workers 8 --threads 4 --worker-class gthread
--worker-tmp-dir /dev/shm --error-logfile - --bind 0.0.0.0:8000 --pythonpath project
cms_site.wsgi:application

EnvironmentFile=/home/cms/.cms_backend.env
RemapUsers=manual
RemapUid=0:0:1
RemapUid=100:1:1
RemapGid=0:0:1
RemapGid=65534:1:1

Network=pasta:-t,auto,-T,auto

# TODO change to native format after Podman 4.5 upgrade
PodmanArgs=--log-driver=journald \
    --mount type=bind,source=/var/log/cms,target=/logs \
    --mount type=bind,source=/srv/cms/staticfiles,target=/staticfiles

[Install]
WantedBy=default.target

- - - - -

The container itself is a lightly customized official Python 3.11 image running a Django
application, practically identical to others that we're running with no similar
issues. The entry point of the image is ["/usr/bin/dumb-init", "--"].
Like I said, the service came up manually just fine several hours after the initial
failure, so it doesn't seem likely that this is a problem in our application code.

I can't decipher the error messages, can someone?

2025

2024

2023

2022

2021

2020

2019

[Podman] Rootless container startup failure at bootup, launches fine manually