Reliable service starts
by Mark Raynsford
Hello!
I'm using podman on Fedora CoreOS. The standard setup for a
podman-based service tends to look like this (according to the
documentation):
---
[Unit]
Description=looseleaf
After=network-online.target
Wants=network-online.target
[Service]
Type=exec
TimeoutStartSec=60
User=_looseleaf
Group=_looseleaf
Restart=on-failure
RestartSec=10s
Environment="_JAVA_OPTIONS=-XX:+UseSerialGC -Xmx64m -Xms64m"
ExecStartPre=-/bin/podman kill looseleaf
ExecStartPre=-/bin/podman rm looseleaf
ExecStartPre=/bin/podman pull docker.io/io7m/looseleaf:0.0.4
ExecStart=/bin/podman run \
--name looseleaf \
--volume /var/storage/looseleaf/etc:/looseleaf/etc:Z,ro \
--volume /var/storage/looseleaf/var:/looseleaf/var:Z,rw \
--publish 20000:20000/tcp \
--memory=128m \
--memory-reservation=80m \
docker.io/io7m/looseleaf:{{looseleaf_version}} \
/looseleaf/bin/looseleaf server --file /looseleaf/etc/config.json
ExecStop=/bin/podman stop looseleaf
[Install]
WantedBy=multi-user.target
---
The important line is this one:
/bin/podman pull docker.io/io7m/looseleaf:0.0.4
Unfortunately, this line can fail. That in itself isn't a problem, the
service will be restarted and it'll run again. The real problem is that
it can fail in ways that will break all subsequent executions.
On new Fedora CoreOS deployments, there's often a lot of network
traffic happening on first boot as the rest of the system updates
itself, and it's not unusual for `podman pull` to fail and leave the
services permanently broken (unless someone goes in and fixes them).
This is what will typically happen:
Feb 02 20:31:05 control1.io7m.com podman[1934]: Trying to pull docker.io/io7m/looseleaf:0.0.4...
Feb 02 20:31:48 control1.io7m.com podman[1934]: time="2023-02-02T20:31:48Z" level=warning msg="Failed, retrying in 1s ... (1/3). Error: initializing source docker://io7m/looseleaf:0.0.4: pinging container registry registry-1.docker.io: Get \"https://regist>
Feb 02 20:31:50 control1.io7m.com podman[1934]: Getting image source signatures
Feb 02 20:31:50 control1.io7m.com podman[1934]: Copying blob sha256:9794579c486abc6811cea048073584c869db02a4d9b615eeaa1d29e9c75738b9
Feb 02 20:31:50 control1.io7m.com podman[1934]: Copying blob sha256:8921db27df2831fa6eaa85321205a2470c669b855f3ec95d5a3c2b46de0442c9
Feb 02 20:31:50 control1.io7m.com podman[1934]: Copying blob sha256:846e3b32ee5a149e3ccb99051cdb52e96e11488293cdf72ee88168c88dd335c7
Feb 02 20:31:50 control1.io7m.com podman[1934]: Copying blob sha256:7f516ed68e97f9655d26ae3312c2aeede3dfda2dd3d19d2f9c9c118027543e87
Feb 02 20:31:50 control1.io7m.com podman[1934]: Copying blob sha256:e88daf71a034bed777eda8657762faad07639a9e27c7afb719b9a117946d1b8a
Feb 02 20:32:03 control1.io7m.com systemd[1]: looseleaf.service: start-pre operation timed out. Terminating.
It'll usually happen again on the next service restart. Then, this will
tend to happen:
Feb 02 20:34:13 control1.io7m.com podman[2745]: time="2023-02-02T20:34:13Z" level=error msg="Image docker.io/io7m/looseleaf:0.0.4 exists in local storage but may be corrupted (remove the image to resolve the issue): size for layer \"13cfed814d5b083572142bc>
Feb 02 20:34:13 control1.io7m.com podman[2745]: Trying to pull docker.io/io7m/looseleaf:0.0.4...
Feb 02 20:34:14 control1.io7m.com podman[2745]: Getting image source signatures
Feb 02 20:34:14 control1.io7m.com podman[2745]: Copying blob sha256:9794579c486abc6811cea048073584c869db02a4d9b615eeaa1d29e9c75738b9
Feb 02 20:34:14 control1.io7m.com podman[2745]: Copying blob sha256:8921db27df2831fa6eaa85321205a2470c669b855f3ec95d5a3c2b46de0442c9
Feb 02 20:34:14 control1.io7m.com podman[2745]: Copying blob sha256:846e3b32ee5a149e3ccb99051cdb52e96e11488293cdf72ee88168c88dd335c7
Feb 02 20:34:14 control1.io7m.com podman[2745]: Copying blob sha256:7f516ed68e97f9655d26ae3312c2aeede3dfda2dd3d19d2f9c9c118027543e87
Feb 02 20:34:14 control1.io7m.com podman[2745]: Copying blob sha256:e88daf71a034bed777eda8657762faad07639a9e27c7afb719b9a117946d1b8a
Feb 02 20:34:18 control1.io7m.com podman[2745]: Copying config sha256:cce9701f3b6e34e3fc26332da58edcba85bbf4f625bdb5f508805d2fa5e62e3e
Feb 02 20:34:18 control1.io7m.com podman[2745]: Writing manifest to image destination
Feb 02 20:34:18 control1.io7m.com podman[2745]: Storing signatures
Feb 02 20:34:18 control1.io7m.com podman[2745]: Error: checking platform of image cce9701f3b6e34e3fc26332da58edcba85bbf4f625bdb5f508805d2fa5e62e3e: inspecting image: size for layer "13cfed814d5b083572142bc068ae7f890f323258135f0cffe87b04cb62c3742e" is unkno>
Feb 02 20:34:18 control1.io7m.com systemd[1]: looseleaf.service: Control process exited, code=exited, status=125/n/a
At this point, there's really nothing that can be done aside from
having a human log in and running something like "podman system reset".
These systems are supposed to be as immutable as possible, and
deployments are supposed to be automated. As it stands currently, I
can't actually a deploy a machine and not have it immediately break and
require a manual intervention.
Is there some better way to handle this?
--
Mark Raynsford | https://www.io7m.com
1 year, 9 months
I am not able to install gpgme on RHEL 9
by adhish meena
Hi team,
Currently I am using RHEL 9.
I have been trying to setup dev env of podman on my pc but I am not able to
install gpgme-devel.
could you please suggest how shall I resolve this?
Regards
Adhish Meena
1 year, 9 months
How does podman auto create runRoot directory
by GHui Wu
I have set XDG_RUNTIME_DIR as following.
export XDG_RUNTIME_DIR=/.sllocal/log/containers-user-$UID/containers
But when I execute "podman info", there output error.
$ podman info
ERRO[0000] stat /.sllocal/log/containers-user-3088/containers: no such file or directory
How does podman auto create runRoot directory?
1 year, 9 months