[Podman] Re: Please give feedback on: Add support to auto-update containers running in systemd units as

Saturday, 14 March 2020

Hi Karl,

thanks a lot for sharing feedback and for the details about how you tackle
auto-updates!

I can't really argue against your points as they all seem valid to me, but
I have a couple of thoughts that might clarify a bit where we're currently
positioning the feature:

   - We want to add rollbacks in a future PR, which will make auto updates
   more sophisticated than they are now; especially in combination with health
   checks.
   - As Podman targets single nodes only, we can't guarantee seamless
   updates without service downtime. That's something for multi-node
   environments such as OpenShift.
   - Our goal is to provide a solid and reliable infrastructure to build
   complex systems. The proposed auto-update mechanism provides such
   infrastructure, but it can easily be made smarter to meet more (but
   certainly far from all) complex requirements:
      - Updates need to be done faster? No problem. The auto-update timer
      can run every minute, if needed. Or another service might just `podman
      auto-update` when an update is needed.
      - Updates need to happen in a certain order? Systemd has a
      sophisticated dependency management and can restart services in
a specified
      order.

Kind regards,
 Valentin

On Fri, Mar 13, 2020 at 11:55 PM Karl Quinsland <karl(a)touchpoint.io&gt; wrote:

...
 I am of two minds on this.

 I am happy to see the functionality come to podman, but am concerned that
 there's no way to make this feature robust enough for all but the simplest
 of use cases without sinking a *ton* of time into it.

 Tl;DR: "reload this service when there's a new version" is a lot more
 complicated than it appears unless the service in question is low stakes or
 otherwise purposefully designed to be highly stateless and all consumers of
 the service are equally well equipped to deal with a service that may
 suddenly speak a slightly updated version of the protocol... etc. If this
 is a feature that is in demand, then please do keep building it!

 As implemented now, I can think of a few common scenarios where it will be
 immediately useful, but beyond them, I see quite a few things that'll need
 to be added to make it useful in more sophisticated/legacy environments.  I
 would use this auto-update functionality on a few containers that I deploy
 around the house because those containers all run on systemd hosts and the
 workloads that the containers have is not sensitive to (slightly) out of
 date containers. Nor is a manual rollback of any container the end of the
 world.  I can't use this at work, though because various workloads have
 elaborate gates around their rollout or otherwise need to be rolled out as
 soon as a new release is available... not (up to) 24h later.

 ---

 I've implemented something similar internally that does not suffer from
 some of the same drawbacks. It's is quite a bit more flexible, but at the
 cost of some additional overhead/infrastructure. Chiefly:

 - Would work with any init system that supports some form of "additional
 configuration" faculty. In my case, though, we're primarily - but not
 exclusively - a systemd shop.
 - Is not limited to daily checks for updates. Within seconds of the
 "switch being flipped" - so to speak - the new version of the container can
 be running.
 - Supports rollbacks and other release gates

 Internally, we use the *excelent* Consul Key/Value storage system to
 manage which workloads use which versions of a container, but any key/value
 storage system that allows a daemon to monitor or 'learn' about a change to
 a value for a given key will work. That is: I use consul to pull this off,
 you could absolutely make EtcD or ZooKeeper work here, too.

 Through a process that's not relevant here, a key/value path is updated.
 E.G.:

 path: /service/in-field-c/version
 value: 1.28

 where in-field-hardware-controller is an illustrative example, as is the
 value stored @ that key.

 On every container host,  there's a daemon that watches the
 /service/in-field-hardware-controller/version path in consul. Depending
 on the workload, we use the simple but powerful consul-template program
 or a more sophisticated internal daemon. Consul-Template is a small
 golang based binary that can be run as a daemon to watch a specific consul
 key, but the consul API is open and there are a variety of daemons out
 there that support monitoring a given path. The critical bit here is that
 the daemon has the ability to execute system commands when a change is
 observed: When the monitoring daemon notices a change to the value @ the
 key, it renders out a file that is then read by systemd and "exposed" to
 the ExecStart= directive as an environment variable. The file that is
 rendered out would be placed in:

 /etc/systemd/systemd/in-field-hardware-controller.service.d/10-version.conf

 and would look like this:

 [Service]
 Environment=WORKLOAD_IMAGE_VERSION=1.28

 The daemon that writes out the file then consults some internal logic to
 see when to *apply* this change. In simple cases, the daemon
 (consul-template) will immediately run

 systemctl daemon-reload; systemctl restart hardware-controller.service

 which will immediately apply the change. In other cases, the daemon (not
 consul-template) will run additional scripts to sanity check other
 dependencies and provide additional 'gates' on the roll out. These scripts
 check up and down-stream dependencies,  database/stateful data versions and
 - in some cases - require an engineer to be the "second man" (see 'two man
 rule' on wikipedia) in a version roll out. If the updated container does
 not start to publish an expected payload to a pre-defined endpoint, we
 consider the container to be unhealthy and consult additional internal
 logic about weather to revert or exponentially backoff on the restart
 attempts.

 The portion of the hardware-controller.service file that plugs the
 env-var into the run command looks like this

 ExecStart=/usr/bin/podman run --name=hardware-controller <...snip...>
 some-registry/hw-controller:${WORKLOAD_IMAGE_VERSION}

 I will be the first to acknowledge that our solution has many knobs and
 sliders that increase the complexity of our "dynamic" version configuration
 setup. Some of these knobs are
 necessary to support features that are absolutely critical for our needs:
 rollouts within seconds unless additional gates and relatively painless
 rollbacks (where possible). For my
 personal/at-home workloads, those needs are not critical and so the many
 knobs/sliders are not needed.

 Happy to clarify anything!

 -K

 _______________________________________________
 Podman mailing list -- podman(a)lists.podman.io
 To unsubscribe send an email to podman-leave(a)lists.podman.io

2025

2024

2023

2022

2021

2020

2019

[Podman] Re: Please give feedback on: Add support to auto-update containers running in systemd units as