Like Valentin, I don't disagree with any of your points. That said, I do want to provide a bit of clarity on WHY and WHEN we are recommending the use of this feature. Inline....

On Fri, Mar 13, 2020 at 6:55 PM Karl Quinsland <karl@touchpoint.io> wrote:
I am of two minds on this.

I am happy to see the functionality come to podman, but am concerned that there's no way to make this feature robust enough for all but the simplest of use cases without sinking a *ton* of time into it.

Tl;DR: "reload this service when there's a new version" is a lot more complicated than it appears unless the service in question is low stakes or otherwise purposefully designed to be highly stateless and all consumers of the service are equally well equipped to deal with a service that may suddenly speak a slightly updated version of the protocol... etc. If this is a feature that is in demand, then please do keep building it!

That's exactly right. We will ONLY be recommending this feature be used when:

1. The end user has the infrastructure to test the upcoming change. The use case driving this is IoT. Imagine a company with 100,000 sensors in the field. Now imagine how they would update these sensors. They must A. have a test environment, let's call it a couple of hundred devices B. A known good starting point (the existing container image/layer) C. A known good destination (the new container image/layer).
2. They ability to push this new image layer to a very trusted registry that only ever has known good versions. You would never push to this registry unless it's passed all tests from #1
3. The ability to roll out en masse to 100,000 devices. This can only happen after #1 and #2 are designed and built.

This feature is really targeting users that are mature enough to have #1, #2, and #3 above. We are providing #3 with podman. They would need to build and design #1 and #2. Some other contributing factors might also be:

1. Users who have good service governance software in the container and the consuming software, something like 3scale
2. User consume very stable Linux distros that really only provide security updates (this is a common use case for Red Hat Enterprise Linux users with Extended Update Support). This is EXTREMELY stable, you can pretty much run yum updates until your heart is content without ever breaking anything, and you can do it for years and years).

 

As implemented now, I can think of a few common scenarios where it will be immediately useful, but beyond them, I see quite a few things that'll need to be added to make it useful in more sophisticated/legacy environments.  I would use this auto-update functionality on a few containers that I deploy around the house because those containers all run on systemd hosts and the workloads that the containers have is not sensitive to (slightly) out of date containers. Nor is a manual rollback of any container the end of the world.  I can't use this at work, though because various workloads have elaborate gates around their rollout or otherwise need to be rolled out as soon as a new release is available... not (up to) 24h later.

---

I've implemented something similar internally that does not suffer from some of the same drawbacks. It's is quite a bit more flexible, but at the cost of some additional overhead/infrastructure. Chiefly:

- Would work with any init system that supports some form of "additional configuration" faculty. In my case, though, we're primarily - but not exclusively - a systemd shop.
- Is not limited to daily checks for updates. Within seconds of the "switch being flipped" - so to speak - the new version of the container can be running.
- Supports rollbacks and other release gates


Internally, we use the *excelent* Consul Key/Value storage system to manage which workloads use which versions of a container, but any key/value storage system that allows a daemon to monitor or 'learn' about a change to a value for a given key will work. That is: I use consul to pull this off, you could absolutely make EtcD or ZooKeeper work here, too.

Through a process that's not relevant here, a key/value path is updated. E.G.:

path: /service/in-field-c/version
value: 1.28

where in-field-hardware-controller is an illustrative example, as is the value stored @ that key.

On every container host,  there's a daemon that watches the /service/in-field-hardware-controller/version path in consul. Depending on the workload, we use the simple but powerful consul-template program or a more sophisticated internal daemon. Consul-Template is a small golang based binary that can be run as a daemon to watch a specific consul key, but the consul API is open and there are a variety of daemons out there that support monitoring a given path. The critical bit here is that the daemon has the ability to execute system commands when a change is observed: When the monitoring daemon notices a change to the value @ the key, it renders out a file that is then read by systemd and "exposed" to the ExecStart= directive as an environment variable. The file that is rendered out would be placed in:

/etc/systemd/systemd/in-field-hardware-controller.service.d/10-version.conf

and would look like this:

[Service]
Environment=WORKLOAD_IMAGE_VERSION=1.28


The daemon that writes out the file then consults some internal logic to see when to *apply* this change. In simple cases, the daemon (consul-template) will immediately run 

systemctl daemon-reload; systemctl restart hardware-controller.service 

which will immediately apply the change. In other cases, the daemon (not consul-template) will run additional scripts to sanity check other dependencies and provide additional 'gates' on the roll out. These scripts check up and down-stream dependencies,  database/stateful data versions and - in some cases - require an engineer to be the "second man" (see 'two man rule' on wikipedia) in a version roll out. If the updated container does not start to publish an expected payload to a pre-defined endpoint, we consider the container to be unhealthy and consult additional internal logic about weather to revert or exponentially backoff on the restart attempts.

The portion of the hardware-controller.service file that plugs the env-var into the run command looks like this 

ExecStart=/usr/bin/podman run --name=hardware-controller <...snip...> some-registry/hw-controller:${WORKLOAD_IMAGE_VERSION}


I will be the first to acknowledge that our solution has many knobs and sliders that increase the complexity of our "dynamic" version configuration setup. Some of these knobs are
necessary to support features that are absolutely critical for our needs: rollouts within seconds unless additional gates and relatively painless rollbacks (where possible). For my
personal/at-home workloads, those needs are not critical and so the many knobs/sliders are not needed.

I think, in my mind, you are describing a very good pattern that I call embedding the configuration in the environment. With containers, I often talk about how people need to separate:

1. Code
2. Configuration
3. Data

Only the code should be in the container #1. Everything else should come from the environment. I would agree, your environment is like Configuration with dynamic capabilities.  We are definitely NOT trying to go that far with this new podman feature.

 


Happy to clarify anything!

-K






_______________________________________________
Podman mailing list -- podman@lists.podman.io
To unsubscribe send an email to podman-leave@lists.podman.io


--
-- 
Scott McCarty, RHCA
Product Management - Containers, Red Hat Enterprise Linux & OpenShift
Email: smccarty@redhat.com
Phone: 312-660-3535
Cell: 330-807-1043
Web: http://crunchtools.com
Using Azure Pipelines with Red Hat Universal Base Image and Quay.io: https://red.ht/2TvYo3Y