[Podman] Re: Speeding up podman by using cache

Monday, 4 November 2024

Dan,

Thanks and apologies for the delay, I have been away.

We mainly use the podman for building the Go and Java artifact images during the package
stage of our Jenkins pipelines which get deployed to production.

Currently I am doing the image caching on the worker nodes of our EKS cluster and when I
look at the sizes of the overlay directory after a couple of days on a worker node, its
over 50GB in size.  So I think exploring the NFS concept would be worth it to see how much
it would speed up the builds (or slower due to read/write over network).  I am also
concerned about the network latency as currently there is no network as the images are
sitting on the nodes that podman runs on.

I will also test the UID issue that you talked about.

Thanks again for your help and advice, much appreciated.

Puvi Ganeshar | @pg925u
Principal, Platform Engineer
CICD - Pipeline Express | Toronto
[Image]

From: Daniel Walsh <dwalsh(a)redhat.com&gt;
Date: Friday, October 25, 2024 at 8:33 AM
To: Ganeshar, Puvi <puvi.ganeshar(a)directv.com&gt;, podman(a)lists.podman.io
<podman(a)lists.podman.io&gt;
Subject: Re: [Podman] Re: Speeding up podman by using cache
On 10/24/24 11:25, Ganeshar, Puvi wrote:
Dan,

Thanks for coming back to me on this.

If I use an NFS store (with Read & Write) as Podman’s storage, do you anticipate any
race conditions when multiple podman processes reading and writing at the same time?  Do I
need implement any locking mechanisms like what they do in relational databases.

Yum and DNF should not be a bigger issue as we don’t build them every day and we use
distroless for the Go microservices and Java s built on a custom base image with all deps
already included.

Thanks again.

I don't think so.  We already have locking built into podman database and NFS Storage.
 Once the container is running Podman is not going to do anything.  In detach mode podman
exits.

Podman is only writing to storage and working with locks when the container is created and
when images are pulled.

The key issue with NFS and storage is that if Podman needs to create a different UID other
then the users UID, then NFS server will not allow Podman to do a chown.

From its point of view it sees dwalsh chowning a file to a non dwalsh UID.

Puvi Ganeshar | @pg925u
Principal, Platform Engineer
CICD - Pipeline Express | Toronto
[Image]

From: Daniel Walsh <dwalsh@redhat.com><mailto:dwalsh@redhat.com>
Date: Wednesday, October 23, 2024 at 11:10 AM
To: podman@lists.podman.io<mailto:podman@lists.podman.io>
<podman@lists.podman.io><mailto:podman@lists.podman.io>
Subject: [Podman] Re: Speeding up podman by using cache
On 10/22/24 11:04, Ganeshar, Puvi wrote:
Hello Podman team,

I am about explore this option so just wanted to check with you all before as I might be
wasting my time.

I am in Platform Engineering team at DirecTV, and we run Go and Java pipelines on Jenkins
using Amazon EKS as the workers.  So, the process is that when a Jenkins build runs, it
asks the EKS for a worker (Kubernetes pod) and the cluster would spawn one and the new pod
would communicate back to the Jenkins controller.  We use the Jenkins Kubernetes pod
template to configure the communication.  We are currently running the latest LTS of
podman, v5.2.2, however still using cgroups-v1 for now, planning to migrate early 2025 by
upgrading the cluster to use Amazon Linux 2023 which uses cgroups-v2 as default.  Here’s
the podman configuration details that we use:

host:
  arch: arm64
  buildahVersion: 1.37.2
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.12-1.el9.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit:
f174c390e4760883511ab6b5c146dcb244aeb647'
  cpuUtilization:
    idlePercent: 99.22
    systemPercent: 0.37
    userPercent: 0.41
  cpus: 16
  databaseBackend: sqlite
  distribution:
    distribution: centos
    version: "9"
  eventLogger: file
  freeLocks: 2048
  hostname: podmanv5-arm
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.10.225-213.878.amzn2.aarch64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 8531066880
  memTotal: 33023348736
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.1-1.el9.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.1
    package: netavark-1.12.2-1.el9.aarch64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.16.1-1.el9.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.16.1
      commit: afa829ca0122bd5e1d67f1f38e6cc348027e3c32
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240806.gee36266-2.el9.aarch64
    version: |
      pasta 0^20240806.gee36266-2.el9.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later

https://www.gnu.org/licenses/old-licenses/gpl-2.0.html<https://urldefe...
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities:
CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.el9.aarch64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 144h 6m 15.00s (Approximately 6.00 days)
  variant: v8
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 107352141824
  graphRootUsed: 23986397184
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.2.2
  Built: 1724331496
  BuiltTime: Thu Aug 22 12:58:16 2024
  GitCommit: ""
  GoVersion: go1.22.5 (Red Hat 1.22.5-2.el9)
  Os: linux
  OsArch: linux/arm64
  Version: 5.2.2
We migrated to podman when Kubernetes deprecated docker and have been using podman for the
last two years or so.  Its working well, however since we run over 500 builds a day, I am
trying to explore whether I can speed up the podman build process by using image caching. 
I wanted to see if I use an NFS file system (Amazon FSX) as the storage for podman
(overlay-fs) would it improve podman performance by the builds completing much faster as
of the already downloaded images on the NFS.  Currently, podman in each pod on the EKS
cluster would download all the required images every time so not taking advantage of the
cached images.

These are my concerns:

  1.  Any race conditions, a podman processes colliding with each other during read and
write.
  2.  Performance of I/O operations as NFS communication will be over the network.

Have any of you tried this method before?  If so, can you share any pitfalls that you’ve
faced?

Any comments / advice would be beneficial as I need to weigh up pros and cons before
spending time on this.  Also, if it causes outage due to storage failures it would block
all our developers; so, I will have to design this in a way where we can recover
quickly.

Thanks very much in advance and have a great day.

Puvi Ganeshar | @pg925u
Principal, Platform Engineer
CICD - Pipeline Express | Toronto
[Image]

_______________________________________________

Podman mailing list -- podman@lists.podman.io<mailto:podman@lists.podman.io>

To unsubscribe send an email to
podman-leave@lists.podman.io<mailto:podman-leave@lists.podman.io>

You can setup an additional store which is preloaded with Images on an NFS share, which
should work fine.

Whether this improves performance or not is probably something you need to discover.

If you are dealing with YUM and DNF,  you might also want to play with sharing of the rpm
database with the build system.

https://www.redhat.com/en/blog/speeding-container-buildah<https://urld...

https://www.youtube.com/watch?v=qsh7NL8H4GQ<https://urldefense.com/v3/...

2025

2024

2023

2022

2021

2020

2019

[Podman] Re: Speeding up podman by using cache