I got this to work. It's clear that I fundamentally don't understand UID mapping
and user namespaces well enough, even after reading a bunch of man pages and articles
about them. Still, after a lot of further experimentation, trial and error, I succeeded.
I'll post my findings here in case someone else struggles with this (and, judging by
Google results for the error messages in my original post, I'm not alone).
TL;DR here are the parameters I had to add:
--uidmap 0:0:1
--uidmap 100:1:1
--gidmap 0:0:1
--gidmap 65534:1:1
Inside the container, the corresponding /proc entries match:
root@c4a7043b2e10:/# cat /proc/self/uid_map
0 0 1
100 1 1
root@c4a7043b2e10:/# cat /proc/self/gid_map
0 0 1
65534 1 1
After this, 'apt update' ran fine. The mapping of the container's root to the
host's non-root user when writing files to bind mounted volumes was also preserved.
More details—in particular about the "magic numbers" 100 and 65534—follow for
anyone interested, or in need of applying this to their situation.
- - - - -
Here's what's on the host side in both /etc/subuid and /etc/subgid, encompassing
all the five non-root users on the host (names obscured):
vagrant:100000:65536
user2:165536:65536 <-- this one is the container-running service user
user3:231072:65536
user4:296608:65536
user5:362144:65536
(The corresponding host UIDs and GIDs for the listed users are 1000–1004, though
that's not relevant for this case.)
Inside the container, the user is just root with UID/GID 0. There, both /etc/subuid and
/etc/subgid are blank.
During research for writing this, I realized where the UID 100 and GID 65534 in the error
message are coming from. Inside the container's /etc/passwd, there's this:
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
UIDs 100-999 are "dynamically allocated system users and groups" according to
https://www.debian.org/doc/debian-policy/ch-opersys.html#uid-and-gid-classes. Fortunately,
in this case, the _apt user always seems to be dynamically allocated the UID 100, so that
I can just hardcode that into my --uidmap invocation. GID 65534 is "nogroup",
the group equivalent of the "nobody" user with the corresponding UID.
_apt is an unprivileged user that owns some of the files/dirs that the 'apt'
command uses when dealing with packages. When 'apt' runs as root, it drops
privileges to become this user as a security measure; see
https://askubuntu.com/a/810213/1265622
To summarize my goals:
1. When the container's root user writes into volumes bind mounted from the host, it
should have the privileges of user2, who is running the container. When it creates new
files into those volumes, they should appear on the host side with the same
owner/group/mode as they would have if user2 had just written them to the bind mount
source directly on the host.
2. Inside the container, the root user should have full root privileges to do what it
wants – within the limits of what's even possible to do in rootless containers. (And,
of course, limited to the privileges of user2 whenever touching something on the host.) In
this case, it needs to be able to drop privileges and become the _apt user within the
container.
The first goal was doable with just --uidmap 0:0:1. Interestingly, when I added --uidmap
100:1:1 and --gidmap 65534:1:1 for the second goal, the container refused to start,
complaining that UID mapping is used but GID 0 is not mapped. This is why --gidmap 0:0:1
was added as the last piece of the puzzle. I have no idea why it wasn't initially
required for the first goal.
- - - - -
After all this, I still do not at all understand what the second number
("intermediate UID/GID") in rootless mode for the --uidmap and --gidmap
parameters even is. I've read and re-read
https://docs.podman.io/en/latest/markdown/podman-run.1.html#uidmap-contai...
a dozen times to no avail, as well as 'man user_namespaces', and I almost feel
like I know less than when I started. At this point I guess I'd need to understand
something about the Linux kernel for any explanation to make sense.
Still, I hope this helps someone else!