The problem with rootfull containers.

In a nutshell, compromise:

$ sudo mkdir /var/lib/foobar
$ docker run --rm -it -v /var/lib/foobar:/mnt/foobar:Z docker.io/library/httpd:latest \
sh -c 'apt-get update && apt-get install -y zsh-static && cp /usr/bin/zsh-static /mnt/foobar/zsh && chmod u+s,g+s /mnt/foobar/zsh'
$ sudo -u nobody /var/lib/foobar/zsh -c 'id' && sudo rm /var/lib/foobar/zsh && sudo rmdir /var/lib/foobar

foobar indeed.

With apologies to team zsh whom I have abused to illustrate my point.

The solution? Don't do it. Or at least, don't do it without taking the necessary precautions. But how?

Conventional wisdom dictates that you run the container with the user option so that for example in the httpd container, '--user=www-data', runs with the active user as www-data within the container.

Problem solved then?

No. This doesn't automagically make all the files in the container owned by root now owned by www-data. So the container is probably not going to function internally as seamlessly as it normally would.

What now then?

Well, you could always build the container just the way that you need it, changing all the file permissions the way you need them etc. Yay... and welcome to container management hell. What started off as 'cool, just pull containers from docker and off we go...' turns into 'pull the container from docker, customise and build the container just the way we need it. Later, check if it has it changed? Yes? Oh well, now we gotta pull the container from docker, rebuild the container with its security updates, apply our customisations again etc. etc.'

Rinse, recycle, repeat.

So what's the real solution?

Run the container as intended and in such a manner that root is root inside the container and not root outside the container. Aka rootless containers.

Problem solved then?

No. That's all fine and well, until you need those rootless containers to intercommunicate or share a common network.

Historically, we run services as separate users. bind runs named, www-data runs httpd etc. So when we want to extend these services to running in containers, logically we want the bind container running as user bind on the host, the httpd container running as user www-data on the host, no? Yes.

But what happens when we want our httpd container to talk to our bind container, when they're both running rootless as independent users? The less security mindful amongst us might just expose the container ports on the host IP address and whoop de do, we're off. But of course that exposes our containers to the wild west that is the Internet of today and bang, sooner or later we're done for. Containers should be treated like intimacy. No glove, no love. We want them isolated as much as possible to avoid all those ITD's that the kids throw at us...

So what the F* is the solution then?

podman
uidmaps
shared network