An in-depth information to overcoming the challenges when migrating from Docker to a safer, excessive efficiency and supported container picture builder
To supply a seamless recruiting expertise to tens of millions of job seekers and employers, ZipRecruiter depends on a singular tech stack for orchestrating, working, and constructing containerized pictures.
This consists of AWS EC2 clusters for computing, Kubernetes for container orchestration, the Flatcar working system, and Docker to construct pictures and run containers for our apps and Jenkins to assist streamline deployment operations. A type of methods this all comes collectively, for instance, is by mounting the Docker socket from the Kubernetes node into every Jenkins agent container.
In the course of the second quarter of 2022, we encountered a vital technical failure in Docker itself. Investigations uncovered inherent safety vulnerabilities and throttling of parallel builds which prompted an pressing seek for another solution to construct and push pictures to the registry.
The newest docs on these points are from 2019 and 2021. This text can hopefully prevent just a few months of trial and error.
The file system driver error and COPY failure that prompted a Docker investigation
Like many points in our discipline, a model replace kicked off a sequence response.
We use the Flatcar working system on nodes in our K8s infrastructure on AWS EC2 machines. Each few weeks, we improve Flatcar AMI to the newest model, which additionally comes with an improve within the Docker model.
After upgrading to Docker 19.xx we encountered construct failures of a few of our functions associated to a bug outlined in this github issue #403. In essence, Docker 18.xx has its Native Overlay Diff parameter set as true by default, however in later Docker 19.xx and 20.xx the default is fake. This influences the efficiency of the Overlay driver which I’ll clarify a bit extra about later.
As well as, a COPY fail challenge, outlined here, pertaining to Docker variations later than 17.06 popped up.
These bugs meant we couldn’t construct correctly, forcing us to revert again to the earlier Docker and Flatcar variations.
The implication of this failure to improve is two-prong;
First, we weren’t capable of leverage new options and performance. Extra importantly nonetheless, by not upgrading to newer variations we have been rising safety danger publicity over time as new vulnerabilities are found and exploited day-after-day.
As we devised an answer (which I share beneath), we uncovered bigger points with Docker that merited dropping IT altogether for a greater construct system.
Three (extra) causes to drop the Docker-daemon
1. Safety Dangers
In right this moment’s world, working as root is dangerous follow, to not point out harmful. The Docker-daemon, sadly, runs as root with the best stage of entry and in battle with the safety paradigm of containerized processes. A hacker who positive aspects entry to the pod might achieve entry to all root node knowledge and inflict critical injury.
As well as, mounting and establishing the Docker socket is just not easy. The Docker structure says it’s a must to mount a Unix socket on a pod to construct pictures, however failure to configure IT accurately may end up in vulnerabilities. Certainly, there are different options like docker-in-docker however that has its personal points, akin to nested containerization and useful resource isolation points. Ultimately, docker additionally later launched a rootless docker, however you continue to need to mount the docker socket because the docker daemon is working on the node.
2. Docker-daemon throttles parallel builds
Docker runs a single Docker-daemon on every node. Irrespective of what number of pods Kubernetes spins up in a node, all of the pods will use the identical Docker-daemon. This restricted our construct system’s potential to construct parallel pictures even when we decide massive EC2 cases, and by sticking with IT we have been throttling our potential on massive AWS cases.
3. Lack of upkeep and assist
As Kubernetes adopted a extra modular and standardized strategy for interacting with container runtimes, the venture dropped dockershim as an middleman for interacting with Docker. Persevering with to make use of an unsupported software could be dangerous, and stop us from with the ability to use the newest variations of K8s.
Though that is extra associated to runtime, and never instantly associated to the construct system, IT served as but one more reason to seek out a substitute for Docker. We finally migrated to containerd, however that’s a narrative for an additional day.
Evaluating different construct instruments
In mild of the bugs and aforementioned drawbacks, we got down to consider our options to construct pictures. We explored and examined 4 totally different choices: Kaniko (by Google), buildah (open supply), s2i, and img.
Finally we selected Buildah, as IT posed the trail of least resistance with out compromise on safety and efficiency.
4 standards have been of key consideration throughout analysis: safety, efficiency, compatibility and stability.
Safety
Which further privileges, if any, would pods working the brand new builder want?
Would having builder pictures not construct from our base pictures current an issue for our safety posture?
Buildah can run inside a Kubernetes container with a lot much less privileges than root on the node.
Efficiency
How briskly are builds beneath the brand new builder, compared to Docker construct?
How properly does the brand new builder leverage construct/picture caches?
The brand new answer needed to obtain, on the very least, what Docker was already doing. Through the use of the native OverlayFS storage driver (defined beneath) we achieved the specified efficiency.
Compatibility
How simple is IT to combine the brand new builder with what we have already got? How a lot change, if any, would every app must bear to utilize the brand new builder (together with documentation and re-training)?
How properly does the brand new builder combine instantly with GitLab?
Does utilizing this software limit our decisions in integrating a extra complete OSS CI system sooner or later?
With over 1,300 ZipRecruiter apps utilizing Dockerfiles, and for the sake of sustaining backward compatibility, we needed to proceed utilizing Dockerfile syntax and the recordsdata we already had.
Buildah seamlessly ingests dockerfiles, enabling nearly no re-writing or coaching.
Stability
How properly maintained is that this venture, particularly if IT is open-source?
Buildah is backed by RedHat with a really lively open supply neighborhood. Lengthy-term stability appeared like a protected wager.
Key technical points when transitioning from Docker to Buildah
Transitioning from Docker to Buildah is just not easy. The options beneath will shorten the time IT takes you to rise up and working.
1. Loading the Native Overlay Diff driver accurately
Overlay is a storage driver that each Docker and Buildah use. Thus, even after migrating to Buildah, we needed to keep away from the aforementioned bug (#403) by ensuring that the Flatcar AMI ‘Native Overlay Diff’ parameter is TRUE in order that Overlay works as desired.
To attain this, it’s essential to load the Overlay driver within the Flatcar AMI with an adjustment within the boot choices as follows:
Overlay metacopy=off redirect_dir=off choices
If you’re utilizing EC2 you are able to do this in a boot script. In Flatcar AMI you may as well do that through a vendor config. It doesn’t matter what possibility you’re utilizing, this config have to be set to true.
To examine this, run the Buildah information in search of Native Overlay Diff: true.
buildah information
{
"host": {
"CgroupVersion": "v2",
"OCIRuntime": "runc",
"kernel": "5.15.142-flatcar",
"os": "linux",
"rootless": true,
},
"retailer": {
"GraphDriverName": "overlay",
"GraphOptions": [
"overlay.ignore_chown_errors=true"
],
"GraphStatus": {
"Backing Filesystem": "extfs",
"Native Overlay Diff": "true",
"Helps d_type": "true",
"Us
2. Selecting the perfect performing storage driver
Initially, we tried utilizing the VFS storage driver with Buildah, however IT was very sluggish. The VFS backend is a quite simple fallback that has no copy-on-write assist. Every layer is only a separate listing. Creating a brand new layer primarily based on one other layer is finished by making a deep copy of the bottom layer into a brand new listing.
Fuse-overlay is another choice. In this 2019 RedHat blog IT is written: “Fuse-overlay works fairly properly and offers us higher efficiency than utilizing the VFS storage driver.” Nonetheless, compared to Docker and benchmark expectations Fuse-overlay can also be too sluggish, and efficiency was not acceptable for Buildah to qualify as a Docker alternative.
Whereas this 2021 article theorized about implementing ‘native OverlayFS’, we put IT to the check. We discovered that IT carried out in addition to Docker, lastly enabling our transfer to Buildah.
For anybody keen on efficiency, OverlayFS might be the most suitable choice.
To implement, you must use Linux kernel v5.13 or later. And to have the ability to replace to the v5.13 kernel, you want the Flatcar AMI that has kernel version 5.13 and above. Serendipitously, as we realized the right way to remedy the difficulty, v5.13 was launched.
3. Efficiency
After coping with the purposeful points, finally we got down to use construct pictures with Buildah in a managed surroundings to evaluate efficiency. There we seen IT was inconsistently slower to drag pictures.
Debugging Buildah and inspection of the underlying Golang and io.copy code [copy from ECR to local storage] all got here up clear. Then, whereas monitoring community site visitors we seen a excessive charge of TCP re-transmissions. Ultimately an improve from 3227.2.4 to the Flatcar OS model 3510.2.0 mounted this challenge.
This was one of many final points we had throughout the Docker to Buildah migration and as soon as mounted, Buildah efficiency was on par with Docker permitting full migration to Buildah.
4. Configuring authentication
Equally to Docker, after Buidah builds the pictures, our construct system pushes the pictures to cloud-based AWS ECR for storage. To take action, it’s a must to present authentication. As well as, in some cases we additionally must entry third-party pictures to which we have to authenticate ourselves. We authenticate at first as soon as the pod is working utilizing the buildah login command and specify buildah config for use.
That is an instance storage configuration (storage.conf) that must be setup for each pod:
[storage]
# Default Storage Driver, Should be set for correct operation.
driver = "overlay"
[storage.options.overlay]
ignore_chown_errors = "true"
5. Multistage builds
We have been constructing a number of pictures for a single app, and that wanted to proceed working. Buildah, nonetheless, had an optimization characteristic and produced solely a single picture.
To resolve this issue we reached out to the Buildah workforce and instructed them IT was breaking our system. They added new performance in Buildah to ‘Skip unneeded phases from multi-stages.’ You possibly can read the feature request here.
For those who want this characteristic, utilizing the next flag (obtainable from v1.27.2 onwards):
--skip-unused-stages=false
6. Bazel builds
We’ve some scripting on prime of Makefiles that use Bazel to construct packages.
With Docker, Bazel shut down correctly after every construct and when the following RUN command got here by means of, Bazel would begin a brand new server. That is the specified conduct.
Once we examined constructing pictures with Buildah, the Bazel native server course of didn’t shutdown correctly and left some state recordsdata such that when the following RUN directive initiated, Bazel didn’t begin the server and assumed IT was already working from the earlier picture layer thereby failing the whole construct course of.
To repair this challenge it’s a must to both drive Bazel to begin an area server at every layer by deleting state recordsdata from the earlier layer, or you’ll be able to mix a number of make instructions in a single RUN directive.
7. Unable to resolve hostname
There was a bug in Buildah leading to Java builds failing as a result of the intermediate container will get containerID as hostname when utilizing host community, and IT was not resolvable because the entry was not current in /and so forth/hosts. Particulars here and repair here.
To resolve this, we used an inside patch for just a few months and as soon as migration was full, we received IT mounted upstream by the Buildah workforce. Today, so long as you use Buildah model v1.31.0 onwards, you should not have any points.
8. Setting rootless privileges for pods on a node
By working every pod on a node as rootless, and following the precept of least privileges, dangerous actors gained’t be capable of entry precise node knowledge.
To run Buildah inside a Kubernetes container with out root privileges, set the next:
# Set default safety profile methods
runAsUser:
rule: MustRunAsNonRoot
allowedCapabilities:
# Required for Buildah to run in a non-privileged container. See
# https://github.com/containers/buildah/points/4049
- SETUID
- SETGID
# "Since Linux 5.12, this functionality can also be wanted to map
# consumer ID 0 in a brand new consumer namespace" from:
# - https://man7.org/linux/man-pages/man7/capabilities.7.html
# See additionally (seek for "If updating /proc/pid/uid_map"):
# - https://man7.org/linux/man-pages/man7/user_namespaces.7.html
- SETFCAP
9. Chown errors
We needed to set ignore_chown_errors = “true” [see above in storage.conf] to repair among the apps we construct.
Right here’s some documentation for this flag on GITHUB and PODMAN.
“It will permit non-privileged customers working with a single UID inside a consumer namespace to run containers. The consumer can pull and use any picture, even these with a number of uids. Word a number of UIDs shall be squashed right down to the default uid within the container. These pictures can have no separation between the customers within the container. Solely supported for the overlay and vfs drivers.“ – Github.
10. Mknod requires root privileges
Whereas trying to construct the open-source ingress-nginx app with a rootless buildah, we encountered the next error:
Fail to run mknod - Operation not permitted
As defined here, the error happens as a result of when an unprivileged consumer (rootless) doesn’t have sufficient privileges for utilizing mknod, the kernel blocks that operation. No matter what number of capabilities are left within the consumer namespace, IT gained’t be attainable.
This can be a limitation (documented here) we needed to stay with. The workaround is to use an already built image from the github repository.
11. Too Many Open Recordsdata
Too Many Open Recordsdata is the error that Linux returns when open() or join() or the rest that will allocate a file descriptor fails as a result of we hit the higher restrict on open recordsdata.
The buildah-build manual page gave us a touch on what quantity to decide on for the max variety of open recordsdata:
“nofile”: most variety of open recordsdata (ulimit -n)
“nofile”: most variety of open recordsdata (1048576); when run by root
Initially we began with half of what buildah makes use of when run as root (i.e. (1048576 / 2 = 524288) and this did the trick.
The ultimate config we used was as follows:
"--ulimit", "nofile=524288"
It would be best to experiment primarily based in your necessities and set nofile accordingly.
12. Incapability to entry root file system
When constructing pictures, Buildah runs as rootless and doesn’t have permissions of root owned directories. For instance, I noticed a problem the place /var/run was symlinked to /run, and /run was owned by root and Buildah was unable to entry IT.
Ensure you should not accessing recordsdata owned by root when constructing pictures.
Remember the fact that many different Buildah construct configuration arguments could also be helpful relying in your venture, for instance: –community, –layers, –format, –build-context, –reminiscence and so forth.
Go exploring!
In abstract, we decided that migrating from Docker to Buildah was essential to take care of ZipRecruiter’s excessive requirements for efficiency and knowledge safety. Regardless of many challenges alongside the way in which, we persevered, went to the supply, and finally achieved our purpose whereas serving to create fixes for the entire neighborhood.
For those who’re keen on engaged on options like these, go to our Careers web page to see open roles.
* * *
In regards to the Writer
Saurabh Ahuja is a Workers Software program Engineer at ZipRecruiter. As a key member of the Core Methods workforce, Saurabh builds and maintains the framework and instruments that energy our technological growth and on-line companies. After 18+ years on the most profitable tech corporations on the planet, he nonetheless likes to get his palms deep in code. ZipRecruiter presents him exactly that, in addition to the chance to affect the corporate as a number one IC, and the flexibleness to participate in household life at house and practice for intense Ultraman triathlons.