Cloud Native: Thinking About Data#
Wanna see the code? It’s in this GitHub repository.
The transition from IaaS to Cloud Native computing means rethinking our data. Categorizing data is the first step in conceptualizing how applications should run and, for oldsters like me, there’s a need to rethink where things belong. To understand data you need to ask some key questions:
Who owns the data?
What is the lifecycle of the data? (When is it created, destroyed and updated.)
Is the data secret or sensitive?
Is the data precious?
These considerations, among others, are how you know where to put things. Now, what data is a part of a traditional VM?
The source and binary application code (i.e. the distro and all its packages)
The set of installed packages that are needed
System configuration (e.g. users, hostname, IP address, cron jobs, settings)
System secrets (SSH keys)
Home directories
System runtime data (e.g. files in
/tmp
and/var/run
)
Let’s consider those items in the table below:
Data |
Owner |
Lifecycle |
Secret |
Precious |
Place |
---|---|---|---|---|---|
Applications |
The Distro (Ubuntu) |
Regular Updates |
no |
no |
The container image. |
Installed Packages |
Me |
Whenever I want |
no |
no |
The container image. |
System configuration |
Me |
Applied at startup |
no |
no |
Kubernetes |
System secrets |
Me |
Applied at startup |
yes |
no |
Kubernetes |
Home Directories |
Individual users |
The semester (or longer for my home) |
no |
yes |
Kubernetes |
Runtime data |
The system |
Same as the |
no |
no |
Kubernetes |
So far, this is the examination we would do with any Kubernetes conversion. However, there’s a problem.
The Problem With /etc/passwd
#
UNIX wasn’t designed to be cloud native and this causes a conflict with the /etc/passwd
and related files. User accounts are listed in /etc/passwd
and users are system configuration. They should be specified in a ConfigMap
(or a Secret
for /etc/shadow
). However, installing packages after the pod is deployed may add system users, modifying /etc/passwd
. It’s not possible to modify a ConfigMap
as a file from inside of the pod. Therefore, the list of registered users must be considered runtime data.
To resolve the /etc/passwd
problem the user list has to be applied at system start time using standard tools (e.g. Ansible). The base Docker image has no registered users and on boot the following things are done by the /etc/rc.local
script:
Apply the desired hostname
Create the administrative user
Configure
sudo
for the administrative user.Import SSH keys for the administrative user.
Run an arbitrary command as the administrative user to further customize the container.
System Startup#
A decade after Linux system startup moved away from shell scripts, one thing still works: If you put a script in /etc/rc.d
it is still run by systemd
. Hooray! That gives me an easy way to have the container execute the specialization that it needs. I created three ConfigMap
s that contain BASH script. The first /etc/rc.env
sets environment variables during specialization. It looks basically like this:
export DEFAULT_USER=human
export DEFAULT_KEY_IMPORT=gh:you-user
export SET_HOSTNAME=myhost
It’s sourced by /etc/rc.local
which looks a bit like this:
#! /usr/bin/bash
set -e
. /etc/rc.env
# System setup
echo "127.0.1.1 ${SET_HOSTNAME}" | tee -a /etc/hosts
echo "${SET_HOSTNAME}" | tee /etc/hostname
hostname ${SET_HOSTNAME} || true # Doesn't work in unpriveleged containers.
# User setup
useradd ${DEFAULT_USER} -u 1000 -U -G adm -m -s /usr/bin/bash
echo "${DEFAULT_USER} ALL=(ALL) NOPASSWD:ALL" | tee /etc/sudoers.d/${DEFAULT_USER}
chown ${DEFAULT_USER}:${DEFAULT_USER} /home/${DEFAULT_USER}
# User customization
su ${DEFAULT_USER} -c "/etc/rc.user" || true
touch /ready
The user’s customization is run as the administrative user. The presence of the /ready
file signals to Kubernetes that the container is fully configured. The last thing is the /etc/rc.user
. I’ve wrestled with what the right thing to do with that script is. I decided on a two step process:
Check out a git repository and run a command in it.
Run some other arbitrary command.
These are specified as variables in /etc/rc.env
. You can see the full contents of the files in the GitHub repository linked at the top of the post.
Conclusion#
Now that we have a startup process we can run our operating system as a Kubernetes container. There are a few things that would be frowned upon by the denizens of Kubernetes. It’s possible that the arbitrary configuration command run a script in the user’s home
directory. This would make container startup non-repeatable because it depends on stored volume state. Also, depending on what you do startup may be a bit slow. But with this we have a working system.
In the next post I’ll talk about privilege and how it affects my containers.