Image Credit: Lisa Hornung/TechRepublic

Cloud Native: Thinking About Data#

Wanna see the code? It’s in this GitHub repository.

The transition from IaaS to Cloud Native computing means rethinking our data. Categorizing data is the first step in conceptualizing how applications should run and, for oldsters like me, there’s a need to rethink where things belong. To understand data you need to ask some key questions:

  1. Who owns the data?

  2. What is the lifecycle of the data? (When is it created, destroyed and updated.)

  3. Is the data secret or sensitive?

  4. Is the data precious?

These considerations, among others, are how you know where to put things. Now, what data is a part of a traditional VM?

  1. The source and binary application code (i.e. the distro and all its packages)

  2. The set of installed packages that are needed

  3. System configuration (e.g. users, hostname, IP address, cron jobs, settings)

  4. System secrets (SSH keys)

  5. Home directories

  6. System runtime data (e.g. files in /tmp and /var/run)

Let’s consider those items in the table below:

Data

Owner

Lifecycle

Secret

Precious

Place

Applications

The Distro (Ubuntu)

Regular Updates

no

no

The container image.

Installed Packages

Me

Whenever I want

no

no

The container image.

System configuration

Me

Applied at startup

no

no

Kubernetes ConfigMaps and Helm values.yaml

System secrets

Me

Applied at startup

yes

no

Kubernetes Secrets and Helm values.yaml

Home Directories

Individual users

The semester (or longer for my home)

no

yes

Kubernetes PersistentVolume

Runtime data

The system

Same as the Pod

no

no

Kubernetes EmptyDir or the container writable layer.

So far, this is the examination we would do with any Kubernetes conversion. However, there’s a problem.

The Problem With /etc/passwd#

UNIX wasn’t designed to be cloud native and this causes a conflict with the /etc/passwd and related files. User accounts are listed in /etc/passwd and users are system configuration. They should be specified in a ConfigMap (or a Secret for /etc/shadow). However, installing packages after the pod is deployed may add system users, modifying /etc/passwd. It’s not possible to modify a ConfigMap as a file from inside of the pod. Therefore, the list of registered users must be considered runtime data.

To resolve the /etc/passwd problem the user list has to be applied at system start time using standard tools (e.g. Ansible). The base Docker image has no registered users and on boot the following things are done by the /etc/rc.local script:

  1. Apply the desired hostname

  2. Create the administrative user

  3. Configure sudo for the administrative user.

  4. Import SSH keys for the administrative user.

  5. Run an arbitrary command as the administrative user to further customize the container.

System Startup#

A decade after Linux system startup moved away from shell scripts, one thing still works: If you put a script in /etc/rc.d it is still run by systemd. Hooray! That gives me an easy way to have the container execute the specialization that it needs. I created three ConfigMaps that contain BASH script. The first /etc/rc.env sets environment variables during specialization. It looks basically like this:

export DEFAULT_USER=human
export DEFAULT_KEY_IMPORT=gh:you-user
export SET_HOSTNAME=myhost

It’s sourced by /etc/rc.local which looks a bit like this:

#! /usr/bin/bash
set -e 
. /etc/rc.env 

# System setup 
echo "127.0.1.1 ${SET_HOSTNAME}" | tee -a /etc/hosts 
echo "${SET_HOSTNAME}" | tee /etc/hostname 
hostname ${SET_HOSTNAME} || true  # Doesn't work in unpriveleged containers.

# User setup
useradd ${DEFAULT_USER} -u 1000 -U -G adm -m -s /usr/bin/bash
echo "${DEFAULT_USER} ALL=(ALL) NOPASSWD:ALL" | tee /etc/sudoers.d/${DEFAULT_USER}
chown ${DEFAULT_USER}:${DEFAULT_USER} /home/${DEFAULT_USER}

# User customization
su ${DEFAULT_USER} -c "/etc/rc.user" || true

touch /ready

The user’s customization is run as the administrative user. The presence of the /ready file signals to Kubernetes that the container is fully configured. The last thing is the /etc/rc.user. I’ve wrestled with what the right thing to do with that script is. I decided on a two step process:

  1. Check out a git repository and run a command in it.

  2. Run some other arbitrary command.

These are specified as variables in /etc/rc.env. You can see the full contents of the files in the GitHub repository linked at the top of the post.

Conclusion#

Now that we have a startup process we can run our operating system as a Kubernetes container. There are a few things that would be frowned upon by the denizens of Kubernetes. It’s possible that the arbitrary configuration command run a script in the user’s home directory. This would make container startup non-repeatable because it depends on stored volume state. Also, depending on what you do startup may be a bit slow. But with this we have a working system.

In the next post I’ll talk about privilege and how it affects my containers.