Random   •   Archives   •   RSS   •   About   •   Contact

Configuration Management and the Golden Image

When operations first became a thing, system administrators stood up servers using a base image from their favourite distribution. Things were done manually. Some administrators created their own distros, some wrote customised shell scripts to be run once-and-only-once to provision software and settings. This method worked, but it was slow, manual, and the human element caused defects. Then the request came in to stand up 100 servers the exact same way.

System admins resolved this large request by coming up with a Standard Operating Environment (SOE) image which would end up becoming the "golden image". This golden image was the source of truth, and we built hundreds, thousands of machines this way. All systems were the same, or rather started out the same, but it didn't take long for deviations occur.

The golden image by itself was a flawed idea. The task of creating a golden image was difficult and required a lot of work. Not only was it technical, but there was a lot of politics involved in what was worthy of inclusion. We didn't want to add cruft to every machine. Also the golden image was only updated every couple of years, so it would quickly become outdated and it still required provisioning scripts. Systems already in production didn't get configuration updates that were recently added to the golden image. There had to be a better way, and there was...

Some novel and smart system administrators who also knew how to program decided that they didn't want to maintain a golden image and deal with all the headaches involved and opted to use the light weight distribution image and then build sophisticated remote execution software to manage and maintain each server's configuration. Later on they built configuration management systems on top of this remote execution layer and were finally able to keep each system up-to-date, regardless of when it was deployed. They were geniuses and everyone who knew anything quickly rushed to implement remote execution and configuration management. It was the right way to manage servers, until the cloud drifted in.
(SaltStack, Ansible, Puppet, Chef, CFEngine)

In the day of cloud computing, we needed to scale up and down servers in seconds. A complex configuration manifest could take hours to run from start to finish. Configuration management was too slow. Each server needed to download, install, and configure the software stack, in real-time. Sure we could spin up multiple machines in parallel, but it was still slow. We started to look back at the golden age of the golden image, when a server was built and booted in moments. How could we pair the speed of the golden image with the flexibility of configuration management?

In the future could we use configuration management manifests and revision control to document how to build the golden image and then take a snapshot? Could we overlay multiple layers or dataset of images on top of each other? Perhaps we take a step way back and create golden images with a combination of a highly customizable distribution like Gentoo and configuration management.
(Joyant SmartOS zone datasets, Docker container images, Vagrant box files, AWS AMI, Digital Ocean Snapshots, etc)

© Russell Ballestrini.