"What has changed?":
Using Git To Document System Configurations

BayLisa, Mountain View
March 20, 2014

The slides are available at: http://harker.com/Talks/gitfp/index.xhtml

What has changed: Using Git To Document System Configurations by Robert Harker is licensed under a Creative Commons Attribution 3.0 Unported License.
Based on a work at http://harker.com/Talks/gitfp/index.xhtml

Hit the space bar or swipe left for next slide

Overview

This talk is about using git as a version control system for:

Pet server configurations
Auditing of the cloud cattle herd

The talk is also git 101 for systems administrators

The goal of this talk is to give you some useful tools you can use

Robert Harker:
"The sendmail guy": not for a long time
Interested in OpenStack. Built my own stack: cloud5
Interested in DevOps. Worked in Yahoo Sports as they automated

Pets -vs- Cattle

Cattle servers are multiple servers doing the same thing

If a cattle server dies, you spin up a new one
Automation does everything

Pet servers are the "one off" servers that every network has:
dhcp, dns, smtp relay, yum repo, git repo, monitoring, database

These servers quite often:

Predate configuration automation
Only partially managed by automation
Easier to manage "by hand"

Pet Server Configuration Management

How do you manage and document your pet server configuration?

Install notes and changes tracked in bugs or change requests
Readme files in various directories

Most typically the answer is a guilty:

"We we don't really track the changes."
"We 'know' what changes we have made"

Is there a better way?

Manage System Configuration With Git

Why git?

Your development environment already uses git
Easy remote storage of changes
Simple rollback to earlier revisions
Rich tools for viewing history: gitk
Useful git branching model

Git commit management model:

Verify local git repo is up to date
Make changes
Check changes into git
Push changes to the github repo server

Note: in this talk github refers to a local private git repository server

Overview Of Using Git On A New Server

To start using git you need to create your base git repository

Immediately after installing the system:

Finger print the system:
List software, enabled services, other information
Create a git repository for the OS
Use git to track changes to the system as you make them
Use standard git tools like gitk to view history of changes

Shortcomings:
Git can not track configuration changes stored in databases
I don't think git tracks user and group ownership

Fingerprint The System

System files do not tell the full story of the system configuration

What to track:

Network configuration
Packages installed
Services enabled
Network ports listened to
Hardware/kernel information

This is not part of git but can be tracked as part of git

I have a S3 (Stupid Shell Script) that does this:
http://harker.com/Talks/gitfp/Files/Fingerprint.sh

Creating The Initial Git Repo

What to track

System configuration files
System specific /var/lib files
Configuration files outside of /etc
/usr/share/wordpress, /usr/share/tomcat, /var/www/html
Site specific application files
System fingerprints

What not to track

Files with critical passwords in them
System binary files
Script/text files managed by a package manager like yum
Non-text database files
System transient files:
Cache files, log files

Create The Initial Git Repo

Set up a ssh based repo server:
1. On your designated git repository server
```
Make a user git
useradd -m -r -p '!!' git
```
2. Add remote users ssh public keys to ~git/.ssh/authorized_keys
3. Secure as you see fit to prohibit these users from ssh'ing into git repository server

Set yourself up as a git user:

git config --global user.name "First Last"
git config --global user.email "username@example.com"
git config --global core.autocrlf input
git config --global core.safecrlf true
git config --global push.default simple

Verify you can ssh into the git server with your private ssh key

Some useful aliases I use

# GIT aliases
alias gita='git add'
alias gitb='git branch'
alias gitc='git commit'
alias gitd='git diff'
alias gito='git checkout'
alias gitpl='git pull origin'
alias gitpsh='git push origin'
alias gits='git status'
alias gitv='git status > /tmp/git; vi /tmp/git'
alias gitk='gitk --all&'

Initialize local git repository with git init:

cd /
git init
git remote add origin git@github.example.com:reponame.git
git config branch.master.remote origin
git config branch.master.merge refs/heads/master
git symbolic-ref -m "linking HEAD to master" HEAD refs/heads/master

Setup /.gitignore

Simple .gitignore syntax:
Exclude all files matching:
*gif
Exclude a directory but allow a sub-directory:
usr/*/*
!usr/share/wordpress

Sample .gitignore file:

/bin
/boot
/cgroup
/dev
/etc/selinux/targeted/policy
/etc/selinux/targeted/modules/active
/etc/shadow
/etc/shadow-
/etc/ssh/ssh_host_dsa_key
/etc/ssh/ssh_host_key
/etc/ssh/ssh_host_rsa_key
/home
/lib
[. . .]
/tmp
/usr/*/*
!/usr/share/wordpress
/var/*
!/var/lib
/var/lib/dhclient/dhclient*
/var/lib/logrotate.status
/var/lib/mlocate/mlocate.db
/var/lib/mysql
/var/lib/ntp/drift
/var/lib/random-seed
/var/lib/rpm
/var/lib/yum

I have sample .gitignore files I use:
http://harker.com/Talks/gitfp/Files/slash/.gitignore.centos
You will need to adapt these to your needs

Setup /.gitattributes:
- /.gitattributes allows you to override gits default behavior
- Sample .gitattributes file:
```
/etc/trusted-key.key -crlf -diff
*.xls -crlf -diff
*.qif -crlf -diff
```
Look at what git will add with git status:
- git status will show you what git will add
  Piping this to a file so you can edit it is useful
  See gitv alias
- Tune .gitignore to exclude/include local conventions
Make initial git commit:
- Add all selected system files to local repo:
  git add /
- Commit your changes to local git repo:
  git commit -m 'First centos6.4 minimal install commit'
Create a remote repository on git repository server
- As user git on git repo server:
```
cd ~git
mkdir repo.name.git
cd repo.name.git
git init --bare
```
  That's all
Push this repo to the github server:
- git push origin master
- This will populate the bare git repo on the remote server

Note: icon will expand to instructions by clicking on it

The Git update/commit model

I use git to track changes in the system as changes are made

I recommend individual changes be wrapped in a git commit cycle:

Sync local repo
Make your changes
Git commit your changes

Advantages:

Simple to see the files that changed
Commit can be commented with a bug or change control number
Commit history can be viewed with gitk

Before you start a change/update cycle:

Make sure all local changes are committed:
- git status
- git commit
Make sure your local repo is in sync with the master:
- git git pull origin [branch]
- You may need to git merge conflicting changes
Any local commits have been pushed to the master:
- git git push origin [branch]

Note: icon will expand to instructions by clicking on it

Tracking Package Initial Installs

When new packages are installed new configuration files may be created
Capturing there default configuration is useful for upgrades

Files that change due to normal system operation may be added
Add these files to your /.gitignore file

System configuration may change:

New services started
New ports opened
Firewall rules changed

Make running your fingerprinting tool part of your commit cycle

Tracking Changes With Git Topic Branches

Topic branches are a standard method used in many OpenSource projects

The bug fix branch model makes a separate branch for each fix:

Sync repo
Create topic branch you will do your work in:
git checkout -b topic
topic can be: bug number, change ID, any other tracking info
Do a normal git commit cycle
Merge topic branch changes into master branch
Push topic branch to github for tracking

Topic branches are not permanent
Periodically purge old topic branches by deleting them

These branches are still available in git history.

Tracking Changes Across Multiple Hosts Using Git "Host" Branches

Many sites use a hybrid pet model:

A configuration management system is used to configure site wide defaults

Configuration files specific to the local service are edited by hand

Git branches can be used to track differences between these servers

Installing a git host branch on a server:

Clone the git repo.
- Note: this will create a new directory
- git clone git@github:reponame.git
- cd reponame.git
- mv .git* /
- cd ..; rm -rf reponame.git
- cd /
Create the host branch by checking it out:
git checkout -b host_branch_name
Do a git commit cycle to check in initial differences:
Host specific hardware/system differences
Push local branch to github server
git push origin host_branch_name

Tracking Changes Across Multiple Hosts Using Git "Host" Branches (cont.)

Configuration management changes:
These changes are pushed to all servers

Git updates:
Update master branch

Merge changes with each host branch

Conflicts indicate clashes between automatic and manual changes

Merging configuration management changes with the host branches probably can be automated

Figuring Out Changes On An Existing Server Using Git

Problem: you have been asked to upgrade a critical server
Nobody recalls exactly how it was configured
"What was changed?"

Create a generic git repo for the server:

Find original install configuration
anaconda-ks.cfg can be a useful starting point
Do a fresh install on a spare/virtual server using this config
Fingerprint the system; fingerprint the system to be audited
git diff packages.installed.list to see what packages are missing
On generic system install missing packages
Install a /.gitignore file to exclude non-interesting directories
Commit all remaining files

Figuring Out Changes On An Existing Server Using Git (cont.)

Use git to find and review differences

Copy /.git and .gitignore to the server to be audited
Fingerprint the system to be audited
git status will show all the changes:
Files that have been added/deleted
Files that have been changed
git diff file will show changes between original and current config files
Diffing the fingerprint files will hardware/system config changes

Build new server adding missing software packages

Merge changes by hand

Using Git To Track Changes In Your Herds

In a cattle herd we do not need to track changes because:

"They are the same across all the instances"
"The build automation system manages the configuration"

The view is that each time I deploy a release I create a new "gold" image

The truth is that while each image is new, its overall configuration is mostly static

Tracking system changes in your herds can be problematical:

What made the change?
New package install/upgrade?
Puppet change?
Which puppet module?

Using Git To Track Changes In Your Herds (cont.)

From a production/operations point of view:
It is much simpler to look at
cumulative changes to the configuration files.

Each production push gets a git commit of changes labeled with the push ID

The push ID becomes the link between:

The package oriented view of development
The host or server-type view of operations

All of this can be automated

gitk is a really cool way to view configuration changes through time

Integrating OS package updates into your build process

"Dependency hell"

It is hard to keep all of your systems synced with the upstream OS:

Fear that an upstream change will break your code
Version in use is down rev. Upgrade will require major rewrite

How do you pick and choose?

Integrating OS package updates into your build process (cont.)

Solution: make patching the OS part of the DevOps cycle:

When the main code is branched update your private OS repos

Benefits:

A small number of updates are include in each release cycle
Code changes should be minimal
A full development cycle is available for testing

Exceptions:

Updates requiring significant code changes or other problems
Kernel updates for non-instance hosts. Requires reboot

The release engineer / branch manger is responsible for updating OS repos

All devel Linux boxes should also update

"What has changed?": Using Git To Document System Configurations