clone

Source: https://github.com/plockaby/clone

What is the clone tool?

This is a pretty simple program that assembles multiple directory trees onto remote servers using rsync. It makes it easy to build something once on a build server and then deploy it to multiple servers. For example, it can be used to control the contents of /usr/local or /srv or even /etc makes it easy to keep configuration files in one location and deployed to lots of servers and then verify that they haven’t changed on the remote server since you last deployed them.

For example, say you want to compile Perl once and just deploy it to your servers but man are RPMs and debs annoying. So you compile it on a build host, tar it up, then plop it into /clone/sources/perl-5.20.3. Alongside that you have /clone/sources/perl-support which has things like symlinks for /usr/local/bin/perl so that if you build a new version of Perl don’t need to remember to recreate all the symlinks and generic configuration files. So you write a configuration for clone file that looks like this:

PERL = {
    software/perl-support
    PERL-5.20.3 = {
        software/perl-5.20.3
    }
}

And then you add to the same configuration file a list of your hosts, like this:

MYHOST = {
    $PERL-5.20.3
}

_HOSTS_ = {
    myhost/r~ [debian9 @myhost.example.org] = $MYHOST
}

Now, using this tool, you can deploy Perl to any number of hosts and it will be configured exactly the same way. If someone on one of your hosts goes around messing with Perl this tool will revert those changes the next time it is run. In this way you can guarantee what you’re deploying and you can validate that your host matches what you expect it to be.

Finally, after putting all of the software on your host, you can make it trigger actions on the remote host based on which files changed. Did you deploy a new configuration for Apache? Have the host automatically restart Apache. And so on.

This is a system created by the University of Washington in the 1990s and is still in use to this day, in 2019, to manage the software installed on hundreds of hosts. (The original version used rdist instead of rsync. I do not wish rdist on my worst enemy.) The version currently running at the University of Washington was written by me in 2014 based on the original version. The version you see here is a slightly modified version of what runs at the University but does not track very far from it.

What pieces make up this tool?

There are three commands:

  • clone
    This tool is the center of the system. It assembles the directory trees into host representations, shows you a verification log, and then actually deploys the software with rsync.
  • findsrc
    It’s often useful to know which directory tree is the source for a file on a host. This tool will tell you that.
  • deadsrc
    It’s often useful to know if there are directory trees that are no longer in use. This tool will tell you that.

This document will only focus on the first command.

Configuring Clone

There are two files that control the operation of the clone system. One is called config.ini and the other is called hosts. The former controls the configuration of the system and the latter the configuration of the hosts that will receive the directory trees.

There are no required options in the config.ini file as all the options have sensible defaults. This is a pretty standard configuration file:

user = ref
home = /usr/local/ref

[runner]
ssh = /usr/bin/ssh
rsync = /usr/local/bin/rsync
sudo = /usr/bin/sudo

[paths]
sources = /clone/sources
builds = /clone/builds
logs = /clone/logs
tools = /clone/tools

But if you don’t like the defaults then these are the options that you can change:

  • user
    The clone system must connect to remote hosts over SSH. This is the user to use to make the connection. This user should probably not be root. By default the user is called ref.
  • home
    This is the path to the unprivileged user’s home directory on the remote systems. The default is /usr/local/<user> where <user> is whatever is defined in the user option.
  • key
    This is the path to the SSH key that the unprivileged user will use to log in to remote systems. The default is <home>/.ssh/id_rsa.
  • ssh
    This is the path to the SSH command. The default is /usr/bin/ssh.
  • rsync
    This is the path to the rsync command. This applies to both the source and target systems so hopefully it’s in the same place on all of your systems. The default is /usr/bin/rsync.
  • sudo
    This is the path to the sudo command. This applies to both the source and target systems so hopefully it’s in the same place on all of your systems. The default is /usr/bin/sudo.
  • sources
    This is the path to the source trees. The default is to look in the sources directory found in $FindBin::RealBin/../.
  • builds
    This is the path where all of the trees will be assembled for each host. The default is to use the builds directory found in $FindBin::RealBin/../.
  • logs
    This is the path where log files will be stored on each run. The default is to write logs to the logs directory found in $FindBin::RealBin/../.
  • tools
    This is the path to tools that need to be deployed to each host. Anything found in the path defined here will be deployed to <home>/tools. The default is to use $FindBin::RealBin/../.

The hosts configuration file needs to exist. This configuration file defines what the directory trees are and which directory trees goes to which hosts. It is full of variables defining those trees and each tree can reference other trees to build a composite. Variables are expanded lazily so order does not matter in this file. There is only one required variable: _HOSTS_. This variable contains a definition of all of the hosts and which directory trees they will receive. For example:

_HOSTS_ = {
    foo/r [debian8 @foo.example.com] = $FOO_DEBIAN8_NS
    bar/r [debian7 @bar.example.com] = $BAR_DEBIAN7_NS
}

The format of each line in the hosts definition can be read:

name/flags [platform @fqdn] = $VAR

There are several parts to this defintion:

  • name
    This is the name that will be used to refer to the host. This is used when calling clone, like this: clone name build.
  • flags
    These are flags for the host. The list of flags may contain nothing. There are also two options for flags that affect how hosts are updated:
    • X
      If this flag is set then the host is disabled and neither a verify nor an update is allowed without the use of the --force flag.
    • ~
      Normally this program will effectively take control of the entire file system for a host and any file that is neither excepted nor controlled by clone will be removed from the host. If this flag is set then only an “overlay” will be done and no files will be removed from the remote host by default. If you set this flag but want to selectively choose certain directories that are controlled entirely by clone then you can use the directory option in a filter for the host. You almost certainly want to use this flag.
  • platform
    This is the type of platform on which the host runs. This can be any arbitrary string but it is used to automatically load a source tree based on the platform and host name. For example, if the platform is debian8 and the hostname is foo then the source tree debian8/foo would automatically be added to the list of source trees.
  • fqdn
    This is the fully qualified domain name of the host. This will be looked up in DNS during compilation of the hosts configuration file so it had better resolve to something or your hosts file will not compile correctly.

Finally, $VAR is the variable that defines the sources that will comprise the host. To create variables to build a host one can start easy with something like:

FOO_DEBIAN7_NS = foo/bar

A host that uses this variable will receive files from the directory tree found in foo/bar. To create a more complex definition one can include multiple values in a variable with something like this:

FOO_DEBIAN7_NS = {
    foo/bar
    common/asdf
    common/test
}

This would combine all files from each foo/bar, common/asdf, and common/test. One could also include variables instead. For example, the following is functionally equivalent to the previous example:

FOO = foo/bar
REF_DEBIAN7_NS = {
    $FOO
    common/asdf
    common/test
}

As a shortcut, a parenthetical can be added after the variable definition and this will postfix whatever is in the parenthetical to the value. Again, this example would be functionally equivalent to the two previous examples:

FOO = foo/bar
REF_DEBIAN7_NS (asdf) = {
    $FOO
    common/
    common/test
}

Multiple arguments could be provided in the parenthetical as well. The next example would be almost functionally equivalent to the three previous examples:

FOO = foo/bar
REF_DEBIAN7_NS (asdf test) = {
    $FOO
    common/
}

However while previous example will search for common/asdf and common/test, $FOO will not be expanded to include either asdf or test.

Variables can also be embedded in other variables, too. For example:

FOO = {
    common/asdf
    BAR = {
        common/test
    }
    BAZ = common/fdsa
}

The previous example would be functionally equivalent to:

FOO = {
    common/asdf
}

BAR = {
    $FOO
    common/test
}

BAZ = {
    $FOO
    common/fdsa
}

Finally, using the directory expansion does apply to embedded variables. For example:

FOO (bar) = {
    common/
    FOO_BAZ = baz/
    FOO_BAT = bat/
}

The previous example would be functionally equivalent to:

FOO = {
    common/bar
}

FOO_BAZ = {
    common/bar
    baz/bar
}

FOO_BAT = {
    common/bar
    bat/bar
}

Filter Files

The default configuration is to copy all files to the remote host and remove any files that aren’t controlled by clone. You can configure a host such that files that aren’t controlled by clone aren’t removed by setting the ~ flag in the hosts configuration file as described above. You can also control which files are managed by clone more finely by putting filter files into the root of directory trees.

The filter file should be named with the format filter.foo where foo can be anything as long as it is unique among all source directories that make up a host. An example file name might be /clone/sources/common/asdf/filter.foo.

There are five sections that can be used in a filter file.

  • =directory
    This defines directory paths that will be kept synchronized on the host. That means that when a host is updated then anything under the directory that is not found in the host’s sources on the master will be removed from the host. This section means nothing unless the host has the ~ flag set because the default configuration for clone is such that every directory will be kept fully synchronized.
  • =except
    This defines things to exclude from being synced on both the source and the destination. That means that paths in this section won’t be removed from the destination and they won’t be sent from the master to the host. The pattern should match the acceptable formats for rsync. There is no guaranteed order to how these will be put into the filters.
  • =perishable
    This defines things to exclude from removing but that will be removed if they are the last thing in a directory and are preventing the directory from being removed. The same applies here that applies to except: the pattern should match the acceptable formats for rsync and there is no guaranteed order to how these will be put into the filters.
  • =command keyword
    For all of the paths under this section, if any of them change, the script named keyword will be run at the end of the update. An environment variable named FILES will be passed to the script containing a list of all files that triggered the script, delimited with a colon. The actual programs can be found under /tools/scripts. Everything under /tools will be deployed to all remote hosts under <home>/tools. As an example, if this were configured in your filter file:
    =command foobar
    /foo/bar/.*
    Then anything under the path /foo/bar/ on the remote host that changes will cause the program named <home>/tools/scripts/foobar to be called when updates are finished.

An example filter file might look like this:

=directory
    /srv

=except
    /srv/foobar
    .*/.toprc

=command bind
    /etc/bind/.*

The above example will synchronize all files in /srv and remove files on the host that weren’t found in the configured sources, except /srv/foobar which will be left in place if found. Additionally, if any file is matched by the regular expression /etc/bind/.*, such as /etc/bind/named.conf.local or /etc/bind/named.conf.options for example, then the program called bind found in <home>/tools/scripts will be run. You might make the bind script restart named, for example, to load the new configurations.

Installation

Installation should be done as root. If it’s not then the clone program will not be able to do things like change file ownership. Thus, all the below commands should be run as root.

It is assumed that your hosts restrict SSH connections from the root user. Based on that assumption, a user needs to be created on all systems to initiate the SSH connection over which rsync will run. If this is not a valid assumption and you are able to use the root user to ssh into remote hosts using an SSH key then you can skip this section.

The unprivileged should be the same user on all systems and the user should have the same user id and group id on all systems. For example:

addgroup --gid 123 --system ref
adduser --uid 123 --system --home /usr/local/ref --disabled-password --shell /bin/bash --ingroup ref ref

The user needs to be able to run rsync on the remote host through sudo. This is complicated by the way rsync works when it calls itself on the remote host. To work around this we are going to set an effective alias for rsync to force it to run through sudo. To that end, set the user’s /usr/local/ref/.bashrc file like this:

rsync() (
   /usr/bin/sudo /usr/bin/rsync $@
)

The user should be allowed to use sudo to run rsync and the post-processing program like this by creating /etc/sudoers.d/ref and putting this in it:

Cmnd_Alias REF=/usr/bin/rsync, /usr/local/bin/rsync, /usr/local/ref/tools/process-updates
ref ALL=NOPASSWD:REF

Ensure some file permissions are updated:

chown ref:ref /usr/local/ref/.bashrc
chmod 0440 /etc/sudoers.d/ref

Once you have a server from which to run clone and once you have a user set up to SSH into your remote systems you can install clone by simply putting it anywhere on your file system as it exists in the source repository. There is no installation process. Just clone the source git repository and start running the system. A nice place to put clone, though, might be /clone.

Once you’ve installed the clone system you can start writing commands in the /clone/tools directory. Those will be deployed to all remote hosts. Make sure that those files are executable or they won’t work.

After you’ve finished installation and writing some tools scripts, create the files /clone/conf/hosts and /clone/conf/config.ini. Once those are created you can run this command:

clone myhost verify -c

This will spit out what will change on the remote system. Change verify to update and you apply the changes.

Getting Started

Once you’ve got a config.ini configuration file and a relatively empty hosts configuration file, it’s time start building directory trees. Assuming that you’ve installed clone into /clone and kept the configuration files roughly the same as shown in the examples, then you can start putting directory trees under /clone/sources.

Directory trees have two components to them. They have a group and they have a subgroup. A group starts like this:

/clone/sources/group

And then a subgroup is like this:

/clone/sources/group/subgroup

Under the subgroup is where you start putting files. So this file:

/clone/sources/group/subgroup/myfile.txt

Will be deployed to a host at this path:

/myfile.txt

And this file:

/clone/sources/group/subgroup/etc/myfile.txt

Will be deployed to a host at this path:

/etc/myfile.txt

The same file can’t exist in two directory trees that go to the same host without causing a conflict that will prevent the host from being updated.

Now you know how to start building directory trees and how to configure those trees to go to different hosts. The clone system is very handy for deploying lots of files to a remote system and writing rudimentary post processing scripts. It’s also great for validating a host’s file system against changes made by bad developers or thieves who have broken into your system.

It’s worth noting that this system, while handy, has drawbacks. While files are being synchronized it’s possible for a host to be in a non-deterministic state while some files have been copied but others have not yet been copied. That is the biggest drawback of this system. So be aware that that problem exists. If you do not like that problem then be aware that neither do I. This is just what we’re using at the University of Washington until we get our act together with Docker containers to guarantee atomic deployments, better testing before deployment, and consistent development and production environments. Until then, this is what we have.