isconf(8)
ISconf 4.2.8.250
10/14/2014
NAME
isconf - infrastructure build and configuration manager
SYNOPSIS
isconf [-Dhrq] [
-c config] [
-m message] verb [verb_args] ...
QUICK START
First, follow the short installation instructions in the INSTALL file
that came with this package. It's best to do this on whatever you're
using as a golden master image, then deploy that image to all of your
machines. If you're only setting up a few machines and have no image
server, then you might be able to get away with installing each
manually from the vendor CD, if you carefully install them each the
same way.
Later, to install the latest version of package 'foo' on ten thousand
hosts, including any hosts that are currently down or not yet built,
you can log into any host and say this:
cd /tmp
wget http://example.com/foo-1.2.tar.gz
isconf start
isconf lock just a comment about installing foo
isconf snap foo-1.2.tar.gz
isconf exec tar -xzvf foo-1.2.tar.gz
isconf exec make -C foo-1.2 install
isconf exec rm -rf foo-1.2.tar.gz foo-1.2
isconf ci
...then, on the other 9,999 hosts, run this during boot or from cron:
isconf start
isconf up
If you're only managing a few machines, you can probably get away with
not starting the isconf daemon at boot -- as above, just start it
manually when you need it by saying 'isconf start'. If you're turning
these machines over to someone else for long-term support, don't want
to teach them isconf, and expect them to manually make a mess anyway,
then it would make sense to leave the daemon off when you're done
anyway. See BUGS/RESTRICTIONS for some security reasons why it
also makes sense to leave the daemon off when you're done.
DESCRIPTION
See the GLOSSARY below for terms and concepts.
ISconf can be thought of as a cross between sudo(8) and a
distributed version control tool like Git or Bitkeeper. Changes you
make via ISconf are journaled and added to a distributed repository,
queuing them for execution on other target machines. Those other
target machines do not need to be running, or even be built, at the
time you check in changes. As you turn on, build, reboot, and/or run
'isconf up' on other machines, ISconf consults the journal and
executes the same changes, in the same order, on each machine.
The ISconf architecture is completely peer-to-peer; there are no
central servers or other single points of failure, and it is designed
for use in partially-partitioned networks such as DMZ environments.
The command-line client talks to a daemon which runs on each
machine. The daemon, usually started at boot, handles distributed
file storage, locking, and network communications.
ISconf is not intended for use in environments where you want to make
manual, ad-hoc, or other out-of-band changes to machines. If you
don't have the will to rebuild all of your machines from scratch so
you know what's on their disks, don't care about disaster recovery,
don't need to keep any of your machines in lock-step with each other,
don't need to test O/S changes before deploying them to production,
aren't as interested in O/S patch management, or still want to log in
as root on target machines and make arbitrary untracked changes, then
you don't want this package.
BACKGROUND
One hundred years ago, automobiles were built by hand. Each vehicle
was unique, composed of parts which were often crafted on the spot.
Repairs were expensive and frequent, owners needed to be mechanics,
fleetwide engineering changes were non-existent.
Then came mass production. Today, a single automotive assembly line
produces vehicles of varying colors and options, all built from the
same basic design and tooling. Replacement parts are interchangeable;
technicians bolt engineering changes onto existing vehicles with a
reasonable expectation that the parts will fit. Economies of scale
have led to highly optimized designs, performance, and usability.
Drivers turn the key and go.
Most IT departments are nowhere near that sort of capability; they
still install and maintain operating systems and applications by hand.
Each machine is unique, reliability is elusive, users become
technicians, fixes often require re-engineering, most outages are
caused by other fixes, and infrastructure-wide changes are fraught
with peril if they are possible at all. Even basic security patches
are, as a rule, applied sporadically.
ISconf provides some of the standardized tooling needed for
deterministic, reproducible management of UNIX machines -- the kind of
reproducibility you can count on for consistency, disaster recovery,
reliability, security, and auditability. ISconf manages hosts over
their entire lifecycle following initial install, allowing you to
continue to test and deploy both major and minor changes well after
the target hosts have been placed into production service. With this
tool you can safely replace kernels and bootloaders, install new
patches, packages, and tarballs, run arbitrary commands, and even
re-install the entire operating system under program control, and do
it all in a way that can be consistently reproduced on other current
or future machines.
Over the last decade, users of earlier versions of ISconf have found
that this consistency gives systems administrators enough breathing
room to "get ahead of the ticket curve", reclaim more of their nights
and weekends, and to, finally, begin to do more engineering and less
firefighting.
PREREQUISITES
To do deterministic and repeatable host management, there are some
things you need to do in addition to just installing and using ISconf.
Above all, you need to maintain a reasonable level of control over the
root-owned bits which you place on your disks, both during initial
install as well as throughout their lifetime.
Automated systems administration is all about making self-modifying
code behave consistently. If you don't start from a known state and
keep it that way, then you can make no assertions about how your
machines will behave in comparison with each other -- a change which
works on one host may not work on others. Once you've destroyed this
consistency, you can no longer count on QA, disaster recovery, load
balancers, distributed applications, HA clusters, new deployments, or
even single machine rebuilds to work correctly.
- If two or more hosts are supposed to act the same, then you need to
install them from the same disk image. This applies to rebuilds of a
single host as well as multiple installs of identical hosts. See
base image in the glossary.
- Your host install tool needs to be able to capture an image of an
existing machine, save it on an install server, then dump that image
onto subsequent machines verbatim, altering only those things which
are supposed to be unique, such as IP address and hostname. Among
Linux installers, for example, systemimager meets this requirement;
kickstart does not. Under Solaris, you'll need to use Flash Archives,
not Jumpstart. See checkpoint image in the glossary.
- After initial install, you need to manage hosts exclusively with
ISconf -- no manual or other out-of-band changes. There is one
semi-exception to this rule: You might want to use another tool to
manage environmentally-influenced configuration files. You'll want to
manage the binaries of that tool using ISconf, and take care to ensure
that the external tool manages only those files which it must. See
environmental data in the glossary for more discussion.
FLAGS
Flags appear only after the isconf command name, not after
subcommands (as opposed to e.g. CVS).
- -c config
-
Top-level configuration file. Defaults to /etc/is/main.cf.
- -D
-
Show debug info on stderr.
- -m
-
Message -- human-readable comment describing the change. Required
only when locking. This flag is deprecated and is likely to be
removed; see the 'lock' verb below for how to provide the message
in a forward-compatible way.
- -r
-
Allow reboot if needed. Used only with the 'up' verb below. Also
see the 'reboot' verb.
Ordinarily you would execute 'isconf -r up' from an rc script,
which is a relatively safe time to allow reboots.
This flag has no effect unless there is a 'reboot' operation
pending in the journal. If there is a 'reboot' pending, then this
flag allows the reboot to take place. You only want to provide
this flag at times when it's safe to reboot the local machine.
Without this flag, if 'isconf up' encounters a 'reboot' operation
during journal replay, the replay stops, an error message is
issued, and subsequent changes are not applied. You'll need to
run 'isconf -r up' to continue past this point -- we cannot assume
that the later changes will work without the reboot.
- -q
-
Quiet -- don't show verbose output.
- -V
-
Version -- show ISconf version.
SUBCOMMANDS
Subcommands are often called 'verbs' in ISconf documentation and
usage. They can be grouped into the following categories:
Changing disk state
lock, unlock, snap, exec, reboot, ci, up
Branch management
fork, migrate
Daemon management
start, stop, restart
The following is a detailed description of all subcommands, in
alphabetical order. In these descriptions, the origin host is the
host where a user executes lock, snap, exec, reboot,
or ci, and the target host is where a user executes
up(date).
- ci
-
Check in local changes, such as snap or exec, and release
branch lock.
Run on origin.
- exec command args ...
-
Execute an arbitrary command. Causes the command to be executed
immediately on the local machine, and queued for
execution on target machines after ci.
Example:
isconf lock "permanently shut down apache"
isconf exec /etc/rc2.d/S85apache stop
isconf exec rm /etc/rc2.d/S85apache
isconf ci
If you want to embed shell redirects or pipes in the exec
arguments, then you'll need to wrap the arguments in a shell
invocation. For example, this *won't* do what you want -- it will
only change /etc/motd on the origin machine:
isconf exec echo web server down > /etc/motd
Here's what you really want instead:
isconf exec sh -c "echo web server down > /etc/motd"
- fork newbranch
-
Create a new branch from the current branch, and migrate the local
host onto the new branch. The
original branch is the "parent" branch, and the new branch is the
"child" branch.
If host A executes a fork, then it is the only host moved to
the
branch; hosts B and C do not change. If you want B or C to move
to the new branch as well, see migrate.
Low-level implementation: Since a journal describes the details
of a branch, then a fork essentially just copies the entire
journal contents from the parent branch into a new journal named
after the child branch, then runs the migrate code path.
- lock message ...
-
Lock the branch. Required before snap, exec, reboot,
or ci, and recommended before fork and migrate. The
message will be recorded in the journal for each subsequent
transaction until the next ci.
- migrate branchname
-
Migrates the local host onto a new branch. In human language
this means the host is going to change roles.
Switching a host to a new branch is only possible if the new
branch is a child of the host's old branch, and if there have been
no transactions executed on the host since the new branch was
forked off -- in other words, the new branch's journal content
needs to be a contiguous superset of the old branch's journal
content. If these conditions aren't met, migrate will
exit with a non-zero return code.
- reboot
-
Reboots the machine. Before reboot, adds a journal entry which
will cause all target machines on this branch to reboot at the same
point in their build. For example, this is what you might do to
install and boot a new kernel:
isconf lock "upgrade to 2.6.20"
isconf snap kernel-2.6.20-1.i686.rpm
isconf exec rpm -ivh kernel-2.6.20-1.i686.rpm
isconf reboot
isconf ci
# on other machines
isconf -r up
Apply thought when using this verb; 'isconf up' (without the -r)
won't finish if there is a 'reboot' pending as the next action in
the journal. You need 'isconf -r up' -- and you don't want to put
that in crontab, unless you really don't mind your
machines rebooting at that time. See the -r flag for details.
Never say 'isconf exec reboot' -- that will only reboot the local
machine, and will never create any sort of journal entry; the
reboot kills isconf itself before the journal entry can be made.
Always say 'isconf reboot' instead.
By default, ISconf runs 'shutdown -r now' to cause the reboot. If
you want or need to use a different command, see the IS_REBOOT_CMD
environment variable below.
- restart
-
Restart the daemon. Equivalent to a stop followed by a
start.
- snap filename
-
Snapshot a file for install on target machines. Preserves the current
contents, permissions, and mode bits of the file.
After ci, any target host on the same branch can
run 'isconf up', which will cause ISconf to install the file on
the target host.
- start
-
- stop
-
Start or stop the daemon.
- unlock
-
Break the lock on the local branch.
Use with great care.
This reverses the effect of a
lock, invalidates the work stored in journal.wip on
the locking machine, and will likely require the person who set
the lock to discard their work and/or rebuild the machine where
the lock was made.
Generally speaking, it's better to pick up the telephone and call
the person who set the lock, asking them politely to finish
whatever they were doing and check it in, rather than use this
subcommand.
- up
-
Update. Causes the isconf daemon to attempt execution of any new
transactions in the
journal. Errors and messages are copied to stderr and stdout of
isconf as well as to syslog. Exits with a non-zero return
code in case of error.
If used with -r, and if a pending reboot entry is
encountered in the journal, then the host will reboot.
ENVIRONMENT
ISconf behavior is controlled predominantly by environment
variables. These can be set and exported before starting or
restarting the isconf daemon, or can be set in configuration
files, usually main.cf. Any
variables set in the environment will be overridden by those set
in the configuration file.
- IS_DOMAIN
-
ISconf domain name -- more or less equivalent to an AFS cell name
or a Kerberos realm name; all of the machines sharing this name
will share in the distributed cache that makes up the ISconf
repository. Normally you'd want all of the machines in a given
legal entity -- the same corporation, for instance, to use the
same domain name. This is an arbitrary string, but by convention it
is usually based on the DNS domain name.
Rather than set this in an environment variable, you're better off
populating the /var/is/conf/domain file, below.
See the domain glossary entry.
- IS_HOME
-
The base directory which ISconf uses for data storage. Defaults
to /var/is.
- IS_HMAC_KEYS
-
The name of a file which contains a list of HMAC keys.
See the hmac_keys file below.
- IS_HTTP_PORT
-
The port number which each ISconf HTTP server listens on. Used only for
file fetches between machines, and is likely to be deprecated in a
near-future release. Defaults to port 65028.
- IS_NETS
-
The name of a file which contains a list of broadcast and/or host
addresses which ISconf should advertize file updates to. See
nets file below. Likely to change in a future release.
- IS_NOBROADCAST
-
Boolean. If set, do not send UDP broadcast packets; only send
UDP point-to-point packets to the addresses listed in **nets*
file. Likely to change in a future release.
- IS_PORT
-
The port number which ISconf daemons use to communicate between each
other. Right now this is UDP only, but TCP will be added in
4.2.7, and UDP is likely to be deprecated. Defaults to port 65027.
- IS_REBOOT_CMD
-
The command which ISconf uses to reboot the machine in response to
an 'isconf reboot' request. Defaults to "shutdown -r now".
FILES
- /etc/is/main.cf
-
Top-level configuration file for ISconf. See CONFIGURATION for
details. As of this writing, ISconf does not distribute this file for
you. In earlier versions, we used to simply rsync it from a
central server at the beginning of each execution. In a near-future
version, look for it to be managed by the distributed cache.
- /var/is
-
See IS_HOME above.
- /var/is/conf/domain
-
Single-line file, newline optional, containing only the string
which is to be used for the ISconf domain name. See IS_DOMAIN
above.
- hmac_keys
-
HMAC key list, one key per line. See IS_HMAC_KEYS. If this
file exists and contains properly-formatted keys, then RFC 2104 HMAC
authentication is enabled; wire messages which are not properly
authenticated will be ignored.
The first key in the list is used for generating authentication
codes on all outgoing messages, and is the first key tried when
authenticating inbound messages. If the first key fails to
authenticate an inbound message, and if more than one key is
listed in the file, then the second and subsequent keys are tried,
in order. This mechanism enables you to update the primary key
while preserving backward compatibility with older keys, allowing
for a transition period.
When updating keys, it's a good idea to first add the new key as a
secondary key to the hmac_keys file, and deploy that to all
machines. Once you're sure that all of your machines (and
install images) have the new key, then move the new key up to the
primary position in the file, leaving any old key(s) in the file
as secondaries, then deploy that. Finally, once you're again sure
that all of your machines (and install images) are using the
new primary key, then (and only then) should you think about
retiring any old key(s).
Take care when deploying this file for the first time on hosts
which are already running ISconf; those ISconf daemons which get
it first will refuse to listen to any which don't yet have the
file; this will prevent further deployment if you're using ISconf
to deploy the file. To prevent this from happening, you can
include the special key +ANY+ at the end of the file. If
encountered in the file, this special key disables HMAC
authentication of received messages, but does not prevent
generation of authentication codes on transmitted messages. What
you want to do is deploy the file with one or more real keys
listed in it, followed by the +ANY+ key. The file might look
like this when first deployed:
someauthenticationkey
+ANY+
As you deploy the above file, hosts will begin sending
authenticated messages to each other using the
someauthenticationkey key, but will ignore the authentication
codes they receive. Once you are sure that all of your hosts have
that copy of the file, then deploy the file again, this time with
the +ANY+ key removed. This will cause hosts to begin
checking received authentication codes against
someauthenticationkey, while discarding any messages not
properly authenticated.
For best security, each key should be about 20 bytes long; see RFC
2104. Keys can can include any ASCII character except space,
newline, or the pound (hash) (#) sign. Lines beginning with pound
signs are comments. Blank lines are ignored. If no keys are
found in the file, then the entire file is ignored, and HMAC
authentication is disabled.
ISconf checks for new versions of this file every 10 seconds when
it is processing inbound packets -- there is no need to restart
the ISconf daemon.
The hash function used internally is SHA-1, with Python's
hmac module doing the real work.
You should ensure that this file is only readable by root.
This entire mechanism is likely to change and/or be replaced by
PGP key signatures in a future release.
- nets
-
Network broadcast list -- see IS_NETS above. See t/nets for
an example. Likely to change.
CONFIGURATION
ISconf uses environment variables for its configuration, and these
variables are in turn passed on to any executables ISconf calls -- see
ENVIRONMENT. These environment variables can be set in
/etc/is/main.cf. The format of this file is similar to a makefile,
but whitespace is whitespace -- tabs aren't required. Each stanza
looks like this:
target: optional includes
var1 = value
var2 = value
The 'target' string above is matched against the hostname; case is
significant. If it contains dots, it's matched against the FQDN. If
it starts with a caret (^) it is a regex matched against the FQDN.
The first matching target is the only one used, however the special
target named 'DEFAULT' is always matched. Variables set in DEFAULT,
earlier includes, or earlier in the same stanza are overridden by
identically-named variables which appear later in matched stanzas.
Comments are any text following a hash (#) on any line.
You can see the resulting environment by using the -D flag.
Here's an example /etc/is/main.cf:
DEFAULT:
NTPSERVERS = ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov
IS_NETS=/etc/is/nets
NET1:
GATEWAY = 10.10.1.1
NET2:
GATEWAY = 10.10.2.1
# The host 'scotty' will end up with these environment variables
# set during the ISconf run:
#
# NTPSERVERS="ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov"
# GATEWAY=10.10.1.1
# building=23
# floor=2
# IS_NETS=/etc/is/nets.scotty
#
scotty: NET1
building = this value is ignored
building = 23
floor = 2
IS_NETS=/etc/is/nets.scotty
# kirk will get:
#
# NTPSERVERS="ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov"
# IS_NETS=/etc/is/nets
# GATEWAY = 10.10.2.1
# building=52
# floor=12
#
kirk: NET2
building = 52
floor = 12
LOST:
building = unknown
floor = unknown
# any other host in example.com:
#
# NTPSERVERS="ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov"
# IS_NETS=/etc/is/nets
# building=unknown
# floor=unknown
# GATEWAY=10.2.3.1
#
^.*\.example\.com: LOST
GATEWAY = 10.2.3.1
# any other host not in example.com:
#
# NTPSERVERS="ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov"
# IS_NETS=/etc/is/nets
# building=unknown
# floor=unknown
# GATEWAY=10.0.0.1
#
^.*: LOST
GATEWAY = 10.0.0.1
GLOSSARY
- base image
-
An image which was created directly from vendor CD or another
external source, and which contains an empty journal. Normally as
simple as possible, with only a management tool (such as ISconf)
and its prerequisites added. See image glossary entry.
You will usually create only one base image per platform -- see
one-base. You will create at least one checkpoint image per
branch.
- branch
-
Host model or type. Similar usage as in software version control.
A different branch is normally used for each set of hosts that
need their own disk image and that do wildly different or
conflicting things. For example, a DNS server and a database
server would tend to be on different branches.
A branch is described by the sequence of transactions in a
journal. A new branch is created by forking an existing branch,
then creating a checkpoint image.
Branch names must match this regular expression:
\w+[-\w\.]+
See also class.
For more discussion of what branches are, and how they contrast
with domains, see
http://trac.t7a.org/isconf/wiki/DomainsVsBranches.
- categories of data
-
There appear to be three categories of data or executables on the
disk of a typical UNIX machine:
- evolvable data -- this includes binaries and executables
scripts, as well as most configuration files (see glossary entry)
- environmental data -- that set of configuration data which
must match external conditions (see glossary entry)
- user or business data
- checkpoint image
-
An offline copy of the disk image of a given branch at a given
revision, used to differentiate branches and for speedier
installs. A checkpoint image is made by installing a host from an
ancestor checkpoint or base image, allowing its branch's journal
entries to execute, then capturing the resulting disk content.
See image glossary entry.
- class
-
This is an anti-definition: the word "class" should not be used to
describe anything related to deterministic host management. It
brings with it misconceptions, such as "hosts can be subclassed",
"changes in the parent class can be automatically and safely
propagated to subclasses", and so on; most of these misconceptions
imply that editing history is a safe thing to do.
- congruent
-
Remaining in compliance with a fully-descriptive specification.
If a configuration management tool is congruent, the machines it
manages will remain in lock-step with the desired state. This
makes it easier to maintain a representative test environment, and
allows for more predictable disaster recovery. ISconf is
congruent. Also see the convergent glossary entry, and:
http://www.infrastructures.org/papers/turing/turing.html#methods/congruence
- convergent
-
Tending to converge towards a desired state. If a configuration
management tool is convergent, the machines it manages will trend
towards each other in disk state, but for practical reasons they
will rarely reach congruence. It will be difficult to maintain a
representative test environment, and changes will tend to be made
first, and tested first, in production. Predictable disaster
recovery will remain elusive. Also see the congruent glossary
entry. For more in-depth information about convergence, see:
http://www.infrastructures.org/papers/turing/turing.html#methods/convergence
- domain
-
An ISconf domain name is more or less equivalent to a NIS domain
name, an AFS cell name, or a Kerberos realm name. This name is an
arbitrary string, but by convention it is usually based on the DNS
domain name.
ISconf domains are a security mechanism, primarily in regards to
information hiding. All of the machines sharing the same ISconf
domain name will share the same distributed cache, so root users
on all of these machines will be able to read the contents of the
cache. Likewise, machines that are in different domains will not
share the same cache, so root users of these machines will not
have access to the cache contents of the other domain. This
becomes important if there is any proprietary or sensitive
information stored in the ISconf cache, for example via a 'snap'
or 'exec' command.
Normally you'd want all of the machines in a given legal entity --
the same corporation, for instance, to use the same domain name.
For example, a small company using ISconf might use an ISconf
domain name of 'example.com' on all of their machines. A larger
company might have multiple divisions or subsidiaries and legal or
security reasons for segregating machines. The large campany
might put most of their machines in 'example.com', but for
regulatory or security reasons might isolate a subsidiary into
'foo.example.com', and might put their bastion and firewall
machines into 'security.example.com'. Note again that there
doens't need to be a 'security.example.com' DNS domain for this to
work.
The idea of ISconf domains is to completely isolate legal entities
from each other when sharing the same net. Machines in different
domains refuse to cache each other's data, answer each other's
queries, and so on. Domains really come into play in the TCP
crypto and user auth code (ISconf 4.3 and later), where each
domain has its own PGP keyring; its own database of hosts and
users, and all of the wire traffic is encrypted accordingly.
Establishing two machines in different domains means "I don't want
these machines to ever cooperate at all. I will never merge their
branches, I don't want them to be able to share or see each
other's packages, cache space, or wire traffic."
For more discussion of what domains are, and how they contrast
with branches, see
http://trac.t7a.org/isconf/wiki/DomainsVsBranches.
Domain names must match this regular expression:
\w+[-\w\.]+
- editing history
-
"Editing history" is what happens when you build a machine based
on a set of instructions, then alter the instructions that you
used to build the machine. Once you've done this, there is no
mathematically provable way to ensure that your new
instructions will still build the same machine, short of building
the new machine and then comparing the entire disk content to the
old one.
In ISconf, editing history would mean editing the journal file
itself -- while there's nothing (currently) which would stop you
from doing that, and while the resulting file would be dutifully
distributed and applied to the target machines, it's highly
discouraged and may be a lot more difficult to do in the future,
as we add things like digital signatures and checksums to the mix.
Editing history can create major outages when:
- you're trying to deploy changes which worked in QA (using the
old instructions) to production (using the new instructions)
- you're trying to execute a disaster recovery, or even a single
host rebuild, and you no longer have the old disk content available
- you're trying to add a new server to an existing farm and don't
have time to resort to backups or run rsync across both disks
- environmental data
-
Configuration data (usually files) whose content is predominantly
influenced by external business, political, procedural, or
economic factors, and whose function is critical to the integrity
of business data or to the operation of ISconf. Examples include
files containing IP addresses, domain names, and other information
which, if out of date, will break the ability of ISconf to
continue journal replay. See also categories of data.
This version of ISconf does not attempt to manage environmental
data natively. In earlier versions of ISconf, we would simply
rsync environmental configuration files (such as /etc/hosts and
resolv.conf) from a per-environment server at the beginning of
each execution. We weren't real happy with the limited
flexibility that gave us, but this method might work for you. If
you want to do this, either modify or wrap the main isconf script
to call rsync, and then set up an rsync server somewhere. See
http://www.infrastructures.org/bootstrap/gold.shtml for more
details. (If demand is there, we can add an executable hook that
makes this easier.)
If a file meets the description of evolvable data, then it is
not environmental data, and it should be managed via a simple
isconf snap, rather than the means described below. For
instance, /etc/passwd and /etc/resolv.conf are usually
environmental, while /etc/services and /etc/inittab are much more
influenced by local applications, and in most cases should be
managed via isconf snap.
A better way to manage environmental data is to store the raw data
(or pathnames pointing to the raw data) in /etc/is/main.cf and
then generate the configuration files during boot and/or cron.
(Look for an isconf verb in a near-future release which lets you
export the content of /etc/is/main.cf as a shell script. In the
meantime you can do this the other way around -- call ISconf from
a wrapper script which sets up the environment you want.)
Your goal should be to keep the set of environmental data as small
as possible, via architectural decisions in both infrastructure
and applications.
You need to be able to examine each bit of environmental data to
try to predict its behavior during deployment. Your ability to do
this will always be flawed -- you cannot possibly imagine all of
the permutations that might be encountered during future
operations. Keeping the environmental data set small reduces your
workload and the risk caused by a flawed analysis.
You need to be able to test each bit of environmental data after
deployment. Any change in environmental data, by definition,
cannot be tested anywhere except in its native environment. If
this environment is production, then we can only test these
changes after deploying them to production -- this is bad, but
unless you have completely duplicate networks, down to the details
of IP addresses and hostnames, there's not much you can do about
it. Keeping the environmental dataset small reduces the
variations between environments; ideally, IP addresses and/or
hostnames might be the only differences you need to analyze and
test for.
The classic case of what not to do involves hardcoding IP
addresses in executables -- we all know this is bad, but here's
why: Embedding an IP address in a larger executable taints the
entire executable, requiring that we manage the whole file as
environmental data. It's better to move that IP address to a
separate configuration file, to shrink the size of the
environmental data set.
Executables aren't the only thing that can be tainted. Embedding
an IP address into a larger configuration file of
non-environmental data also taints the rest of the configuration
file. If you have ever generated configuration files by merging
IP addresses into templates of other data, then you have
experienced this case. By using templates, you prevent taint
spread.
Taken to an extreme, tainting of files and packages can cause an
explosion in the size of the environmental dataset, and an
explosion of risk, to the point where all data on disk must be
considered to be environmental, and all changes must be considered
untested prior to production rollout. If you find yourself in
this situation, your best bet might be to go with a convergent
tool such as cfengine; you'll lose congruence, though, until
you're able to fix the original problems and rebuild your
machines. See convergent and congruent.
- evolvable data
-
Data which can be managed via journal replay. This includes
successive versions of executables, packages, kernels, patches,
and configuration data which is not dependent on external
environment. See also environmental data.
Examples of evolvable data include /bin/ls, /etc/mailcap, and
libc.
It's usually safe to assume that all data is evolvable until
proven otherwise. It's relatively easy to later begin managing a
particular data item as environmental data if it proves necessary.
- image
-
The bits placed on disk during installation; this will be either
the base image or a checkpoint image taken from a child branch.
This version of ISconf does not do image management (it's in the
release plan). Images need to be managed and installed using a
certain category of host install tool. See PREREQUISITES.
- one-base
-
One-base is an axiom of ISconf (and probably deterministic host
management in general) -- it says that a host of any branch can be
created by installing the base image for that platform and then
replaying that branch's journal. This means you may only need one
base image for any given platform -- starting from there you can
use journal replay to morph the image into any other image which
is described by a branch's journal.
"One base to start them all, one base to gild them, one base to
boot them all and in the darkness build them."
Sorry.
- journal
-
The transaction log of all changes made to a branch, starting from
the base image. Used for replay on other hosts of the same
branch.
INTERNALS
The basic algorithm that ISconf uses is roughly:
- Journal the changes that are going to be made.
- Preserve all entries in the journal over the lifetime of the
infrastructure.
- Only append entries to the journal -- never delete, never
alter or re-order.
- Apply changes to one or more test machines by reading the
journal.
- Maintain a history of changes that have been applied to each
host. The master copy of this history should reside on the
local disk of that host, and must be destroyed if the disk
becomes corrupt or the host is rebuilt.
- Later, apply the same changes in the same order on other
machines, by reading the same journal, using the same code path,
consulting their local histories to see what is yet to be done.
- (This bullet point not yet implemented in 4.2.X.)
Keep track of those files which a human explicitly
says do not need to be versioned, and in those cases (only),
refer only to the last journal entry for those files. An
example is resolv.conf; in this case, you only want the most
recent version to be applied, in order to ensure the host will
function at all. (But consider new, edited,
and deleted configuration files; these three operations actually
could make use of distinct handling.)
BUGS/RESTRICTIONS
See http://trac.t7a.org/isconf/report for bugs, and see notes for a
given release at http://trac.t7a.org/isconf/roadmap?show=all.
This version of ISconf was assembled with the features most requested
by early adopters, and does not pretend to be secure or scalable. It
is intended for use in small deployments, trusted internal networks,
and evaluation. If you do install this version in a production
environment, you should plan to upgrade as newer versions become
available.
Having said that, we do use this version of ISconf ourselves.
Because we'll need to change wire protocols to add in the security
bits, the next upgrade is likely to be a tricky procedure; you may
need to keep an old machine around for a while as a cache server
until you're sure you've upgraded all of your existing machines and
updated your checkpoint images. Keep your rollouts small for now.
Known flaws in this release include:
- Files are transported via cleartext HTTP. Any file checked into
ISconf is visible by anyone with a web browser. HTTP in general is
a poor protocol for ISconf, is being used at the suggestion of an
early adopter, and we plan to deprecate it as soon as we can get the
consensus that it's the wrong direction.
- Control messages are transported via UDP and/or UDP broadcast, for
expediency. This protocol is going to be deprecated in favor of a
TCP mesh which will do both control messages and file transport.
- No authentication or encryption is performed for any operation on
the wire. A properly-formatted packet can be forged to insert
unsafe content into the journal for an entire branch. We plan to
add HMAC soonest, and later PGP signatures and either PGP or SSL
transport encryption as part of the TCP mesh layer.
- Each machine stores a complete copy of all files in the cache. If
you snap hundreds of megabytes of files, you will use hundreds
of megabytes of disk space on each node. Once the TCP mesh is up,
we'll have a protocol capable of quorum counting. This will let us
starve the cache on ordinary nodes, while allowing designated
"master" nodes to store a copy of everything -- the cache on these
can then be backed up for safe-keeping as well.
- We don't pretend to handle a certain subset of configuration files
right now -- see the environmental data glossary entry.
- Logging is rudimentary right now; everything gets dumped into
various files in /tmp. This all needs to be migrated to syslog
and/or files in var log.
SEE ALSO
Most ISconf developers and users can be found on the infrastructures
mailing list at
http://mailman.terraluna.org/mailman/listinfo/infrastructures
AUTHOR
Steve Traugott -- http://www.stevegt.com