V10/cmd/backup.old/new.ms

.FP palatino
.TL
Backup \(em the second release
.AU
Andrew Hume
.AI
.MH
.I research!andrew
.AB
The second release of the backup software is quite different from the first.
This report provides details of the changes and motivations for the changes.
.AE
.SH
Introduction
.PP
The backup system described here is an incremental system;
small files that have been changed recently are saved at a central
system (the \f2backup machine\fP).
Explicit commands implement recovering of files and searching for file versions.
.SH
Architecture
.PP
The main elements in the architecture are the backup machine
and the client systems.
The backup machine has a database for managing file versions and
two areas for file storage; a holding area (where files live for a few days)
and a permanent storage area.
The latter includes optical disk and magnetic disk that is periodically
copied to tape.
The backup system does not care; it is only responsible for copying
files into the permanent storage area.
The process is described as two (nearly) independent parts:
copying files from clients into the holding area,
and transfering files from the holding area into the permanent storage area.
.SH
Client File to Holding Area
.PP
This processing is done mostly on the client and typically
runs (via \f2crontab\fP) every morning around 2-3am.
There are three steps;
generating candidates for backup,
checking to see if they have been backed up already,
and copying the necessary files into the holding area on the backup machine.
.PP
Candidates for backup are generated by the file \f(CW/usr/lib/backup/sel\fP.
A sample is shown below.
Most filenames are generated by descending user's login directories
(or any other directories of interest) and pruning out old files,
files that are too big,
or files that are detectable as garbage.
Certain precious files (for example, \f(CW/etc/passwd\fP)
are always candidates.
The size of the file is carried along with the filename to cope with files
that grow between name generation and file copy.
.DS
.ft CW
/usr/lib/backup/fcheck 512 7 /usr/* | file -f /dev/stdin | \e
sed -e '/\e.o:	/d
/a\e.out:	/d
/\e/core:	/d
/\e/foo:	/d
/\e/usr\e/guest\e//d
/\e/usr\e/adm\e/messages/d
/\e/usr\e/adm\e/trimlog/d
/\e/usr\e/adm\e/wtmp/d
/\e/usr\e/spool\e/[^m][^a][^i]/d
/:	troff/d
s/:	.*//'
cat <<'EOF'
/etc/passwd
/etc/ttys
/etc/crontab
/etc/rc
/etc/fstab
/etc/aculist
/etc/candest
/etc/sysinfo
/usr/lib/backup.sel
EOF
.ft P
.DE
.CW /usr/lib/backup/fcheck
takes a maximum size (in 1024 byte blocks),
a maximum age (days before now) and a list of names.
Symbolic links are followed only if explicitly mentioned;
otherwise, they are saved as a symbolic link.
.PP
The list of filenames is augmented by the date
(\f(CWctime\fP) and size of each file.
This size is carried along throughout the following procedure;
it represents the maximum number of bytes to be saved.
The list is run against the database by the program
.CW /usr/lib/backup/notdone
which emits only the filenames that should be backed up.
These names are then fed to
.CW /usr/lib/backup/bpush
which places the corresponding files in (more or less) arbitrary files
in the holding area on the backup machine.
In the process, the files become \f2backup copies\fP because of a 1KB header
prepended to the file.
The header contains
.DS
.nf
version number
the file's inode (\f(CWstruct stat\fP)
checksum
owner UID (in ASCII)
owner GID (in ASCII)
original pathname
backup copy name
.DE
All the files in the holding area are owned by the special user \f2daemon\fP.
While copying is happening, files are mode zero;
when the copy has completed successfully, the mode is set to 600.
The backup copy name is initialised to the empty string.
The holding area is defined as a set of directories contained in a file.
.PP
Note that none of this process changes the backup database
(this happens later).
The checking performed by
.CW notdone
is conservative; errors simply mean a little unnecessary copying.
This is deliberate;
most problems with the older backup system involved getting files from
the clients to the backup machine.
It gracefully handles breakdowns in the inter-machine connections,
files disappearing between nomination and copying,
hardware problems in the network (use of checksums)
and permission problems (by using one special user).
The most imporant result is that the following steps,
which involve changing the database,
all occur on the backup machine.
.SH
Holding Area to Permanent Backup
.PP
The transfer from holding area to permanent backup takes three passes:
assigning backup names,
doing the backup copies,
and recording the copies in the database.
.PP
Assigning backup names is done by
.CW /usr/lib/backup/sweep
which acts on a list of filenames one at a time.
There are three cases for the mode of a file.
A mode of 000 means that the file is being received.
If the file is suspiciously old
(it has not been modified in six hours),
it is presumed to be a victim of networking and is deleted.
A mode of 400 means that the file has already been given a backup name.
If the file is suspiciously old
(it has not been modified in eight days),
it is presumed to be a dreg and is deleted.
A mode of 600 means the file is waiting for a backup name.
If this copy is still needed
(it may have been superseded while waiting),
a backup name is assigned and the file mode changed to 400.
Otherwise the file is simply deleted.
Backup names are generated by the
.I printf (3)
format
.CW v/v%d/%d .
The directories are constrained to be at most 10,000K bytes
for easy management.
.PP
The second pass involves copying the appropriate files (mode 400)
from the holding area to the permanent area.
This step is arbitrary; each backup machine could do something
different.
.PP
The third pass,
.CW /usr/lib/backup/complete ,
takes files and records the backup name in the database.
This naturally should only be done if the previous step worked.
.SH
An Example
.PP
This section describes how a backup machine might be set up.
We will assume the the database etc. lives in
.CW /backup
and the holding area is in
.CW /backup/pen .
.IP \(em
All the client machines (and the backup machine too probably)
need a line in
.CW /etc/crontab
doing the transfer to holding area.
To ease the load on network and databases, the times should be staggered by
a few (say five) minutes.
Error output from this step typically comes from two sources:
files that disappear between the selection process and transfer
(ignore these errors), and problems with the holding area
(running out of space).
The latter errors are more serious; after fixing the problem on the
backup machine, the transfer should be run again by
.CW "/usr/lib/backup/bpush < /usr/adm/bkp.Wed"
(assuming the attempt that failed occured on Wednesday).
.IP \(em
Sometime after all the transfers to the holding area have been made
(no harm is done if the transfers are not complete),
backup names should be assigned by
.CW sweep .
.DS
.CW
cd /backup/pen
ls | /usr/lib/backup/sweep
.DE
.IP \(em
The
.I copies
command reads filenames from standard input and
prints tuples on its standard output.
Each line has the filename and its backup name separated by a tab.
Thus the copies could be made by
.DS
.CW
ls | /usr/lib/backup/copies > /tmp/x
sed 's:^:/usr/lib/backup/bcp -r :' /tmp/x | sh
.DE
.IP \(em
Finally, the backup names are recorded by
.CW complete .
.DS
.CW
cut -f2 /tmp/x | /usr/lib/backup/complete
.DE
.SH
Security and Reliability Issues
.PP
The main security issues are access to the backup copies and what programs
need to be setuid.
The backup copies in the holding area are only readable by \f2daemon\fP.
The permanent copies have only read permissions set by these rules
(owner and group are determined by string rather than numeric id):
.IP
\(em\^if the owner exists on the backup machine and had read permission on the
original file, set the owner and owner read bit.
Otherwise, set the owner to \f2daemon\fP and clear the owner read bit.
.IP
\(em\^if the group exists on the backup machine and had read permission on the
original file, set the group and group read bit.
Otherwise, set the group to \f20\fP and clear the group read bit.
.IP
\(em\^copy the other read bit from the original file.
.PP
The intent is that the normal file system permissions will preserve the
original permissions as much as possible.
.PP
This scheme requires two programs be run as root.
The first (run on the client) is the selection process.
It is simply searching the file tree to generate filenames.
The second program (run on the backup machine)
copies the holding area files into permanent storage.
It needs to be root so the files end up with the right mode, owner and group.
The old backup system used the selection process (running on the client)
and the network file system to set the mode, owner and group.
This proved unworkable in practise as the network file system is
fairly unreliable with respect to administrating permissions.
The other problem is that the semantics of creating files have been changed
such that the group of a new file is that of the directory it lives in,
rather than that of the process that created it.
.PP
This latter process seems to be the only security hole as it actually
creates new files.
To prevent such things as creating a new \f(CW/etc/passwd\fP,
this process will only create files with a particular style of name
(\f(CW.*/v[0-9]+/[0-9]+\fP)
and unlinks the file before creating to break any links.