cleanup
script. The requirements for this are:
cleanup
is this number or 5% of the total available space,
whichever is larger.
The requirements are met by progressively compressing and removing old files (where old refers to the time of data-taking not the time a file arrived at or was created on the central machine). The main logic of the cleanup script is:
compress files older than Z days purge files older than P days while (freespace < target) AND (Z > minimal-uncompressed-days) do Z = Z - 1 compress files older than Z days determine new freespace done while (freespace < target) AND (P > minimal-keep-days) P = P - 1 purge files older than P days determine new freespace donewhere minimal-uncompressed-days is the number of days for which files should remain stored in uncompressed format and minimal-keep-days is the number of days the files must be kept on disk.
These two parameters form hard boundary conditions for cleanup. If the targetted amount of free space cannot be reached within these conditions, the script returns error code 2.
output and return codes
While traversing the (raw or processed) data hierarchy, cleanup
collects a list of files and directories whose names do not match
the expected patterns for TTM production files; i.e. they do not
start with any of the keywords GENE, PCKB, RCDP, RVEC or SNDP.
Most often these unexpected files are left-overs from a previous
run of the collect_data
program. On Feb 2, 2000, for example, cleanup reported:
Files/directories left after cleanup; check out manually: /ncc/ttpro/raw_data/2000/01/28/old.GENE.tt37.ripe.net.20000128 /ncc/ttpro/raw_data/2000/01/28/old.RCDP.tt37.ripe.net.20000128-000005 /ncc/ttpro/raw_data/2000/01/28/old.SNDP.tt36.ripe.net.20000128 /ncc/ttpro/raw_data/2000/01/29/old.GENE.tt37.ripe.net.20000129 /ncc/ttpro/raw_data/2000/01/29/old.RCDP.tt37.ripe.net.20000129-000005 /ncc/ttpro/raw_data/2000/01/29/old.RCDP.tt37.ripe.net.20000129-010154 /ncc/ttpro/raw_data/2000/01/29/old.RCDP.tt37.ripe.net.20000129-030155 /ncc/ttpro/raw_data/2000/01/29/old.RCDP.tt37.ripe.net.20000129-050153 /ncc/ttpro/raw_data/2000/01/29/old.SNDP.tt36.ripe.net.20000129 /ncc/ttpro/raw_data/2000/01/30/old.GENE.tt37.ripe.net.20000130 /ncc/ttpro/raw_data/2000/01/30/old.RCDP.tt37.ripe.net.20000130-000005 /ncc/ttpro/raw_data/2000/01/30/old.RCDP.tt37.ripe.net.20000130-010154 /ncc/ttpro/raw_data/2000/01/30/old.RCDP.tt37.ripe.net.20000130-030155 /ncc/ttpro/raw_data/2000/01/30/old.RCDP.tt37.ripe.net.20000130-050154 /ncc/ttpro/raw_data/2000/01/30/old.SNDP.tt36.ripe.net.20000130The
manual check
involves:
old.*
test-traffic files can all be purged, but in some problematic
cases (e.g. bad connectivity) the old version might be better.
Also it could happen that by accident other important files or directories
are created within the TTM data hierarchy. Therefore, instead of
blindly removing every file that does not match an expected pattern,
cleanup
leaves it to an intelligent human operator to judge
on that.
Note: if it is decided that all files can be removed,
one can use the xargs
command in combination with cut and paste
to get this done quickly.