Subject: [ripe-ttraffic #3066] TestTraffic Tape Handling Model
Date: Wed, 26 May 1999 17:34:00 +0200
From: Rene Wilhelm 

[ao-backups: this is intended as FYI, but do feel free to comment]


In the Test-Traffic (TT) Project we collect large amounts of data
which will be stored on tape. A DAT autochanger ("jukebox") 
connected to ginkgo.ripe.net should ease semi-automatic data
storage and retrieval. This note presents the tape handling model
and associated procedures/scripts, which I'm about to setup.
Comments are welcome.

-- Rene

================================
TestTraffic Tape Handling Model
================================

requirements
------------

- need to be able to uniquely identify a tape both online from
  within a program and offline with human eyesight.
  reasons:
    - moving tapes from/to an archive  (e.g. the safe downstairs)
    - integrity check by software to ensure the right tape
      is loaded from the jukebox for data storage or retrieval
    - cope with situation where due to handling error all tapes
      fall out of the magazine

- contents of all tapes (in jukebox or not) must be accessible online
  reason:
    - much faster to find the tape which holds data file(s) 
      that need retrieval.


implementation
--------------

- start every tape with a short file that uniquely identifies it.
  Basically, a one-liner like: "RIPE Test-Traffic Project tape0001"
  We'll call this file the "tape label". Tapes are expected to
  be filled in several cycles, so it will be hard to know in advance
  which files will end up on a particulartape; an on-line index at
  the start (like the one on Ops' backup tapes) is not feasible.

- when a new magazine with tapes is loaded in the jukebox one 
  has to run the 'index-jukebox' program which for every tape:
    - checks if a label is present
    - if not, checks if the tape is empty ("dd after rewind yields
      I/O read error" seems to be the only way to test this?)
    - if tape is empty, labels it automatically, using the 'next'
      tape number (a file on the system will store the last used
      number)
  when the scan of all tapes is completed, an index file will be
  created wich links the jukebox slots to the tape labels.

- the 'label-tape' tool creates a directory 'tapeXXXX' where XXXX
  is the unique tape number. Each tape will hold several
  (probably 'tar') files, the indexes of which will be stored in
  the tapeXXXX directory upon succesfull writing.  When it comes
  to appending new data to a tape, the contents of the tapeXXXX
  directory are taken as authoratative; i.e. if the directory has
  index files 'file001' - 'file005', the tools will, after rewind,
  skip six files (label + 5) before starting to write data.

- the 'load-tape' tool will load the tape with the specified
  label. It consults the 'jukebox-index' file to see if the tape
  is present in the loaded magazine. 'load-tape' will be used by
  higher level tools that locate a file in the online indexes, get
  the associated tape label, load the tape and retrieve the file.


coding
------

writing the scripts will be relatively straightforward (~ 1-2 days),
however one should take care of proper exception handling: consider 
what can go wrong, then add code to prevent such situations from
causing trouble. This may increase the time needed to get the
scripts 99.9% OK  (there's always a chance you overlook something,
so I won't claim scripts are 100% OK after initial coding&testing :-)


OPEN ISSUE
----------

- physically labeling the tapes

  tape numbers are assigned automatically, but when tapes are archived
  outside the jukebox, a physical label ('sticker') is needed
  for human identification of tapes. This process is very much
  open to errors. Should it be based on a complete tape magazine
  or individually tapes? Also the question is when to do it: ASAP after
  putting in new tapes? (sequence being: insert new magazine, wait for
  all tapes to be labeled, unload magazine, put stickers on tapes,
  reload magazine). Or only when tapes are exchanged?

===========================================================================
X-Request-Action: wilhelm notified by gerard@ripe.net.
X-Request-Acted: Thu May 27  9:20:09 1999 (927789609)

Received: from postman.ripe.net (postman.ripe.net [193.0.0.199])
        by office.ripe.net (8.8.8/8.8.5) with SMTP id JAA03689
        for ; Thu, 27 May 1999 09:20:08 +0200 (CEST)
Received: (qmail 26387 invoked by uid 0); 27 May 1999 07:20:07 -0000
Received: from birch.ripe.net (193.0.1.96)
        by postman.ripe.net with SMTP; 27 May 1999 07:20:07 -0000
Received: (from gerard@localhost)
        by birch.ripe.net (8.8.8/8.8.8) id JAA00749;
        Thu, 27 May 1999 09:20:01 +0200 (CEST)
Date: Thu, 27 May 1999 09:20:01 +0200 (CEST)
From: Gerard Leurs 
Message-Id: <199905270720.JAA00749@birch.ripe.net>
To: tt-ops@ripe.net, wilhelm@ripe.net
Subject: Re: [ripe-ttraffic #3066] TestTraffic Tape Handling Model
Cc: ao-backups@ripe.net

Hi Rene.
A nice and very handy overview of tape-handling. I think this
should not only be applied to the test-traffic-jukebox, but also
to the ops-jukebox.

Some remarks and questions.
	> contents of all tapes (in jukebox or not) must be accessible
	> online.
True and handy. It will however take lots of diskspace [but tt
has lots of that].
This is one of the things that are not implemented in the ops-
backups, because of lack of diskspace. Even if I/we would use
rotating logfiles (only for the incrementals (level 8 + 9))
it would take more than 10MB per day (estimated guess).

	> Tapes are expected to be filled in several cycles, so it will
	> be hard to know in advance which files will end up on a particular
	> tape. An on-line index at the start is not feasible.
But you already have these indexfiles (per tape and per write-session-
per-tape [tapeXXX/fileYYY]). They are your guideline.

	> physically labeling the tapes.
I would strongly advise to label them before inserting them in the
jukebox. This will keep you aware of which tapelabels are in use.
And it will prevend you from asking "are these old/used tapes [and
just not labeled] or are these newly inserted tapes [partially
written]". When you see a label you can online check if there were
any write-sessions to it (by checking tapeXXX).

Q. Are your tapejobs dramatically different from the ops-tapejobs ?
If they need minor adjustements I would really like it if we keep
just one file for it. Maybe we can build in a check "on which host
does this run" !? But on the other end if ginkgo will end up as a
stand-alone host : no amd => no /ncc/bsdmgr/dump accessable.

	Gerard.