Subject: [ripe-ttraffic #3066] TestTraffic Tape Handling Model Date: Wed, 26 May 1999 17:34:00 +0200 From: Rene Wilhelm[ao-backups: this is intended as FYI, but do feel free to comment] In the Test-Traffic (TT) Project we collect large amounts of data which will be stored on tape. A DAT autochanger ("jukebox") connected to ginkgo.ripe.net should ease semi-automatic data storage and retrieval. This note presents the tape handling model and associated procedures/scripts, which I'm about to setup. Comments are welcome. -- Rene ================================ TestTraffic Tape Handling Model ================================ requirements ------------ - need to be able to uniquely identify a tape both online from within a program and offline with human eyesight. reasons: - moving tapes from/to an archive (e.g. the safe downstairs) - integrity check by software to ensure the right tape is loaded from the jukebox for data storage or retrieval - cope with situation where due to handling error all tapes fall out of the magazine - contents of all tapes (in jukebox or not) must be accessible online reason: - much faster to find the tape which holds data file(s) that need retrieval. implementation -------------- - start every tape with a short file that uniquely identifies it. Basically, a one-liner like: "RIPE Test-Traffic Project tape0001" We'll call this file the "tape label". Tapes are expected to be filled in several cycles, so it will be hard to know in advance which files will end up on a particulartape; an on-line index at the start (like the one on Ops' backup tapes) is not feasible. - when a new magazine with tapes is loaded in the jukebox one has to run the 'index-jukebox' program which for every tape: - checks if a label is present - if not, checks if the tape is empty ("dd after rewind yields I/O read error" seems to be the only way to test this?) - if tape is empty, labels it automatically, using the 'next' tape number (a file on the system will store the last used number) when the scan of all tapes is completed, an index file will be created wich links the jukebox slots to the tape labels. - the 'label-tape' tool creates a directory 'tapeXXXX' where XXXX is the unique tape number. Each tape will hold several (probably 'tar') files, the indexes of which will be stored in the tapeXXXX directory upon succesfull writing. When it comes to appending new data to a tape, the contents of the tapeXXXX directory are taken as authoratative; i.e. if the directory has index files 'file001' - 'file005', the tools will, after rewind, skip six files (label + 5) before starting to write data. - the 'load-tape' tool will load the tape with the specified label. It consults the 'jukebox-index' file to see if the tape is present in the loaded magazine. 'load-tape' will be used by higher level tools that locate a file in the online indexes, get the associated tape label, load the tape and retrieve the file. coding ------ writing the scripts will be relatively straightforward (~ 1-2 days), however one should take care of proper exception handling: consider what can go wrong, then add code to prevent such situations from causing trouble. This may increase the time needed to get the scripts 99.9% OK (there's always a chance you overlook something, so I won't claim scripts are 100% OK after initial coding&testing :-) OPEN ISSUE ---------- - physically labeling the tapes tape numbers are assigned automatically, but when tapes are archived outside the jukebox, a physical label ('sticker') is needed for human identification of tapes. This process is very much open to errors. Should it be based on a complete tape magazine or individually tapes? Also the question is when to do it: ASAP after putting in new tapes? (sequence being: insert new magazine, wait for all tapes to be labeled, unload magazine, put stickers on tapes, reload magazine). Or only when tapes are exchanged? =========================================================================== X-Request-Action: wilhelm notified by gerard@ripe.net. X-Request-Acted: Thu May 27 9:20:09 1999 (927789609) Received: from postman.ripe.net (postman.ripe.net [193.0.0.199]) by office.ripe.net (8.8.8/8.8.5) with SMTP id JAA03689 for ; Thu, 27 May 1999 09:20:08 +0200 (CEST) Received: (qmail 26387 invoked by uid 0); 27 May 1999 07:20:07 -0000 Received: from birch.ripe.net (193.0.1.96) by postman.ripe.net with SMTP; 27 May 1999 07:20:07 -0000 Received: (from gerard@localhost) by birch.ripe.net (8.8.8/8.8.8) id JAA00749; Thu, 27 May 1999 09:20:01 +0200 (CEST) Date: Thu, 27 May 1999 09:20:01 +0200 (CEST) From: Gerard Leurs Message-Id: <199905270720.JAA00749@birch.ripe.net> To: tt-ops@ripe.net, wilhelm@ripe.net Subject: Re: [ripe-ttraffic #3066] TestTraffic Tape Handling Model Cc: ao-backups@ripe.net Hi Rene. A nice and very handy overview of tape-handling. I think this should not only be applied to the test-traffic-jukebox, but also to the ops-jukebox. Some remarks and questions. > contents of all tapes (in jukebox or not) must be accessible > online. True and handy. It will however take lots of diskspace [but tt has lots of that]. This is one of the things that are not implemented in the ops- backups, because of lack of diskspace. Even if I/we would use rotating logfiles (only for the incrementals (level 8 + 9)) it would take more than 10MB per day (estimated guess). > Tapes are expected to be filled in several cycles, so it will > be hard to know in advance which files will end up on a particular > tape. An on-line index at the start is not feasible. But you already have these indexfiles (per tape and per write-session- per-tape [tapeXXX/fileYYY]). They are your guideline. > physically labeling the tapes. I would strongly advise to label them before inserting them in the jukebox. This will keep you aware of which tapelabels are in use. And it will prevend you from asking "are these old/used tapes [and just not labeled] or are these newly inserted tapes [partially written]". When you see a label you can online check if there were any write-sessions to it (by checking tapeXXX). Q. Are your tapejobs dramatically different from the ops-tapejobs ? If they need minor adjustements I would really like it if we keep just one file for it. Maybe we can build in a check "on which host does this run" !? But on the other end if ginkgo will end up as a stand-alone host : no amd => no /ncc/bsdmgr/dump accessable. Gerard.