TestTraffic Operations / Data processing FAQ

Q. The daily data processing job reports:
Error in : error writing all requested bytes to file /ncc/ttpro/root_data/2000/04/03/tt45.ripe.net.20000403.root, wrote 4748 of 27956
SysError in : error writing to file /ncc/ttpro/root_data/2000/04/03/tt45.ripe.net.20000403.root (No space left on device)
SysError in : error writing to file /ncc/ttpro/root_data/2000/04/03/tt45.ripe.net.20000403.root (No space left on device)
...
A. This indicates the filesystem with the current month of ROOT data has filled up. Please consult the documentation of make_root script for details on how to recover.
 
Q. The daily data processing job reports:
stc: Can't move volume: I/O error
What should we do?
A. The problems is caused by a recent reboot. Somehow this brings the tapechanger or the driver software in the OS in a weird state. It can only be fixed by unloading and reloading the tape magazine, which requires physical access to the device.
  • If no tape has been loaded from the magazine:
    • Press the unload button
    • wait for the magazine to be unloaded,
    • press the load button.

  • If a tape was loaded into the tape drive
    • remove tapes from magazine
    • unload and reload the (now empty) magazine
    • eject the tape from the drive (the 'robot' arm which first blocked the tape drive now leaves enough room to get the tape out manually)
    • unload the magazine
    • put all tapes back (in ttraffic case, order is _not_ important!)
    • load the magazine
    • update the file which matches the tape 'labels' to slot numbers in the autochanger:

      ssh kauri
      su ttraffic
      index-jukebox

       

Q. when do we switch to new tapes for data storage?
A. There is no clear schedule yet; on the one hand we want to make good use of available tape space, on the other hand it should not take too long to find a file on a given tape. Workable compromise (with ~40 testboxes) is to switch tapes every 4 months. Other option is to wait wait for a 'store-data' job to fail and take corrective action the day after.
 
Q. what is the procedure for switching to new tapes
A. This is still manual work, the frequency of it is low. The procedure below assumes both raw data and root data tapes are switched; it's trivial to distill the steps required if only one tape needs switching.
  1. Log in to KAURI, the master machine for /ncc/ttpro/data/tapes

  2. Find labels of next two tapes:

    su ttraffic
    cd /ncc/ttpro/data/tapes
    du -k -s tape*

    the first two returning a usage of 2 are candidates; we will refer to these as tapeXXXXX and tapeYYYYY (for example: tape00004 and tape00005). If no empty tapes can be identified in the online tape database, new tapes will have to be configured (i.e. labeled and inserted in the tape jukebox).

    you can also use the output of describe-jukebox

    | tape00017: TYPE: ROOT (8057 files)
    | tape00035: TYPE: RAW (8282 files)
    | tape00009: TYPE: ROOT (5365 files)
    > tape00036: TYPE: ROOT (412 files)
    | tape00011: TYPE: ROOT (9236 files)
    | tape00023: TYPE: ROOT (6123 files)
    | tape00020: TYPE: ROOT (6906 files)
    | tape00034: TYPE: ROOT (2305 files)
    | tape00032: TYPE: ROOT (3510 files)
    | tape00014: TYPE: ROOT (7064 files)
    > tape00037: TYPE: RAW (839 files)
    | tape00029: TYPE: ROOT (4880 files)
        ">" marks active tapes used for RAW/ROOT data
    
    describe-jukebox uses the tape* directories to analyze the content and marks the tape either ROOT or RAW. If it yields "Both" you should investigate the current-* files, this is an error.

    describe-jukebox depends on index-jukebox as it does no tape access whatsoever.

  3. confirm that the two candidate tapes have no data stored:

    ls tapeXXXXX
    ls tapeYYYYY

    should only list the file 'labeldate'. If either of them lists more than this file, choose other tapenumber and check again.

  4. confirm that both tapes are present in the jukebox magazine

    grep XXXXX jukebox-index
    grep YYYYY jukebox-index

    (jukebox-index contains a table matching jukebox-slot-number to tapenumber)

    If not found, choose other tape and go back to step 2.

  5. when steps 2. and 3. yield positive result: retrieve slot number (position in magazine) and label of current raw data tape

    grep `cat current-rawdata-tape` jukebox-index
    (remember or write down the result)
    then update this and the current root data tape number:

    echo XXXXX >current-rawdata-tape
    echo YYYYY >current-rootdata-tape

  6. Recover (partially) failed store-data jobs

    If tape(s) are switched after running into problems in daily processing, the failed job(s) have to be recovered. First make sure the tape drive in the jukebox is empty by executing the command

    empty-drive

    Now for each day that needs recovery start a store-data job; i.e. if not all raw data could be stored and today is Tuesday, run

    store-data -raw /ncc/ttpro/data/tapes/collected_data.Tue
    Similarly, if it was the ROOT data that failed to fit on tape run

    store-data -root /ncc/ttpro/data/tapes/root_data.lastday.Tue
    possibly followed (if that file was modified today) by

    store-data -root /ncc/ttpro/data/tapes/root_data.latefiles.Tue
    Note: if you start working on this first thing in the morning, chances are the index file with collected files is still named /ncc/ttpro/data/tapes/collected_data. It's best to use your own judgement and have a look at output of
    ls -ltr /ncc/ttpro/data/tapes/ | tail
  7. Archive the raw data tape and configure a new one

    Because old raw data are only needed in the rare circumstances of redoing the merging of send/received data, there is no need to keep these tapes online in the jukebox. Therefore we archive it in the tape safe downstairs. First make a copy of the tape:

    • Log in to KAURI, the machine which physcially connects the tape drives
    • Insert a fresh tape into the stand-alone tape drive (located directly on top of the machine)
    • Load and copy the old tape (where ZZZZZ is the old tape's label, grepped in step 5 above):

      empty-drive
      load-tape ZZZZZ
      copy-tape

    • Verify that the number of files copied covers all of files stored in /ncc/ttpro/data/tapes/tapeZZZZZ
    • Eject the tape and physically label it with following text:

      TTM tapeZZZZZ (duplicate)

    • Get the original tape out of the jukebox:

      empty-drive
      load-tape YYYYY      
      (this moves the jukebox arm away from our tape)

      open the cover and get the tape from the right slot, i.e. the one remembered in step 5 above (slots are numbered 0..11, starting from the left); replace with a new blank tape and close cover.

    • verfify you have the correct tape by inserting it in the standalone drive and reading the first file:

      mt -t /dev/nrst37 rewind
      dd if=/dev/nrst37
    • Eject this original tape and physically label it with the text:

      TTM tapeZZZZZ
    • Update the status of tape-number <-> jukebox-slot index (this will automatically assign a new number to the newly inserted tape):

      ssh kauri
      su ttraffic
      index-jukebox

    • archive the old tapes: the original is stored in the safe in the basement (top drawer), the duplicate is stored off-site (for now at René's home). At this point switch the white tabs on the tapes to the 'read-only' position.