HOWTO: Run the pipeline

These instructions are based on the what's needed to run the CASAVA 1.7 variant of the BaseCaller+ELAND. This uses components from htsworkflow as well as the Illumina provided CASAVA 1.7 and BCLConverter 1.7.1.

There may be a copy of BCLConverter in OLB-1.8 as well.

Logically the steps are as follows:

  1. If needed clean up the oldest runfolders
  2. Wait for completion of RTA.
    • When RTA finishes it should create RunInfo.xml slightly after creating Basecalling_Netcopy_complete_SINGLEREAD.txt or Basecalling_Netcopy_READ2.txt

  3. Create config file using retrieve_config
  4. Create make files using setupBclToQseq.py
  5. Go into Data/Intensities/BaseCalls and run make recursive in a way that wont terminate when you log out.
  6. make sure it finishes.

The commands I'm running:

screen -dR
cd $RUNFOLDER
retrieve_config -r . -f $FLOWCELLID
~/proj/BclConverter-1.7.1/bin/setupBclToQseq.py -i Data/Intensities/BaseCalls/ -p Data/Intensities/ -o Data/Intensities/BaseCalls/ --in-place --overwrite --GERALD config-auto.txt
cd Data/Intensities/BaseCalls
make -j6 recursive

RTA 1.9 and 1.10 (HiSeq) require a setupBclToQseq.py version that comes with OLB1.9. Updated commands are:

/home/diane/.local/bin/htsw-get-config -r . -f $FLOWCELLID
/home/diane/proj/OLB-1.9.0/bin/setupBclToQseq.py -b Data/Intensities/BaseCalls/ --in-place --overwrite --GERALD config-auto.txt
(--in-place option makes Data/Intensities/BaseCalls/ the output directory; Intensities directory need not be explicitly specified if it is the parent of BaseCalls)
cd Data/Intensities/BaseCalls
make -j6 recursive

I'm using -j6 as -j8 seemed to overload the disks, though that was when the analysis was happening over nfs on jumpgate. Perhaps -j8 would be ok now.

When it completes there archived qseq and (export.txt or sequence.txt) files on /woldlab/loxcyc/data00/solexa-sequence/flowcells/$FLOWCELLID

Software

On tardigrade/rotifer.

  1. CASAVA1.7/bin is in /usr/local/casava-1.7.0/bin you should add it to your path.
  2. htsworkflow is in ~diane/htsworkflow

export PYTHONPATH=~diane/proj/htsworkflow
export PATH=$PATH:~diane/proj/htsworkflow/scripts
  1. BCLConverter is in ~/proj/BclConverter-1.7.1/bin/setupBclToQseq.py
    • I don't know if you need to add anything from the module on to your path or pythonpath.

Archive Old Runfolder

  1. runfolder --clean $RUNFOLDER
  2. cd done, edit compress program to include which runfolders to compress.
  3. nohup ./compress &

  4. (Depends on xmpp/jabber messaging to notify when its done)
  5. make to get md5sums of compressed files in done (or other scratch space)
  6. rsync $RUNFOLDER.tgz rsync://packrat/sd${space}
  7. Drop md5 Makefile into target drive
  8. make copied md5s
  9. make md5s on both archival disk and scratch space and compare them.
  10. Once both copies match remove the runfolder and compressed copy on scratch space
  11. TODO I need to fix the disk inventory control system to track which disk has which runfolder.


CategoryHowto

RunPipeline (last edited 2011-05-03 23:22:42 by diane)