Progressive Cactus

Progressive Cactus (link)

Typical Usage

/exports/software/progressiveCactus/bin/runProgressiveCactus.sh \
    /path/to/align.txt \
    /path/to/data/directory \
    /path/to/output/align.hal \
    --database kyoto_tycoon \
    --maxThreads 64

Input format

align.txt:

(tax1,(tax2,tax3));
tax1 /path/to/tax1.fa
tax2* /path/to/tax2.fa
tax3 /path/to/tax3.fa

Contains a newick tree on line 1 (don’t forget the semicolon) followed by a set of mappings from taxon tames to (soft-masked) fasta files.

A taxon name with and asterisk is marked as reference quality (if no asterisks, all sequences will be treated as reference quality)

Gridengine wrapper

progressivecactus.sh (Progressive Cactus alignment followed by conversion to an Assembly Hub and a MAF file):

#!/bin/bash

#$ -V
#$ -cwd
#$ -j y
#$ -o $JOB_ID.log
#$ -pe smp 64

SCRATCH=/scratch/$USER/wga/$JOB_ID
mkdir -p $SCRATCH/$DIR
cd $SCRATCH

export PYTHONPATH=$PYTHONPATH:/exports/software/progressiveCactus/submodules/biopython
. /exports/software/progressiveCactus/environment

nice -n 10 /exports/software/progressiveCactus/bin/runProgressiveCactus.sh $DATA/$INFILE ./$DIR ./$DIR/$HALFILE --database kyoto_tycoon --maxThreads $NSLOTS

if ! [ -z $HUBFILE ]; then
nice -n 10 /exports/software/progressiveCactus/submodules/hal/bin/hal2assemblyHub.py --gcContent --maxThreads=$NSLOTS --defaultMemory=10000000000 ./$DIR/$HALFILE ./$DIR/$HUBFILE
fi

if ! [ -z $MAFFILE ]; then
nice -n 10 /exports/software/progressiveCactus/submodules/hal/bin/hal2mafMP.py ./$DIR/$HALFILE $MAFFILE --numProc $NSLOTS --refGenome $REFGENOME
fi

rsync -av --remove-source-files ./$DIR $DATA/output

find ./$DIR -depth -type d -empty -delete

qsub command

qsub -v SEQFILE=seq.fa \
     -v DIR=input \
     -v DATA=/path/to/data \
     -v INFILE=align.txt \
     -v HALFILE=align.hal \
     -v HUBFILE=align.hub \
     -v MAFFILE=align.maf \
     -v REFGENOME=reference_genome_name \
     -pe smp 32 \
     /path/to/repeatmasker.sh