.\"manual page for xsact .\"don't leave home without it .\" .TH XSACT 1 "FEBRUARY 2006" .SH NAME xsact \- cluster EST sequences .SH SYNOPSIS .B xsact -k .I kval .B -n .I threshold .B [options] [file..] .SH DESCRIPTION .B xsact Clusters ESTs in a FASTA-formatted file by building a suffix identifying exactly matching blocs, and measuring the maximal colinear set of matches between pairs of sequences. .SH OPTIONS .IP -k Specify the word size .IP -n Specify threshold for clustering .IP -p Prefix length. Increase this to reduce memory consumption. .IP -u Upper case input only, lower case characters are treated as 'N's. .IP -s Add singleton clusters to output. .IP -i Don't ignore matches in simple (low complexity) repeats. .PP Output options .IP -o Specify output file. .IP -L Each cluster is output as one line, containing its sequence labels. .IP -N Output a set of newick-formatted trees. .IP -U Output full sequences, each cluster separated by a comment line (similar to UniGene). .IP -P Output all sequence pairs. .PP Parallelism options .IP -m Specify a prefix for which to generate intermediate files. .IP -c Construct clustering from intermediate files with the specified prefix length. .SH EXAMPLES Typical usage: xsact -k 20 -n 70 -s -x -p 2 input.seq -L -o input.clusters Good values for k and n are in the ranges 18..30 and 50..90, respectively. For anything beyond trivial data sets, use .B -x and set .B -p between 1 and 3. Here's a script that clusters the file using four parallel processes: for a in A C G T; do xsact -k 20 -p 2 -m $a input.seq & done wait xsact -k 20 -n 70 -c 1 -s -L input.seq -o input.clusters .SH BUGS Large intermediate files are generated. Keep an eye on free disk space! .SH AUTHOR Ketil Malde