FlowSim is a simulator for pyrosequencing. It currently supports Roche 454 sequencing, and attempts to model accurately the flowgram generation, including base and quality calling. The output is written as an SFF file. Currently, only the original GS20 generation is supported, but the newer FLX and Titanium generations should be easy to add, as soon as we have reasonably accurate statistical models for the flow distributions, and a better idea of how to do quality calling. Installation The quick and dirty way to acquire flowsim is to download the statically linked Linux executable from http://malde.org/~ketil/biohaskelll/linux_binaries/ If your system is similar enough to mine, this should work - but please read the next paragraph as well! Only slighltly more involved, and a lot less dirty, is to install the cabal-install package which provides the 'cabal' utility. You will also need a working GHC (which probably includes MINGW or something on Windows). This should work for most current operating systems, and you only need to run the command 'cabal install flowsim' to download the source for FlowSim and its dependencies from HackageDB, compile, and install them. Or any other program written in Haskell. If you want the cutting edge, you can get the latest sources using darcs: 'darcs get http://malde.org/~ketil/biohaskell/flowsim'. If you have everything else installed (i.e. the GHC compiler, and various libraries), you should be able to compile and install flowsim by typing (in the flowsim directory): chmod +x Setup.hs ./Setup.hs configure ./Setup.hs build ./Setup.hs install You probably want to check out the various options to these first. If configuration fails with missing dependencies, you need to install these, either using 'cabal install' or manually downloading them from HackageDB (http://hackage.haskell.org/) and installing them. Using FlowSim Usage is, in brief: clonesim [-c #] [] | flowsim [-G gen] [-o ] Clonesim¹ is used to generate a set of clones from an input file (basically generating fragments of random lengths and orientations, from random locations in the input), and flowsim simulates pyrosequencing process of the clones. The 'gen' parameter may currently be GS20, Titanium or EmpTitanium. Both clonesim and flowsim may be given the --help option for more extensive usage information. So, to generate an SFF read consisting of 100K simulated shotgun reads from genome.fasta using the empirical flow distributions could be done thusly: clonesim -c 100000 genome.fasta | flowsim -G Emp -o sim.sff If you are interested in using this, or in further development, please contact me at ketil dot malde at imr dot no. Note that clonesim was added (or rather, factored out from flowsim) in version 0.2.6,