List of sequence alignment software

This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins.

Database search only

Name Description Sequence Type* Link Authors Year

Name	Description	Sequence Type*	Link	Authors	Year
BLAST	local search with fast k-tuple heuristic (Basic Local Alignment Search Tool)	Both	NCBI EBI DDBJ DDBJ (psi-blast) GenomeNet PIR (protein only)	Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ^[1]	1990
CS-BLAST	sequence-context specific BLAST, more sensitive than BLAST, FASTA, and SSEARCH. Position-specific iterative version CSI-BLAST more sensitive than PSI-BLAST	Protein	CS-BLAST server download	Biegert A, Söding J^[2]	2009
FASTA	local search with fast k-tuple heuristic, slower but more sensitive than BLAST	Both	EBI DDBJ GenomeNet PIR (protein only)
GGSEARCH / GLSEARCH	Global:Global (GG), Global:Local (GL) alignment with statistics	Protein	FASTA server
HMMER	local and global search with profile Hidden Markov models, more sensitive than PSI-BLAST	Both	download	Durbin R, Eddy SR, Krogh A, Mitchison G^[3]	1998
HHpred / HHsearch	pairwise comparison of profile Hidden Markov models; very sensitive, but can only search alignment databases (Pfam, PDB, InterPro...)	Protein	server download	Söding J^[4]	2005
IDF	Inverse Document Frequency	Both	download
Infernal	profile SCFG search	RNA	download	Eddy S
PSI-BLAST	position-specific iterative BLAST, local search with position-specific scoring matrices, much more sensitive than BLAST	Protein	NCBI PSI-BLAST	Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ^[5]	1997
SAM	local and global search with profile Hidden Markov models, more sensitive than PSI-BLAST	Both	SAM	Karplus K, Krogh A^[6]	1999
SSEARCH	Smith-Waterman search, slower but more sensitive than FASTA	Both	EBI DDBJ
*Sequence Type: Protein or nucleotide

Pairwise alignment

Name	Description	Sequence Type*	Alignment Type**	Link	Author	Year
Bioconductor Biostrings::pairwiseAlignment	Dynamic programming	Both	Both + Ends-free	site	P. Aboyoun	2008
BioPerl dpAlign	Dynamic programming	Both	Both + Ends-free	site	Y. M. Chan	2003
BLASTZ,LASTZ	Seeded pattern-matching	Nucleotide	Local	download,download	Schwartz et al.	2004,2009
DNADot	Web-based dot-plot tool	Nucleotide	Global	server	R. Bowen	1998
DOTLET	Java-based dot-plot tool	Both	Global	applet	M. Pagni and T. Junier	1998
GGSEARCH, GLSEARCH	Global:Global (GG), Global:Local (GL) alignment with statistics	Protein	Global in query	FASTA server	W. Pearson	2007
JAligner	Open source Java implementation of Smith-Waterman	Both	Local	JWS	A. Moustafa	2005
LALIGN	Multiple, non-overlapping, local similarity (same algorithm as SIM)	Both	Local non-overlapping	server FASTA server	W. Pearson	1991 (algorithm)
mAlign	modelling alignment; models the information content of the sequences	Nucleotide	Both	[1] [2]	D. Powell, L. Allison and T. I. Dix	2004
matcher	Memory-optimized Needleman-Wunsch dynamic programming (based on LALIGN)	Both	Local	Pasteur	I. Longden (modified from W. Pearson)	1999
MCALIGN2	explicit models of indel evolution	DNA	Global	server	J. Wang et al.	2006
MUMmer	suffix tree based	Nucleotide	Global	download	S. Kurtz et al.	2004
needle	Needleman-Wunsch dynamic programming	Both	SemiGlobal	EBI Pasteur	A. Bleasby	1999
Ngila	logarithmic and affine gap costs and explicit models of indel evolution	Both	Global	download	R. Cartwright	2007
Path	Smith-Waterman on protein back-translation graph (detects frameshifts at protein level)	Protein	Local	server download	M. Gîrdea et al.	2009
PatternHunter	Seeded pattern-matching	Nucleotide	Local	download	B. Ma et al.	2002–2004
ProbA (also propA)	Stochastic partition function sampling via dynamic programming	Both	Global	download	U. Mückstein	2002
PyMOL	"align" command aligns sequence & applies it to structure	Protein	Global (by selection)	site	W. L. DeLano	2007
REPuter	suffix tree based	Nucleotide	Local	download	S. Kurtz et al.	2001
SABERTOOTH	Alignment using predicted Connectivity Profiles	Protein	Global	download on request	F. Teichert, J. Minning, U. Bastolla, and M. Porto	2009
SEQALN	Various dynamic programming	Both	Local or Global	server	M.S. Waterman and P. Hardy	1996
SIM, GAP, NAP, LAP	Local similarity with varying gap treatments	Both	Local or global	server	X. Huang and W. Miller	1990-6
SIM	Local similarity	Both	Local	servers	X. Huang and W. Miller	1991
SLIM Search	Ultra-fast blocked alignment	Both	Both	site	S. Inglis, J. Cleary, S. Irvine, L. Trigg, L. Bloksberg et al.	2004
SSEARCH	Local (Smith-Waterman) alignment with statistics	Protein	Local	EBI FASTA server	W. Pearson	1981 (Algorithm)
Sequences Studio	Java applet demonstrating various algorithms from ^[7]	Generic sequence	Local and global	code applet	A.Meskauskas	1997 (reference book)
SWIFT suit	Fast Local Alignment Searching	DNA	Local	site	K. Rasmussen, W. Gerlach	2005,2008
stretcher	Memory-optimized but slow dynamic programming	Both	Global	Pasteur	I. Longden (modified from G. Myers and W. Miller)	1999
tranalign	Aligns nucleic acid sequences given a protein alignment	Nucleotide	NA	Pasteur	G. Williams (modified from B. Pearson)	2002
water	Smith-Waterman dynamic programming	Both	Local	EBI Pasteur	A. Bleasby	1999
wordmatch	k-tuple pairwise match	Both	NA	Pasteur	I. Longden	1998
YASS	Seeded pattern-matching	Nucleotide	Local	server download	L. Noe and G. Kucherov	2003–2007
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global

Multiple sequence alignment

Name	Description	Sequence Type*	Alignment Type**	Link	Author	Year
ABA	A-Bruijn alignment	Protein	Global	download	B.Raphaelet al.	2004
ALE	manual alignment ; some software assistance	Nucleotides	Local	download	J. Blandy and K. Fogel	1994 (latest version 2007)
AMAP	Sequence annealing	Both	Global	server	A. Schwartz and L. Pachter	2006
anon.	fast, optimal alignment of three sequences using linear gap costs	Nucleotides	Global	paper software	D. Powell, L. Allison and T. I. Dix	2000
BAli-Phy	Tree+Multi alignment ; Probabilistic/Bayesian ; Joint Estimation	Both	Global	WWW+download	BD Redelings and MA Suchard	2005 (latest version 2010)
CHAOS/DIALIGN	Iterative alignment	Both	Local (preferred)	server	M. Brudno and B. Morgenstern	2003
ClustalW	Progressive alignment	Both	Local or Global	download EBI DDBJ PBIL EMBNet GenomeNet	Thompson et al.	1994
CodonCode Aligner	Multi alignment; ClustalW & Phrap support	Nucleotides	Local or Global	download	P. Richterich et al.	2003 (latest version 2009)
DIALIGN-TX and DIALIGN-T	Segment-based method	Both	Local (preferred) or Global	download and server	A.R.Subramanian	2005 (latest version 2008)
DNA Alignment	Segment-based method for intraspecific alignments	Both	Local (preferred) or Global	server	A.Roehl	2005 (latest version 2008)
FSA	Sequence annealing	Both	Global	download and server	R. K. Bradley et al.	2008
Geneious	Progressive/Iterative alignment; ClustalW plugin	Both	Local or Global	download	A.J. Drummond et al.	2005 (latest version 2009)
Kalign	Progressive alignment	Both	Global	server EBI MPItoolkit	T. Lassmann	2005
MAFFT	Progressive/iterative alignment	Both	Local or Global	GenomeNet MAFFT	K. Katoh et al.	2005
MARNA	Multiple Alignment of RNAs	RNA	Local	server download	S. Siebert et al.	2005
MAVID	Progressive alignment	Both	Global	server	N. Bray and L. Pachter	2004
MSA	Dynamic programming	Both	Local or Global	download	D.J. Lipman et al.	1989 (modified 1995)
MSAProbs	Dynamic programming	Protein	Global	download	Y. Liu, B. Schmidt, D. Maskell	2010
MULTALIN	Dynamic programming/clustering	Both	Local or Global	server	F. Corpet	1988
Multi-LAGAN	Progressive dynamic programming alignment	Both	Global	server	M. Brudno et al.	2003
MUSCLE	Progressive/iterative alignment	Both	Local or Global	server	R. Edgar	2004
Opal	Progressive/iterative alignment	Both	Local or Global	download	T. Wheeler and J. Kececioglu	2007
Pecan	Probabilistic/consistency	DNA	Global	download	B. Paten et al.	2008
POA	Partial order/hidden Markov model	Protein	Local or Global	download	C. Lee	2002
Probalign	Probabilistic/consistency with partition function probabilities	Protein	Global	server	Roshan and Livesay	2006
ProbCons	Probabilistic/consistency	Protein	Local or Global	server	C. Do et al.	2005
PROMALS3D	Progressive alignment/hidden Markov model/Secondary structure/3D structure	Protein	Global	server	J. Pei et al.	2008
PRRN/PRRP	Iterative alignment (especially refinement)	Protein	Local or Global	PRRP PRRN	Y. Totoki (based on O. Gotoh)	1991 and later
PSAlign	Alignment preserving non-heuristic	Both	Local or Global	download	S.H. Sze, Y. Lu, Q. Yang.	2006
RevTrans	Combines DNA and Protein alignment, by back translating the protein alignment to DNA.	DNA/Protein (special)	Local or Global	server	Wernersson and Pedersen	2003 (newest version 2005)
SAGA	Sequence alignment by genetic algorithm	Protein	Local or Global	download	C. Notredame et al.	1996 (new version 1998)
SAM	Hidden Markov model	Protein	Local or Global	server	A. Krogh et al.	1994 (most recent version 2002)
StatAlign	Bayesian co-estimation of alignment and phylogeny (MCMC)	Both	Global	download	A. Novak et al.	2008
Stemloc	Multiple alignment and secondary structure prediction	RNA	Local or Global	download	I. Holmes	2005
T-Coffee	More sensitive progressive alignment	Both	Local or Global	server download	C. Notredame et al.	2000 (newest version 2008)
UGENE	Supports multiple alignment with MUSCLE and KAlign plugins, local sequences alignment with Smith-Waterman algorithm.	Both	Local or Global	download	UGENE team	2009
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global

Genomics analysis

Name	Description	Sequence Type*	Link
SLAM	Gene finding, alignment, annotation (human-mouse homology identification)	Nucleotide	server
Mauve	Multiple alignment of rearranged genomes	Nucleotide	download
MGA	Multiple Genome Aligner	Nucleotide	download
Mulan	Local multiple alignments of genome-length sequences	Nucleotide	server
Multiz	Multiple alignment of genomes	Nucleotide	download
PLAST-ncRNA	Search for ncRNAs in genomes by partition function local alignment	Nucleotide	server
Sequerome	Profiling sequence alignment data with major servers/services	Nucleotide/peptide	server
AVID	Pairwise global alignment with whole genomes	Nucleotide	server
SIBsim4 / Sim4	A program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns	Nucleotide	download
Shuffle-LAGAN	Pairwise glocal alignment of completed genome regions	Nucleotide	server
ACT (Artemis Comparison Tool)	Synteny and comparative genomics	Nucleotide	server
*Sequence Type: Protein or nucleotide

Motif finding

Name	Description	Sequence Type*	Link
MEME/MAST	Motif discovery and search	Both	server
BLOCKS	Ungapped motif identification from BLOCKS database	Both	server
eMOTIF	Extraction and identification of shorter motifs	Both	servers
Gibbs motif sampler	Stochastic motif extraction by statistical likelihood	Both	server (one of many implementations)
HMMTOP	Prediction of transmembrane helices and topology of proteins	Protein	homepage & download
JCoils	Prediction of Coiled coil and Leucine Zipper	Protein	homepage & download
TEIRESIAS	Motif extraction and database search	Both	server
PRATT	Pattern generation for use with ScanProsite	Protein	server
ScanProsite	Motif database search tool	Protein	server
PHI-Blast	Motif search and alignment tool	Both	Pasteur
I-sites	Local structure motif library	Protein	server
*Sequence Type: Protein or nucleotide

Benchmarking

Name	Link	Authors
BAliBASE	download	Thompson, Plewniak, Poch
HOMSTRAD	download	Stebbings, Mizuguchi
Oxbench	download	Raghava, Searle, Audley, Barber, Barton
PFAM	download
PREFAB	download	Edgar
SABmark	download	Van Walle, Lasters, Wyns
SMART	download	Letunic, Copley, Schmidt, Ciccarelli, Doerks, Schultz, Ponting, Bork

Alignment Viewers/Editors
Please see the List of alignment visualization software.

Short-Read Sequence Alignment

Name	Description	Multi-threaded	License	Link
BFAST	Explicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment.	Yes (POSIX)	GPL	link
BLASTN	BLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, sanger sequence) rather than a reference genome.			link
BLAT	Made by Jim Kent. Can handle one mismatch in initial alignment step.	Yes (client/server).	Free for academic and non-commercial use.	link
Bowtie	Uses a Burrows-Wheeler to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policies.	Yes (POSIX)	Artistic License	link
CASHX	Quantify and manage large quantities of short-read sequence data. CASHX pipeline contains a set of tools that can be used together or as independent modules on their own. This algorithm is very accurate for perfect hits to a reference genome.	No	Free for academic and non-commercial use.	link
CUDA-EC	Short-read alignment error correction using GPUs.	Yes (GPU enabled)		CUDA-EC-
ELAND	Implemented by Illumina. Includes ungapped alignment with a finite read length.
GMAP and GSNAP	Robust, fast, short-read alignment. GMAP: singleton reads; GSNAP: paired reads. Useful for digital gene expression, SNP and indel genotyping. Developed by Tom Wu at Genentech. Implemented by NCGR in Alpheus.	Yes	Free for academic and non-commercial use.	[3]
LAST				link
MAQ	Ungapped alignment that takes into account quality scores for each base.		GPL	link
MOM	MOM or maximum oligonucleotide mapping is a query matching tool that captures a maximal length match within the short read.	Yes		[4]
MOSAIK	Fast gapped aligner and reference-guided assembler. Aligns reads using a banded Smith-Waterman algorithm seeded by results from a k-mer hashing scheme. Supports reads ranging in size from very short to very long.	Yes		link
Novoalign	Gapped alignment of single end and paired end Illumina GA I & II reads and reads from the new Helicos Heliscope Genome Analyzer. High sensitivity and specificity, using base qualities at all steps in the alignment. Includes adapter trimming, base quality calibration, Bi-Seq alignment, and option to report multiple alignments per read.	Multi-threading and MPI versions available with paid license.	Single threaded version free for academic and non-commercial use.	Novocraft
PALMapper	PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy. Relying on a machine learning strategy combined with a fast mapping based on a banded Smith-Waterman-like algorithm it aligns around 7 million reads per hour on a single CPU. It refines the originally proposed QPALMA approach.	Yes	GPL	[5]
PerM	Indexes the genome with periodic seeds to quickly find alignments with full sensitivity up to four mismatches. It can map Illumina and SOLiD reads. Unlike most mapping programs, speed increases for longer read lengths.	Yes	GPL	link
QPalma	Is able to take advantage of quality scores, intron lengths and computation splice site predictions to perform and performs an unbiased alignment. Can be trained to the specifics of a RNA-seq experiment and genome. Useful for splice site/intron discovery and for gene model building. (See PALMapper for a faster version).	Yes (client/server)	GPLv2	link
RazerS	No read length limit. Hamming or edit distance mapping with configurable error rates. Configurable and predictable sensitivity (runtime/sensitivity tradeoff). Supports paired-end read mapping.		LGPL	link
RMAP	Read lengths can range from 20bp to at most 64bp. Uses the "exclusion principle" to allow for mismatches and look-up reads in an index.			link
SeqMap	Up to 5 mixed substitutions and insertions/deletions. Various tuning options and input/output formats.		Free for academic and non-commercial use.	link
Shrec	Short read error correction with a Suffix trie data structure.	Yes (Java)		link
SHRiMP	Indexes the reads instead of the reference genome. Uses masks to generate possible keys. Can map ABI SOLiD color space reads.		BSD derivative	link
SLIDER	Slider is an application for the Illumina Sequence Analyzer output that uses the "probability" files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences.			link
SLIM Search	Extremely fast, tolerant to high indel and substitution counts. Includes full read alignment.	Yes	Commercial/High-use customers can join beta programme (Q2 '09)	link
SOAP	Robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table.Now SOAP2 is much faster than the first version.	YES(MULTITHREAD)	GPL; SOAP2 source is currently unavailable	link
SOCS	For ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm.	Yes		link
SSAHA and SSAHA2	Fast for a small number of variants.		Free for academic and non-commercial use.	link
Taipan	de-novo Assembler for Illumina reads		Free for academic and non-commercial use.	link
ZOOM	100% sensitivity for a reads between 15 - 240bp with practical mismatches. Very fast. Support insertions and deletions. Works with Illumina & SOLiD instruments, not 454.	Yes (GUI) No (CLI).	Commercial	link

See also
* List of open source bioinformatics software

References
1. ^ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (October 1990). "Basic local alignment search tool". Journal of Molecular Biology 215 (3): 403–10. doi:10.1006/jmbi.1990.9999. PMID 2231712.
2. ^ Biegert A, Söding J (March 2009). "Sequence context-specific profiles for homology searching". Proceedings of the National Academy of Sciences of the United States of America 106 (10): 3770–5. doi:10.1073/pnas.0810767106. PMID 19234132.
3. ^ Durbin, Richard; Eddy, Sean R.; Krogh, Anders et al., eds (1998). Biological sequence analysis: probalistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press. ISBN 978-0-521-62971-3. [page needed]
4. ^ Söding J (April 2005). "Protein homology detection by HMM-HMM comparison". Bioinformatics 21 (7): 951–60. doi:10.1093/bioinformatics/bti125. PMID 15531603.
5. ^ Altschul SF, Madden TL, Schäffer AA, et al. (September 1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research 25 (17): 3389–402. doi:10.1093/nar/25.17.3389. PMID 9254694.
6. ^ Hughey, R, Karplus, K., and Krogh, A. (1999) SAM: sequence alignment and modeling software system. Technical report UCSC-CRL 99-11. University of California, Santa Cruz, CA.
7. ^ Dan Gusfield (1997). Algorithms on strings, trees and sequences. Cambridge university press, ISBN 0-521-58519-8.

External links
* Pollard et al. (2004) (PubMed Central free fulltext): The authors discuss LAGAN, CHAOS, and Dialign as the most effective tools tested for certain uses.

Biology Encyclopedia

Retrieved from "http://en.wikipedia.org/"
All text is available under the terms of the GNU Free Documentation License

Home - Hellenica World