This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins.
Database search only
Name Description Sequence Type* Link Authors Year
Name | Description | Sequence Type* | Link | Authors | Year |
---|---|---|---|---|---|
BLAST | local search with fast k-tuple heuristic (Basic Local Alignment Search Tool) | Both | NCBI EBI DDBJ DDBJ (psi-blast) GenomeNet PIR (protein only) | Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ[1] | 1990 |
CS-BLAST | sequence-context specific BLAST, more sensitive than BLAST, FASTA, and SSEARCH. Position-specific iterative version CSI-BLAST more sensitive than PSI-BLAST | Protein | CS-BLAST server download | Biegert A, Söding J[2] | 2009 |
FASTA | local search with fast k-tuple heuristic, slower but more sensitive than BLAST | Both | EBI DDBJ GenomeNet PIR (protein only) | ||
GGSEARCH / GLSEARCH | Global:Global (GG), Global:Local (GL) alignment with statistics | Protein | FASTA server | ||
HMMER | local and global search with profile Hidden Markov models, more sensitive than PSI-BLAST | Both | download | Durbin R, Eddy SR, Krogh A, Mitchison G[3] | 1998 |
HHpred / HHsearch | pairwise comparison of profile Hidden Markov models; very sensitive, but can only search alignment databases (Pfam, PDB, InterPro...) | Protein | server download | Söding J[4] | 2005 |
IDF | Inverse Document Frequency | Both | download | ||
Infernal | profile SCFG search | RNA | download | Eddy S | |
PSI-BLAST | position-specific iterative BLAST, local search with position-specific scoring matrices, much more sensitive than BLAST | Protein | NCBI PSI-BLAST | Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ[5] | 1997 |
SAM | local and global search with profile Hidden Markov models, more sensitive than PSI-BLAST | Both | SAM | Karplus K, Krogh A[6] | 1999 |
SSEARCH | Smith-Waterman search, slower but more sensitive than FASTA | Both | EBI DDBJ | ||
*Sequence Type: Protein or nucleotide |
Pairwise alignment
Name | Description | Sequence Type* | Alignment Type** | Link | Author | Year |
---|---|---|---|---|---|---|
Bioconductor Biostrings::pairwiseAlignment | Dynamic programming | Both | Both + Ends-free | site | P. Aboyoun | 2008 |
BioPerl dpAlign | Dynamic programming | Both | Both + Ends-free | site | Y. M. Chan | 2003 |
BLASTZ,LASTZ | Seeded pattern-matching | Nucleotide | Local | download,download | Schwartz et al. | 2004,2009 |
DNADot | Web-based dot-plot tool | Nucleotide | Global | server | R. Bowen | 1998 |
DOTLET | Java-based dot-plot tool | Both | Global | applet | M. Pagni and T. Junier | 1998 |
GGSEARCH, GLSEARCH | Global:Global (GG), Global:Local (GL) alignment with statistics | Protein | Global in query | FASTA server | W. Pearson | 2007 |
JAligner | Open source Java implementation of Smith-Waterman | Both | Local | JWS | A. Moustafa | 2005 |
LALIGN | Multiple, non-overlapping, local similarity (same algorithm as SIM) | Both | Local non-overlapping | server FASTA server | W. Pearson | 1991 (algorithm) |
mAlign | modelling alignment; models the information content of the sequences | Nucleotide | Both | [1] [2] | D. Powell, L. Allison and T. I. Dix | 2004 |
matcher | Memory-optimized Needleman-Wunsch dynamic programming (based on LALIGN) | Both | Local | Pasteur | I. Longden (modified from W. Pearson) | 1999 |
MCALIGN2 | explicit models of indel evolution | DNA | Global | server | J. Wang et al. | 2006 |
MUMmer | suffix tree based | Nucleotide | Global | download | S. Kurtz et al. | 2004 |
needle | Needleman-Wunsch dynamic programming | Both | SemiGlobal | EBIPasteur | A. Bleasby | 1999 |
Ngila | logarithmic and affine gap costs and explicit models of indel evolution | Both | Global | download | R. Cartwright | 2007 |
Path | Smith-Waterman on protein back-translation graph (detects frameshifts at protein level) | Protein | Local | server download | M. Gîrdea et al. | 2009 |
PatternHunter | Seeded pattern-matching | Nucleotide | Local | download | B. Ma et al. | 2002–2004 |
ProbA (also propA) | Stochastic partition function sampling via dynamic programming | Both | Global | download | U. Mückstein | 2002 |
PyMOL | "align" command aligns sequence & applies it to structure | Protein | Global (by selection) | site | W. L. DeLano | 2007 |
REPuter | suffix tree based | Nucleotide | Local | download | S. Kurtz et al. | 2001 |
SABERTOOTH | Alignment using predicted Connectivity Profiles | Protein | Global | download on request | F. Teichert, J. Minning, U. Bastolla, and M. Porto | 2009 |
SEQALN | Various dynamic programming | Both | Local or Global | server | M.S. Waterman and P. Hardy | 1996 |
SIM, GAP, NAP, LAP | Local similarity with varying gap treatments | Both | Local or global | server | X. Huang and W. Miller | 1990-6 |
SIM | Local similarity | Both | Local | servers | X. Huang and W. Miller | 1991 |
SLIM Search | Ultra-fast blocked alignment | Both | Both | site | S. Inglis, J. Cleary, S. Irvine, L. Trigg, L. Bloksberg et al. | 2004 |
SSEARCH | Local (Smith-Waterman) alignment with statistics | Protein | Local | EBI FASTA server | W. Pearson | 1981 (Algorithm) |
Sequences Studio | Java applet demonstrating various algorithms from [7] | Generic sequence | Local and global | code applet | A.Meskauskas | 1997 (reference book) |
SWIFT suit | Fast Local Alignment Searching | DNA | Local | site | K. Rasmussen, W. Gerlach | 2005,2008 |
stretcher | Memory-optimized but slow dynamic programming | Both | Global | Pasteur | I. Longden (modified from G. Myers and W. Miller) | 1999 |
tranalign | Aligns nucleic acid sequences given a protein alignment | Nucleotide | NA | Pasteur | G. Williams (modified from B. Pearson) | 2002 |
water | Smith-Waterman dynamic programming | Both | Local | EBIPasteur | A. Bleasby | 1999 |
wordmatch | k-tuple pairwise match | Both | NA | Pasteur | I. Longden | 1998 |
YASS | Seeded pattern-matching | Nucleotide | Local | server download | L. Noe and G. Kucherov | 2003–2007 |
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global |
Multiple sequence alignment
Name | Description | Sequence Type* | Alignment Type** | Link | Author | Year |
---|---|---|---|---|---|---|
ABA | A-Bruijn alignment | Protein | Global | download | B.Raphaelet al. | 2004 |
ALE | manual alignment ; some software assistance | Nucleotides | Local | download | J. Blandy and K. Fogel | 1994 (latest version 2007) |
AMAP | Sequence annealing | Both | Global | server | A. Schwartz and L. Pachter | 2006 |
anon. | fast, optimal alignment of three sequences using linear gap costs | Nucleotides | Global | paper software | D. Powell, L. Allison and T. I. Dix | 2000 |
BAli-Phy | Tree+Multi alignment ; Probabilistic/Bayesian ; Joint Estimation | Both | Global | WWW+download | BD Redelings and MA Suchard | 2005 (latest version 2010) |
CHAOS/DIALIGN | Iterative alignment | Both | Local (preferred) | server | M. Brudno and B. Morgenstern | 2003 |
ClustalW | Progressive alignment | Both | Local or Global | download EBI DDBJ PBIL EMBNet GenomeNet | Thompson et al. | 1994 |
CodonCode Aligner | Multi alignment; ClustalW & Phrap support | Nucleotides | Local or Global | download | P. Richterich et al. | 2003 (latest version 2009) |
DIALIGN-TX and DIALIGN-T | Segment-based method | Both | Local (preferred) or Global | download and server | A.R.Subramanian | 2005 (latest version 2008) |
DNA Alignment | Segment-based method for intraspecific alignments | Both | Local (preferred) or Global | server | A.Roehl | 2005 (latest version 2008) |
FSA | Sequence annealing | Both | Global | download and server | R. K. Bradley et al. | 2008 |
Geneious | Progressive/Iterative alignment; ClustalW plugin | Both | Local or Global | download | A.J. Drummond et al. | 2005 (latest version 2009) |
Kalign | Progressive alignment | Both | Global | serverEBI MPItoolkit | T. Lassmann | 2005 |
MAFFT | Progressive/iterative alignment | Both | Local or Global | GenomeNet MAFFT | K. Katoh et al. | 2005 |
MARNA | Multiple Alignment of RNAs | RNA | Local | server download | S. Siebert et al. | 2005 |
MAVID | Progressive alignment | Both | Global | server | N. Bray and L. Pachter | 2004 |
MSA | Dynamic programming | Both | Local or Global | download | D.J. Lipman et al. | 1989 (modified 1995) |
MSAProbs | Dynamic programming | Protein | Global | download | Y. Liu, B. Schmidt, D. Maskell | 2010 |
MULTALIN | Dynamic programming/clustering | Both | Local or Global | server | F. Corpet | 1988 |
Multi-LAGAN | Progressive dynamic programming alignment | Both | Global | server | M. Brudno et al. | 2003 |
MUSCLE | Progressive/iterative alignment | Both | Local or Global | server | R. Edgar | 2004 |
Opal | Progressive/iterative alignment | Both | Local or Global | download | T. Wheeler and J. Kececioglu | 2007 |
Pecan | Probabilistic/consistency | DNA | Global | download | B. Paten et al. | 2008 |
POA | Partial order/hidden Markov model | Protein | Local or Global | download | C. Lee | 2002 |
Probalign | Probabilistic/consistency with partition function probabilities | Protein | Global | server | Roshan and Livesay | 2006 |
ProbCons | Probabilistic/consistency | Protein | Local or Global | server | C. Do et al. | 2005 |
PROMALS3D | Progressive alignment/hidden Markov model/Secondary structure/3D structure | Protein | Global | server | J. Pei et al. | 2008 |
PRRN/PRRP | Iterative alignment (especially refinement) | Protein | Local or Global | PRRP PRRN | Y. Totoki (based on O. Gotoh) | 1991 and later |
PSAlign | Alignment preserving non-heuristic | Both | Local or Global | download | S.H. Sze, Y. Lu, Q. Yang. | 2006 |
RevTrans | Combines DNA and Protein alignment, by back translating the protein alignment to DNA. | DNA/Protein (special) | Local or Global | server | Wernersson and Pedersen | 2003 (newest version 2005) |
SAGA | Sequence alignment by genetic algorithm | Protein | Local or Global | download | C. Notredame et al. | 1996 (new version 1998) |
SAM | Hidden Markov model | Protein | Local or Global | server | A. Krogh et al. | 1994 (most recent version 2002) |
StatAlign | Bayesian co-estimation of alignment and phylogeny (MCMC) | Both | Global | download | A. Novak et al. | 2008 |
Stemloc | Multiple alignment and secondary structure prediction | RNA | Local or Global | download | I. Holmes | 2005 |
T-Coffee | More sensitive progressive alignment | Both | Local or Global | server download | C. Notredame et al. | 2000 (newest version 2008) |
UGENE | Supports multiple alignment with MUSCLE and KAlign plugins, local sequences alignment with Smith-Waterman algorithm. | Both | Local or Global | download | UGENE team | 2009 |
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global |
Genomics analysis
Name | Description | Sequence Type* | Link |
---|---|---|---|
SLAM | Gene finding, alignment, annotation (human-mouse homology identification) | Nucleotide | server |
Mauve | Multiple alignment of rearranged genomes | Nucleotide | download |
MGA | Multiple Genome Aligner | Nucleotide | download |
Mulan | Local multiple alignments of genome-length sequences | Nucleotide | server |
Multiz | Multiple alignment of genomes | Nucleotide | download |
PLAST-ncRNA | Search for ncRNAs in genomes by partition function local alignment | Nucleotide | server |
Sequerome | Profiling sequence alignment data with major servers/services | Nucleotide/peptide | server |
AVID | Pairwise global alignment with whole genomes | Nucleotide | server |
SIBsim4 / Sim4 | A program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns | Nucleotide | download |
Shuffle-LAGAN | Pairwise glocal alignment of completed genome regions | Nucleotide | server |
ACT (Artemis Comparison Tool) | Synteny and comparative genomics | Nucleotide | server |
*Sequence Type: Protein or nucleotide |
Motif finding
Name | Description | Sequence Type* | Link |
---|---|---|---|
MEME/MAST | Motif discovery and search | Both | server |
BLOCKS | Ungapped motif identification from BLOCKS database | Both | server |
eMOTIF | Extraction and identification of shorter motifs | Both | servers |
Gibbs motif sampler | Stochastic motif extraction by statistical likelihood | Both | server (one of many implementations) |
HMMTOP | Prediction of transmembrane helices and topology of proteins | Protein | homepage & download |
JCoils | Prediction of Coiled coil and Leucine Zipper | Protein | homepage & download |
TEIRESIAS | Motif extraction and database search | Both | server |
PRATT | Pattern generation for use with ScanProsite | Protein | server |
ScanProsite | Motif database search tool | Protein | server |
PHI-Blast | Motif search and alignment tool | Both | Pasteur |
I-sites | Local structure motif library | Protein | server |
*Sequence Type: Protein or nucleotide |
Benchmarking
Name | Link | Authors |
---|---|---|
BAliBASE | download | Thompson, Plewniak, Poch |
HOMSTRAD | download | Stebbings, Mizuguchi |
Oxbench | download | Raghava, Searle, Audley, Barber, Barton |
PFAM | download | |
PREFAB | download | Edgar |
SABmark | download | Van Walle, Lasters, Wyns |
SMART | download | Letunic, Copley, Schmidt, Ciccarelli, Doerks, Schultz, Ponting, Bork |
Alignment Viewers/Editors
Please see the List of alignment visualization software.
Short-Read Sequence Alignment
Name | Description | Multi-threaded | License | Link | |
---|---|---|---|---|---|
BFAST | Explicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment. | Yes (POSIX) | GPL | link | |
BLASTN | BLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, sanger sequence) rather than a reference genome. | link | |||
BLAT | Made by Jim Kent. Can handle one mismatch in initial alignment step. | Yes (client/server). | Free for academic and non-commercial use. | link | |
Bowtie | Uses a Burrows-Wheeler to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policies. | Yes (POSIX) | Artistic License | link | |
CASHX | Quantify and manage large quantities of short-read sequence data. CASHX pipeline contains a set of tools that can be used together or as independent modules on their own. This algorithm is very accurate for perfect hits to a reference genome. | No | Free for academic and non-commercial use. | link | |
CUDA-EC | Short-read alignment error correction using GPUs. | Yes (GPU enabled) | CUDA-EC- | ||
ELAND | Implemented by Illumina. Includes ungapped alignment with a finite read length. | ||||
GMAP and GSNAP | Robust, fast, short-read alignment. GMAP: singleton reads; GSNAP: paired reads. Useful for digital gene expression, SNP and indel genotyping. Developed by Tom Wu at Genentech. Implemented by NCGR in Alpheus. | Yes | Free for academic and non-commercial use. | [3] | |
LAST | link | ||||
MAQ | Ungapped alignment that takes into account quality scores for each base. | GPL | link | ||
MOM | MOM or maximum oligonucleotide mapping is a query matching tool that captures a maximal length match within the short read. | Yes | [4] | ||
MOSAIK | Fast gapped aligner and reference-guided assembler. Aligns reads using a banded Smith-Waterman algorithm seeded by results from a k-mer hashing scheme. Supports reads ranging in size from very short to very long. | Yes | link | ||
Novoalign | Gapped alignment of single end and paired end Illumina GA I & II reads and reads from the new Helicos Heliscope Genome Analyzer. High sensitivity and specificity, using base qualities at all steps in the alignment. Includes adapter trimming, base quality calibration, Bi-Seq alignment, and option to report multiple alignments per read. | Multi-threading and MPI versions available with paid license. | Single threaded version free for academic and non-commercial use. | Novocraft | |
PALMapper | PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy. Relying on a machine learning strategy combined with a fast mapping based on a banded Smith-Waterman-like algorithm it aligns around 7 million reads per hour on a single CPU. It refines the originally proposed QPALMA approach. | Yes | GPL | [5] | |
PerM | Indexes the genome with periodic seeds to quickly find alignments with full sensitivity up to four mismatches. It can map Illumina and SOLiD reads. Unlike most mapping programs, speed increases for longer read lengths. | Yes | GPL | link | |
QPalma | Is able to take advantage of quality scores, intron lengths and computation splice site predictions to perform and performs an unbiased alignment. Can be trained to the specifics of a RNA-seq experiment and genome. Useful for splice site/intron discovery and for gene model building. (See PALMapper for a faster version). | Yes (client/server) | GPLv2 | link | |
RazerS | No read length limit. Hamming or edit distance mapping with configurable error rates. Configurable and predictable sensitivity (runtime/sensitivity tradeoff). Supports paired-end read mapping. | LGPL | link | ||
RMAP | Read lengths can range from 20bp to at most 64bp. Uses the "exclusion principle" to allow for mismatches and look-up reads in an index. | link | |||
SeqMap | Up to 5 mixed substitutions and insertions/deletions. Various tuning options and input/output formats. | Free for academic and non-commercial use. | link | ||
Shrec | Short read error correction with a Suffix trie data structure. | Yes (Java) | link | ||
SHRiMP | Indexes the reads instead of the reference genome. Uses masks to generate possible keys. Can map ABI SOLiD color space reads. | BSD derivative | link | ||
SLIDER | Slider is an application for the Illumina Sequence Analyzer output that uses the "probability" files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. | link | |||
SLIM Search | Extremely fast, tolerant to high indel and substitution counts. Includes full read alignment. | Yes | Commercial/High-use customers can join beta programme (Q2 '09) | link | |
SOAP | Robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table.Now SOAP2 is much faster than the first version. | YES(MULTITHREAD) | GPL; SOAP2 source is currently unavailable | link | |
SOCS | For ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm. | Yes | link | ||
SSAHA and SSAHA2 | Fast for a small number of variants. | Free for academic and non-commercial use. | link | ||
Taipan | de-novo Assembler for Illumina reads | Free for academic and non-commercial use. | link | ||
ZOOM | 100% sensitivity for a reads between 15 - 240bp with practical mismatches. Very fast. Support insertions and deletions. Works with Illumina & SOLiD instruments, not 454. | Yes (GUI) No (CLI). | Commercial | link |
See also
* List of open source bioinformatics software
References
1. ^ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (October 1990). "Basic local alignment search tool". Journal of Molecular Biology 215 (3): 403–10. doi:10.1006/jmbi.1990.9999. PMID 2231712.
2. ^ Biegert A, Söding J (March 2009). "Sequence context-specific profiles for homology searching". Proceedings of the National Academy of Sciences of the United States of America 106 (10): 3770–5. doi:10.1073/pnas.0810767106. PMID 19234132.
3. ^ Durbin, Richard; Eddy, Sean R.; Krogh, Anders et al., eds (1998). Biological sequence analysis: probalistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press. ISBN 978-0-521-62971-3. [page needed]
4. ^ Söding J (April 2005). "Protein homology detection by HMM-HMM comparison". Bioinformatics 21 (7): 951–60. doi:10.1093/bioinformatics/bti125. PMID 15531603.
5. ^ Altschul SF, Madden TL, Schäffer AA, et al. (September 1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research 25 (17): 3389–402. doi:10.1093/nar/25.17.3389. PMID 9254694.
6. ^ Hughey, R, Karplus, K., and Krogh, A. (1999) SAM: sequence alignment and modeling software system. Technical report UCSC-CRL 99-11. University of California, Santa Cruz, CA.
7. ^ Dan Gusfield (1997). Algorithms on strings, trees and sequences. Cambridge university press, ISBN 0-521-58519-8.
External links
* Pollard et al. (2004) (PubMed Central free fulltext): The authors discuss LAGAN, CHAOS, and Dialign as the most effective tools tested for certain uses.
Retrieved from "http://en.wikipedia.org/"
All text is available under the terms of the GNU Free Documentation License