The guts of BLAST2

This page has been accessed 0 times since 26-Mar-99 CronCount

BLAST2

Only new version of BLAST (from 1997) is discussed here, because it has superior speed m ore flexibility and better biological relatedness comparing to old BLAST (from 1990) or frequently used Wu-BLAST.

What the different BLAST programs do?

The BLAST family of programs allows all combinations of DNA or protein query sequences with searches against DNA or protein databases:

blastp compares an amino acid query sequence against a protein sequence database.

blastn compares a nucleotide query sequence against a nucleotide sequence database.

blastx compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.

tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).

tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

The programs, blastn and blastp, offer fully gapped alignments. blastx and tblastn have 'in-frame' gapped alignments and use sum statistics to link alignments from different frames. tblastx provides only ungapped alignments.

blastx and tblastx might be useful for finding frameshifts in newly sequenced genes.

The PSI-BLAST program is extension of blastp that allows more sensitive database search. It runs iteratevily, it means that several consequtive blast searches are needed to achieve the sensitivity. Briefly, a simple multiple alignment is created from sequences that were received after first blastp search, then a position specific scoring matrix is created from the alignment. This new, more sensitive matrix is used for the following rounds of blast search. The process is repeated until no improvement is found.
PSI-BLAST can be run on NCBI server.

The PHI-BLAST program is combining usual blast search with pattern search. Patterns are defined as regular expressions (complex combinations of letters).
PHI-BLAST may be preferable to just searching for pattern occurrences because it uses more information from surrounding area.
If your pattern is very frequent in sequences, PHI-BLAST might be helpful because it filters out those cases where the pattern occurrence is probably random and not indicative of homology.
PHI-BLAST may be preferable to other flavors of BLAST because it is faster and because it allows the user to express a rigid pattern occurrence requirement. Only matches where your pattern occurs are reported.
PHI-BLAST can be run on NCBI server.

Sometimes you do not want to search databases, but want to align only 2 sequences with each other (for example, 2 homologous cosmids from close organisms). Then you can use NCBI server for 2 sequence BLAST. You can do the same on your own machine, when you convert second sequence to blast database format and run first sequence against this "database". Using NCBI server is just much faster and more convinient. The alignment of sequences of longer than 150 kb is discouraged on this server.

Running blast from command line?

Sometimes you may want to run blast on your own computer. BLAST executables are available for

Win32/DOS

Linux

Sun Solaris

SGI IRIX6

DEC Alpha

Download executables from NCBI.

There are only 3 programs: blastall, blastpgp and formatdb. Formatdb is used for creating blast databases, blastpgp is used for psi-blast and phi-blast searches and blastall is used for executing all other programs.

What information should I use from blast output?

How does the blast2 algorithm work?