Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

S F Altschul, T L Madden, A A Schäffer, J Zhang, Z Zhang, W Miller, D J Lipman, S F Altschul, T L Madden, A A Schäffer, J Zhang, Z Zhang, W Miller, D J Lipman

Abstract

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

References

    1. J Mol Biol. 1989 Jun 20;207(4):647-53
    1. J Mol Biol. 1994 Mar 4;236(4):1067-78
    1. Nat Genet. 1996 Dec;14(4):430-40
    1. Nucleic Acids Res. 1997 Jan 1;25(1):31-6
    1. Virology. 1986 Dec;155(2):418-33
    1. Proc Int Conf Intell Syst Mol Biol. 1993;1:47-55
    1. J Mol Biol. 1983 Sep 5;169(1):15-30
    1. J Mol Biol. 1982 Dec 15;162(3):705-8
    1. Methods Enzymol. 1996;266:460-80
    1. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444-8
    1. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264-8
    1. Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091-5
    1. Comput Appl Biosci. 1996 Aug;12(4):327-45
    1. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355-8
    1. DNA Res. 1996 Oct 31;3(5):321-9, 341-54
    1. Comput Appl Biosci. 1995 Oct;11(5):543-51
    1. Comput Appl Biosci. 1996 Apr;12(2):135-43
    1. J Mol Biol. 1986 Apr 5;188(3):415-31
    1. J Mol Biol. 1987 Apr 5;194(3):557-64
    1. Comput Appl Biosci. 1988 Mar;4(1):67-71
    1. Proc Natl Acad Sci U S A. 1991 Oct 15;88(20):8880-4
    1. Comput Appl Biosci. 1988 Mar;4(1):11-7
    1. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):505-19
    1. J Bacteriol. 1989 Dec;171(12):6437-45
    1. J Mol Evol. 1993 Mar;36(3):290-300
    1. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9
    1. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30
    1. Protein Sci. 1997 Mar;6(3):698-705
    1. Science. 1993 Oct 8;262(5131):208-14
    1. Comput Appl Biosci. 1992 Oct;8(5):481-7
    1. Nucleic Acids Res. 1997 Jan 1;25(1):1-6
    1. Mol Microbiol. 1992 Oct;6(20):3051-63
    1. Science. 1994 Oct 7;266(5182):66-71
    1. DNA Seq. 1993;3(5):311-8
    1. J Mol Biol. 1987 Dec 20;198(4):567-77
    1. Proc Natl Acad Sci U S A. 1989 Feb;86(4):1183-7
    1. Hoppe Seylers Z Physiol Chem. 1980 Jul;361(7):1107-16
    1. Proteins. 1991;9(1):56-68
    1. Proc Natl Acad Sci U S A. 1983 Mar;80(5):1382-6
    1. J Mol Biol. 1987 Oct 20;197(4):723-8
    1. J Mol Biol. 1990 Oct 5;215(3):403-10
    1. J Mol Biol. 1994 Nov 4;243(4):574-8
    1. J Mol Biol. 1990 Dec 20;216(4):813-8
    1. Proc Natl Acad Sci U S A. 1993 Jun 15;90(12):5873-7
    1. Bull Math Biol. 1986;48(5-6):633-60
    1. FEBS Lett. 1997 Jan 2;400(1):25-30
    1. J Mol Biol. 1991 Jun 5;219(3):555-65
    1. J Mol Biol. 1987 Feb 20;193(4):723-50
    1. J Comput Biol. 1995 Spring;2(1):9-23
    1. Comput Chem. 1996 Mar;20(1):3-23
    1. FEBS Lett. 1975 Mar 1;51(1):33-7
    1. FASEB J. 1997 Jan;11(1):68-76
    1. Cell. 1980 Mar;19(3):683-96
    1. J Mol Biol. 1986 Mar 20;188(2):233-58
    1. Nucleic Acids Res. 1985 Jan 25;13(2):645-56
    1. Protein Sci. 1994 Aug;3(8):1315-28
    1. J Mol Biol. 1970 Mar;48(3):443-53
    1. Comput Appl Biosci. 1994 Jun;10(3):301-7
    1. DNA Res. 1996 Jun 30;3(3):109-36
    1. Nat Genet. 1996 Jul;13(3):266-8
    1. Cell. 1996 Feb 23;84(4):587-97
    1. Bull Math Biol. 1986;48(5-6):603-16
    1. Nat Genet. 1994 Feb;6(2):119-29
    1. Genes Dev. 1996 Dec 15;10(24):3141-55
    1. Science. 1996 Aug 23;273(5278):1058-73
    1. Structure. 1997 Feb 15;5(2):165-71
    1. J Mol Biol. 1981 Mar 25;147(1):195-7
    1. Nature. 1994 Mar 3;368(6466):32-8
    1. Comput Appl Biosci. 1994 Feb;10(1):19-29

Source: PubMed

3
구독하다