Blast Search PDB

This example demonstrates how to use Protein Data Bank blast search function, blastPDB().

blastPDB() is a utility function which can be used to check if structures matching a sequence exist in PDB or to identify a set of related structures for Ensemble Analysis.

We will used amino acid sequence of a protein, e.g. ASFPVEILPFLYLGCAKDSTNLDVLEEFGIKYILNVTPNLPNLF...YDIVKMKKSNISPNFNFMGQLLDFERTL

The blastPDB() function accepts sequence as a Python str().

Output will be PDBBlastRecord instance that stores PDB hits and returns to the user those sharing sequence identity above a user specified value.

Best match

To get the best match, PDBBlastRecord.getBest() method can be used:

In [6]: best = blast_record.getBest()

In [7]: best['pdb_id']
Out[7]: '1mkp'

In [8]: best['percent_identity']
Out[8]: 100.0

PDB hits

In [9]: hits = blast_record.getHits(percent_identity=90, percent_overlap=70)

In [10]: list(hits)
Out[10]: ['1mkp']

This results in only MKP-3 itself, since percent_identity argument was set to 90:

In [11]: hits = blast_record.getHits(percent_identity=50)

In [12]: list(hits)
Out[12]: ['1m3g', '2hxp', '3lj8', '3ezz', '1mkp']

In [13]: hits = blast_record.getHits(percent_identity=40)

In [14]: list(hits)
Out[14]: ['3lj8', '1mkp', '1zzw', '2g6z', '2hxp', '3ezz', '1m3g', '2oud']

This resulted in more hits, including structures of MKP-2, MKP-4, and MKP-5 More information on a hit can be obtained as follows:

In [15]: hits['1zzw']['percent_identity']
Out[15]: 49.27536231884058

In [16]: hits['1zzw']['align-len']
Out[16]: 138

In [17]: hits['1zzw']['identity']
Out[17]: 68

To obtain all hits, simply run the function without specifying parameters:

In [18]: all_hits = blast_record.getHits()

Download hits

PDB hits can be downloaded using fetchPDB() function:

filenames = fetchPDB(hits.keys())
filenames