Blast Search PDB¶
This example demonstrates how to use Protein Data Bank blast search function,
blastPDB()
.
blastPDB()
is a utility function which can be used to check if
structures matching a sequence exist in PDB or to identify a set of related
structures for Ensemble Analysis.
We will used amino acid sequence of a protein, e.g.
ASFPVEILPFLYLGCAKDSTNLDVLEEFGIKYILNVTPNLPNLF...YDIVKMKKSNISPNFNFMGQLLDFERTL
The blastPDB()
function accepts sequence as a Python str()
.
Output will be PDBBlastRecord
instance that stores PDB hits and
returns to the user those sharing sequence identity above a user specified
value.
Blast search¶
We start by importing everything from the ProDy package:
In [1]: from prody import *
Let’s search for structures similar to that of MKP-3, using its sequence:
In [2]: blast_record = blastPDB('''ASFPVEILPFLYLGCAKDSTNLDVLEEFGIKYILNVTPNL
...: PNLFENAGEFKYKQIPISDHWSQNLSQFFPEAISFIDEAR
...: GKNCGVLVHSLAGISRSVTVTVAYLMQKLNLSMNDAYDIV
...: KMKKSNISPNFNFMGQLLDFERTL''')
...:
blastPDB()
function returns a PDBBlastRecord
. It is a good
practice to save this record on disk, as NCBI may not respond to repeated
searches for the same sequence. We can do this using Python standard library
pickle
as follows:
In [3]: import pickle
Record is save using dump()
function into an open file:
In [4]: pickle.dump(blast_record, open('mkp3_blast_record.pkl', 'wb'))
Then, it can be loaded using load()
function:
In [5]: blast_record = pickle.load(open('mkp3_blast_record.pkl', 'rb'))
Best match¶
To get the best match, PDBBlastRecord.getBest()
method can be used:
In [6]: best = blast_record.getBest()
In [7]: best['pdb_id']
Out[7]: '1mkp'
In [8]: best['percent_identity']
Out[8]: 100.0
PDB hits¶
In [9]: hits = blast_record.getHits(percent_identity=90, percent_overlap=70)
In [10]: list(hits)
Out[10]: ['1mkp']
This results in only MKP-3 itself, since percent_identity argument was set to 90:
In [11]: hits = blast_record.getHits(percent_identity=50)
In [12]: list(hits)
Out[12]: ['1m3g', '2hxp', '3lj8', '3ezz', '1mkp']
In [13]: hits = blast_record.getHits(percent_identity=40)
In [14]: list(hits)
Out[14]: ['3lj8', '1mkp', '1zzw', '2g6z', '2hxp', '3ezz', '1m3g', '2oud']
This resulted in more hits, including structures of MKP-2, MKP-4, and MKP-5 More information on a hit can be obtained as follows:
In [15]: hits['1zzw']['percent_identity']
Out[15]: 49.27536231884058
In [16]: hits['1zzw']['align-len']
Out[16]: 138
In [17]: hits['1zzw']['identity']
Out[17]: 68
To obtain all hits, simply run the function without specifying parameters:
In [18]: all_hits = blast_record.getHits()
Download hits¶
PDB hits can be downloaded using fetchPDB()
function:
filenames = fetchPDB(hits.keys())
filenames