PDB Blast Search

This module defines functions for blast searching Protein Data Bank.

class PDBBlastRecord(xml, sequence=None)[source]

A class to store results from ProteinDataBank blast search.

Instantiate a PDBlast object instance.

Parameters:
  • xml (str) – blast search results in XML format or an XML file that contains the results
  • sequence (str) – query sequence
getBest()[source]

Returns a dictionary containing structure and alignment information for the hit with highest sequence identity.

getHits(percent_identity=90.0, percent_overlap=70.0, chain=False)[source]

Returns a dictionary in which PDB identifiers are mapped to structure and alignment information.

Parameters:
  • percent_identity (float) – PDB hits with percent sequence identity equal to or higher than this value will be returned, default is 90.0
  • percent_overlap (float) – PDB hits with percent coverage of the query sequence equivalent or better will be returned, default is 70.0
  • chain (bool) – if chain is True, individual chains in a PDB file will be considered as separate hits , default is False
getParameters()[source]

Returns parameters used in blast search.

getSequence()[source]

Returns the query sequence that was used in the search.

blastPDB(sequence, filename=None, **kwargs)[source]

Returns a PDBBlastRecord instance that contains results from blast searching of ProteinDataBank database sequence using NCBI blastp.

Parameters:
  • sequence (str) – single-letter code amino acid sequence of the protein without any gap characters, all white spaces will be removed
  • filename (str) – a filename to save the results in XML format

hitlist_size (default is 250) and expect (default is 1e-10) search parameters can be adjusted by the user. sleep keyword argument (default is 2 seconds) determines how long to wait to reconnect for results. Sleep time is doubled when results are not ready. timeout (default is 120s) determines when to give up waiting for the results.