PDB Blast Search

This module defines functions for blast searching the Protein Data Bank.

class PDBBlastRecord(xml, sequence=None)[source]

A class to store results from blast searches.

Instantiate a PDBBlastRecord object instance.

Parameters:
  • xml (str) – blast search results in XML format or an XML file that contains the results
  • sequence (str) – query sequence
getBest()[source]

Returns a dictionary containing structure and alignment information for the hit with highest sequence identity.

getHits(percent_identity=0.0, percent_overlap=0.0, chain=False)[source]

Returns a dictionary in which PDB identifiers are mapped to structure and alignment information.

Parameters:
  • percent_identity (float) – PDB hits with percent sequence identity equal to or higher than this value will be returned, default is 0.
  • percent_overlap (float) – PDB hits with percent coverage of the query sequence equivalent or better will be returned, default is 0.
  • chain (bool) – if chain is True, individual chains in a PDB file will be considered as separate hits , default is False
getParameters()[source]

Returns parameters used in blast search.

getSequence()[source]

Returns the query sequence that was used in the search.

writeSequences(filename, **kwargs)[source]

Returns a plot that contains a dendrogram of the sequence similarities among the sequences in given hit list.

Parameters:hits (dict) – A dictionary that contains hits that are obtained from a blast record object.

Arguments of getHits can be parsed as kwargs.

blastPDB(sequence, filename=None, **kwargs)[source]

Returns a PDBBlastRecord instance that contains results from blast searching sequence against the PDB using NCBI blastp.

Parameters:
  • sequence (Atomic, Sequence, or str) – an object with an associated sequence string or a sequence string itself
  • filename (str) – a filename to save the results in XML format

hitlist_size (default is 250) and expect (default is 1e-10) search parameters can be adjusted by the user. sleep keyword argument (default is 2 seconds) determines how long to wait to reconnect for results. Sleep time is multiplied by 1.5 when results are not ready. timeout (default is 120 s) determines when to give up waiting for the results.