PDB Sequence Clusters

This module defines functions for handling PDB sequence clusters.

fetchPDBClusters(sqid=None)[source]

Retrieve PDB sequence clusters. PDB sequence clusters are results of the weekly clustering of protein chains in the PDB generated by blastclust. They are available at FTP site: https://cdn.rcsb.org/resources/sequence/clusters/

This function will download about 10 Mb of data and save it after compressing in your home directory in .prody/pdbclusters. Compressed files will be less than 4 Mb in size. Cluster data can be loaded using loadPDBClusters() function and be accessed using listPDBCluster().

loadPDBClusters(sqid=None)[source]

Load previously fetched PDB sequence clusters from disk to memory.

listPDBCluster(pdb, ch, sqid=95)[source]

Returns the PDB sequence cluster that contains chain ch in structure pdb for sequence identity level sqid. PDB sequence cluster will be returned in as a list of tuples, e.g. [('1XXX', 'A'), ]. Note that PDB clusters individual chains, so the same PDB identifier may appear twice in the same cluster if the corresponding chain is present in the structure twice.

Before this function is used, fetchPDBClusters() needs to be called. This function will load the PDB sequence clusters for sqid automatically using loadPDBClusters().