Supporting Functions

This module defines a functions for handling conformational ensembles.

saveEnsemble(ensemble, filename=None, **kwargs)[source]

Save ensemble model data as filename.ens.npz. If filename is None, title of the ensemble will be used as the filename, after white spaces in the title are replaced with underscores. Extension is .ens.npz. Upon successful completion of saving, filename is returned. This function makes use of numpy.savez() function.

loadEnsemble(filename)[source]

Returns ensemble instance loaded from filename. This function makes use of numpy.load() function. See also saveEnsemble()

trimPDBEnsemble(pdb_ensemble, **kwargs)[source]

Returns a new PDB ensemble obtained by trimming given pdb_ensemble. This function helps selecting atoms in a pdb ensemble based on one of the following criteria, and returns them in a new PDBEnsemble instance.

Occupancy

Resulting PDB ensemble will contain atoms whose occupancies are greater or equal to occupancy keyword argument. Occupancies for atoms will be calculated using calcOccupancies(pdb_ensemble, normed=True).

Parameters:occupancy (float) – occupancy for selecting atoms, must satisfy 0 < occupancy <= 1
calcOccupancies(pdb_ensemble, normed=False)[source]

Returns occupancy calculated from weights of a PDBEnsemble. Any non-zero weight will be considered equal to one. Occupancies are calculated by binary weights for each atom over the conformations in the ensemble. When normed is True, total weights will be divided by the number of atoms. This function can be used to see how many times a residue is resolved when analyzing an ensemble of X-ray structures.

showOccupancies(pdbensemble, *args, **kwargs)[source]

Show occupancies for the PDB ensemble using plot(). Occupancies are calculated using calcOccupancies().

alignPDBEnsemble(ensemble, suffix='_aligned', outdir='.', gzip=False)[source]

Align PDB files using transformations from ensemble, which may be a PDBEnsemble or a PDBConformation instance. Label of the conformation (see getLabel()) will be used to determine the PDB structure and model number. First four characters of the label is expected to be the PDB identifier and ending numbers to be the model number. For example, the Transformation from conformation with label 2k39_ca_selection_’resnum_<_71’_m116 will be applied to 116th model of structure 2k39. After applicable transformations are made, structure will be written into outputdir as 2k39_aligned.pdb. If gzip is True, output files will be compressed. Return value is the output filename or list of filenames, in the order files are processed. Note that if multiple models from a file are aligned, that filename will appear in the list multiple times.

calcTree(ensemble, distance_matrix)[source]

Given a distance matrix for an ensemble, it creates an returns a tree structure. :arg ensemble: an ensemble with labels. :type ensemble: prody.ensemble.Ensemble or prody.ensemble.PDBEnsemble :arg distance_matrix: a square matrix with length of ensemble. If numbers does not mismatch it will raise an error. :type distance_matrix: numpy.ndarray

showTree(tree, **kwargs)[source]

Given a tree, creates visualization in different formats. arg tree: Tree needs to be unrooted and should be generated by tree generator from Phylo in biopython. type tree: Bio.Phylo.BaseTree.Tree arg format: Depending on the format, you will see different forms of trees. Acceptable formats are plt and ascii. type format: str arg font_size: Font size for branch labels type: float arg line_width: The line width for each branch type: float

buildPDBEnsemble(refpdb, PDBs, title='Unknown', labels=None, seqid=94, coverage=85, occupancy=None, unmapped=None)[source]

Builds a PDB ensemble from a given reference structure and a list of PDB structures. Note that the reference structure should be included in the list as well.

Parameters:
  • refpdb (Chain, Selection, or AtomGroup) – Reference structure
  • PDBs (iterable) – A list of PDB structures
  • title (str) – The title of the ensemble
  • labels (list) – labels of the conformations
  • seqid (int) – Minimal sequence identity (percent)
  • coverage (int) – Minimal sequence overlap (percent)
  • occupancy – Minimal occupancy of columns (range from 0 to 1). Columns whose occupancy

is below this value will be trimmed. :type occupancy: float :arg unmapped: A list of PDB IDs that cannot be included in the ensemble. This is an output argument. :type unmapped: list