Supporting Functions

This module defines a functions for handling conformational ensembles.

saveEnsemble(ensemble, filename=None, **kwargs)[source]

Save ensemble model data as filename.ens.npz. If filename is None, title of the ensemble will be used as the filename, after white spaces in the title are replaced with underscores. Extension is .ens.npz. Upon successful completion of saving, filename is returned. This function makes use of savez() function.

loadEnsemble(filename, **kwargs)[source]

Returns ensemble instance loaded from filename. This function makes use of load() function. See also saveEnsemble()

trimPDBEnsemble(pdb_ensemble, occupancy=None, **kwargs)[source]

Returns a new PDB ensemble obtained by trimming given pdb_ensemble. This function helps selecting atoms in a pdb ensemble based on one of the following criteria, and returns them in a new PDBEnsemble instance.

Resulting PDB ensemble will contain atoms whose occupancies are greater or equal to occupancy keyword argument. Occupancies for atoms will be calculated using calcOccupancies(pdb_ensemble, normed=True).

Parameters:
  • occupancy (float) – occupancy for selecting atoms, must satisfy 0 < occupancy <= 1. If set to None then hard trimming will be performed.
  • hard (bool) – Whether to perform hard trimming. Default is False If set to True, atoms will be completely removed from pdb_ensemble. If set to False, a soft trimming of pdb_ensemble will be done where atoms will be removed from the active selection. This is useful, for example, when one uses calcEnsembleENMs() together with sliceModel() or reduceModel() to calculate the modes from the remaining part while still taking the removed part into consideration (e.g. as the environment).
calcOccupancies(pdb_ensemble, normed=False)[source]

Returns occupancy calculated from weights of a PDBEnsemble. Any non-zero weight will be considered equal to one. Occupancies are calculated by binary weights for each atom over the conformations in the ensemble. When normed is True, total weights will be divided by the number of atoms. This function can be used to see how many times a residue is resolved when analyzing an ensemble of X-ray structures.

showOccupancies(pdbensemble, *args, **kwargs)[source]

Show occupancies for the PDB ensemble using plot(). Occupancies are calculated using calcOccupancies().

alignPDBEnsemble(ensemble, suffix='_aligned', outdir='.', gzip=False)[source]

Align PDB files using transformations from ensemble, which may be a PDBEnsemble or a PDBConformation instance. Label of the conformation (see getLabel()) will be used to determine the PDB structure and model number. First four characters of the label is expected to be the PDB identifier and ending numbers to be the model number. For example, the Transformation from conformation with label 2k39_ca_selection_’resnum_<_71’_m116 will be applied to 116th model of structure 2k39. After applicable transformations are made, structure will be written into outputdir as 2k39_aligned.pdb. If gzip=True, output files will be compressed. Return value is the output filename or list of filenames, in the order files are processed. Note that if multiple models from a file are aligned, that filename will appear in the list multiple times.

buildPDBEnsemble(PDBs, ref=None, title='Unknown', labels=None, seqid=94, coverage=85, mapping_func=<function mapOntoChain>, unmapped=None, **kwargs)[source]

Builds a PDB ensemble from a given reference structure and a list of PDB structures. Note that the reference structure should be included in the list as well.

Parameters:
  • PDBs (iterable) – A list of PDB structures
  • ref (int, Chain, Selection, or AtomGroup) – Reference structure or the index to the reference in PDBs. If None, then the first item in PDBs will be considered as the reference. Default is None
  • title (str) – The title of the ensemble
  • labels (list) – labels of the conformations
  • seqid (int) – Minimal sequence identity (percent)
  • coverage (int) – Minimal sequence overlap (percent)
  • occupancy (float) – Minimal occupancy of columns (range from 0 to 1). Columns whose occupancy is below this value will be trimmed.
  • unmapped (list) – A list of PDB IDs that cannot be included in the ensemble. This is an output argument.
addPDBEnsemble(ensemble, PDBs, refpdb=None, labels=None, seqid=94, coverage=85, mapping_func=<function mapOntoChain>, occupancy=None, unmapped=None, **kwargs)[source]

Adds extra structures to a given PDB ensemble.

Parameters:
  • ensemble (PDBEnsemble) – The ensemble to which the PDBs are added.
  • refpdb (Chain, Selection, or AtomGroup) – Reference structure. If set to None, it will be set to ensemble.getAtoms() automatically.
  • PDBs (iterable) – A list of PDB structures
  • title (str) – The title of the ensemble
  • labels (list) – labels of the conformations
  • seqid (int) – Minimal sequence identity (percent)
  • coverage (int) – Minimal sequence overlap (percent)
  • occupancy (float) – Minimal occupancy of columns (range from 0 to 1). Columns whose occupancy is below this value will be trimmed.
  • unmapped (list) – A list of PDB IDs that cannot be included in the ensemble. This is an output argument.
refineEnsemble(ens, lower=0.5, upper=10.0)[source]

Refine a PDB ensemble based on RMSD criterions.