PDB Structure Ensemble

This module defines a class for handling ensembles of PDB conformations.

class PDBEnsemble(title='Unknown')[source]

This class enables handling coordinates for heterogeneous structural datasets and stores identifiers for individual conformations.

See usage usage in Heterogeneous X-ray Structures, Multimeric Structures, and Homologous Proteins.

Note

This class is designed to handle conformations with missing coordinates, e.g. atoms that are note resolved in an X-ray structure. For unresolved atoms, the coordinates of the reference structure is assumed in RMSD calculations and superpositions.

addCoordset(coords, weights=None, label=None, **kwargs)[source]

Add coordinate set(s) to the ensemble. coords must be a Numpy array with suitable shape and dimensionality, or an object with getCoordsets(). weights is an optional argument. If provided, its length must match number of atoms. Weights of missing (not resolved) atoms must be 0 and weights of those that are resolved can be anything greater than 0. If not provided, weights of all atoms for this coordinate set will be set equal to 1. label, which may be a PDB identifier or a list of identifiers, is used to label conformations.

delCoordset(index)[source]

Delete a coordinate set from the ensemble.

delData(label)

Return data associated with label and remove from the instance. If data associated with label is not found, return None.

deselect()

Undoes the selection.

getAtoms(selected=True)

Returns associated/selected atoms.

getConformation(index)[source]

Returns conformation at given index.

getCoords(selected=True)

Returns a copy of reference coordinates for selected atoms.

getCoordsets(indices=None, selected=True)[source]

Returns a copy of coordinate set(s) at given indices for selected atoms. indices may be an integer, a list of integers or None. None returns all coordinate sets.

Warning

When there are atoms with weights equal to zero (0), their coordinates will be replaced with the coordinates of the ensemble reference coordinate set.

getData(label)

Returns a copy of the data array associated with label, or None if such data is not present.

getDataLabels(which=None)

Returns data labels. For which='user', return only labels of user provided data.

getDataType(label)

Returns type of the data (i.e. data.dtype) associated with label, or None label is not used.

getDefvecs(pairwise=False)

Calculate and return deformation vectors (defvecs). Note that you might need to align the conformations using superpose() or iterpose() before calculating defvecs.

Parameters:pairwise (bool) – if True then it will return pairwise defvecs as an n-by-n matrix. n is the number of conformations.
getDeviations()

Returns deviations from reference coordinates for selected atoms. Conformations can be aligned using one of superpose() or iterpose() methods prior to calculating deviations.

getIndices()

Returns a copy of indices of selected columns

getLabels()[source]

Returns identifiers of the conformations in the ensemble.

getMSA(indices=None, selected=True)[source]

Returns an MSA of selected atoms.

getMSFs()[source]

Calculate and return mean square fluctuations (MSFs). Note that you might need to align the conformations using superpose() or iterpose() before calculating MSFs.

getRMSDs(pairwise=False)[source]

Calculate and return root mean square deviations (RMSDs). Note that you might need to align the conformations using superpose() or iterpose() before calculating RMSDs.

Parameters:pairwise (bool) – if True then it will return pairwise RMSDs as an n-by-n matrix. n is the number of conformations.
getRMSFs()

Returns root mean square fluctuations (RMSFs) for selected atoms. Conformations can be aligned using one of superpose() or iterpose() methods prior to RMSF calculation.

getTitle()

Returns title of the ensemble.

getTransformations()[source]

Returns the Transformation used to superpose this conformation onto reference coordinates. The transformation can be used to superpose original PDB file onto the reference PDB file.

getWeights(selected=True)

Returns a copy of weights of selected atoms.

isDataLabel(label)

Returns True if data associated with label is present.

isSelected()

Returns if a subset of atoms are selected.

iterCoordsets()[source]

Iterate over coordinate sets. A copy of each coordinate set for selected atoms is returned. Reference coordinates are not included.

iterpose(rmsd=0.0001)[source]

Iteratively superpose the ensemble until convergence. Initially, all conformations are aligned with the reference coordinates. Then mean coordinates are calculated, and are set as the new reference coordinates. This is repeated until reference coordinates do not change. This is determined by the value of RMSD between the new and old reference coordinates. Note that at the end of the iterative procedure the reference coordinate set will be average of conformations in the ensemble.

Parameters:rmsd (float) – change in reference coordinates to determine convergence, default is 0.0001 Å RMSD
numAtoms(selected=True)

Returns number of atoms.

numConfs()

Returns number of conformations.

numCoordsets()

Returns number of conformations.

numSelected()

Returns number of selected atoms. Number of all atoms will be returned if a selection is not made. A subset of atoms can be selected by passing a selection to setAtoms().

select(selection)

Selects columns corresponding to a part of the atoms.

setAtoms(atoms)

Set atoms or specify a selection of atoms to be considered in calculations and coordinate requests. When a selection is set, corresponding subset of coordinates will be considered in, for example, alignments and RMSD calculations. Setting atoms also allows some functions to access atomic data when needed. For example, Ensemble and Conformation instances become suitable arguments for writePDB(). Passing None as atoms argument will deselect atoms.

setCoords(coords)

Set coords as the ensemble reference coordinate set. coords may be an array with suitable data type, shape, and dimensionality, or an object with getCoords() method.

setData(label, data)

Store atomic data under label, which must:

  • start with a letter
  • contain only alphanumeric characters and underscore
  • not be a reserved word (see listReservedWords())

data must be a list() or a ndarray and its length must be equal to the number of atoms. If the dimension of the data array is 1, i.e. data.ndim==1, label may be used to make atom selections, e.g. "label 1 to 10" or "label C1 C2". Note that, if data with label is present, it will be overwritten.

setTitle(title)

Set title of the ensemble.

setWeights(weights)[source]

Set atomic weights.

superpose(**kwargs)[source]

Superpose the ensemble onto the reference coordinates obtained by getCoords().