Principal Component Analysis

This module defines classes for principal component analysis (PCA) and essential dynamics analysis (EDA) calculations.

class PCA(name='Unknown')[source]

A class for Principal Component Analysis (PCA) of conformational ensembles. See examples in Ensemble Analysis.

addEigenpair(eigenvector, eigenvalue=None)[source]

Add eigen vector and eigen value pair(s) to the instance. If eigen value is omitted, it will be set to 1. Eigenvalues are set as variances.

buildCovariance(coordsets, **kwargs)[source]

Build a covariance matrix for coordsets using mean coordinates as the reference. coordsets argument may be one of the following:

For ensemble and trajectory objects, update_coords=True argument can be used to set the mean coordinates as the coordinates of the object.

When coordsets is a trajectory object, such as DCDFile, covariance will be built by superposing frames onto the reference coordinate set (see Frame.superpose()). If frames are already aligned, use aligned=True argument to skip this step.

Note

If coordsets is a PDBEnsemble instance, coordinates are treated specially. Let’s say C_ij is the element of the covariance matrix that corresponds to atoms i and j. This super element is divided by number of coordinate sets (PDB models or structures) in which both of these atoms are observed together.

calcModes(n_modes=20, turbo=True)[source]

Calculate principal (or essential) modes. This method uses scipy.linalg.eigh(), or numpy.linalg.eigh(), function to diagonalize the covariance matrix.

Parameters:
  • n_modes (int) – number of non-zero eigenvalues/vectors to calculate, default is 20, if None or 'all' is given, all modes will be calculated
  • turbo (bool) – when available, use a memory intensive but faster way to calculate modes, default is True
getArray()

Returns a copy of eigenvectors array.

getCovariance()

Returns covariance matrix. If covariance matrix is not set or yet calculated, it will be calculated using available modes.

getEigvals()

Returns eigenvalues. For PCA and EDA models built using coordinate data in Å, unit of eigenvalues is Å2. For ANM, GNM, and RTB, on the other hand, eigenvalues are in arbitrary or relative units but they correlate with stiffness of the motion along associated eigenvector.

getEigvecs()

Returns a copy of eigenvectors array.

getModel()

Returns self.

getTitle()

Returns title of the model.

getVariances()

Returns variances. For PCA and EDA models built using coordinate data in Å, unit of variance is Å2. For ANM, GNM, and RTB, on the other hand, variance is the inverse of the eigenvalue, so it has arbitrary or relative units.

is3d()

Returns True if model is 3-dimensional.

numAtoms()

Returns number of atoms.

numDOF()

Returns number of degrees of freedom.

numEntries()

Returns number of entries in one eigenvector.

numModes()

Returns number of modes in the instance (not necessarily maximum number of possible modes).

performSVD(coordsets)[source]

Calculate principal modes using singular value decomposition (SVD). coordsets argument may be a Atomic, Ensemble, or numpy.ndarray instance. If coordsets is a numpy array, its shape must be (n_csets, n_atoms, 3). Note that coordinate sets must be aligned prior to SVD calculations.

This is a considerably faster way of performing PCA calculations compared to eigenvalue decomposition of covariance matrix, but is an approximate method when heterogeneous datasets are analyzed. Covariance method should be preferred over this one for analysis of ensembles with missing atomic data. See Calculations example for comparison of results from SVD and covariance methods.

setCovariance(covariance, is3d=True)[source]

Set covariance matrix.

setEigens(vectors, values=None)[source]

Set eigen vectors and eigen values. If eigen values are omitted, they will be set to 1. Eigenvalues are set as variances.

setTitle(title)

Set title of the model.

class EDA(name='Unknown')[source]

A class for Essential Dynamics Analysis (EDA) [AA93]. See examples in Essential Dynamics Analysis.

[AA93]Amadei A, Linssen AB, Berendsen HJ. Essential dynamics of proteins. Proteins 1993 17(4):412-25.
addEigenpair(eigenvector, eigenvalue=None)

Add eigen vector and eigen value pair(s) to the instance. If eigen value is omitted, it will be set to 1. Eigenvalues are set as variances.

buildCovariance(coordsets, **kwargs)

Build a covariance matrix for coordsets using mean coordinates as the reference. coordsets argument may be one of the following:

For ensemble and trajectory objects, update_coords=True argument can be used to set the mean coordinates as the coordinates of the object.

When coordsets is a trajectory object, such as DCDFile, covariance will be built by superposing frames onto the reference coordinate set (see Frame.superpose()). If frames are already aligned, use aligned=True argument to skip this step.

Note

If coordsets is a PDBEnsemble instance, coordinates are treated specially. Let’s say C_ij is the element of the covariance matrix that corresponds to atoms i and j. This super element is divided by number of coordinate sets (PDB models or structures) in which both of these atoms are observed together.

calcModes(n_modes=20, turbo=True)

Calculate principal (or essential) modes. This method uses scipy.linalg.eigh(), or numpy.linalg.eigh(), function to diagonalize the covariance matrix.

Parameters:
  • n_modes (int) – number of non-zero eigenvalues/vectors to calculate, default is 20, if None or 'all' is given, all modes will be calculated
  • turbo (bool) – when available, use a memory intensive but faster way to calculate modes, default is True
getArray()

Returns a copy of eigenvectors array.

getCovariance()

Returns covariance matrix. If covariance matrix is not set or yet calculated, it will be calculated using available modes.

getEigvals()

Returns eigenvalues. For PCA and EDA models built using coordinate data in Å, unit of eigenvalues is Å2. For ANM, GNM, and RTB, on the other hand, eigenvalues are in arbitrary or relative units but they correlate with stiffness of the motion along associated eigenvector.

getEigvecs()

Returns a copy of eigenvectors array.

getModel()

Returns self.

getTitle()

Returns title of the model.

getVariances()

Returns variances. For PCA and EDA models built using coordinate data in Å, unit of variance is Å2. For ANM, GNM, and RTB, on the other hand, variance is the inverse of the eigenvalue, so it has arbitrary or relative units.

is3d()

Returns True if model is 3-dimensional.

numAtoms()

Returns number of atoms.

numDOF()

Returns number of degrees of freedom.

numEntries()

Returns number of entries in one eigenvector.

numModes()

Returns number of modes in the instance (not necessarily maximum number of possible modes).

performSVD(coordsets)

Calculate principal modes using singular value decomposition (SVD). coordsets argument may be a Atomic, Ensemble, or numpy.ndarray instance. If coordsets is a numpy array, its shape must be (n_csets, n_atoms, 3). Note that coordinate sets must be aligned prior to SVD calculations.

This is a considerably faster way of performing PCA calculations compared to eigenvalue decomposition of covariance matrix, but is an approximate method when heterogeneous datasets are analyzed. Covariance method should be preferred over this one for analysis of ensembles with missing atomic data. See Calculations example for comparison of results from SVD and covariance methods.

setCovariance(covariance, is3d=True)

Set covariance matrix.

setEigens(vectors, values=None)

Set eigen vectors and eigen values. If eigen values are omitted, they will be set to 1. Eigenvalues are set as variances.

setTitle(title)

Set title of the model.