Principal Component Analysis¶
This module defines classes for principal component analysis (PCA) and essential dynamics analysis (EDA) calculations.
-
class
PCA
(name='Unknown')[source]¶ A class for Principal Component Analysis (PCA) of conformational ensembles. See examples in Ensemble Analysis.
-
addEigenpair
(eigenvector, eigenvalue=None)[source]¶ Add eigen vector and eigen value pair(s) to the instance. If eigen value is omitted, it will be set to 1. Eigenvalues are set as variances.
-
buildCovariance
(coordsets, **kwargs)[source]¶ Build a covariance matrix for coordsets using mean coordinates as the reference. coordsets argument may be one of the following:
For ensemble and trajectory objects,
update_coords=True
argument can be used to set the mean coordinates as the coordinates of the object.When coordsets is a trajectory object, such as
DCDFile
, covariance will be built by superposing frames onto the reference coordinate set (seeFrame.superpose()
). If frames are already aligned, usealigned=True
argument to skip this step.Note
If coordsets is a
PDBEnsemble
instance, coordinates are treated specially. Let’s say C_ij is the element of the covariance matrix that corresponds to atoms i and j. This super element is divided by number of coordinate sets (PDB models or structures) in which both of these atoms are observed together.
-
calcModes
(n_modes=20, turbo=True)[source]¶ Calculate principal (or essential) modes. This method uses
scipy.linalg.eigh()
, ornumpy.linalg.eigh()
, function to diagonalize the covariance matrix.Parameters:
-
getArray
()¶ Returns a copy of eigenvectors array.
-
getCovariance
()¶ Returns covariance matrix. If covariance matrix is not set or yet calculated, it will be calculated using available modes.
-
getEigvals
()¶ Returns eigenvalues. For
PCA
andEDA
models built using coordinate data in Å, unit of eigenvalues is Å2. ForANM
,GNM
, andRTB
, on the other hand, eigenvalues are in arbitrary or relative units but they correlate with stiffness of the motion along associated eigenvector.
-
getEigvecs
()¶ Returns a copy of eigenvectors array.
-
getModel
()¶ Returns self.
-
getTitle
()¶ Returns title of the model.
-
getVariances
()¶ Returns variances. For
PCA
andEDA
models built using coordinate data in Å, unit of variance is Å2. ForANM
,GNM
, andRTB
, on the other hand, variance is the inverse of the eigenvalue, so it has arbitrary or relative units.
-
is3d
()¶ Returns True if model is 3-dimensional.
-
numAtoms
()¶ Returns number of atoms.
-
numDOF
()¶ Returns number of degrees of freedom.
-
numEntries
()¶ Returns number of entries in one eigenvector.
-
numModes
()¶ Returns number of modes in the instance (not necessarily maximum number of possible modes).
-
performSVD
(coordsets)[source]¶ Calculate principal modes using singular value decomposition (SVD). coordsets argument may be a
Atomic
,Ensemble
, ornumpy.ndarray
instance. If coordsets is a numpy array, its shape must be(n_csets, n_atoms, 3)
. Note that coordinate sets must be aligned prior to SVD calculations.This is a considerably faster way of performing PCA calculations compared to eigenvalue decomposition of covariance matrix, but is an approximate method when heterogeneous datasets are analyzed. Covariance method should be preferred over this one for analysis of ensembles with missing atomic data. See Calculations example for comparison of results from SVD and covariance methods.
-
setEigens
(vectors, values=None)[source]¶ Set eigen vectors and eigen values. If eigen values are omitted, they will be set to 1. Eigenvalues are set as variances.
-
setTitle
(title)¶ Set title of the model.
-
-
class
EDA
(name='Unknown')[source]¶ A class for Essential Dynamics Analysis (EDA) [AA93]. See examples in Essential Dynamics Analysis.
[AA93] Amadei A, Linssen AB, Berendsen HJ. Essential dynamics of proteins. Proteins 1993 17(4):412-25. -
addEigenpair
(eigenvector, eigenvalue=None)¶ Add eigen vector and eigen value pair(s) to the instance. If eigen value is omitted, it will be set to 1. Eigenvalues are set as variances.
-
buildCovariance
(coordsets, **kwargs)¶ Build a covariance matrix for coordsets using mean coordinates as the reference. coordsets argument may be one of the following:
For ensemble and trajectory objects,
update_coords=True
argument can be used to set the mean coordinates as the coordinates of the object.When coordsets is a trajectory object, such as
DCDFile
, covariance will be built by superposing frames onto the reference coordinate set (seeFrame.superpose()
). If frames are already aligned, usealigned=True
argument to skip this step.Note
If coordsets is a
PDBEnsemble
instance, coordinates are treated specially. Let’s say C_ij is the element of the covariance matrix that corresponds to atoms i and j. This super element is divided by number of coordinate sets (PDB models or structures) in which both of these atoms are observed together.
-
calcModes
(n_modes=20, turbo=True)¶ Calculate principal (or essential) modes. This method uses
scipy.linalg.eigh()
, ornumpy.linalg.eigh()
, function to diagonalize the covariance matrix.Parameters:
-
getArray
()¶ Returns a copy of eigenvectors array.
-
getCovariance
()¶ Returns covariance matrix. If covariance matrix is not set or yet calculated, it will be calculated using available modes.
-
getEigvals
()¶ Returns eigenvalues. For
PCA
andEDA
models built using coordinate data in Å, unit of eigenvalues is Å2. ForANM
,GNM
, andRTB
, on the other hand, eigenvalues are in arbitrary or relative units but they correlate with stiffness of the motion along associated eigenvector.
-
getEigvecs
()¶ Returns a copy of eigenvectors array.
-
getModel
()¶ Returns self.
-
getTitle
()¶ Returns title of the model.
-
getVariances
()¶ Returns variances. For
PCA
andEDA
models built using coordinate data in Å, unit of variance is Å2. ForANM
,GNM
, andRTB
, on the other hand, variance is the inverse of the eigenvalue, so it has arbitrary or relative units.
-
is3d
()¶ Returns True if model is 3-dimensional.
-
numAtoms
()¶ Returns number of atoms.
-
numDOF
()¶ Returns number of degrees of freedom.
-
numEntries
()¶ Returns number of entries in one eigenvector.
-
numModes
()¶ Returns number of modes in the instance (not necessarily maximum number of possible modes).
-
performSVD
(coordsets)¶ Calculate principal modes using singular value decomposition (SVD). coordsets argument may be a
Atomic
,Ensemble
, ornumpy.ndarray
instance. If coordsets is a numpy array, its shape must be(n_csets, n_atoms, 3)
. Note that coordinate sets must be aligned prior to SVD calculations.This is a considerably faster way of performing PCA calculations compared to eigenvalue decomposition of covariance matrix, but is an approximate method when heterogeneous datasets are analyzed. Covariance method should be preferred over this one for analysis of ensembles with missing atomic data. See Calculations example for comparison of results from SVD and covariance methods.
-
setCovariance
(covariance, is3d=True)¶ Set covariance matrix.
-
setEigens
(vectors, values=None)¶ Set eigen vectors and eigen values. If eigen values are omitted, they will be set to 1. Eigenvalues are set as variances.
-
setTitle
(title)¶ Set title of the model.
-