The potential energy of a Gaussian Network Model is defined as the summation of harmonic potentials over all unique $(i,j)$-pairs and is a function of only the square of inter-residue distance vector, $\Delta \mathbf{R}_{ij} = \mathbf{R}_i - \mathbf{R}_j^0$. It can be given in terms of the Kirchhoff matrix for inter-residue contacts, $\mathbf{\Gamma}$,
$$ V^{GNM} = \frac{\gamma}{2} \Delta \mathbf{R}^T \left( \mathbf{\Gamma} \otimes \mathbf{I}_{3\times 3} \right) \Delta \mathbf{R} \,, $$where $\mathbf{\Gamma}$ is an $(N \times N)$-matrix and is defined as:
$$ \mathbf{\Gamma}_{ij} = \left\{\begin{matrix} -1 & \text{if } i \ne j \text{ and }R_{ij} \le R_{cut} \,, \\ 0 & \text{if } i \ne j \text{ and }R_{ij} > R_{cut} \,, \\ -\sum_{j,j \ne i}^{N} \mathbf{\Gamma}_{ij} & \text{if } i = j \,. \end{matrix}\right. $$First, let's import all the required packages!
from numpy import *
from matplotlib.pyplot import *
from prody import *
confProDy(auto_show=False)
confProDy(auto_secondary=True)
Tip: how to access the documentation
parsePDB?
Parse the PDB structure for adenylate kinase.
ake = parsePDB('4ake', compressed=False)
ake
figure(dpi=100)
showProtein(ake);
This PDB file contains 2 chains. If we only want to use $C^{\alpha}$ atoms from chain A, then we can select it as:
calphas = ake.select('calpha and chain A')
calphas
calphas.numAtoms()
figure(dpi=100)
showProtein(calphas);
See also the online tutorial.
First, let's instantiate an object of GNM class.
gnm = GNM('AKE') # assign a name
Now, we can build the Kirchhoff matrix by the method buildKirchhoff
.
What are the parameters of this method?
gnm.buildKirchhoff?
Let's build Kirchhoff matrix based on the coordinate of $C^{\alpha}$-atoms already selected.
gnm.buildKirchhoff(calphas)
We can get the Kirchhoff matrix by the following method:
gnm.getKirchhoff()
Check that every row sum is equal to 0 (since it is a Kirchhoff matrix!):
apply_along_axis(sum, 1, gnm.getKirchhoff())
Degrees of residues:
diag(gnm.getKirchhoff())
We can calculate normal modes from the Kirchhoff matrix by the method calcModes. Many parameters in this function can be adjusted, e.g. the number of modes: calcModes( ) doc
gnm.calcModes?
gnm.calcModes(n_modes=None) # or: (n_modes='all')
gnm.calcModes(n_modes=10)
gnm.calcModes() # n_modes=20 (default)
gnm.calcModes(n_modes=None, zeros=True)
The algebraic multiplicity of zero eigenvalues determines separate connected components of your graph.
If it is equal to one, then your graph does not have a disconnected piece!
Get eigenvalues and eigenvectors:
gnm.getEigvals().round(3)
gnm.getEigvecs()
NB: The eigenvector corresponding to 0 eigenvalue is always constant.
gnm.getEigvecs()[:, 0]
How about the variances:
gnm.getVariances()[1:].round(3)
NB: A more compact way to compute GNM is to use the function calcGNM( ) doc
gnm.calcModes()
The contact map can be visualized with the command:
figure(dpi=100)
showContactMap(gnm, cmap='Greys');
... while the shape of normal modes can be visualized by typing:
figure(dpi=100)
showMode(gnm[0]);
We can show possible hinge sites on that plot as well.
figure(dpi=100)
showMode(gnm[0], hinges=True);
Furthermore, you can also access hinge sites identified from multiple modes (e.g. 2 modes) by:
calcHinges(gnm[:2])
Many other visualization tools can be found online at the page Plotting Functions.
Let's get the covariance matrix first, $C_{ij} \propto \sum_{i=1}^{N-1} \lambda_i^{-1} \mathbf{u}_i \mathbf{u}_i^T$:
cov = gnm.getCovariance()
cov.shape
cov
Alternative ways to compute the covariance and (normalized/orientational) cross-correlation matrix, $C_{ij}^{(n)} = C_{ij} \big/ \left(C_{ii} C_{jj} \right)^{1/2}$:
C1 = calcCovariance(gnm)
# again covariance matrix for GNM, not for ANM!
C2 = calcCrossCorr(gnm, norm=False)
# normalized/orientational cross-correlation matrix
C3 = calcCrossCorr(gnm)
all(C1 == cov)
all(C2 == cov)
all(C3 == cov)
C3
We can visualize the normalized/orientational cross-correlations between residue fluctuations with the command:
figure(dpi=100)
showCrossCorr(gnm);
The residues' mean square fluctuations (MSF) are found on the diagonal of the covariance matrix and describe the residue mobility or flexibility.
# MSFs from the "computed" modes,
# and here the total number of calculated modes is 20.
figure(dpi=100)
showSqFlucts(gnm);
Moreover, you can specify the number of modes to compute the MSFs to see either their indivudual or cumulative contributions visually as well!
# square fluctuations computed from 2 slowest (lowest-energy) modes
figure(dpi=100)
showSqFlucts(gnm[:2])
grid();
We can also see how well the computed MSF correlates with the experimental B-factors (Debye-Waller) extracted from the PDB file. It is better to use all modes for this aim.
gnm.calcModes(None)
# rescale MSFs
bfactors = calphas.getBetas()
msfs = calcSqFlucts(gnm)
msfs = msfs / mean(msfs) * mean(bfactors)
figure(figsize=(9, 5), dpi=300)
plot(bfactors, 'orange', label='Experimental')
plot(msfs, 'g', lw=1., label='GNM')
grid()
legend()
tight_layout();
Let's calculate the effective force constant $\gamma$ of GNM springs in $k_B T \mathring{A}^{-2}$:
rescaling_prefactor = (bfactors.mean() / calcSqFlucts(gnm).mean())
rescaling_prefactor.round(5)
gamma = 8 * pi**2 / rescaling_prefactor
gamma.round(5)
Let's evaluate Pearson's correlation coefficient between experimental B-factors and GNM predicted MSFs:
corrcoef(calphas.getBetas(), calcSqFlucts(gnm))[0, 1].round(3)
The values above are equal to those that DynOmics server gives!
figure(dpi=100)
showProtein(calphas, mode=gnm[0]);
GNM modes and square fluctuations can also be saved in .nmd format and visualized with the VMD plugin NMWizard:
This plugin can be found by navigating from Extensions in VMD main menu --> Analysis --> Normal Mode Wizard.
writeNMD('AKE_gnm.nmd', gnm, calphas)
We can now run VMD from jupyter notebook using an exclamation mark to tell the jupyter notebook that it is a command run by the operating system not by the python kernel.
Alternatively, MSFs can be stored in the tempFactor column of a PDB file and again visualized in VMD:
msf_10 = calcSqFlucts(gnm[:10])
writePDB('AKE_sqFlucts_on_beta.pdb', calphas, beta=msf_10)
ubi = parsePDB('1aar', chain='A', subset='calpha', compressed=False)
ubi_gnm, ubi_calphas = calcGNM(ubi, cutoff=10)
figure(dpi=100)
showMode(ubi_gnm[0]);
figure(dpi=100)
showProtein(ubi_calphas, mode=ubi_gnm[0]);
The unfolded C-terminal (residues 71-76) of ubiquitin is particularly flexible and its contribution to total residue fluctuations is dominant. Therefore, we need to exclude it from the analysis.
ubi_gnm_fixed, ubi_calphas_fixed = calcGNM(ubi.select('resnum < 71'), cutoff=10)
figure(dpi=100)
showMode(ubi_gnm_fixed[0])
grid();
figure(dpi=100)
showProtein(ubi_calphas_fixed, mode=ubi_gnm_fixed[0]);
Let's look at the effect of those residues on the collectivity of a particular mode, whose degree basically reflects the number of atoms which are significantly affected by that mode. The collectivity is defined as the exponential of the information entropy of that mode: calcCollectivity
calcCollectivity(ubi_gnm[0])
calcCollectivity(ubi_gnm_fixed[0])
We can easily compute the collectivity of a set of modes at once!
figure(figsize=(9, 5), dpi=100)
bar(arange(10)-0.1, calcCollectivity(ubi_gnm[:10]), width=.2, color='r', label='ubi')
bar(arange(10)+0.1, calcCollectivity(ubi_gnm_fixed[:10]), width=.2, color='b', label='ubi_fixed')
xlabel('mode number')
ylabel('collectivity')
legend();