NMR Models¶
This example shows how to perform principal component analysis (PCA) of an ensemble of NMR models. The protein of interest is ubiquitin, and for illustration puposes, we will repeat the calculations for the ensemble of ubiquitin models that were analyzed in [AB09].
A PCA object that stores covariance matrix and principal modes that
describe the dominant changes in the dataset will be obtained. PCA
and principal modes (Mode) can be used as input to functions in
dynamics module for further analysis.
Notes¶
Note that this example is slightly different from that in the ProDy Tutorial.
This example uses Ensemble which has a method for performing
iterative superposition.
Also, note that this example applies to any PDB file that contains multiple models.
Prepare ensemble¶
We start by importing everything from the ProDy package:
In [1]: from prody import *
In [2]: from pylab import *
In [3]: ion()
We parse only Cα atoms using parsePDB() (note that it is possible to
repeat this calculation for all atoms):
In [4]: ubi = parsePDB('2k39', subset='calpha')
We use residues 1 to 70, as residues 71 to 76 are very mobile and including them skews the results.
In [5]: ubi = ubi.select('resnum < 71').copy()
In [6]: ensemble = Ensemble('Ubiquitin NMR ensemble')
In [7]: ensemble.setCoords( ubi.getCoords() )
Then, we add all of the coordinate sets to the ensemble, and perform an iterative superposition:
In [8]: ensemble.addCoordset( ubi.getCoordsets() )
In [9]: ensemble.iterpose()
@> Superposing [ 1%]
@> Superposing [ 2%]
@> Superposing [ 3%]
@> Superposing [ 4%] 1s
@> Superposing [ 5%] 1s
@> Superposing [ 6%] 1s
@> Superposing [ 7%] 1s
@> Superposing [ 8%] 1s
@> Superposing [ 9%] 1s
@> Superposing [ 10%] 1s
@> Superposing [ 11%] 1s
@> Superposing [ 12%] 1s
@> Superposing [ 13%] 1s
@> Superposing [ 14%] 1s
@> Superposing [ 15%] 1s
@> Superposing [ 16%] 1s
@> Superposing [ 17%] 1s
@> Superposing [ 18%] 1s
@> Superposing [ 19%] 1s
@> Superposing [ 20%] 1s
@> Superposing [ 21%] 1s
@> Superposing [ 22%] 1s
@> Superposing [ 23%] 1s
@> Superposing [ 24%] 1s
@> Superposing [ 25%] 1s
@> Superposing [ 26%] 1s
@> Superposing [ 27%] 1s
@> Superposing [ 28%] 1s
@> Superposing [ 29%] 1s
@> Superposing [ 30%] 1s
@> Superposing [ 31%] 1s
@> Superposing [ 32%] 1s
@> Superposing [ 33%] 1s
@> Superposing [ 34%] 1s
@> Superposing [ 35%] 1s
@> Superposing [ 36%] 1s
@> Superposing [ 37%] 1s
@> Superposing [ 38%] 1s
@> Superposing [ 39%] 1s
@> Superposing [ 40%] 1s
@> Superposing [ 41%] 1s
@> Superposing [ 42%] 1s
@> Superposing [ 43%] 1s
@> Superposing [ 44%] 1s
@> Superposing [ 45%] 1s
@> Superposing [ 46%] 1s
@> Superposing [ 47%] 1s
@> Superposing [ 48%] 1s
@> Superposing [ 49%] 1s
@> Superposing [ 50%] 1s
@> Superposing [ 51%] 1s
@> Superposing [ 52%] 1s
@> Superposing [ 53%] 1s
@> Superposing [ 54%] 1s
@> Superposing [ 55%] 1s
@> Superposing [ 56%] 1s
@> Superposing [ 57%] 1s
@> Superposing [ 58%] 1s
@> Superposing [ 59%] 1s
@> Superposing [ 60%] 1s
@> Superposing [ 61%] 1s
@> Superposing [ 62%] 1s
@> Superposing [ 63%] 1s
@> Superposing [ 64%] 1s
@> Superposing [ 65%] 1s
@> Superposing [ 66%] 1s
@> Superposing [ 67%] 1s
@> Superposing [ 68%] 1s
@> Superposing [ 69%] 1s
@> Superposing [ 70%] 1s
@> Superposing [ 71%] 1s
@> Superposing [ 72%] 1s
@> Superposing [ 73%] 1s
@> Superposing [ 74%] 1s
@> Superposing [ 75%] 1s
@> Superposing [ 76%] 1s
@> Superposing [ 77%] 1s
@> Superposing [ 78%] 1s
@> Superposing [ 79%] 1s
@> Superposing [ 80%] 1s
@> Superposing [ 81%] 1s
@> Superposing [ 82%] 1s
@> Superposing [ 83%] 1s
@> Superposing [ 84%] 1s
@> Superposing [ 85%] 1s
@> Superposing [ 86%] 1s
@> Superposing [ 87%] 1s
@> Superposing [ 88%] 1s
@> Superposing [ 89%] 1s
@> Superposing [ 90%] 1s
@> Superposing [ 91%] 1s
@> Superposing [ 92%] 1s
@> Superposing [ 93%] 1s
@> Superposing [ 94%] 1s
@> Superposing [ 95%] 1s
@> Superposing [ 96%] 1s
@> Superposing [ 97%] 1s
@> Superposing [ 98%] 1s
@> Superposing [ 99%] 1s
@> Superposing [ 1%]
@> Superposing [ 2%]
@> Superposing [ 3%]
@> Superposing [ 4%] 1s
@> Superposing [ 5%] 1s
@> Superposing [ 6%] 1s
@> Superposing [ 7%] 1s
@> Superposing [ 8%] 1s
@> Superposing [ 9%] 1s
@> Superposing [ 10%] 1s
@> Superposing [ 11%] 1s
@> Superposing [ 12%] 1s
@> Superposing [ 13%] 1s
@> Superposing [ 14%] 1s
@> Superposing [ 15%] 1s
@> Superposing [ 16%] 1s
@> Superposing [ 17%] 1s
@> Superposing [ 18%] 1s
@> Superposing [ 19%] 1s
@> Superposing [ 20%] 1s
@> Superposing [ 21%] 1s
@> Superposing [ 22%] 1s
@> Superposing [ 23%] 1s
@> Superposing [ 24%] 1s
@> Superposing [ 25%] 1s
@> Superposing [ 26%] 1s
@> Superposing [ 27%] 1s
@> Superposing [ 28%] 1s
@> Superposing [ 29%] 1s
@> Superposing [ 30%] 1s
@> Superposing [ 31%] 1s
@> Superposing [ 32%] 1s
@> Superposing [ 33%] 1s
@> Superposing [ 34%] 1s
@> Superposing [ 35%] 1s
@> Superposing [ 36%] 1s
@> Superposing [ 37%] 1s
@> Superposing [ 38%] 1s
@> Superposing [ 39%] 1s
@> Superposing [ 40%] 1s
@> Superposing [ 41%] 1s
@> Superposing [ 42%] 1s
@> Superposing [ 43%] 1s
@> Superposing [ 44%] 1s
@> Superposing [ 45%] 1s
@> Superposing [ 46%] 1s
@> Superposing [ 47%] 1s
@> Superposing [ 48%] 1s
@> Superposing [ 49%] 1s
@> Superposing [ 50%] 1s
@> Superposing [ 51%] 1s
@> Superposing [ 52%] 1s
@> Superposing [ 53%] 1s
@> Superposing [ 54%] 1s
@> Superposing [ 55%] 1s
@> Superposing [ 56%] 1s
@> Superposing [ 57%] 1s
@> Superposing [ 58%] 1s
@> Superposing [ 59%] 1s
@> Superposing [ 60%] 1s
@> Superposing [ 61%] 1s
@> Superposing [ 62%] 1s
@> Superposing [ 63%] 1s
@> Superposing [ 64%] 1s
@> Superposing [ 65%] 1s
@> Superposing [ 66%] 1s
@> Superposing [ 67%] 1s
@> Superposing [ 68%] 1s
@> Superposing [ 69%] 1s
@> Superposing [ 70%] 1s
@> Superposing [ 71%] 1s
@> Superposing [ 72%] 1s
@> Superposing [ 73%] 1s
@> Superposing [ 74%] 1s
@> Superposing [ 75%] 1s
@> Superposing [ 76%] 1s
@> Superposing [ 77%] 1s
@> Superposing [ 78%] 1s
@> Superposing [ 79%] 1s
@> Superposing [ 80%] 1s
@> Superposing [ 81%] 1s
@> Superposing [ 82%] 1s
@> Superposing [ 83%] 1s
@> Superposing [ 84%] 1s
@> Superposing [ 85%] 1s
@> Superposing [ 86%] 1s
@> Superposing [ 87%] 1s
@> Superposing [ 88%] 1s
@> Superposing [ 89%] 1s
@> Superposing [ 90%] 1s
@> Superposing [ 91%] 1s
@> Superposing [ 92%] 1s
@> Superposing [ 93%] 1s
@> Superposing [ 94%] 1s
@> Superposing [ 95%] 1s
@> Superposing [ 96%] 1s
@> Superposing [ 97%] 1s
@> Superposing [ 98%] 1s
@> Superposing [ 99%] 1s
@> Superposing [ 1%]
@> Superposing [ 2%]
@> Superposing [ 3%]
@> Superposing [ 4%] 1s
@> Superposing [ 5%] 1s
@> Superposing [ 6%] 1s
@> Superposing [ 7%] 1s
@> Superposing [ 8%] 1s
@> Superposing [ 9%] 1s
@> Superposing [ 10%] 1s
@> Superposing [ 11%] 1s
@> Superposing [ 12%] 1s
@> Superposing [ 13%] 1s
@> Superposing [ 14%] 1s
@> Superposing [ 15%] 1s
@> Superposing [ 16%] 1s
@> Superposing [ 17%] 1s
@> Superposing [ 18%] 1s
@> Superposing [ 19%] 1s
@> Superposing [ 20%] 1s
@> Superposing [ 21%] 1s
@> Superposing [ 22%] 1s
@> Superposing [ 23%] 1s
@> Superposing [ 24%] 1s
@> Superposing [ 25%] 1s
@> Superposing [ 26%] 1s
@> Superposing [ 27%] 1s
@> Superposing [ 28%] 1s
@> Superposing [ 29%] 1s
@> Superposing [ 30%] 1s
@> Superposing [ 31%] 1s
@> Superposing [ 32%] 1s
@> Superposing [ 33%] 1s
@> Superposing [ 34%] 1s
@> Superposing [ 35%] 1s
@> Superposing [ 36%] 1s
@> Superposing [ 37%] 1s
@> Superposing [ 38%] 1s
@> Superposing [ 39%] 1s
@> Superposing [ 40%] 1s
@> Superposing [ 41%] 1s
@> Superposing [ 42%] 1s
@> Superposing [ 43%] 1s
@> Superposing [ 44%] 1s
@> Superposing [ 45%] 1s
@> Superposing [ 46%] 1s
@> Superposing [ 47%] 1s
@> Superposing [ 48%] 1s
@> Superposing [ 49%] 1s
@> Superposing [ 50%] 1s
@> Superposing [ 51%] 1s
@> Superposing [ 52%] 1s
@> Superposing [ 53%] 1s
@> Superposing [ 54%] 1s
@> Superposing [ 55%] 1s
@> Superposing [ 56%] 1s
@> Superposing [ 57%] 1s
@> Superposing [ 58%] 1s
@> Superposing [ 59%] 1s
@> Superposing [ 60%] 1s
@> Superposing [ 61%] 1s
@> Superposing [ 62%] 1s
@> Superposing [ 63%] 1s
@> Superposing [ 64%] 1s
@> Superposing [ 65%] 1s
@> Superposing [ 66%] 1s
@> Superposing [ 67%] 1s
@> Superposing [ 68%] 1s
@> Superposing [ 69%] 1s
@> Superposing [ 70%] 1s
@> Superposing [ 71%] 1s
@> Superposing [ 72%] 1s
@> Superposing [ 73%] 1s
@> Superposing [ 74%] 1s
@> Superposing [ 75%] 1s
@> Superposing [ 76%] 1s
@> Superposing [ 77%] 1s
@> Superposing [ 78%] 1s
@> Superposing [ 79%] 1s
@> Superposing [ 80%] 1s
@> Superposing [ 81%] 1s
@> Superposing [ 82%] 1s
@> Superposing [ 83%] 1s
@> Superposing [ 84%] 1s
@> Superposing [ 85%] 1s
@> Superposing [ 86%] 1s
@> Superposing [ 87%] 1s
@> Superposing [ 88%] 1s
@> Superposing [ 89%] 1s
@> Superposing [ 90%] 1s
@> Superposing [ 91%] 1s
@> Superposing [ 92%] 1s
@> Superposing [ 93%] 1s
@> Superposing [ 94%] 1s
@> Superposing [ 95%] 1s
@> Superposing [ 96%] 1s
@> Superposing [ 97%] 1s
@> Superposing [ 98%] 1s
@> Superposing [ 99%] 1s
PCA calculations¶
Performing PCA is only three lines of code:
In [10]: pca = PCA('Ubiquitin')
In [11]: pca.buildCovariance(ensemble)
In [12]: pca.calcModes()
In [13]: repr(pca)
Out[13]: '<PCA: Ubiquitin (20 modes; 70 atoms)>'
Faster method
Principal modes can be calculated faster using singular value decomposition:
In [14]: svd = PCA('Ubiquitin')
In [15]: svd.performSVD(ensemble)
For heterogeneous NMR datasets, both methods yields identical results:
In [16]: abs(svd.getEigvals()[:20] - pca.getEigvals()).max()
Out[16]: 9.7699626167013776e-15
In [17]: abs(calcOverlap(pca, svd).diagonal()[:20]).min()
Out[17]: 0.99999999999999944
Write NMD file¶
Write principal modes into an NMD Format file for NMWiz using
writeNMD() function:
In [18]: writeNMD('ubi_pca.nmd', pca[:3], ubi)
Out[18]: 'ubi_pca.nmd'
Print data¶
Let’s print fraction of variance for top ranking 4 PCs (listed in Table S3):
In [19]: for mode in pca[:4]:
....: print calcFractVariance(mode).round(3)
....:
0.134
0.094
0.083
0.065
Compare with ANM results¶
We set the active coordinate set to 79, which is the one that is closest
to the mean structure (note that indices start from 0 in Python).
Then, we perform ANM calculations using calcANM() for the active
coordset:
In [20]: ubi.setACSIndex(78)
In [21]: anm, temp = calcANM(ubi)
In [22]: anm.setTitle('Ubiquitin')
We calculate overlaps between ANM and PCA modes (presented in Table 1).
printOverlapTable() function is handy to print a formatted overlap
table:
In [23]: printOverlapTable(pca[:4], anm[:4])
Overlap Table
ANM Ubiquitin
#1 #2 #3 #4
PCA Ubiquitin #1 -0.19 -0.30 +0.22 -0.62
PCA Ubiquitin #2 +0.09 -0.72 -0.16 +0.16
PCA Ubiquitin #3 +0.31 -0.06 -0.23 0.00
PCA Ubiquitin #4 +0.11 +0.02 +0.16 -0.31