Trajectory Analysis¶
This example shows how to analyze a trajectory in DCD format. RMSD, RMSF, radius of gyration, and distance will be calculated from trajectory frames.
Input files¶
Currently, ProDy supports only DCD format files. Two DCD trajectory files and corresponding PDB structure file are needed for this example:
Setup environment¶
We start by importing everything from ProDy:
In [1]: from prody import *
In [2]: from pylab import *
In [3]: ion()
Parse structure¶
The PDB file provided with this example contains an X-ray structure which will be useful in a number of places, so let’s start with parsing this file first:
In [4]: structure = parsePDB('mdm2.pdb')
In [5]: repr(structure)
Out[5]: '<AtomGroup: mdm2 (1449 atoms)>'
This function returned a AtomGroup
instance that
stores all atomic data parsed from the PDB file.
Parse all frames¶
Using parseDCD()
function all coordinate data in the DCD file can
be parsed at once. This function returns an Ensemble
instance:
In [6]: ensemble = parseDCD('mdm2.dcd')
In [7]: repr(ensemble)
Out[7]: '<Ensemble: mdm2 (0:500:1) (500 conformations; 1449 atoms)>'
Note
When parsing large DCD files at once, memory may become an issue. If the size of the DCD file is larger than half of the RAM in your machine, consider parsing DCD files frame-by-frame. See the following subsection for details.
Let’s associate this ensemble with the structure we parsed from the PDB file:
In [8]: ensemble.setAtoms(structure)
In [9]: ensemble.setCoords(structure)
This operation set the coordinates of the structure as the reference
coordinates of the ensemble. Now we can Ensemble.superpose()
the ensemble onto the coordinates of the structure.
In [10]: ensemble.superpose()
Now, we can calculate RMSDs and RMSFs as follows:
In [11]: rmsd = ensemble.getRMSDs()
In [12]: rmsd[:10]
Out[12]:
array([0.95948706, 1.37571761, 1.86486719, 1.67050981, 1.81932183,
1.99846088, 1.83607975, 1.85453853, 1.72450348, 1.99616932])
In [13]: rmsf = ensemble.getRMSFs()
In [14]: rmsf
Out[14]:
array([2.16987058, 2.50997848, 2.54671469, ..., 2.40085027, 2.35998639,
2.36161442])
Preceding calculations used all atoms in the structure. When we are interested in a subset of atoms, let’s say Cα atoms, we can make a selection before performing calculations:
In [15]: ensemble.setAtoms(structure.calpha)
In [16]: repr(ensemble)
Out[16]: '<Ensemble: mdm2 (0:500:1) (500 conformations; selected 85 of 1449 atoms)>'
In [17]: ensemble.superpose()
In this case, superposition was based on Cα atom coordinates.
In [18]: rmsd = ensemble.getRMSDs()
In [19]: rmsd[:10]
Out[19]:
array([0.57036271, 0.65563502, 1.07986844, 0.87149569, 1.00810079,
1.08325961, 0.97407076, 0.97066068, 0.71017147, 0.98885846])
In [20]: rmsf = ensemble.getRMSFs()
In [21]: rmsf
Out[21]:
array([1.6330461 , 1.23173403, 0.80235834, 0.5958921 , 0.50945672,
0.45622768, 0.4464998 , 0.56223584, 0.54737841, 0.43845661,
0.49619213, 0.55505813, 0.5036435 , 0.56143506, 0.68243978,
0.63595968, 0.65944352, 0.79793022, 0.616282 , 0.84929821,
0.87003225, 1.36270963, 1.02894487, 0.58924426, 0.75705043,
0.75241394, 0.80858253, 0.63143782, 0.50065952, 0.62382299,
0.59612102, 0.4654029 , 0.55655523, 0.62571376, 0.56086838,
0.50027906, 0.47802536, 0.62719066, 0.69342567, 0.6609691 ,
0.63199592, 0.45820843, 0.55381 , 0.81272202, 1.429093 ,
1.61484996, 1.3844535 , 1.449554 , 1.009925 , 0.56484553,
0.44251167, 0.46495854, 0.51425355, 0.66156275, 0.78570958,
0.51504623, 0.53755642, 0.48657396, 0.51736938, 0.59835881,
0.6156337 , 0.67893898, 0.66840022, 0.70192971, 0.64153721,
0.56550165, 0.60452546, 0.66527624, 0.85178562, 0.89024996,
0.93976546, 0.99426097, 0.90499485, 0.7658395 , 0.64180648,
0.60419211, 0.67995234, 0.68370779, 0.53503999, 0.64867237,
0.79336428, 0.58609028, 0.54535261, 0.73394662, 1.55134084])
The Ensemble
instance can also be used in PCA
calculations. See the examples in Ensemble Analysis for more information.
Parse frames one-by-one¶
In [22]: dcd = DCDFile('mdm2.dcd')
In [23]: repr(dcd)
Out[23]: '<DCDFile: mdm2 (next 0 of 500 frames; 1449 atoms)>'
In [24]: structure = parsePDB('mdm2.pdb')
In [25]: dcd.setCoords(structure)
In [26]: dcd.link(structure)
In [27]: dcd.nextIndex()
Out[27]: 0
In [28]: frame = dcd.next()
In [29]: repr(frame)
Out[29]: '<Frame: 0 from mdm2 (1449 atoms)>'
In [30]: dcd.nextIndex()
Out[30]: 1
In [31]: frame.getRMSD()
Out[31]: 1.0965813503989288
In [32]: frame.superpose()
In [33]: frame.getRMSD()
Out[33]: 0.9594870434519088
In [34]: calcGyradius(frame)
Out[34]: 12.950192748991222
We can perform these calculations for all frames in a for loop. Let’s reset dcd to return to the 0th frame:
In [35]: dcd.reset()
In [36]: rgyr = zeros(len(dcd))
In [37]: rmsd = zeros(len(dcd))
In [38]: for i, frame in enumerate(dcd):
....: rgyr[i] = calcGyradius(frame)
....: frame.superpose()
....: rmsd[i] = frame.getRMSD()
....:
In [39]: rmsd[:10]
Out[39]:
array([0.95948704, 1.37571759, 1.86486716, 1.6705098 , 1.81932183,
1.99846087, 1.83607974, 1.85453852, 1.72450347, 1.99616931])
In [40]: rgyr[:10]
Out[40]:
array([12.95019275, 13.07770448, 12.93054754, 13.02506153, 12.95834748,
13.01555473, 12.86652426, 12.93371279, 12.89667858, 12.86328841])
Handling multiple files¶
Trajectory
is designed for handling multiple trajectory files:
In [41]: traj = Trajectory('mdm2.dcd')
In [42]: repr(traj)
Out[42]: '<Trajectory: mdm2 (next 0 of 500 frames; 1449 atoms)>'
In [43]: traj.addFile('mdm2sim2.dcd')
In [44]: repr(traj)
Out[44]: '<Trajectory: mdm2 (2 files; next 0 of 1000 frames; 1449 atoms)>'
Instances of this class are also suitable for previous calculations:
In [45]: structure = parsePDB('mdm2.pdb')
In [46]: traj.link(structure)
In [47]: traj.setCoords(structure)
In [48]: rgyr = zeros(len(traj))
In [49]: rmsd = zeros(len(traj))
In [50]: for i, frame in enumerate(traj):
....: rgyr[i] = calcGyradius( frame )
....: frame.superpose()
....: rmsd[i] = frame.getRMSD()
....:
In [51]: rmsd[:10]
Out[51]:
array([0.95948704, 1.37571759, 1.86486716, 1.6705098 , 1.81932183,
1.99846087, 1.83607974, 1.85453852, 1.72450347, 1.99616931])
In [52]: rgyr[:10]
Out[52]:
array([12.95019275, 13.07770448, 12.93054754, 13.02506153, 12.95834748,
13.01555473, 12.86652426, 12.93371279, 12.89667858, 12.86328841])