Trajectory Analysis

This example shows how to analyze a trajectory in DCD format. RMSD, RMSF, radius of gyration, and distance will be calculated from trajectory frames.

Input files

Currently, ProDy supports only DCD format files. Two DCD trajectory files and corresponding PDB structure file are needed for this example:

Setup environment

We start by importing everything from ProDy:

In [1]: from prody import *

In [2]: from pylab import *

In [3]: ion()

Parse structure

The PDB file provided with this example contains an X-ray structure which will be useful in a number of places, so let’s start with parsing this file first:

In [4]: structure = parsePDB('mdm2.pdb')

In [5]: repr(structure)
Out[5]: '<AtomGroup: mdm2 (1449 atoms)>'

This function returned a AtomGroup instance that stores all atomic data parsed from the PDB file.

Parse all frames

Using parseDCD() function all coordinate data in the DCD file can be parsed at once. This function returns an Ensemble instance:

In [6]: ensemble = parseDCD('mdm2.dcd')

In [7]: repr(ensemble)
Out[7]: '<Ensemble: mdm2 (0:500:1) (500 conformations; 1449 atoms)>'

Note

When parsing large DCD files at once, memory may become an issue. If the size of the DCD file is larger than half of the RAM in your machine, consider parsing DCD files frame-by-frame. See the following subsection for details.

Let’s associate this ensemble with the structure we parsed from the PDB file:

In [8]: ensemble.setAtoms(structure)

In [9]: ensemble.setCoords(structure)

This operation set the coordinates of the structure as the reference coordinates of the ensemble. Now we can Ensemble.superpose() the ensemble onto the coordinates of the structure.

In [10]: ensemble.superpose()

Now, we can calculate RMSDs and RMSFs as follows:

In [11]: rmsd = ensemble.getRMSDs()

In [12]: rmsd[:10]
Out[12]: 
array([0.95948706, 1.37571761, 1.86486719, 1.67050981, 1.81932183,
       1.99846088, 1.83607975, 1.85453853, 1.72450348, 1.99616932])

In [13]: rmsf = ensemble.getRMSFs()

In [14]: rmsf
Out[14]: 
array([2.16987058, 2.50997848, 2.54671469, ..., 2.40085027, 2.35998639,
       2.36161442])

Preceding calculations used all atoms in the structure. When we are interested in a subset of atoms, let’s say Cα atoms, we can make a selection before performing calculations:

In [15]: ensemble.setAtoms(structure.calpha)

In [16]: repr(ensemble)
Out[16]: '<Ensemble: mdm2 (0:500:1) (500 conformations; selected 85 of 1449 atoms)>'

In [17]: ensemble.superpose()

In this case, superposition was based on Cα atom coordinates.

In [18]: rmsd = ensemble.getRMSDs()

In [19]: rmsd[:10]
Out[19]: 
array([0.57036271, 0.65563502, 1.07986844, 0.87149569, 1.00810079,
       1.08325961, 0.97407076, 0.97066068, 0.71017147, 0.98885846])

In [20]: rmsf = ensemble.getRMSFs()

In [21]: rmsf
Out[21]: 
array([1.6330461 , 1.23173403, 0.80235834, 0.5958921 , 0.50945672,
       0.45622768, 0.4464998 , 0.56223584, 0.54737841, 0.43845661,
       0.49619213, 0.55505813, 0.5036435 , 0.56143506, 0.68243978,
       0.63595968, 0.65944352, 0.79793022, 0.616282  , 0.84929821,
       0.87003225, 1.36270963, 1.02894487, 0.58924426, 0.75705043,
       0.75241394, 0.80858253, 0.63143782, 0.50065952, 0.62382299,
       0.59612102, 0.4654029 , 0.55655523, 0.62571376, 0.56086838,
       0.50027906, 0.47802536, 0.62719066, 0.69342567, 0.6609691 ,
       0.63199592, 0.45820843, 0.55381   , 0.81272202, 1.429093  ,
       1.61484996, 1.3844535 , 1.449554  , 1.009925  , 0.56484553,
       0.44251167, 0.46495854, 0.51425355, 0.66156275, 0.78570958,
       0.51504623, 0.53755642, 0.48657396, 0.51736938, 0.59835881,
       0.6156337 , 0.67893898, 0.66840022, 0.70192971, 0.64153721,
       0.56550165, 0.60452546, 0.66527624, 0.85178562, 0.89024996,
       0.93976546, 0.99426097, 0.90499485, 0.7658395 , 0.64180648,
       0.60419211, 0.67995234, 0.68370779, 0.53503999, 0.64867237,
       0.79336428, 0.58609028, 0.54535261, 0.73394662, 1.55134084])

The Ensemble instance can also be used in PCA calculations. See the examples in Ensemble Analysis for more information.

Parse frames one-by-one

In [22]: dcd = DCDFile('mdm2.dcd')

In [23]: repr(dcd)
Out[23]: '<DCDFile: mdm2 (next 0 of 500 frames; 1449 atoms)>'
In [24]: structure = parsePDB('mdm2.pdb')

In [25]: dcd.setCoords(structure)

In [26]: dcd.link(structure)

In [27]: dcd.nextIndex()
Out[27]: 0

In [28]: frame = dcd.next()

In [29]: repr(frame)
Out[29]: '<Frame: 0 from mdm2 (1449 atoms)>'

In [30]: dcd.nextIndex()
Out[30]: 1
In [31]: frame.getRMSD()
Out[31]: 1.0965813503989288

In [32]: frame.superpose()

In [33]: frame.getRMSD()
Out[33]: 0.9594870434519088

In [34]: calcGyradius(frame)
Out[34]: 12.950192748991222

We can perform these calculations for all frames in a for loop. Let’s reset dcd to return to the 0th frame:

In [35]: dcd.reset()

In [36]: rgyr = zeros(len(dcd))

In [37]: rmsd = zeros(len(dcd))

In [38]: for i, frame in enumerate(dcd):
   ....:     rgyr[i] = calcGyradius(frame)
   ....:     frame.superpose()
   ....:     rmsd[i] = frame.getRMSD()
   ....: 

In [39]: rmsd[:10]
Out[39]: 
array([0.95948704, 1.37571759, 1.86486716, 1.6705098 , 1.81932183,
       1.99846087, 1.83607974, 1.85453852, 1.72450347, 1.99616931])

In [40]: rgyr[:10]
Out[40]: 
array([12.95019275, 13.07770448, 12.93054754, 13.02506153, 12.95834748,
       13.01555473, 12.86652426, 12.93371279, 12.89667858, 12.86328841])

Handling multiple files

Trajectory is designed for handling multiple trajectory files:

In [41]: traj = Trajectory('mdm2.dcd')

In [42]: repr(traj)
Out[42]: '<Trajectory: mdm2 (next 0 of 500 frames; 1449 atoms)>'

In [43]: traj.addFile('mdm2sim2.dcd')

In [44]: repr(traj)
Out[44]: '<Trajectory: mdm2 (2 files; next 0 of 1000 frames; 1449 atoms)>'

Instances of this class are also suitable for previous calculations:

In [45]: structure = parsePDB('mdm2.pdb')

In [46]: traj.link(structure)

In [47]: traj.setCoords(structure)

In [48]: rgyr = zeros(len(traj))

In [49]: rmsd = zeros(len(traj))

In [50]: for i, frame in enumerate(traj):
   ....:     rgyr[i] = calcGyradius( frame )
   ....:     frame.superpose()
   ....:     rmsd[i] = frame.getRMSD()
   ....: 

In [51]: rmsd[:10]
Out[51]: 
array([0.95948704, 1.37571759, 1.86486716, 1.6705098 , 1.81932183,
       1.99846087, 1.83607974, 1.85453852, 1.72450347, 1.99616931])

In [52]: rgyr[:10]
Out[52]: 
array([12.95019275, 13.07770448, 12.93054754, 13.02506153, 12.95834748,
       13.01555473, 12.86652426, 12.93371279, 12.89667858, 12.86328841])