Calculations

This is the first part of a lengthy example. In this part, we perform the calculations using a p38 MAP kinase (MAPK) structural dataset. This will reproduce the calculations for p38 that were published in [AB09].

We will obtain a PCA instance that stores the covariance matrix and principal modes describing the dominant changes in the dataset. The PCA instance and principal modes (Mode) can be used as input for the functions in dynamics module.

Retrieve dataset

We start by importing everything from the ProDy package:

In [1]: from prody import *

In [2]: from pylab import *

In [3]: ion()

We use a list of PDB identifiers for the structures that we want to include in our analysis.

In [4]: pdbids = ['1A9U', '1BL6', '1BL7', '1BMK', '1DI9', '1IAN', '1KV1', '1KV2',
   ...:           '1LEW', '1LEZ', '1M7Q', '1OUK', '1OUY', '1OVE', '1OZ1', '1P38',
   ...:           '1R39', '1R3C', '1W7H', '1W82', '1W83', '1W84', '1WBN', '1WBO',
   ...:           '1WBS', '1WBT', '1WBV', '1WBW', '1WFC', '1YQJ', '1YW2', '1YWR',
   ...:           '1ZYJ', '1ZZ2', '1ZZL', '2BAJ', '2BAK', '2BAL', '2BAQ', '2EWA',
   ...:           '2FSL', '2FSM', '2FSO', '2FST', '2GFS', '2GHL', '2GHM', '2GTM',
   ...:           '2GTN', '2I0H', '2NPQ', '2OKR', '2OZA', '3HVC', '3MH0', '3MH3',
   ...:           '3MH2', '2PUU', '3MGY', '3MH1', '2QD9', '2RG5', '2RG6', '2ZAZ',
   ...:           '2ZB0', '2ZB1', '3BV2', '3BV3', '3BX5', '3C5U', '3L8X', '3CTQ',
   ...:           '3D7Z', '3D83', '2ONL']
   ...: 

Note that we used a list of identifiers that are different from what was listed in the supporting material of [AB09]. Some structures have been refined and their identifier have been changed by the Protein Data Bank. These changes are reflected in the above list.

Also note that it is possible to update this list to include all of the p38 structures currently available in the PDB using the blastPDB() function as follows:

In [5]: p38_sequence = '''GLVPRGSHMSQERPTFYRQELNKTIWEVPERYQNLSPVGSGAYGSVCAAFDTKTGHRV
   ...: AVKKLSRPFQSIIHAKRTYRELRLLKHMKHENVIGLLDVFTPARSLEEFNDVYLVTHLMGADLNNIVKCQKLTDDH
   ...: VQFLIYQILRGLKYIHSADIIHRDLKPSNLAVNEDCELKILDFGLARHTDDEMTGYVATRWYRAPEIMLNWMHYNQ
   ...: TVDIWSVGCIMAELLTGRTLFPGTDHIDQLKLILRLVGTPGAELLKKISSESARNYIQSLAQMPKMNFANVFIGAN
   ...: PLAVDLLEKMLVLDSDKRITAAQALAHAYFAQYHDPDDEPVADPYDQSFESRDLLIDEWKSLTYDEVISFVPPPLD
   ...: QEEMES'''
   ...: 

To update list of p38 MAPK PDB files, you can make a blast search as follows:

In [6]: blast_record = blastPDB(p38_sequence)

In [7]: pdbids = blast_record.getHits()

We use the same set of structures to reproduce the results. After we listed the PDB identifiers, we obtain them using fetchPDB() function as follows:

In [8]: pdbfiles = fetchPDB(*pdbids, compressed=False)

pdbfiles variable contains a list of PDB filenames.

Set reference chain

The next step is setting one of the p38 structures as the reference structure. We use 1p38 chain A. Note that we won’t use all of the resolved residues in this structure. We select only those residues which are resolved in at least 90% of the dataset.

In [9]: ref_structure = parsePDB('1p38')

In [10]: ref_selection = ref_structure.select('resnum 5 to 31 36 to 114 122 to '
   ....:                                      '169 185 to 351 and calpha')
   ....: 

Retrieve protein chain A from the reference selection:

In [11]: ref_chain = ref_selection.getHierView().getChain('A')

In [12]: repr(ref_chain)
Out[12]: '<Chain: A from 1p38 (321 residues, 321 atoms)>'

We use the parsePDB() function to parse a PDB file. This returns a AtomGroup instance. We make a copy of α-carbon atoms of select residues for analysis.

See Atom Selections for making selections.

Prepare ensemble

X-ray structural ensembles are heterogenenous, i.e. different structures have different sets of unresolved residues. Hence, it is not straightforward to analyzed them as it would be for NMR models (see NMR Models).

ProDy has special functions and classes for facilitating efficient analysis of the PDB X-ray data. In this example we use mapOntoChain() function which returns an AtomMap instance.

See How AtomMap’s work for more details.

Start a logfile to save screen output:

In [13]: startLogfile('p38_pca')

Instantiate an PDBEnsemble object:

In [14]: ensemble = PDBEnsemble('p38 X-ray')

Set atoms and reference coordinate set of the ensemble:

In [15]: ensemble.setAtoms(ref_chain)

In [16]: ensemble.setCoords(ref_chain)

For each PDB file, we find the matching chain and add it to the ensemble:

In [17]: for pdbfile in pdbfiles:
   ....:     structure = parsePDB(pdbfile, subset='calpha')
   ....:     mappings = mapOntoChain(structure, ref_chain)
   ....:     atommap = mappings[0][0]
   ....:     ensemble.addCoordset(atommap, weights=atommap.getFlags('mapped'))
   ....: 
In [18]: repr(ensemble)
Out[18]: '<PDBEnsemble: p38 X-ray (75 conformations; 321 atoms)>'

In [19]: len(ensemble) == len(pdbfiles)
Out[19]: True

Perform an iterative superimposition:

In [20]: ensemble.iterpose()

Close the logfile (file content shows how chains were paired/mapped):

In [21]: closeLogfile('p38_pca')

Save coordinates

We use PDBEnsemble to store coordinates of the X-ray structures. The PDBEnsemble instances do not store any other atomic data. If we want to write aligned coordinates into a file, we need to pass the coordinates to an AtomGroup instance. Then we use writePDB() function to save coordinates:

In [22]: writePDB('p38_xray_ensemble.pdb', ensemble)
Out[22]: 'p38_xray_ensemble.pdb'

PCA calculations

Once the coordinate data are prepared, it is straightforward to perform the PCA calculations:

In [23]: pca = PCA('p38 xray')           # Instantiate a PCA instance

In [24]: pca.buildCovariance(ensemble)   # Build covariance for the ensemble

In [25]: pca.calcModes()                 # Calculate modes (20 of the by default)

Approximate method

In the following we are using singular value decomposition for faster and more memory efficient calculation of principal modes:

In [26]: pca_svd = PCA('p38 svd')

In [27]: pca_svd.performSVD(ensemble)

The resulting eigenvalues and eigenvectors may show small differences due to missing atoms in the datasets:

In [28]: abs(pca_svd.getEigvals()[:20] - pca.getEigvals()).max()
Out[28]: 0.40185220465119187

In [29]: abs(calcOverlap(pca, pca_svd).diagonal()[:20]).min()
Out[29]: 0.99801084515680039

Note that building and diagonalizing the covariance matrix is the preferred method for heterogeneous ensembles. For NMR models or MD trajectories SVD method may be preferred over covariance method.

ANM calculations

To perform ANM calculations:

In [30]: anm = ANM('1p38')             # Instantiate a ANM instance

In [31]: anm.buildHessian(ref_chain)   # Build Hessian for the reference chain

In [32]: anm.calcModes()               # Calculate slowest non-trivial 20 modes

Save your work

Calculated data can be saved in a ProDy internal format to use in a later session or to share it with others.

If you are in an interactive Python session, and wish to continue without leaving your session, you do not need to save the data. Saving data is useful if you want to use it in another session or at a later time, or if you want to share it with others.

In [33]: saveModel(pca)
Out[33]: 'p38_xray.pca.npz'

In [34]: saveModel(anm)
Out[34]: '1p38.anm.npz'

In [35]: saveEnsemble(ensemble)
Out[35]: 'p38_X-ray.ens.npz'

In [36]: writePDB('p38_ref_chain.pdb', ref_chain)
Out[36]: 'p38_ref_chain.pdb'

We use the saveModel() and saveEnsemble() functions to save calculated data. In Analysis, we will use the loadModel() and loadEnsemble() functions to load the data.