# Calculations¶

This is the first part of a lengthy example. In this part, we perform the calculations using a p38 MAP kinase (MAPK) structural dataset. This will reproduce the calculations for p38 that were published in [AB09].

We will obtain a `PCA`

instance that stores the covariance matrix and
principal modes describing the dominant changes in the dataset. The
`PCA`

instance and principal modes (`Mode`

) can be used as
input for the functions in `dynamics`

module.

## Retrieve dataset¶

We start by importing everything from the ProDy package:

```
In [1]: from prody import *
In [2]: from pylab import *
In [3]: ion()
```

We use a list of PDB identifiers for the structures that we want to include in our analysis.

```
In [4]: pdbids = ['1A9U', '1BL6', '1BL7', '1BMK', '1DI9', '1IAN', '1KV1', '1KV2',
...: '1LEW', '1LEZ', '1M7Q', '1OUK', '1OUY', '1OVE', '1OZ1', '1P38',
...: '1R39', '1R3C', '1W7H', '1W82', '1W83', '1W84', '1WBN', '1WBO',
...: '1WBS', '1WBT', '1WBV', '1WBW', '1WFC', '1YQJ', '1YW2', '1YWR',
...: '1ZYJ', '1ZZ2', '1ZZL', '2BAJ', '2BAK', '2BAL', '2BAQ', '2EWA',
...: '2FSL', '2FSM', '2FSO', '2FST', '2GFS', '2GHL', '2GHM', '2GTM',
...: '2GTN', '2I0H', '2NPQ', '2OKR', '2OZA', '3HVC', '3MH0', '3MH3',
...: '3MH2', '2PUU', '3MGY', '3MH1', '2QD9', '2RG5', '2RG6', '2ZAZ',
...: '2ZB0', '2ZB1', '3BV2', '3BV3', '3BX5', '3C5U', '3L8X', '3CTQ',
...: '3D7Z', '3D83', '2ONL']
...:
```

Note that we used a list of identifiers that are different from what was listed in the supporting material of [AB09]. Some structures have been refined and their identifier have been changed by the Protein Data Bank. These changes are reflected in the above list.

Also note that it is possible to update this list to include all of the p38
structures currently available in the PDB using the
`blastPDB()`

function as follows:

```
In [5]: p38_sequence = '''GLVPRGSHMSQERPTFYRQELNKTIWEVPERYQNLSPVGSGAYGSVCAAFDTKTGHRV
...: AVKKLSRPFQSIIHAKRTYRELRLLKHMKHENVIGLLDVFTPARSLEEFNDVYLVTHLMGADLNNIVKCQKLTDDH
...: VQFLIYQILRGLKYIHSADIIHRDLKPSNLAVNEDCELKILDFGLARHTDDEMTGYVATRWYRAPEIMLNWMHYNQ
...: TVDIWSVGCIMAELLTGRTLFPGTDHIDQLKLILRLVGTPGAELLKKISSESARNYIQSLAQMPKMNFANVFIGAN
...: PLAVDLLEKMLVLDSDKRITAAQALAHAYFAQYHDPDDEPVADPYDQSFESRDLLIDEWKSLTYDEVISFVPPPLD
...: QEEMES'''
...:
```

To update list of p38 MAPK PDB files, you can make a blast search as follows:

```
In [6]: blast_record = blastPDB(p38_sequence)
In [7]: pdbids = blast_record.getHits()
```

We use the same set of structures to reproduce the results.
After we listed the PDB identifiers, we obtain them using
`fetchPDB()`

function as follows:

```
In [8]: pdbfiles = fetchPDB(*pdbids, compressed=False)
```

`pdbfiles`

variable contains a list of PDB filenames.

## Set reference chain¶

The next step is setting one of the p38 structures as the reference structure. We use 1p38 chain A. Note that we won’t use all of the resolved residues in this structure. We select only those residues which are resolved in at least 90% of the dataset.

```
In [9]: ref_structure = parsePDB('1p38')
In [10]: ref_selection = ref_structure.select('resnum 5 to 31 36 to 114 122 to '
....: '169 185 to 351 and calpha')
....:
```

Retrieve protein chain A from the reference selection:

```
In [11]: ref_chain = ref_selection.getHierView().getChain('A')
In [12]: repr(ref_chain)
Out[12]: '<Chain: A from 1p38 (321 residues, 321 atoms)>'
```

We use the `parsePDB()`

function to parse a PDB file.
This returns a `AtomGroup`

instance. We make a copy
of α-carbon atoms of select residues for analysis.

See Atom Selections for making selections.

## Prepare ensemble¶

X-ray structural ensembles are heterogenenous, i.e. different structures have different sets of unresolved residues. Hence, it is not straightforward to analyzed them as it would be for NMR models (see NMR Models).

ProDy has special functions and classes for facilitating efficient analysis
of the PDB X-ray data. In this example we use `mapOntoChain()`

function which returns an `AtomMap`

instance.

See How AtomMap’s work for more details.

Start a logfile to save screen output:

```
In [13]: startLogfile('p38_pca')
```

Instantiate an `PDBEnsemble`

object:

```
In [14]: ensemble = PDBEnsemble('p38 X-ray')
```

Set atoms and reference coordinate set of the ensemble:

```
In [15]: ensemble.setAtoms(ref_chain)
In [16]: ensemble.setCoords(ref_chain)
```

For each PDB file, we find the matching chain and add it to the ensemble:

```
In [17]: for pdbfile in pdbfiles:
....: structure = parsePDB(pdbfile, subset='calpha')
....: mappings = mapOntoChain(structure, ref_chain)
....: atommap = mappings[0][0]
....: ensemble.addCoordset(atommap, weights=atommap.getFlags('mapped'))
....:
```

```
In [18]: repr(ensemble)
Out[18]: '<PDBEnsemble: p38 X-ray (75 conformations; 321 atoms)>'
In [19]: len(ensemble) == len(pdbfiles)
Out[19]: True
```

Perform an iterative superimposition:

```
In [20]: ensemble.iterpose()
```

Close the logfile (file content shows how chains were paired/mapped):

```
In [21]: closeLogfile('p38_pca')
```

## Save coordinates¶

We use `PDBEnsemble`

to store coordinates of the X-ray
structures. The `PDBEnsemble`

instances do not store any
other atomic data. If we want to write aligned coordinates into a file, we
need to pass the coordinates to an `AtomGroup`

instance.
Then we use `writePDB()`

function to save coordinates:

```
In [22]: writePDB('p38_xray_ensemble.pdb', ensemble)
Out[22]: 'p38_xray_ensemble.pdb'
```

## PCA calculations¶

Once the coordinate data are prepared, it is straightforward to perform the
`PCA`

calculations:

```
In [23]: pca = PCA('p38 xray') # Instantiate a PCA instance
In [24]: pca.buildCovariance(ensemble) # Build covariance for the ensemble
In [25]: pca.calcModes() # Calculate modes (20 of the by default)
```

**Approximate method**

In the following we are using singular value decomposition for faster and more memory efficient calculation of principal modes:

```
In [26]: pca_svd = PCA('p38 svd')
In [27]: pca_svd.performSVD(ensemble)
```

The resulting eigenvalues and eigenvectors may show small differences due to missing atoms in the datasets:

```
In [28]: abs(pca_svd.getEigvals()[:20] - pca.getEigvals()).max()
Out[28]: 0.401852204651167
In [29]: abs(calcOverlap(pca, pca_svd).diagonal()[:20]).min()
Out[29]: 0.99801084515679939
```

Note that building and diagonalizing the covariance matrix is the preferred method for heterogeneous ensembles. For NMR models or MD trajectories SVD method may be preferred over covariance method.

## ANM calculations¶

To perform `ANM`

calculations:

```
In [30]: anm = ANM('1p38') # Instantiate a ANM instance
In [31]: anm.buildHessian(ref_chain) # Build Hessian for the reference chain
In [32]: anm.calcModes() # Calculate slowest non-trivial 20 modes
```

## Save your work¶

Calculated data can be saved in a ProDy internal format to use in a later session or to share it with others.

If you are in an interactive Python session, and wish to continue without leaving your session, you do not need to save the data. Saving data is useful if you want to use it in another session or at a later time, or if you want to share it with others.

```
In [33]: saveModel(pca)
Out[33]: 'p38_xray.pca.npz'
In [34]: saveModel(anm)
Out[34]: '1p38.anm.npz'
In [35]: saveEnsemble(ensemble)
Out[35]: 'p38_X-ray.ens.npz'
In [36]: writePDB('p38_ref_chain.pdb', ref_chain)
Out[36]: 'p38_ref_chain.pdb'
```

We use the `saveModel()`

and `saveEnsemble()`

functions to save
calculated data. In Analysis, we will use the
`loadModel()`

and `loadEnsemble()`

functions to load the data.