Atom Selections¶
This part gives more information on properties of AtomGroup
objects.
We start with making necessary imports. Note that every documentation page
contains them so that the code within the can be executed independently.
You can skip them if you have already done them in a Python session.
In [1]: from prody import *
In [2]: from pylab import *
In [3]: ion()
Atom Selections¶
AtomGroup
instances have a plain view of atoms for efficiency,
but they are coupled with a powerful atom selection engine. You can get well
defined atom subsets by passing simple keywords or make rather sophisticated
selections using composite statements. Selection keywords and grammar are very
much similar to those found in VMD. Some examples are shown here:
Keyword selections¶
Now, we parse a structure. This could be any structure, one that you know well from your research, for example.
In [4]: structure = parsePDB('1p38')
In [5]: protein = structure.select('protein')
In [6]: protein
Out[6]: <Selection: 'protein' from 1p38 (2833 atoms)>
Using the "protein"
keyword we selected 2833 atoms out of 2962 atoms.
Atomic.select()
method returned a Selection
instance.
Note that all get
and set
methods defined for the AtomGroup
objects are also defined for Selection
objects. For example:
In [7]: protein.getResnames()
Out[7]: array(['GLU', 'GLU', 'GLU', ..., 'ASP', 'ASP', 'ASP'], dtype='|S6')
Select by name/type¶
We can select backbone atoms by passing atom names following "name"
keyword:
In [8]: backbone = structure.select('protein and name N CA C O')
In [9]: backbone
Out[9]: <Selection: 'protein and name N CA C O' from 1p38 (1404 atoms)>
Alternatively, we can use "backbone"
to make the same selection:
In [10]: backbone = structure.select('backbone')
We select acidic and basic residues by using residue names with
"resname"
keyword:
In [11]: charged = structure.select('resname ARG LYS HIS ASP GLU')
In [12]: charged
Out[12]: <Selection: 'resname ARG LYS HIS ASP GLU' from 1p38 (906 atoms)>
Alternatively, we can use predefined keywords “acidic” and “basic”.
In [13]: charged = structure.select('acidic or basic')
In [14]: charged
Out[14]: <Selection: 'acidic or basic' from 1p38 (906 atoms)>
In [15]: set(charged.getResnames())
Out[15]: {'ARG', 'ASP', 'GLU', 'HIS', 'LYS'}
Composite selections¶
Let’s try a more sophisticated selection. We first calculate the geometric
center of the protein atoms using calcCenter()
function. Then, we
select the Cα and Cβ atoms of residues that have at least one atom within
10 A away from the geometric center.
In [16]: center = calcCenter(protein).round(3)
In [17]: center
Out[17]: array([ 1.005, 17.533, 40.052])
In [18]: sel = structure.select('protein and name CA CB and same residue as '
....: '((x-1)**2 + (y-17.5)**2 + (z-40.0)**2)**0.5 < 10')
....:
In [19]: sel
Out[19]: <Selection: 'protein and nam...)**2)**0.5 < 10' from 1p38 (66 atoms)>
Alternatively, this selection could be done as follows:
In [20]: sel = structure.select('protein and name CA CB and same residue as '
....: 'within 10 of center', center=center)
....:
In [21]: sel
Out[21]: <Selection: 'index 576 579 5... 1687 1707 1710' from 1p38 (66 atoms)>
Selections simplified¶
In interactive sessions, an alternative to typing in .select('protein')
or .select('backbone')
is using dot operator:
In [22]: protein = structure.protein
In [23]: protein
Out[23]: <Selection: 'protein' from 1p38 (2833 atoms)>
You can use dot operator multiple times:
In [24]: bb = structure.protein.backbone
In [25]: bb
Out[25]: <Selection: '(backbone) and (protein)' from 1p38 (1404 atoms)>
This may go on and on:
In [26]: ala_ca = structure.protein.backbone.resname_ALA.calpha
In [27]: ala_ca
Out[27]: <Selection: '(calpha) and ((...and (protein)))' from 1p38 (26 atoms)>
More examples¶
There is much more to what you can do with this flexible and fast atom selection engine, without the need for writing nested loops with comparisons or changing the source code. See the following pages:
- Atom Selections for description of all selection keywords
- Intermolecular Contacts for selecting interacting atoms
Operations on Selections¶
Selection
objects can used with bitwise operators:
Union¶
Let’s select β-carbon atoms for non-GLY amino acid residues, and α-carbons for GLYs in two steps:
In [28]: betas = structure.select('name CB and protein')
In [29]: len(betas)
Out[29]: 336
In [30]: gly_alphas = structure.select('name CA and resname GLY')
In [31]: len(gly_alphas)
Out[31]: 15
The above shows that the p38 structure contains 15 GLY residues.
These two selections can be combined as follows:
In [32]: betas_gly_alphas = betas | gly_alphas
In [33]: betas_gly_alphas
Out[33]: <Selection: '(name CB and pr...nd resname GLY)' from 1p38 (351 atoms)>
In [34]: len(betas_gly_alphas)
Out[34]: 351
The selection string for the union of selections becomes:
In [35]: betas_gly_alphas.getSelstr()
Out[35]: '(name CB and protein) or (name CA and resname GLY)'
Note that it is also possible to yield the same selection using selection
string (name CB and protein) or (name CA and resname GLY)
.
Intersection¶
It is as easy to get the intersection of two selections. Let’s find charged and medium size residues in a protein:
In [36]: charged = structure.select('charged')
In [37]: charged
Out[37]: <Selection: 'charged' from 1p38 (906 atoms)>
In [38]: medium = structure.select('medium')
In [39]: medium
Out[39]: <Selection: 'medium' from 1p38 (751 atoms)>
In [40]: medium_charged = medium & charged
In [41]: medium_charged
Out[41]: <Selection: '(medium) and (charged)' from 1p38 (216 atoms)>
In [42]: medium_charged.getSelstr()
Out[42]: '(medium) and (charged)'
Let’s see which amino acids are considered charged and medium:
In [43]: set(medium_charged.getResnames())
Out[43]: {'ASP'}
What about amino acids that are medium or charged:
In [44]: set((medium | charged).getResnames())
Out[44]: {'ARG', 'ASN', 'ASP', 'CYS', 'GLU', 'HIS', 'LYS', 'PRO', 'THR', 'VAL'}
Inversion¶
It is also possible to invert a selection:
In [45]: only_protein = structure.select('protein')
In [46]: only_protein
Out[46]: <Selection: 'protein' from 1p38 (2833 atoms)>
In [47]: only_non_protein = ~only_protein
In [48]: only_non_protein
Out[48]: <Selection: 'not (protein)' from 1p38 (129 atoms)>
In [49]: water = structure.select('water')
In [50]: water
Out[50]: <Selection: 'water' from 1p38 (129 atoms)>
The above shows that 1p38 does not contain any non-water hetero atoms.
Addition¶
Another operation defined on the Select
object is addition
(also on other AtomPointer
derived classes).
This may be useful if you want to yield atoms in an AtomGroup
in a
specific order.
Let’s think of a simple case, where we want to output atoms in 1p38 in a
specific order:
In [51]: protein = structure.select('protein')
In [52]: water = structure.select('water')
In [53]: water_protein = water + protein
In [54]: writePDB('1p38_water_protein.pdb', water_protein)
Out[54]: '1p38_water_protein.pdb'
In the resulting file, the water atoms will precedes the protein atoms.
Membership¶
Selections also allows membership test operations:
In [55]: backbone = structure.select('protein')
In [56]: calpha = structure.select('calpha')
Is calpha a subset of backbone?
In [57]: calpha in backbone
Out[57]: True
Or, is water in protein selection?
In [58]: water in protein
Out[58]: False
Other tests include:
In [59]: protein in structure
Out[59]: True
In [60]: backbone in structure
Out[60]: True
In [61]: structure in structure
Out[61]: True
In [62]: calpha in calpha
Out[62]: True
Equality¶
You can also check the equality of selections. Comparison will return
True
if both selections refer to the same atoms.
In [63]: calpha = structure.select('protein and name CA')
In [64]: calpha2 = structure.select('calpha')
In [65]: calpha == calpha2
Out[65]: True