Pfam Access Functions

This module defines functions for interfacing Pfam database.

searchPfam(query, **kwargs)[source]

Returns Pfam search results in a dictionary. Matching Pfam accession as keys will map to evalue, alignment start and end residue positions.

Parameters:
  • query (str) – UniProt ID, PDB identifier, a protein sequence, or a sequence file. Sequence queries must not contain without gaps and must be at least 16 characters long
  • timeout (int) – timeout for blocking connection attempt in seconds, default is 60

query can also be a PDB identifier, e.g. '1mkp' or '1mkpA' with chain identifier. UniProt ID of the specified chain, or the first protein chain will be used for searching the Pfam database.

fetchPfamMSA(acc, alignment='full', compressed=False, **kwargs)[source]

Returns a path to the downloaded Pfam MSA file.

Parameters:
  • acc (str) – Pfam ID or Accession Code
  • alignment – alignment type, one of 'full' (default), 'seed', 'ncbi', 'metagenomics', 'rp15', 'rp35', 'rp55', 'rp75' or 'uniprot' where rp stands for representative proteomes
  • compressed – gzip the downloaded MSA file, default is False

Alignment Options

Parameters:
  • format – a Pfam supported MSA file format, one of 'selex', (default), 'stockholm' or 'fasta'
  • order – ordering of sequences, 'tree' (default) or 'alphabetical'
  • inserts – letter case for inserts, 'upper' (default) or 'lower'
  • gaps – gap character, one of 'dashes' (default), 'dots', 'mixed' or None for unaligned

Other Options

Parameters:
  • timeout – timeout for blocking connection attempt in seconds, default is 60
  • outname – out filename, default is input 'acc_alignment.format'
  • folder – output folder, default is '.'
searchUniprotID(query, search_b=False, skip_a=False, **kwargs)[source]

Returns Pfam search results in a dictionary. Matching Pfam accession as keys will map to evalue, alignment start and end residue positions.

Parameters:
  • query (str) – UniProt ID, PDB identifier, protein sequence, or a sequence file. Sequence queries must not contain gaps and must be at least 16 characters long
  • search_b (bool) – search Pfam-B families when True
  • skip_a (bool) – do not search Pfam-A families when True
  • ga (bool) – use gathering threshold when True
  • evalue (float) – user specified e-value cutoff, must be smaller than 10.0
  • timeout (int) – timeout for blocking connection attempt in seconds, default is 60

query can also be a PDB identifier, e.g. '1mkp' or '1mkpA' with chain identifier. UniProt ID of the specified chain, or the first protein chain will be used for searching the Pfam database.

parsePfamPDBs(query, data=[], **kwargs)[source]

Returns a list of AtomGroups containing sections of chains that correspond to a particular PFAM domain family. These are defined by alignment start and end residue numbers.

Parameters:
  • query (str) – UniProt ID or PDB ID If a PDB ID is provided the corresponding UniProt ID is used. If this returns multiple matches then start or end must also be provided. This query is also used for label refinement of the Pfam domain MSA.
  • data (list) – If given the data list from the Pfam mapping table will be output through this argument.
  • start (int) – Residue number for defining the start of the domain. The PFAM domain that starts closest to this will be selected. Default is 1
  • end (int) – Residue number for defining the end of the domain. The PFAM domain that ends closest to this will be selected.