Pfam Access Functions

This module defines functions for interfacing Pfam database.

searchPfam(query, **kwargs)[source]

Returns Pfam search results in a dictionary. Matching Pfam accession as keys will map to evalue, alignment start and end residue positions.

Parameters:
  • query (str) – UniProt ID, PDB identifier, protein sequence, or a sequence file, sequence queries must not contain without gaps and must be at least 16 characters long
  • timeout (int) – timeout for blocking connection attempt in seconds, default is 60

query can also be a PDB identifier, e.g. '1mkp' or '1mkpA' with chain identifier. UniProt ID of the specified chain, or the first protein chain will be used for searching the Pfam database.

fetchPfamMSA(acc, alignment='full', compressed=False, **kwargs)[source]

Returns a path to the downloaded Pfam MSA file.

Parameters:
  • acc (str) – Pfam ID or Accession Code
  • alignment – alignment type, one of 'full' (default), 'seed', 'ncbi', 'metagenomics', 'rp15', 'rp35', 'rp55', 'rp75' or 'uniprot' where rp stands for representative proteomes
  • compressed – gzip the downloaded MSA file, default is False

Alignment Options

Parameters:
  • format – a Pfam supported MSA file format, one of 'selex', (default), 'stockholm' or 'fasta'
  • order – ordering of sequences, 'tree' (default) or 'alphabetical'
  • inserts – letter case for inserts, 'upper' (default) or 'lower'
  • gaps – gap character, one of 'dashes' (default), 'dots', 'mixed' or None for unaligned

Other Options

Parameters:
  • timeout – timeout for blocking connection attempt in seconds, default is 60
  • outname – out filename, default is input 'acc_alignment.format'
  • folder – output folder, default is '.'
searchUniprotID(query, search_b=False, skip_a=False, **kwargs)[source]

Returns Pfam search results in a dictionary. Matching Pfam accession as keys will map to evalue, alignment start and end residue positions.

Parameters:
  • query (str) – UniProt ID, PDB identifier, protein sequence, or a sequence file, sequence queries must not contain without gaps and must be at least 16 characters long
  • search_b (bool) – search Pfam-B families when True
  • skip_a (bool) – do not search Pfam-A families when True
  • ga (bool) – use gathering threshold when True
  • evalue (float) – user specified e-value cutoff, must be smaller than 10.0
  • timeout (int) – timeout for blocking connection attempt in seconds, default is 60

query can also be a PDB identifier, e.g. '1mkp' or '1mkpA' with chain identifier. UniProt ID of the specified chain, or the first protein chain will be used for searching the Pfam database.