Gene Ontology Annotation (GOA) Server Functions

This module defines functions for interfacing with the EBI’s Gene Ontology Annotation (GOA) database for analysing gene/protein functions through the Gene Ontology (GO).

This module is based on the tutorial notebook at

class GOADictList(parsingList, title='unnamed', **kwargs)[source]

A class for handling the list of GOA Dictionaries returned by queryGOA


Pop dataBlock with the given index from the list of dataBlocks in GOADictList


Parse a GO OBO file containing the GO itself. See OBO for more information on the file format.

parseGAF(database='PDB', **kwargs)[source]

Parse a GO Association File (GAF) corresponding to a particular database collection into a dictionary for ease of querying.

See GAF for more information on the file format

  • database (str) – name of the database of interest default is PDB. Others include UNIPROT and common names of many organisms.
  • filename (str) – filename for the gaf of interest default is goa_ and the database name in lower case and .gaf.gz
queryGOA(*ids, **kwargs)[source]

Query a GOA database by identifier.

  • ids (str, tuple, list, ndarray) – an identifier or a list-like of identifiers
  • database (str) – name of the database of interest default is PDB. Others include UNIPROT and common names of many organisms.
showGoLineage(go_term, **kwargs)[source]

Use pygraphviz and IPython notebook to show the lineage of a GO term

Parameters:go (~goatools.obo_parser.GODag) – object containing a gene ontology (GO) directed acyclic graph (DAG) default is to parse with parseOBO()
arg out_format: format for output.
Currently only output to file. This file will be displayed in Jupyter Notebook.

type out_format: str

arg filename: filename for output
default behaviour is to use the GO term ID and append ‘_lineage.png’

type filename: str

calcGoOverlap(*go_terms, **kwargs)[source]

Calculate overlap between GO terms based on their distance in the graph. GO terms in different namespaces (molecular function, cellular component, and biological process) have undefined distances.

  • go_terms (list, tuple, ~numpy.ndarray) – a list of GO terms or GO IDs
  • pairwise (bool) – whether to calculate to a matrix of pairwise overlaps default is False
  • distance (bool) – whether to return distances rather than calculating overlaps default is False
  • go (~goatools.obo_parser.GODag) – GO graph. Default behaviour is to parse it with parseOBO().
calcDeepFunctionOverlaps(*goa_data, **kwargs)[source]

Calculate function overlaps between the deep (most detailed) molecular functions in particular from two sets of GO terms.

  • goa1 (tuple, list, ndarray) – the first set of GO terms
  • goa2 (tuple, list, ndarray) – the second set of GO terms
calcEnsembleFunctionOverlaps(ens, **kwargs)[source]

Calculate function overlaps for an ensemble as the mean of the value from calcDeepFunctionOverlaps().

Parameters:ens (Ensemble) – an ensemble with labels
findDeepestFunctions(go_terms, **kwargs)[source]

Find the deepest (most detailed) molecular functions in a list of GO terms.

Parameters:go_terms (GOADictList) – a list of GO terms
findDeepestCommonAncestor(terms, go)[source]

Find the nearest common ancestor. Only returns single most specific - assumes unique exists.

  • terms (tuple, list, ndarray) – a list of GO terms
  • go (~goatools.obo_parser.GODag) – object containing a gene ontology (GO) directed acyclic graph (DAG)
calcMinBranchLength(go_id1, go_id2, go)[source]

Find the minimum branch length between two terms in the GO DAG.

  • go_id1 (str) – the first GO ID
  • go_id2 – the second GO ID

:type go_id2:str

Parameters:go (~goatools.obo_parser.GODag) – object containing a gene ontology (GO) directed acyclic graph (DAG)
findCommonParentGoIds(terms, go)[source]

This function finds the common ancestors in the GO tree of the list of terms in the input.

  • terms (tuple, list, ndarray) – a list of GO terms
  • go (~goatools.obo_parser.GODag) – object containing a gene ontology (GO) directed acyclic graph (DAG)