Protein Complex Prediction via Dense Subgraphs and False Positive Analysis
Cecilia Hernandez, Carlos Mella, Gonzalo Navarro, Alvaro Olivera-Napa, and
Jaime Araya
Many proteins work together with others in groups called complexes in order to
achieve
a specific function. Discovering protein complexes is important for
understanding
biological processes and predict protein functions in living organisms.
Large-scale and
throughput techniques have made possible to compile protein-protein
interaction
networks (PPI networks), which have been used in several computational
approaches for
detecting protein complexes. Those predictions might guide future biologic
experimental
research. Some approaches are topology-based, where highly connected proteins
are
predicted to be complexes; some propose different clustering algorithms using
partitioning, overlaps among clusters for networks modeled with unweighted or
weighted
graphs; and others use density of clusters and information based on protein
functionality. However, some schemes still require much processing time or the
quality
of their results can be improved. Furthermore, most of the results obtained
with
computational tools are not accompanied by an analysis of false positives.
We propose an effective and efficient mining algorithm for discovering highly
connected subgraphs, which is our base for defining protein complexes. Our
representation is based on transforming the PPI network into a directed
acyclic graph
that reduces the number of represented edges and the search space for
discovering
subgraphs. Our approach considers weighted and unweighted PPI networks. We
compare our best alternative using PPI networks from Saccharomyces cerevisiae
(yeast)
and Homo sapiens (human) with state-of-the-art approaches in terms of
clustering,
biological metrics and execution times, as well as three gold standards for
yeast and two
for human. Furthermore, we analyze false positive predicted complexes
searching the
PDBe (Protein Data Bank in Europe) database in order to identify matching
protein
complexes that have been purified and structurally characterized. Our analysis
shows
that more than 50 yeast protein complexes and more than 300 human protein
complexes found to be false positives according to our prediction method,
i.e., not
described in the gold standard complex databases, in fact contain protein
complexes
that have been characterized structurally and documented in PDBe. We also
found that
some of these protein complexes have recently been classified as part of a
Periodic Table
of Protein Complexes. The latest version of our software is publicly
available at
http://www.inf.udec.cl/~chernand/sources/dapg/.