Research
The research of the Theoretical Systems Biology Group focuses on the statistical, mathematical, evolutionary and functional analysis of molecular interaction networks and dynamical processes in biological systems. We are particularly interested in protein-protein, regulatory, signalling and stress response networks in pathogens and their hosts.
In order to address these problems, we develop and use a diverse range of mathematical, statistical and computational tools. The overarching interests of the group’s members are the inference and comparison of the mechanisms underlying biological systems, in order to understand their function and evolution. But we are also interested in further development and deeper understanding of the mathematical structures and approaches used in the modern life- and biomedical sciences in their own right.
The main areas of our research are:
Bayesian Reverse Engineering of Biological Systems
For the vast majority of biological systems, we lack precise knowledge regarding the structure of their molecular interaction networks. Even where network structures are known, we typically lack the parameters required to model such systems reliably. In a Bayesian framework, we can integrate prior information using rigorous statistical inferential procedures. We are developing and applying tools for the inference of networks and dynamical systems from biological high-throughput data.
Approximate Bayesian Computation for Biological Systems
Obtaining reliable parameter estimates for dynamical models of biological systems is fraught withdifficulties: data are notoriously noisy and sparse, and have often been collected under varying conditions. Even if we have a good idea as to the structure of the mechanistic model that generated the data, estimating the corresponding parameters is far from trivial. Conventional approaches for fitting models to such data – using, for example, non-linear optimization routines – routinely fail to capture this complexity by (often grossly) underestimating the uncertainty in the fitted parameters.
Bayesian approaches yield the full posterior probability distribution over the parameters (given the data), which allows us to appreciate the reliability of parameter estimates. At the same time, analysis of the posterior distribution also provides information regarding parameter sensitivity and model robustness. In practice, however, the posterior distribution is often intractable, especially for stochastic systems. In such cases, approximate Bayesian computation (ABC) can provide a practical alternative. We have developed sequential Monte Carlo implementations of ABC, ABC-SMC, which can be applied to parameter inference, and – more importantly – model selection in systems biology.
We are continuing to develop ABC-SMC further, and are employing it in an increasing number of biological systems. These range from bacterial stress response mechanisms to the signalling and regulatory processes that underlie human disease. Having recourse to the full (if approximate) posterior probability distribution over the parameters also allows a more consistent and global analysis of recurring issues in reverse engineering such as identifiability, sensitivity and “sloppiness” of models and model parameters than would be possible if only point estimates were available.
Network Inference
In general, we lack good starting network models for dynamic analysis. In order to deal with such situations, we are working with two complementary approaches that allow us to infer the structure of molecular interaction networks from either high-throughput functional data or from evolutionary comparisons.
Dynamical Bayesian Networks (DBNs) allow us to infer regulatory interactions among genes from transcriptomic data. We are not only interested in the structure of these networks, but also in how they change in response to external and physiological cues or over the life-cycle of an organism. In order to make best use of sparse and frequently disparate data, we are developing a version of the DBN formalism that integrates time-course gene expression data with other, time-independent data. Although these different experimental set-ups often probe only subtly different aspects of molecular systems, a combined analysis can give more highly resolved insights into the workings of transcriptional networks and gene regulatory processes. We are using these tools in order to understand regulatory processes in Escherichia coli and a range of Neisseria species.
Evolutionary Inference Methods use the fact that all life on earth has a single common ancestor. Coupled with a suitable inferential framework, comparative analysis allows us to predict molecular interactions in our target species based on observations made in different species. We are using this approach, coupled to extensive data-integration efforts, to predict protein-protein interactions – especially in the pathogenic fungus Candida glabrata – and to understand signalling and stress response processes across medically and industrially important bacteria and other human pathogens.
Functional Analysis of Biological Systems
We are collaborating actively with experimental biologists to study the interplay between physical, regulatory and metabolic interactions in stress response and signalling networks. Most of this work either deals with pathogenic bacteria and fungi, or is related to human disease, in particular cancer.
We are working on Inference-based modelling techniques where we combine large-scale simulationstudies with parameter inference or model selection. Such an approach is particularly suited for situations in which the precise mechanisms of biological processes are not completely known. Both qualitative and quantitative aspects of a system’s responses can be assessed in this way. Furthermore, formal model selection and ranking procedures can be applied in order to improve mechanistic models. This approach is being jointly developed with the groups of Professor Mark Girolami (Glasgow University) and Professor Vincent Jansen (Royal Holloway, University of London).
We are currently especially interested in combinatorial stress responses of human pathogens; species of interest include the bacteria Eschericia coli and Neisseria, as well as Saccharomyces cerevisiae and the related fungal pathogens Candida glabrata and Candida albicans, and the obligate pathogen of barley, powdery mildew (Blumeria graminis).
We are also applying the same set of tools to human signalling networks which underlie the development of healthy and cancerous cells.
Evolutionary Systems Biology
Evolutionary and comparative methods are ubiquitous in bioinformatics, as they enable more reliable annotation and prediction of e.g. protein function and protei n structure. At a practical level, we are interested in evolutionary methods in the context of predicting protein interaction, metabolic and signalling networks in biomedically or industrially important microbes and humans.
On a more fundamental level, the current influx of comprehensive, if often not very reliable, biological data allows us to address classical evolutionary questions in new light. We are particularly interested in the dynamics underlying the evolution of networks, and the co-evolution of interacting genes and proteins. To this end, we use a com bination of modelling and data- driven analyses which can be combined using suitable statistical and inferential tools that are being developed by the group.
At the moment, relevant data are only available at the species level. In the absence of population level data, we are also studying mathematical and computational evolutionary models of biological systems. This allows us to assess the effects of epistatic interactions, robustness and the evolution of molecular networks more generally in population genetics and dynamics frameworks.
Data-Integration and Statistical Bioinformatics
Modern systems biology data is plentiful but often still sparse and plagued by high levels of noise. Until recently these data had been studied largely in isolation of one another. In reality, however, protein interaction networks, metabolic networks and transcriptional regulation networks are intricately interwoven and need to be considered as such.
Thus, in order to understand the function of biological systems, it is important to combine these different data-types. To do so consistently and coherently is a major statistical challenge, and the need for data-integration arises in all aspects of our work. We are maintaining extensive data resources which capture published data on the model organisms studied in the Theoretical Systems Biology Group and are continually revising these.
Data-integration is particularly important at the system level and in evolutionary problems considered by the group: here, both the data and intrinsic system dynamics are highly variable - the variance frequently “overwhelms” the mean behaviour - and we are actively engaged in developing and applying error models to handle noise and uncertainty in physical interaction data.
Mathematical Models of Complex Systems and Networks
Many of the systems studied in our group have equivalents in other areas - although often apparent similarities between different types of networks, for example, have been exaggerated in the literature. There are two areas of our research that are also giving rise to more theoretical work, some of which has implications beyond the life- and biomedical sciences
Statistical Analysis of Complex Networks
Most network data collected to date are incomplete in the sense that not all nodes and edges have been observed, or can be observed given present experimental procedures. It is possible to show that the properties of incomplete networks will generally differ from those of the complete networks, sometimes quite considerably. A statistical perspective, however, allows us to relate properties of such noise and incomplete networks to those of the “true” network. Such situations also arise in the social sciences, engineering and physical sciences.
Related to this problem is the frequently recurring need to compare networks. These can either be networks collected at different times or under different circumstances. This is an area of longstanding and continuing interest in the group.
Statistical Analysis of Non-Linear (Stochastic) Systems
The biological systems studied in the group are characterized by complex feedback mechanisms, intrinsic noise, and – where data are available – experimental noise as well. The ABC-SMC approach developed in the group allows us to analyze such systems in a Bayesian framework in a computationally affordable manner.
Very generally, we are interested in what qualitative aspects of dynamical systems affect our ability to infer (or reverse-engineer) such systems. We use a combination of inferential and modelling tools in order to address these questions for very general examples of dynamical systems.
Our work is funded by The Wellcome Trust, BBSRC, MRC, EMBO, The Royal Society and The Carlsberg Foundation .
The members of the Theoretical Genomics Group come from a variety of different backgrounds, ranging from Biology to Theoretical Physics. We are involved in the new BBSRC/EPSRC Centre for Integrative Systems Biology and the Institute of Mathematical Sciences at Imperial College London. For further information about our research or opportunities in our group please contact m.stumpf@imperial.ac.uk.

