Inferring condition specific regulatory networks with small sample sizes: a case study in bacillus subtilis and infection of mus musculus by the parasite Toxoplasma gondii
Thesis
Modelling interactions between genes and their regulators is fundamental to understanding how, for example a disease progresses, or the impact of inserting a synthetic circuit into a cell. We use an existing method to infer regulatory networks under multiple conditions: the Joint Graphical Lasso (JGL), a shrinkage based Gaussian graphical model. We apply this method to two data sets: one, a publicly available set of microarray experiments perturbing the gram-positive bacteria Bacillus subtilis under multiple experimental conditions; the second, a set of RNA-seq samples of Mouse (Mus musculus) embryonic fibroblasts (MEFs) infected with different strains of the parasite Toxoplasma gondii. In both cases we infer a subset of the regulatory networks using relatively small sample sizes. For the Bacillus subtilis analysis we focused on the use of these regulatory networks in synthetic biology and found examples of transcriptional units active only under a subset of conditions, this information can be useful when designing circuits to have condition dependent behaviour. We developed methods for large network decomposition that made use of the condition information and showed a greater specificity of identifying single transcriptional units from the larger network using our method. Through annotating these results with known information we were able to identify novel connections and found supporting evidence for a selection of these from publicly available experimental results. Biological data collection is typically expensive and due to the relatively small sample sizes of our MEF data set we developed a novel empirical Bayes method for reducing the false discovery rate when estimating block diagonal covariance matrices. Using these methods we were able to infer regulatory networks for the host infected with either the ME49 or RH strain of the parasite. This enabled the identification of known and novel regulatory mechanisms. The Toxoplasma gondii parasite has shown to subvert host function using similar mechanisms as cancers and through our analysis we were able to identify genes, networks and ontologies associated with cancer, including connections that have not previously been associated with T. gondii infection. Finally a Shiny application was developed as an online resource giving access to the Bacillus subtilis inferred networks with interactive methods for exploring the networks including expansion of sub networks and large network decomposition.