# Network Infusion to Infer Information Sources in Networks

Several models exist for diffusion of signals across biological, social, or engineered networks. However, the inverse problem of identifying the source of such propagated information appears more difficult even in the presence of multiple network snapshots, and especially for the single-snapshot case, given the many alternative, often similar, progression of diffusion that may lead to the same observed snapshots. Mathematically, this problem can be undertaken using a diffusion kernel that represents diffusion processes in a given network, but computing this kernel is computationally challenging in general. Here, we propose a path-based network diffusion kernel which considers edge-disjoint shortest paths among pairs of nodes in the network and can be computed efficiently for both homogeneous and heterogeneous continuous-time diffusion models. We use this network diffusion kernel to solve the inverse diffusion problem, which we term Network Infusion (NI), using both likelihood maximization and error minimization. The minimum error NI algorithm is based on an asymmetric Hamming premetric function and can balance between false positive and false negative error types. We apply this framework for both single-source and multi-source diffusion, for both single-snapshot and multi-snapshot observations, and using both uninformative and informative prior probabilities for candidate source nodes. We also provide proofs that under a standard susceptible-infected diffusion model, (1) the maximum-likelihood NI is mean-field optimal for tree structures or sufficiently sparse Erdos-Renyi graphs, (2) the minimum-error algorithm is mean-field optimal for regular tree structures, and (3) for sufficiently-distant sources, the multi-source solution is mean-field optimal in the regular tree structure. Moreover, we provide techniques to learn diffusion model parameters such as observation times. We apply NI to several synthetic networks and compare its performance to centrality-based and distance-based methods for Erdos-Renyi graphs, power-law networks, symmetric and asymmetric grids. Moreover, we use NI in two real-world applications. First, we identify the news sources for 3,553 stories in the Digg social news network, and validate our results based on annotated information, that was not provided to our algorithm. Second, we use NI to identify infusion hubs of human diseases, defined as gene candidates that can explain the connectivity pattern of disease-related genes in the human regulatory network. NI identifies infusion hubs of several human diseases including T1D, Parkinson, MS, SLE, Psoriasis and Schizophrenia. We show that, the inferred infusion hubs are biologically relevant and often not identifiable using the raw p-values.