Pau Bellot, Catharina Olsen, Patrick Meyer


This package contains a large set of gene expressions generated by various simulators collected in what we cal ``Datasource".

The data generated by the simulators is free of noise. The noise could be added later so that it is possible to control its properties independently of the simulators and also to provide fully reproducible tests. This study involves data generated by three different GRN simulators:


The GNW simulator (Schaffter, Marbach, and Floreano 2011) generates network structures by extracting parts of known real GRN structures capturing several of their important structural properties. To produce gene expression data, the simulator relies on a system of non-linear ordinary differential equations (ODE).


The SynTReN simulator (Van den Bulcke et al. 2006) generates the underlying networks by selecting sub-networks from and organisms. Then the experiments are obtained by simulating equations based on Michaelis-Menten and Hill kinetics under different conditions.


The data generator described in (Rogers and Girolami 2005) that will be referred as relies on a power-law distribution on the number of connections of the genes to generate the underling network. The steady state of the system is obtained by integrating a system of differential equations simulating only knockout data.


Using these simulators, five large datasources involving many noise-free experiments have been generated. The characteristics of these datasources are detailed in the following Table:

Datasource Topology Experiments Genes Edges
\(Rogers_{1000}\) Power-law tail topology 1000 1000 1350
\(SynTReN_{300}\) E. coli 800 300 468
\(SynTReN_{1000}\) E. coli 1000 1000 4695
\(GNW_{1565}\) E. coli 1565 1565 7264
\(GNW_{2000}\) Yeast 2000 2000 10392

In order to generate these datasources we have simulated multifactorial data with SynTReN and GNW, which is a less informative data (Marbach et al. 2010).


