1 About the template

This section provides general description and how to use this cheminformatics workflow. In the actual analysis report, this section is usually removed.

This cheminformatics workflow template is based on the ChemmineR package and should be downloaded from Bioconductor before running the workflow. This template is a workflow that does:

  1. Loads molecules from SDF format.
  2. Help users visualize specified molecules.
  3. Convert molecules to AP and FP format.
  4. Compute the similarity and distance between molecules
  5. Plot the distance matrix.

There is no other command-line software required in this workflow. All are written in R (Linewise) steps.

2 Introduction

Users want to provide here background information about the design of their cheminformatics project.

This report describes the analysis of a cheminformatics project studying drug …

2.1 Experimental design

Typically, users want to specify here all information relevant for the analysis of their Cheminformatics study. This includes detailed descriptions of files, experimental design, reference genome, gene annotations, etc.

3 Workflow environment

systemPipeR workflows can be designed and built from start to finish with a single command, importing from an R Markdown file or stepwise in interactive mode from the R console.

This tutorial will demonstrate how to build the workflow in an interactive mode, appending each step. The workflow is constructed by connecting each step via appendStep method. Each SYSargsList instance contains instructions needed for processing a set of input files with a specific command-line or R software and the paths to the corresponding outfiles generated by a particular tool/step.

To create a Workflow within systemPipeR, we can start by defining an empty container and checking the directory structure:

sal <- SPRproject()

3.1 Load packages

This is an empty template that contains only one demo step. Refer to our website for how to add more steps. If you prefer a more enriched template, read this page for other pre-configured templates.

cat(crayon::blue$bold("To use this workflow, following R packages are expected:\n"))
cat(c("'ChemmineR", "ggplot2", "tibble", "readr", "ggpubr", "gplots'\n"),
    sep = "', '")
### pre-end
appendStep(sal) <- LineWise(code = {
}, step_name = "load_packages")

3.2 Load SDF dataset

Molecules can be loaded or downloaded. This example dataset has 100 molecules.

# Here, the dataset is downloaded. If you already have the
# data locally, change URL to local path.
appendStep(sal) <- LineWise(code = {
    sdfset <- read.SDFset("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/Samples/sdfsample.sdf")
    # rename molecule IDs by IDs in the header. If your
    # molecules' header does not have ID or not unique,
    # remove following code and use the default IDs
    cid(sdfset) <- makeUnique(sdfid(sdfset))
}, step_name = "load_data", dependency = "load_packages")

3.3 Visualize molecules

appendStep(sal) <- LineWise(code = {
    png("results/mols_plot.png", 700, 600)
    # Here only first 4 are plotted. Please choose the ones
    # you want to plot.
}, step_name = "vis_mol", dependency = "load_data", run_step = "optional")

3.4 Basic molecule inforamtion

Compute some basic molecule information and store to file, such as atom frequency matrix, molecular weight and formula.

appendStep(sal) <- LineWise(code = {
    propma <- data.frame(MF = MF(sdfset), MW = MW(sdfset), atomcountMA(sdfset))
    readr::write_csv(propma, "results/basic_mol_info.csv")
}, step_name = "basic_mol_info", dependency = "load_data", run_step = "optional")

The information can be visualized, for example, a boxplot of atom frequency.

appendStep(sal) <- LineWise(code = {
    png("results/atom_req.png", 700, 700)
    boxplot(propma[, 3:ncol(propma)], col = "#6cabfa", main = "Atom Frequency")
}, step_name = "mol_info_plot", dependency = "basic_mol_info",
    run_step = "optional")