Contents

1 Introduction

canceR is a graphical user friendly interface to explore, compare, and analyse all available Cancer Data (Clinical data, Gene Mutation, Gene Methylation, Gene Expression, Protein Phosphorylation, Copy Number Alteration) hosted by the Computational Biology Center (cBio) at Memorial-Sloan-Kettering Cancer Center (MSKCC). canceR implements functions from various packages: 1. to acces, explore and extract Genomics Cancers Data Base of MSKCC (cgdsr,(Cerami et al. 2012, @Gao2013)),

  1. to associate phenotypes with gene expression (phenoTest, (Planet 2013)),

  2. to predict which biological process or pathway or immune system are significantly different under the phenotypes and which genes are associated (GSEA-R~[Subramanian2005,Subramanian2007]),

  3. to predict the most up/down regulated gene sets belonging to one of MSigDB collections~[Subramanian2005] (GSEAlm,(Oron, Jiang, and Gentleman 2008)),

  4. to classify genes by diseases (geNetClassifier,(Aibar et al. 2013)), or

  5. to classify genes by variable or phenotype (rpart, (Therneau, Atkinson, and Ripley 2014)),

  6. to plot genes correlations.

  7. to plot survival curves

  8. to plot muti-omics data using Circos style (circlize, (Gu et al. 2014))

2 Installation

2.1 Suplementary librairies (Not R packages)

For Debian distribution (GNU/Linux)

sudo apt-get insall ("r-cran-tcltk2","r-cran-tkrplot")
sudo apt-get install(Tk-table, BWidget)
sudo apt-get install libcurl4-openssl-dev
sudo apt-get install r-cran-xml

For Windows distribution

LibXml2: parser for XML

For OS X distribution XQuartz: graphics device

run R and write theses lines in console to install dependencies.

install.packages("RCurl", "XML")
install.packages(c("cgdsr","tkrplot","Formula","RCurl" ))

2.2 dependencies from Bioconductor

library(biocManager)
biocManegr::install("GSEABase", "GSEAlm","geNetClassifier","Biobase", "phenoTest")
BiocManager::install("canceR")

Get the development version from github

library(devtools)
devtools::install_git("kmezhoud/canceR")

3 Starting Window

run R and write theses lines in console to run canceR package.

library(canceR)
canceR()

The starting window (Figure~@ref(fig:starting.png)-1) loads all available Cancer Studies (Figure~@ref(fig:starting.png)-3) or search some ones by keyword (Figure 1 4). Before to get Cancers Data (Figure 1 7), it is important to set workspace for output files (Figure~@ref(fig:starting.png)-1). The starting window displays Help menu where user can get this vignette (Figure~@ref(fig:starting.png)-2).

(#fig:starting.png) Starting Windows. 1, File Menu; 2, Help Menu; 3, Button to get all available studies; 4, Button to get only matched studies using key words; 5, list box that displays the number of studies listed in 6; 6, list box that displays the result of quering the Cancer Genomics Data Server. User can select one or multiple Studies; 7, Button to get Genetic Profiles and Clinical data for selected Studies.

3.1 Setting Workspace

canceR package uses input files to compute models and generates output files for biological knowledges. It is important to set workspace and know the location of used files and results. The Button Set Workspace allows user to set easily workspace (Figure~@ref(fig:setWorkspace.jpeg)). User needs just to browse workspace folder or creates a new one. The others necessary folders would be created by simple pressing Set buttons.

(#fig:setWorkspace.jpeg) Setting Workspace

4 Main Window

After selecting studies and pressing on Get Cases and Genetic Profiles Button, the main window appears (Figure~@ref(fig:mainWindow.png)) and displays the progress of loading data of selected studies. The Main Window has a Toolbar with Menus (see following paragraphs). It is subdivised in two columns. The first column lists Cases for all selected studies. The first line of every study indicates its Index and its short description. The remain lines enumerate Cases with short description of data type and the number of samples. The second list box shows selected Cases. Similarly, the second column displays informations of Genetic Profiles. User can select a single or multiple lines with attention to correspond the Case with appropriate Genetic Profile.

(#fig:mainWindow.png) Main Window of canceR package. 1, Toolbar, 2, list box of loaded Cases; 3, list box of selected Cases; 4, list box of loaded Genetic Profiles; 5, list box of selected Genetic Profiles

4.1 Gene List

The first step to get genomics data is to specify what are interesting genes for user. The Gene List button browses folders to load Gene list file or displays examples of genes list. The genes could be in text file (.txt) with one gene by line using HUGO gene Symbol. The function removes automatically duplicate genes.

4.2 Clinical Data

The Multiple Cases button displays successively selected Cases. Results are returned in a table with row for each case and a column for each clinical attribute (Figure 4B). User could select all or some clinical data by checking dialog box (Figure 4A). For example, we select clinical attributes:

  • Overall Survival months: Overall survival, in months.

  • Overall Survival Status: Overall survival status, usually indicated as LIVING or DECEASED.

  • Disease Free Survival months: Disease free survival, in months.

  • Disease Free Survival status: Disease free survival status, usually indicated as DiseaseFree or Recurred/Progressed.

  • Age at diagnosis: Age at diagnosis.

(#fig:clinicalData.png) Getting clinical data for Breast Invasive Carcinoma. A, Dialog Check Box to select clinical data; B, Results of quering clinical data of Breast Invasive Carcinoma (TCGA, Nature 2012)

4.3 Mutation

User can search all mutation in gene list of all selected studies. He needs to select All tumors samples in Cases and Mutations in gentics profiles to get mutations (Figure~@ref(fig:Mutation1.png)).

(#fig:Mutation1.png) Selection Mutation Cases

Mutation function allows user to select about 15 informations corresponding to mutations (Figure~1A). The results is a table with rows for each sample/case, and columns corresponding to the informations cheched in dialog mutation check box (Figure~1B).

Figure 1: Select Clinical data with Mutation


User can filter mutation result only for specific amino acid change (Figure~@ref(fig:specificMutation.png)).

(#fig:specificMutation.png) Specific Mutation

4.4 Methylation

User can search gene methylation and its correlation with mRNA expression. User needs to select Cases and Genetic Profiles with same methylation assay (HM450 or HM27) for the same study. Multiple Cases selection is allowed for one gene list (Figure~@ref(fig:methylation.png)).

(#fig:methylation.png) Selecting methylation data

The dialog box of methylation function allows user to specify the threshold of the correlation rate (Figure~@ref(fig:Met_rate)A). cBioportal~(Cerami et al. 2012, @Gao2013) includes only methylation data from the probe with the strongest negative correlation between the methylation signal and the gene’s expression. The result table (Figure~@ref(fig:Met_rate)B) lists genes with median of rate upper than 0.8.

(#fig:Met_rate) A, Methylation dialog box; B, Methylation results. Correlation of silensing gene expression by methylation (HM27) with correlation rate r > 0.8 for gene list in Breast Invasive Carcinoma (TCGA, Nature 2012).

4.5 Profiles

The function get Profile Data depends on gene list, cases, and genetic profiles. If a Single gene option is done, dialog box appears to specify gene symbol (Figure~@ref(fig:singleProfile.png)). The returned dialog check box allows user to choose some/all profiles data (Figure~@ref(fig:singleProfile.png)B). The result (table) lists some/all genetic profiles data in columns (CNA, Met, Mut, mRNA,RPPA) and all available samples in rows (Figure~@ref(fig:singleProfile.png)C). Oppositely, if Multiple genes option is done, the returned table displays genes expression for gene list (column) for all samples (rows). In the case of multiple genes, the tables are saved in Results/ProfilesData folder (Figure~@ref(fig:multipleProfiles.png)).