1 Overview

This document describes how to create a Bioconductor probe package from the reporter sequence information of a particular chip. Probe packages are a convenient way for distributing and storing the probe sequences and related information.

First, let us load the AnnotationForge package.


1.1 For Affymetrix genechips

In this section we see how a probe package can be created for Affymetrix genechips from the tabulator-separated sequence files that can be obtained from the vendor (at As an example, the file HG-U95Av2_probe_tab.gz is provided in the extdata subdirectory of the AnnotationForge package.

filename <- system.file("extdata", "HG-U95Av2_probe_tab.gz", 
outdir   <- tempdir()
me       <- "Wolfgang Huber <>"
species  <- "Homo_sapiens"
                 datafile   = gzfile(filename, open="r"),
                 outdir     = outdir,
                 maintainer = me,
                 species    = species,
                 version    = "0.0.1")
## Importing the data.
## Creating package in /tmp/RtmpG2hsm1/hgu95av2probe 
## Writing the data.
## Checking the package.
## *** WARNINGS ***
##  * checking data for ASCII and uncompressed saves ... WARNING Status: 1 WARNING, 1 NOTEBuilding the package.
## [1] "hgu95av2probe"
## [1] "BiocStyle"     "hgu95av2probe"

1.2 For other chiptypes

To deal with different file formats and additional types of probe annotation data from public or in-house databases, the function makeProbePackage offers a great deal of flexibility. The user can specify her own import function through the importfun argument. By default, its value is getProbeDataAffy, a function that reads tabular Affymetrix genechip sequence files. Import functions for other types of arrays can be adapted from this prototype.

The help pages and R code contained in the produced packages are derived from a template directory that obeys the usual R package conventions (1999). The input parameters of an import function are A prototype for such a directory is provided within the package AnnotationForge. To facilitate the automated production of large numbers of similar packages, we provide a text substitution mechanism similar to the one used in the GNU configure system.

  • a character string naming the array type

  • the input data files

  • a character string with the directory name of a package template directory, containing at least a file DESCRIPTION, a directory man with a file package.Rd, a directory data, and possible other directories and files, conforming to the usual R package conventions.

  • … any sort of further parameters, as necessary.

The output of an import function is a named list with elements

  • pkgname: a character string with the package name.

  • dataEnv: an environment containing an arbitrary number of data objects; these make the core of the package. Among these, there should be a data frame whose name is the value of pkgname with a column sequence. This data frame may have other columns such as the \(x\)- and \(y\)-position of the probes on the array, identifiers linking a probe to genomic databases, or its length and relative position within the gene it represents. The objects in the environment will be saved as individual .rda files of the same name into the data directory of the produced package. Documentation needs to be provided for the columns of the data frame, as well as for the other objects.

  • symVals: a named list, containing the symbol-value substitutions that are used in the text processing. It must at least contain the elements

    • `ARRAYTYPE`: name of the array

    • DATASOURCE: a textual description of how the data were obtained. It should contain the URL, or the name of the company / person.

For more details, please refer to the help files for the functions makeProbePackage and getProbeDataAffy. For an example, refer to the source code of getProbeDataAffy.


R Foundation. 1999. “Writing R Extensions.”