A Friendly Statistics Toolbox for Microarray Analysis

The friendly statistics toolbox for microarray analysis (FSPMA) is an R-library that is controlled by a definition file. FSPMA is available under the GPL 2 license. It is free software that comes with NO WARRANTY.

Functionality

FSPMA, (Sykacek et al. 2005), is an R-library that can be used to analyse microarray data. FSPMA's concept is to base microarray analysis on a definition file that describes the experiment and which analysis steps should be done. The definition file allows analysis without adapting or writing R-scripts. In addition it serves as documantation of the analysis run. FSPMA can be used with data from different platforms (single and two colour arrays) with optional preprocessing steps done before the data gets loaded into FSPMA. The main restriction of FSPMA is that the experiment must be a balanced reference design. Analysis includes handling of bad quality flagged samples, conventional normalization and normalization with spike RNA, calculation of ANOVA tables and variance components and finally gene ranking based on within ANOVA contrasts and by using per gene ANOVA models. FSPMA is wrapped around YASMA, (Wernisch et al. 2003), which it extends by some preprocessing and normalization options and by more general contrasts that allow e.g. analysis of longitudinal studies. To find out more about FSPMA's functionality, it is recommended to inspect FSPMA's tutorial (Sykacek & Furlong 2005) which is part of FSPMA's documentation files.

Installation

R 2.1 (and later) under MS Windows 32

The easiest way to install FSPMA for R under Windows is to download the corresponding binary distribution. If your distribution is not supported, you have to download the source distribution found below and install the development version for Windows. Details of the latter can be found at the Rtools page.
R versionDownload link
R 2.1fspmawin_R.2.1.zip
R 2.2fspmawin_R.2.2.zip
R 2.3fspmawin_R.2.3.zip

Either approach will install FSPMA and a slightly modified version of the YASMA library, (Wernisch et al. 2003), on which FSPMA depends. Some minor modifications of the original YASMA library were necessary for compatibility with R ver. 2.1 and with the R Win32 tools, which do not support drand48() random number generation. Instead the Win32 port uses R's internal uniform random number generator.

R 2.1 (and later) with other operating systems, source distribution

For all other operating systems, one has to download the source distribution fspmax_062008.zip. After unziping the file, running the command "fspmaxinstall.sh" will install FSPMA and a modified version of the YASMA library, (Wernisch et al. 2003). The source distribution of the library is known to work with Linux, Apple OSX and MS Windows.

Testing the installation and learning about FSPMA

To test the installation, one should use the examples provided in FSPMA's online help. See the package overview for details. There are five zip archives containing definition files and the corresponding data files. These examples are meant for evaluation purpouse and contain a small number of genes of a larger study done by R. Furlong of the Dept. of Pathology, University of Cambridge. The run time of each example is thus rather small. Downloading and extracting fspma-tutorial.zip from the package overview page in FSPMA's online help, one can obtain the "fspma.Rnw" Sweave file (see the R help on how to use Sweave) which together with the experiments will generate the LaTex sources of the FSPMA tutorial (Sykacek & Furlong 2005). This step will run all code fragments in the tutorial and requires that all experimental data and the Sweave file to reside unziped in the same directory. Individual experiments can be run by downloading and extracting the relevant archive into a local directory. Analysis is started by invoking "fspma.wrapper" on the R command line using the name of the definition file as parameter, exactly as is shown in the tutorial. Refer to (Sykacek & Furlong 2005) for further details on the output of such analysis runs and how to produce different visualisations of the data and the analysis results.

A real world dataset

We provide here an additional documented definition file for a public Affymetrix dataset. This file must be unzipped (e.g. gunzip) and stored in a directory of your choice. The microarray data that will be analysed by this file have been published as (Small et al . 2005) and can be downloaded form the NCBI GEO Datasets server under reference GDS660. These data files must be stored in the same directory as the definition file. Subsequently one has to start R in that directory and type the following commands at the command line. Different to the examples provided with the library, this definition file provides an analysis of realistic size. In particular evaluating base level comparisons which are shortcuts for several pair wise comparisons and k nearest beighbour imputation can be computationally quite demanding. The definition file of this example is discussed in (Sykacek 2005) which is also part of FSPMA's online help.

>>library(fspma)
>>ret <- fspma.wrapper('tstsgd_A.def')

As soon as the script terminates, there will be several additional files in that directory. These files contain the normalized raw data and a corresponding effects description, a file with an ANOVA table and variance components (although the latter will not show up in this analysis, since there is only one random effect which is captured by the residual noise) and several files that contain the rank lists that correspond to different tests.

Further Information

FSPMA comes with extensive documentation. There are two tutorial like technical reports, one provides an overview and the second a detailed discussion of definition files. In addition all user level functions of FSPMA are described in detail in the online help.

Acknowledgements

This work was done at the Department of Pathology and the Department of Genetics, University of Cambridge and funded by the BBSRC's Exploiting Genomics initiative under ref. 8/EGH16106, "Shared Genetic Pathways in Cell Number Control". FSPMA is joint work with Gos Micklem and Rob Furlong and relies heavily on Lorenz Wernisch's YASMA package.

References

(Small et al. 2005)
C.L. Small, J. E. Shima, M. Uzumcu, M. K. Skinner, and M. D. Griswold. Profiling gene expression during the differentiation and development of the murine embryonic gonad. Biol Reprod., 72(2):492–501, 2005.
(Sykacek et al. 2005)
P. Sykacek, R. Furlong and G. Micklem. A Friendly Statistics Package for Microarray Analysis, Abstract and PDF available from Bioinformatics Advance Access. An early preprint is available here as pdf and gzipped postscript.
(Sykacek & Furlong 2005)
P. Sykacek and R. Furlong. A FSPMA tutorial. available in pdf and as gzipped postscript.
(Sykacek 2005)
P. Sykacek. A reference to FSPMA definition files. available in pdf and as gzipped postscript.
(Wernisch et al . 2003)
L. Wernisch, S. L. Kendall, S. Soneji, A. Wietzorrek, T. Parish, J. Hinds, P. G. Butcher, and N. G. Stoker. Analysis of whole-genome microarray replicates using mixed models. Bioinformatics, 19(1):53– 61, 2003.

Peter's FSPMA Lecture

Here you find all material required for a successful completion of my FSPMA lecture. Lecture notes are available in pdf or gzipped postscript format. For the practical you need to download the latest source distribution pf FSPMA, fspmax_062008.zip. Installation instructions are provided here. Instructions and data for the hands on experiments can be obtained as tared gzip archive (A login will be provided during the lecture). Copy the file into an empty directory, untar it (tar -xzf fspmaexercise.tar.gz) and follow the instrcutions in exercise.txt. The practical will contribute to your lecture marks. It is thus essential that you work out all problems in exercise.txt and pack (tar -czf lecture-result.tar.gz exercise.txt affy.def cdna.def) your completed version of the file together with all def files into a zip archive and send it as attachment to my BOKU email address (peter.sykacek@boku.ac.at) using the subject fspma lecture. The deadline for sending these results in winter term 07/08 is Monday 11-th of February 2008 8:00 am. The deadline for summer term 08 is Monday the 7-th of July 2008. After inspection of your solutions, I will send out emails with proposed marks. If you feel that you deserve a better assessment, we can arrange for a verbal examination. The exam is based on a discussion, where you will be asked to comment on certain aspects of the problems you solved during the practical part of the lecture. For the exam it is thus vital that you understand how you obtained the results you handed in. Note that I will invite all course participants to a verbal examination, if I get the impression that solutions were not obtained independently. You may ask your colleagues for advise, but handing in plagiates will not be tolerated.

Bayesian Introduction during the Computational Mathematics and Bioinformatics Lecture (851.305)

You can download the lecture notes of the Bayesian Introduction as pdf file. Three prototypical questions covering the Bayesian part of the Computational Mathematics and Bioinformatics Lecture (851.305) can be downloaded here as well: examquestions.pdf. Such questions could be part of the statistics exam of the Computational Mathematics and Bioinformatics Lecture (851.305). Note that I can not yet answer any questions about when the Statistics exams will take place and how they will be handeled.