|
The statistical analysis of quantitative post-genomic data can
present a number of technical challenges to the entry-level
scientist. For example, using tools
such as R and MATLAB to analyse numerical
data require knowledge of their programming languages in order
to implement the underlying analysis algorithm for their effective
use. In
addition, there may be problems combining the use of two or more of
these tools as the transfer of data may require manual copying and
pasting between different user interfaces. Furthermore, these
intermediate data may require a transformation step due to it being
incompatible as input to the next service.
The Taverna workflow system can be used to construct pipelines
integrating the use of computational tools for the statistical
analysis of quantitative data. These workflows can automate the
transfer of data between these tools as well as their invocation
with appropriate parameters. A workflow showing the analysis of
microarray data as an example of post-genomic data stored in a maxdload2
database using R is shown below.
|
|
Note: This workflow requires access to R
deployed as a service using RServe on a server or on your PC where
your Taverna workbench is installed. See here
for further information.
This example workflow analyses microarray data from
Castrillo
et al., (2006). A series of t-tests
are performed on gene expression levels in case and control data
sets in order to identify differentially-expressed genes. How these
genes may relate to changes in biological processes in the cell are
then investigated by identifying common terms assoicated with the
genes from the Gene Ontology.
|
|
|
There are three parts to this workflow. The first part involves the
retrieval of control and case microarray data from the maxdload2
database using a web service interface generated by maxdBrowse.
Two beanshell scripts are
used to allow the workflow user to select control and case data sets
for analysis from Taverna.
|
|
|
The expression levels for each gene between the control and case data
sets are then analysed by t-tests. These t-tests are declared in a
script written in the R programming language which is executed by R
deployed as a service using RServe.
The R service is invoked using the RShell processor from Taverna.
The RShell processor is available from the service palette.
|
|
|
The RServe processor in Taverna is configured with an R script which
implements the t-test analysis. The RServe processor makes input
ports available as variables named after the port, and output ports
read their named variable after executing the script. The last
assigned value to the variable will be the one returned from the
processor.
If using a remote R server, the RServe processor
must be configured with its domain name, and a username and password
if
required for access.
Its also possible to use Bioconductor packages for microarray data
analysis in your R scripts if they have been installed on the R
server accessed by the Taverna workflow. The same workflow described
on this web page but using the LIMMA Bioconductor package for
identifying differentially-expressed genes can be downloaded
here.
|
|
|
The list of significant genes which are
differentially expressed between the selected
control and case data sets are then analysed by the
'analyseGenesPDFOutput' task which invokes
the GoTermFinder
tool. This tool identifies common terms from the biological process
sub-ontology of the Gene Ontology which are associated with the list
of genes.
|
|
|
There are two outputs generated from this workflow. Firstly, a report
in the form of a PDF file is generated showing the association of
the genes with common terms from the Gene Ontology which were
identified by the GoTermFinder tool. The PDF report can be viewed
using the PDF renderer plugin by right-clicking on the PDF results
file and selecting view as PDF.
|
|
|
Secondly, a text
file containing comma-separated values is produced consisting of
the list of significant genes annotated with its gene name, ORF
number, a description of its function and its t-test p-value. To
view in tabular format, right-click on the CSV file and select view
as comma separated values.
The PDFRenderer and CSVRenderer plugins have to be installed for
the
display of PDF files and CSV in the Taverna workbench. Instructions
for installing these renderer plugins are available here.
|
|
|
The results from the carbon and nitrogen t-test comparisons described
in the paper can be downloaded here.
|