627 |
13 Mar 08 |
nicklas |
1 |
---------------------------------------------------------------------- |
627 |
13 Mar 08 |
nicklas |
2 |
{{{ |
627 |
13 Mar 08 |
nicklas |
Copyright (C) 2008 |
627 |
13 Mar 08 |
nicklas |
4 |
|
627 |
13 Mar 08 |
nicklas |
This file is part of Illumina plug-in package for BASE. |
627 |
13 Mar 08 |
nicklas |
Available at http://baseplugins.thep.lu.se/ |
627 |
13 Mar 08 |
nicklas |
BASE main site: http://base.thep.lu.se/ |
627 |
13 Mar 08 |
nicklas |
8 |
|
627 |
13 Mar 08 |
nicklas |
This is free software; you can redistribute it and/or |
627 |
13 Mar 08 |
nicklas |
modify it under the terms of the GNU General Public License |
940 |
27 Jan 09 |
martin |
as published by the Free Software Foundation; either version 3 |
627 |
13 Mar 08 |
nicklas |
of the License, or (at your option) any later version. |
627 |
13 Mar 08 |
nicklas |
13 |
|
627 |
13 Mar 08 |
nicklas |
The software is distributed in the hope that it will be useful, |
627 |
13 Mar 08 |
nicklas |
but WITHOUT ANY WARRANTY; without even the implied warranty of |
627 |
13 Mar 08 |
nicklas |
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
627 |
13 Mar 08 |
nicklas |
GNU General Public License for more details. |
627 |
13 Mar 08 |
nicklas |
18 |
|
627 |
13 Mar 08 |
nicklas |
You should have received a copy of the GNU General Public License |
941 |
27 Jan 09 |
martin |
along with BASE. If not, see <http://www.gnu.org/licenses/>. |
627 |
13 Mar 08 |
nicklas |
21 |
}}} |
627 |
13 Mar 08 |
nicklas |
22 |
---------------------------------------------------------------------- |
627 |
13 Mar 08 |
nicklas |
23 |
|
627 |
13 Mar 08 |
nicklas |
== Introduction == |
627 |
13 Mar 08 |
nicklas |
25 |
|
627 |
13 Mar 08 |
nicklas |
This file contains only information that is specific to Illumina SNP |
627 |
13 Mar 08 |
nicklas |
data. For general information or information about expressions data |
627 |
13 Mar 08 |
nicklas |
see the README file. |
627 |
13 Mar 08 |
nicklas |
29 |
|
627 |
13 Mar 08 |
nicklas |
== Illumina SNP raw data files files == |
627 |
13 Mar 08 |
nicklas |
31 |
|
629 |
13 Mar 08 |
nicklas |
The SNP raw data files are created from BeadStudio and may contain multiple |
629 |
13 Mar 08 |
nicklas |
samples. The file should be saved as a tab-separated text file. The first |
629 |
13 Mar 08 |
nicklas |
line is the header line which contains the column names. Some of the columns |
629 |
13 Mar 08 |
nicklas |
are specific for each sample, some are common columns valid for all samples. |
629 |
13 Mar 08 |
nicklas |
Sample specific columns are prefixed with the sample name, followed by a dot |
629 |
13 Mar 08 |
nicklas |
that is followed by a generic column name. For example, UC199_B.GType, where |
629 |
13 Mar 08 |
nicklas |
UC199_B is the sample name and GType is the generic column name. The following |
629 |
13 Mar 08 |
nicklas |
table lists the columns that are required by the plug-ins in this package. |
627 |
13 Mar 08 |
nicklas |
40 |
|
629 |
13 Mar 08 |
nicklas |
|| '''Column''' || '''Column type''' || '''Example value''' || |
629 |
13 Mar 08 |
nicklas |
|| Address || Common || 830575 || |
629 |
13 Mar 08 |
nicklas |
|| GenTrain Score || Common || 0,8607027 || |
629 |
13 Mar 08 |
nicklas |
|| GType || Sample || BB || |
629 |
13 Mar 08 |
nicklas |
|| Log R Ratio || Sample || 0,1801754 || |
629 |
13 Mar 08 |
nicklas |
|| B Allele Freq || Sample || 1 || |
629 |
13 Mar 08 |
nicklas |
47 |
|
629 |
13 Mar 08 |
nicklas |
If any of those columns are missing, the plug-ins may not function correctly. |
629 |
13 Mar 08 |
nicklas |
Additional columns, both common and sample-specific, may be present in the data |
629 |
13 Mar 08 |
nicklas |
file. When the import plug-in parses the input file it will split the file into |
629 |
13 Mar 08 |
nicklas |
one file for each sample. The new files will include all common columns, and |
629 |
13 Mar 08 |
nicklas |
all sample columns for a specific sample. The column headers in the new files |
629 |
13 Mar 08 |
nicklas |
only includes the generic column name, without the sample name prefix. |
629 |
13 Mar 08 |
nicklas |
54 |
|
629 |
13 Mar 08 |
nicklas |
55 |
|
627 |
13 Mar 08 |
nicklas |
== Illumina SNP manifest files == |
627 |
13 Mar 08 |
nicklas |
57 |
|
627 |
13 Mar 08 |
nicklas |
The SNP manifest files are comma separted text files, that contains |
718 |
04 Jun 08 |
jari |
information about the probes on a specific SNP array, including gene symbol, |
627 |
13 Mar 08 |
nicklas |
probe sequence, and so on. In BASE, the manifest files are used to create |
627 |
13 Mar 08 |
nicklas |
array designs that describe the probe content of a specific SNP Array. |
627 |
13 Mar 08 |
nicklas |
62 |
|
627 |
13 Mar 08 |
nicklas |
The manifest files are comma separated text files composed of 2 sections named Heading |
627 |
13 Mar 08 |
nicklas |
and Assay. The first section is the Heading section. It is preceeded by a row containing the |
627 |
13 Mar 08 |
nicklas |
text [Heading]. In the Heading section some information is presented including the number |
627 |
13 Mar 08 |
nicklas |
of SNPs described in the file. See below for an example of the Heading section. |
627 |
13 Mar 08 |
nicklas |
67 |
{{{ |
627 |
13 Mar 08 |
nicklas |
[Heading] |
627 |
13 Mar 08 |
nicklas |
Descriptor File Name(s),HumanCNV370v1_C.bpm |
627 |
13 Mar 08 |
nicklas |
Assay Format,Infinium |
627 |
13 Mar 08 |
nicklas |
SNP Count,370404 |
627 |
13 Mar 08 |
nicklas |
72 |
}}} |
627 |
13 Mar 08 |
nicklas |
Following the Heading section is the Assay section wich is preceeded by a row |
627 |
13 Mar 08 |
nicklas |
containing the text [Assay]. The first row of the Assay section, i.e., the row |
627 |
13 Mar 08 |
nicklas |
after [Assay] contain the header for the Assay section. |
627 |
13 Mar 08 |
nicklas |
See below for an example of Assay header and how information |
627 |
13 Mar 08 |
nicklas |
in the manifest file is mapped to BASE. |
627 |
13 Mar 08 |
nicklas |
78 |
|
627 |
13 Mar 08 |
nicklas |
== Mapping reporter/control annotations from SNP manifest files to BASE == |
627 |
13 Mar 08 |
nicklas |
80 |
|
627 |
13 Mar 08 |
nicklas |
The table below shows how the [Assay] section in the manifest file are mapped to |
627 |
13 Mar 08 |
nicklas |
reporter annotations in BASE. Annotations in <brackets> are new annotations |
627 |
13 Mar 08 |
nicklas |
defined in the illumina-extended-properties.xml file. Columns marked |
627 |
13 Mar 08 |
nicklas |
with - are not mapped to BASE. |
627 |
13 Mar 08 |
nicklas |
85 |
|
627 |
13 Mar 08 |
nicklas |
|| '''Manifest column''' || '''BASE reporter annotation''' || '''Example value''' || |
627 |
13 Mar 08 |
nicklas |
|| IlmnID || External ID || rs10000010-126_B_F_IFB1153208421:0 || |
627 |
13 Mar 08 |
nicklas |
|| Name || Name || rs10000010 || |
627 |
13 Mar 08 |
nicklas |
|| IlmnStrand || <Ilmn strand> || Bot || |
627 |
13 Mar 08 |
nicklas |
|| SNP || <SNP> || [T/C] || |
627 |
13 Mar 08 |
nicklas |
|| AddressA_ID * || - || 900010475 || |
627 |
13 Mar 08 |
nicklas |
|| AlleleA_ProbeSeq || Sequence || || |
627 |
13 Mar 08 |
nicklas |
|| AddressB_ID || - || || |
627 |
13 Mar 08 |
nicklas |
|| AlleleB_ProbeSeq || - || || |
627 |
13 Mar 08 |
nicklas |
|| Chr || Chromosome || 4 || |
627 |
13 Mar 08 |
nicklas |
|| MapInfo || <Start position> || 21227772 || |
627 |
13 Mar 08 |
nicklas |
|| Ploidy || - || 2 || |
627 |
13 Mar 08 |
nicklas |
|| Species || Species || Homo sapiens || |
627 |
13 Mar 08 |
nicklas |
|| CustomerStrand || - || BOT || |
627 |
13 Mar 08 |
nicklas |
|| IllumicodeSeq || - || || |
627 |
13 Mar 08 |
nicklas |
|| TopGenomicSeq || - || || |
627 |
13 Mar 08 |
nicklas |
102 |
|
627 |
13 Mar 08 |
nicklas |
103 |
|
627 |
13 Mar 08 |
nicklas |
* The AddressA_ID is not a reporter annotation. It is used to identify the |
627 |
13 Mar 08 |
nicklas |
probe on an array design. It's value is found in the Address column in the |
627 |
13 Mar 08 |
nicklas |
raw data files and is used to find the reporter. |
627 |
13 Mar 08 |
nicklas |
107 |
|
627 |
13 Mar 08 |
nicklas |
The column mappings for the [Assay] section can be changed by modifying |
627 |
13 Mar 08 |
nicklas |
the existing import configuration or creating a new configuration. |
627 |
13 Mar 08 |
nicklas |
110 |
|
627 |
13 Mar 08 |
nicklas |
== Getting started == |
627 |
13 Mar 08 |
nicklas |
112 |
|
627 |
13 Mar 08 |
nicklas |
1. Install this package as described by the instructions in the INSTALL file. |
627 |
13 Mar 08 |
nicklas |
2. Import reporter annotations. You will need one or more SNP manifest files for this. |
627 |
13 Mar 08 |
nicklas |
* Upload the manifest file(s) to BASE. |
627 |
13 Mar 08 |
nicklas |
* Go to the View -> Reporters menu. |
627 |
13 Mar 08 |
nicklas |
* Click on the Import button. |
627 |
13 Mar 08 |
nicklas |
* Use the auto-detect function or select the Illumina SNP reporter importer plug-in. |
627 |
13 Mar 08 |
nicklas |
* Select the manifest file. |
627 |
13 Mar 08 |
nicklas |
* Finish the job registration and wait for the plug-in to complete. |
627 |
13 Mar 08 |
nicklas |
* Repeat this one time for each manifest file. |
627 |
13 Mar 08 |
nicklas |
3. Create array designs. You will need one array design for each SNP manifest file. |
627 |
13 Mar 08 |
nicklas |
* Go to the Array LIMS -> Array designs menu. |
627 |
13 Mar 08 |
nicklas |
* Click on the New button. |
627 |
13 Mar 08 |
nicklas |
* Choose the Illumina/SNP platform. |
627 |
13 Mar 08 |
nicklas |
* We recommend that you give the array design the same name as the manifest file. |
627 |
13 Mar 08 |
nicklas |
* Switch to the Data files tab and select the manifest file. |
627 |
13 Mar 08 |
nicklas |
* Click on Save. |
627 |
13 Mar 08 |
nicklas |
* Repeat this for each manifest file. |
627 |
13 Mar 08 |
nicklas |
4. Import raw data. You will need a SNP raw data file. |
627 |
13 Mar 08 |
nicklas |
* Upload the file to BASE. |
627 |
13 Mar 08 |
nicklas |
* Go to the View -> Experiments page and create a new Experiment. |
627 |
13 Mar 08 |
nicklas |
* Select the SNP platform for the experiment. |
627 |
13 Mar 08 |
nicklas |
* Save the experiment and then click on the newly created experiment in the list. |
627 |
13 Mar 08 |
nicklas |
* Click on the Import button. |
627 |
13 Mar 08 |
nicklas |
* Use the auto-detect function or select the Illumina SNP raw data importer plug-in. |
627 |
13 Mar 08 |
nicklas |
* Select the manifest file. |
627 |
13 Mar 08 |
nicklas |
* Select one of the array designs created in step 3. |
627 |
13 Mar 08 |
nicklas |
* Finish the job registration and wait for the plug-in to complete. |
627 |
13 Mar 08 |
nicklas |
* Repeat this if you have more raw data files. |
627 |
13 Mar 08 |
nicklas |
141 |
|
627 |
13 Mar 08 |
nicklas |
Tip! Steps 1-3 only needs to be done a single time for a BASE installation. If more than |
718 |
04 Jun 08 |
jari |
one user is going to use the Illumina package we recommend that the |
718 |
04 Jun 08 |
jari |
array designs created, and the associated manifest files, |
627 |
13 Mar 08 |
nicklas |
in step 3 are shared to the appropriate users, for example, the Everyone group. |
627 |
13 Mar 08 |
nicklas |
146 |
|
629 |
13 Mar 08 |
nicklas |
== Analyzing SNP data == |
629 |
13 Mar 08 |
nicklas |
148 |
|
629 |
13 Mar 08 |
nicklas |
The first step is to create a root bioassayset. To do this: |
629 |
13 Mar 08 |
nicklas |
150 |
|
629 |
13 Mar 08 |
nicklas |
1. Goto the "Bioassay sets" tab of your experiment. |
629 |
13 Mar 08 |
nicklas |
2. Click on the "New root bioassayset" button. |
629 |
13 Mar 08 |
nicklas |
3. This should start the "Illumina SNP root biassayset creator" plug-in. |
629 |
13 Mar 08 |
nicklas |
4. You must tell it which raw data sets to use. |
629 |
13 Mar 08 |
nicklas |
5. You may also have to specify character set and/or which decimal separator |
629 |
13 Mar 08 |
nicklas |
that is used in your data files. |
629 |
13 Mar 08 |
nicklas |
6. Finish the job registration and wait for the plug-in to complete. |
629 |
13 Mar 08 |
nicklas |
158 |
|
629 |
13 Mar 08 |
nicklas |
The above procedure creates a root bioassayset which means that data from the files |
629 |
13 Mar 08 |
nicklas |
are imported into the database. BASE can only store data as numeric values in a |
629 |
13 Mar 08 |
nicklas |
predetermined number of "channels". The number of channels for SNP data is 3, which |
629 |
13 Mar 08 |
nicklas |
means that 3 data columns can be imported. Besides this, the Address column is |
629 |
13 Mar 08 |
nicklas |
imported as the 'position' value. This means that plug-ins that are used later in |
629 |
13 Mar 08 |
nicklas |
the analysis have the possibility to extract other columns directly from the data |
629 |
13 Mar 08 |
nicklas |
files, simply by finding the row which has the same Address value as the position. |
629 |
13 Mar 08 |
nicklas |
166 |
|
629 |
13 Mar 08 |
nicklas |
Note! This position->Address relation is guaranteed to be correct only for |
629 |
13 Mar 08 |
nicklas |
bioassay sets living in the same "data cube" as the root bioassay set. |
629 |
13 Mar 08 |
nicklas |
During the analysis, other plug-ins may decide to create a new "data cube", |
629 |
13 Mar 08 |
nicklas |
re-arrange the position numbers and break the mapping. |
629 |
13 Mar 08 |
nicklas |
171 |
|
629 |
13 Mar 08 |
nicklas |
The table below shows how data from the file are imported into the database. |
629 |
13 Mar 08 |
nicklas |
173 |
|
629 |
13 Mar 08 |
nicklas |
|| '''Column''' || '''Imported to''' || |
629 |
13 Mar 08 |
nicklas |
|| Address || position || |
629 |
13 Mar 08 |
nicklas |
|| GType || ch(1): AA=1.0, AB=0.0, BB=-1.0, Other values=null || |
629 |
13 Mar 08 |
nicklas |
|| Log R Ratio || ch(2) || |
629 |
13 Mar 08 |
nicklas |
|| B Allele Freq || ch(3) || |
629 |
13 Mar 08 |
nicklas |
179 |
|
629 |
13 Mar 08 |
nicklas |
Tip! The installation program has created 3 formulas: GType=ch(1), |
629 |
13 Mar 08 |
nicklas |
Log R Ratio=ch(2) and B Allele Freq=ch(3). The formulas can be used when |
629 |
13 Mar 08 |
nicklas |
displaying or plotting data instead of the channel numbers. It means no |
629 |
13 Mar 08 |
nicklas |
real difference, except that the formula names will be used in column |
629 |
13 Mar 08 |
nicklas |
headers, etc. instead of the generic channel numbers. |