5244 |
12 Feb 10 |
nicklas |
1 |
<?xml version="1.0" encoding="UTF-8"?> |
5244 |
12 Feb 10 |
nicklas |
2 |
<!DOCTYPE appendix PUBLIC |
5244 |
12 Feb 10 |
nicklas |
3 |
"-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN" |
5244 |
12 Feb 10 |
nicklas |
4 |
"../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd"> |
5244 |
12 Feb 10 |
nicklas |
5 |
<!-- |
5244 |
12 Feb 10 |
nicklas |
$Id $ |
5244 |
12 Feb 10 |
nicklas |
7 |
|
5244 |
12 Feb 10 |
nicklas |
Copyright (C) 2010 Nicklas Nordborg |
5244 |
12 Feb 10 |
nicklas |
9 |
|
5244 |
12 Feb 10 |
nicklas |
This file is part of BASE - BioArray Software Environment. |
5244 |
12 Feb 10 |
nicklas |
Available at http://base.thep.lu.se/ |
5244 |
12 Feb 10 |
nicklas |
12 |
|
5244 |
12 Feb 10 |
nicklas |
BASE is free software; you can redistribute it and/or |
5244 |
12 Feb 10 |
nicklas |
modify it under the terms of the GNU General Public License |
5244 |
12 Feb 10 |
nicklas |
as published by the Free Software Foundation; either version 3 |
5244 |
12 Feb 10 |
nicklas |
of the License, or (at your option) any later version. |
5244 |
12 Feb 10 |
nicklas |
17 |
|
5244 |
12 Feb 10 |
nicklas |
BASE is distributed in the hope that it will be useful, |
5244 |
12 Feb 10 |
nicklas |
but WITHOUT ANY WARRANTY; without even the implied warranty of |
5244 |
12 Feb 10 |
nicklas |
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
5244 |
12 Feb 10 |
nicklas |
GNU General Public License for more details. |
5244 |
12 Feb 10 |
nicklas |
22 |
|
5244 |
12 Feb 10 |
nicklas |
You should have received a copy of the GNU General Public License |
5244 |
12 Feb 10 |
nicklas |
along with BASE. If not, see <http://www.gnu.org/licenses/>. |
5244 |
12 Feb 10 |
nicklas |
25 |
--> |
5244 |
12 Feb 10 |
nicklas |
26 |
<sect1 id="appendix.fileformats.bfs" > |
5782 |
04 Oct 11 |
nicklas |
27 |
<?dbhtml filename="bfs.html" ?> |
5244 |
12 Feb 10 |
nicklas |
28 |
<title>The BFS (BASE File Set) format</title> |
5244 |
12 Feb 10 |
nicklas |
29 |
|
5244 |
12 Feb 10 |
nicklas |
30 |
<para> |
5244 |
12 Feb 10 |
nicklas |
31 |
The BASE File Set (BFS) format is a collection of file formats that can be used |
5244 |
12 Feb 10 |
nicklas |
32 |
together to transport all kinds data. The major use is to send spot data |
5245 |
15 Feb 10 |
nicklas |
33 |
to a plug-in for analysis and then to import the analyzed results. We |
5244 |
12 Feb 10 |
nicklas |
34 |
have tried to keep the format generic and extendable so it is not unlikely |
5244 |
12 Feb 10 |
nicklas |
35 |
that the BFS format can be used for other applications in the future. |
5244 |
12 Feb 10 |
nicklas |
36 |
</para> |
5244 |
12 Feb 10 |
nicklas |
37 |
|
5244 |
12 Feb 10 |
nicklas |
38 |
<sect2 id="fileformats.bfs.basics" > |
5244 |
12 Feb 10 |
nicklas |
39 |
<title>The basics of BFS</title> |
5244 |
12 Feb 10 |
nicklas |
40 |
|
5244 |
12 Feb 10 |
nicklas |
41 |
<para> |
5244 |
12 Feb 10 |
nicklas |
42 |
The idea is to use simple, plain-text files with data organised into |
5245 |
15 Feb 10 |
nicklas |
43 |
rows and columns. A single type of file may not be able to hold all |
5244 |
12 Feb 10 |
nicklas |
44 |
kinds of data, so to begin with we have defined three types of files: |
5244 |
12 Feb 10 |
nicklas |
45 |
</para> |
5244 |
12 Feb 10 |
nicklas |
46 |
|
5244 |
12 Feb 10 |
nicklas |
47 |
<itemizedlist> |
5244 |
12 Feb 10 |
nicklas |
48 |
<listitem> |
5244 |
12 Feb 10 |
nicklas |
49 |
<para> |
5244 |
12 Feb 10 |
nicklas |
50 |
Metadata files: Holds information about the data that is found |
5244 |
12 Feb 10 |
nicklas |
51 |
in the other files in the file set. |
5244 |
12 Feb 10 |
nicklas |
52 |
</para> |
5244 |
12 Feb 10 |
nicklas |
53 |
</listitem> |
5244 |
12 Feb 10 |
nicklas |
54 |
<listitem> |
5244 |
12 Feb 10 |
nicklas |
55 |
<para> |
5244 |
12 Feb 10 |
nicklas |
56 |
Annotation files: Column-based files that holds one record per |
5244 |
12 Feb 10 |
nicklas |
57 |
line. The first line is a header line. The remaining lines are |
5244 |
12 Feb 10 |
nicklas |
58 |
data lines identified by a unique positive ID value in |
5244 |
12 Feb 10 |
nicklas |
59 |
the first column. |
5244 |
12 Feb 10 |
nicklas |
60 |
</para> |
5244 |
12 Feb 10 |
nicklas |
61 |
</listitem> |
5244 |
12 Feb 10 |
nicklas |
62 |
<listitem> |
5244 |
12 Feb 10 |
nicklas |
63 |
<para> |
5244 |
12 Feb 10 |
nicklas |
64 |
Data files: Pure matrix data files without header lines or |
5244 |
12 Feb 10 |
nicklas |
65 |
ID columns. Data is usually identified by matching it line-by-line |
5244 |
12 Feb 10 |
nicklas |
66 |
with data in annotation files, or with information in the metadata file. |
5244 |
12 Feb 10 |
nicklas |
67 |
</para> |
5244 |
12 Feb 10 |
nicklas |
68 |
</listitem> |
5244 |
12 Feb 10 |
nicklas |
69 |
</itemizedlist> |
5244 |
12 Feb 10 |
nicklas |
70 |
|
5244 |
12 Feb 10 |
nicklas |
71 |
<sect3 id="fileformats.bfs.encoding"> |
5244 |
12 Feb 10 |
nicklas |
72 |
<title>Character encoding</title> |
5244 |
12 Feb 10 |
nicklas |
73 |
|
5244 |
12 Feb 10 |
nicklas |
74 |
<para> |
5244 |
12 Feb 10 |
nicklas |
75 |
All files are text-based and should use the UTF-8 character encoding. |
5244 |
12 Feb 10 |
nicklas |
76 |
A newline (<code>\n</code>) is used as a record separator and a tab |
5244 |
12 Feb 10 |
nicklas |
77 |
(<code>\t</code>) is used a column separator. Data that contains newline |
5244 |
12 Feb 10 |
nicklas |
78 |
or tab characters need to be escaped. A backslash (<code>\</code>) is used |
5244 |
12 Feb 10 |
nicklas |
79 |
to indicate the start of an escaped sequence. This means that the backslash |
5244 |
12 Feb 10 |
nicklas |
80 |
character must also be escaped. Since some editors includes a carriage |
5245 |
15 Feb 10 |
nicklas |
81 |
return (<code>\r</code>) in line breaks, we should also escape |
5244 |
12 Feb 10 |
nicklas |
82 |
carriage return. |
5244 |
12 Feb 10 |
nicklas |
83 |
</para> |
5244 |
12 Feb 10 |
nicklas |
84 |
|
5244 |
12 Feb 10 |
nicklas |
85 |
<table frame="all" id="fileformats.bfs.escape_table"> |
5244 |
12 Feb 10 |
nicklas |
86 |
<title>Escaped characters in the BFS format</title> |
5244 |
12 Feb 10 |
nicklas |
87 |
<tgroup cols="2" align="center"> |
5244 |
12 Feb 10 |
nicklas |
88 |
<colspec colname="character" /> |
5244 |
12 Feb 10 |
nicklas |
89 |
<colspec colname="escape" /> |
5244 |
12 Feb 10 |
nicklas |
90 |
<thead> |
5244 |
12 Feb 10 |
nicklas |
91 |
<row> |
5244 |
12 Feb 10 |
nicklas |
92 |
<entry>Character</entry> |
5244 |
12 Feb 10 |
nicklas |
93 |
<entry>Escape sequence</entry> |
5244 |
12 Feb 10 |
nicklas |
94 |
</row> |
5244 |
12 Feb 10 |
nicklas |
95 |
</thead> |
5244 |
12 Feb 10 |
nicklas |
96 |
<tbody> |
5244 |
12 Feb 10 |
nicklas |
97 |
<row> |
5244 |
12 Feb 10 |
nicklas |
98 |
<entry><code><backslash></code></entry> |
5244 |
12 Feb 10 |
nicklas |
99 |
<entry><code>\\</code></entry> |
5244 |
12 Feb 10 |
nicklas |
100 |
</row> |
5244 |
12 Feb 10 |
nicklas |
101 |
<row> |
5244 |
12 Feb 10 |
nicklas |
102 |
<entry><code><newline></code></entry> |
5244 |
12 Feb 10 |
nicklas |
103 |
<entry><code>\n</code></entry> |
5244 |
12 Feb 10 |
nicklas |
104 |
</row> |
5244 |
12 Feb 10 |
nicklas |
105 |
<row> |
5244 |
12 Feb 10 |
nicklas |
106 |
<entry><code><carriage return></code></entry> |
5244 |
12 Feb 10 |
nicklas |
107 |
<entry><code>\r</code></entry> |
5244 |
12 Feb 10 |
nicklas |
108 |
</row> |
5244 |
12 Feb 10 |
nicklas |
109 |
<row> |
5244 |
12 Feb 10 |
nicklas |
110 |
<entry><code><tab></code></entry> |
5244 |
12 Feb 10 |
nicklas |
111 |
<entry><code>\t</code></entry> |
5244 |
12 Feb 10 |
nicklas |
112 |
</row> |
5244 |
12 Feb 10 |
nicklas |
113 |
</tbody> |
5244 |
12 Feb 10 |
nicklas |
114 |
</tgroup> |
5244 |
12 Feb 10 |
nicklas |
115 |
</table> |
5244 |
12 Feb 10 |
nicklas |
116 |
|
5244 |
12 Feb 10 |
nicklas |
117 |
<para> |
5244 |
12 Feb 10 |
nicklas |
118 |
It is recommended that parsers are forgiving and if an invalid escape |
5244 |
12 Feb 10 |
nicklas |
119 |
sequence is found, eg. a backslash followed by anything else than |
5244 |
12 Feb 10 |
nicklas |
120 |
<code>\</code>, <code>n</code>, <code>r</code> or <code>t</code>, |
5244 |
12 Feb 10 |
nicklas |
121 |
the input is taken literally. Strict parsers may throw |
5244 |
12 Feb 10 |
nicklas |
122 |
exceptions or log warning messages. |
5244 |
12 Feb 10 |
nicklas |
123 |
</para> |
5244 |
12 Feb 10 |
nicklas |
124 |
</sect3> |
5244 |
12 Feb 10 |
nicklas |
125 |
|
5244 |
12 Feb 10 |
nicklas |
126 |
<sect3 id="fileformats.bfs.numbers"> |
5244 |
12 Feb 10 |
nicklas |
127 |
<title>Numerical values</title> |
5244 |
12 Feb 10 |
nicklas |
128 |
<para> |
5244 |
12 Feb 10 |
nicklas |
129 |
Numeric values should use dot (<code>.</code>) as decimal point. Scientific |
5244 |
12 Feb 10 |
nicklas |
130 |
notation is accepted. Null, NaN, Infinity, and other special values should all |
5244 |
12 Feb 10 |
nicklas |
131 |
be represented by empty string values. It is recommended that parsers |
5245 |
15 Feb 10 |
nicklas |
132 |
are forgiving and treat invalid numerical data as empty values. |
5244 |
12 Feb 10 |
nicklas |
133 |
</para> |
5244 |
12 Feb 10 |
nicklas |
134 |
</sect3> |
5244 |
12 Feb 10 |
nicklas |
135 |
|
5244 |
12 Feb 10 |
nicklas |
136 |
<sect3 id="fileformats.bfs.comments"> |
5244 |
12 Feb 10 |
nicklas |
137 |
<title>Comments and white-space</title> |
5244 |
12 Feb 10 |
nicklas |
138 |
<para> |
5244 |
12 Feb 10 |
nicklas |
139 |
Lines starting with <code>#</code> are comment lines and should be ignored. Empty |
5244 |
12 Feb 10 |
nicklas |
140 |
lines should also be ignored. A line that contains only white-space is |
5245 |
15 Feb 10 |
nicklas |
141 |
considered as empty. White-space=spaces, tabs and other characters that |
5244 |
12 Feb 10 |
nicklas |
142 |
matches <code>\s</code> in regular expressions. |
5244 |
12 Feb 10 |
nicklas |
143 |
</para> |
5244 |
12 Feb 10 |
nicklas |
144 |
|
5244 |
12 Feb 10 |
nicklas |
145 |
<note> |
5244 |
12 Feb 10 |
nicklas |
146 |
<para> |
5244 |
12 Feb 10 |
nicklas |
147 |
This can only be used in metadata files. |
5244 |
12 Feb 10 |
nicklas |
148 |
Annotation files and data files doesn't allow |
5244 |
12 Feb 10 |
nicklas |
149 |
comments or empty lines. |
5244 |
12 Feb 10 |
nicklas |
150 |
</para> |
5244 |
12 Feb 10 |
nicklas |
151 |
</note> |
5244 |
12 Feb 10 |
nicklas |
152 |
</sect3> |
5244 |
12 Feb 10 |
nicklas |
153 |
|
5244 |
12 Feb 10 |
nicklas |
154 |
<sect3 id="fileformats.bfs.metadata"> |
5244 |
12 Feb 10 |
nicklas |
155 |
<title>Metadata files</title> |
5244 |
12 Feb 10 |
nicklas |
156 |
|
5244 |
12 Feb 10 |
nicklas |
157 |
<para> |
5244 |
12 Feb 10 |
nicklas |
158 |
A BASE File Set usually contains one metadata file. This file contains |
5244 |
12 Feb 10 |
nicklas |
159 |
information about the other files that make up the file set. The |
5245 |
15 Feb 10 |
nicklas |
160 |
metadata file can also hold information that is specific to a |
5244 |
12 Feb 10 |
nicklas |
161 |
use case. |
5244 |
12 Feb 10 |
nicklas |
162 |
</para> |
5244 |
12 Feb 10 |
nicklas |
163 |
|
5244 |
12 Feb 10 |
nicklas |
164 |
<para> |
5244 |
12 Feb 10 |
nicklas |
165 |
A metadata file always starts with the beginning-of-file (BOF) marker |
5244 |
12 Feb 10 |
nicklas |
166 |
<code>BFSformat</code>, optionally followed by a tab and a value indicating the |
5244 |
12 Feb 10 |
nicklas |
167 |
subtype of the file. This must be the first line of the file. Comments |
5244 |
12 Feb 10 |
nicklas |
168 |
or empty lines are not allowed before the beginning-of-file marker. |
5244 |
12 Feb 10 |
nicklas |
169 |
</para> |
5244 |
12 Feb 10 |
nicklas |
170 |
|
5244 |
12 Feb 10 |
nicklas |
171 |
<para> |
5244 |
12 Feb 10 |
nicklas |
172 |
All data in a metadata file must be inside a section. A section is |
5244 |
12 Feb 10 |
nicklas |
173 |
started by surrounding a value in brackets on a line by it's own, |
5244 |
12 Feb 10 |
nicklas |
174 |
for example, <code>[my section]</code>. There is no restriction on the name of the |
5244 |
12 Feb 10 |
nicklas |
175 |
section as long as it is escaped using the normal rules. Note that |
5244 |
12 Feb 10 |
nicklas |
176 |
there is no need to escape brackets in the name. For example, |
5244 |
12 Feb 10 |
nicklas |
177 |
<code>[[a\\b]]</code> is a valid section with the name <code>[a\b]</code>. |
5244 |
12 Feb 10 |
nicklas |
178 |
Trailing white-space after the closing bracket is ignored. |
5244 |
12 Feb 10 |
nicklas |
179 |
</para> |
5244 |
12 Feb 10 |
nicklas |
180 |
|
5244 |
12 Feb 10 |
nicklas |
181 |
<para> |
5244 |
12 Feb 10 |
nicklas |
182 |
Multiple sections may have the same name, and the order of the |
5244 |
12 Feb 10 |
nicklas |
183 |
sections is usually of no concern. However, this may be restricted |
5244 |
12 Feb 10 |
nicklas |
184 |
in specific cases if there is need to, for example, require unique |
5244 |
12 Feb 10 |
nicklas |
185 |
section names or enforce a specific order. |
5244 |
12 Feb 10 |
nicklas |
186 |
Parsers are recommended to provide access to sections by name |
5244 |
12 Feb 10 |
nicklas |
187 |
and by ordinal number, starting at 0 and writers are recommended |
5244 |
12 Feb 10 |
nicklas |
188 |
to write sections in the order they are added. |
5244 |
12 Feb 10 |
nicklas |
189 |
</para> |
5244 |
12 Feb 10 |
nicklas |
190 |
|
5244 |
12 Feb 10 |
nicklas |
191 |
<para> |
5244 |
12 Feb 10 |
nicklas |
192 |
Each section contains data in the form of tab-separated |
5244 |
12 Feb 10 |
nicklas |
193 |
key-value pairs. Keys may not start with <code>#</code> or <code>[</code> |
5244 |
12 Feb 10 |
nicklas |
194 |
since this would interfere with comments and sections. Otherwise, the |
5244 |
12 Feb 10 |
nicklas |
195 |
normal escape rules should be used for both keys and values. |
5244 |
12 Feb 10 |
nicklas |
196 |
Values are allowed to use non-escaped tab characers, which makes |
5244 |
12 Feb 10 |
nicklas |
197 |
it possible to use vector-type values. |
5244 |
12 Feb 10 |
nicklas |
198 |
</para> |
5244 |
12 Feb 10 |
nicklas |
199 |
|
5244 |
12 Feb 10 |
nicklas |
200 |
<para> |
5244 |
12 Feb 10 |
nicklas |
201 |
A key doesn't have to be unique within a section, but specific use |
5244 |
12 Feb 10 |
nicklas |
202 |
cases may require this globally or on section-per-section basis. |
5244 |
12 Feb 10 |
nicklas |
203 |
The order of the keys are usually not important, except if the |
5244 |
12 Feb 10 |
nicklas |
204 |
use case requires it. |
5244 |
12 Feb 10 |
nicklas |
205 |
Parser implementations are recommended to provide access to |
5244 |
12 Feb 10 |
nicklas |
206 |
keys by name and by ordinal number, starting at 0. Generic writers |
5244 |
12 Feb 10 |
nicklas |
207 |
implementations are recommended to write keys and values in the order |
5244 |
12 Feb 10 |
nicklas |
208 |
they are added to each section. |
5244 |
12 Feb 10 |
nicklas |
209 |
</para> |
5244 |
12 Feb 10 |
nicklas |
210 |
|
5244 |
12 Feb 10 |
nicklas |
211 |
<para> |
5244 |
12 Feb 10 |
nicklas |
212 |
If the file set includes more files than the metadata file, those |
5244 |
12 Feb 10 |
nicklas |
213 |
files should be listed in the <code>[files]</code> section. Keys should be |
5245 |
15 Feb 10 |
nicklas |
214 |
unique, but there are no other restrictions. The value is the file name |
5245 |
15 Feb 10 |
nicklas |
215 |
without path information. The files are expected to be located in the same |
5245 |
15 Feb 10 |
nicklas |
216 |
container as the current metadata file. A container could for example be a |
5244 |
12 Feb 10 |
nicklas |
217 |
folder in the file system, a zip-file, or any other logical item |
5244 |
12 Feb 10 |
nicklas |
218 |
that group files. Metadata about the files and file content is not |
5244 |
12 Feb 10 |
nicklas |
219 |
part of the generic BFS specification. This is left to specific use cases. |
5244 |
12 Feb 10 |
nicklas |
220 |
</para> |
5244 |
12 Feb 10 |
nicklas |
221 |
|
5244 |
12 Feb 10 |
nicklas |
222 |
<note> |
5244 |
12 Feb 10 |
nicklas |
223 |
Files doesn't have to be other BFS file types. It can be any type |
5244 |
12 Feb 10 |
nicklas |
224 |
of files, like pdf files, images, etc. |
5244 |
12 Feb 10 |
nicklas |
225 |
</note> |
5244 |
12 Feb 10 |
nicklas |
226 |
|
5244 |
12 Feb 10 |
nicklas |
227 |
<example id="fileformats.bfs.metadata_example"> |
5244 |
12 Feb 10 |
nicklas |
228 |
<title>Example BFS metadata file</title> |
5244 |
12 Feb 10 |
nicklas |
229 |
<programlisting> |
5245 |
15 Feb 10 |
nicklas |
230 |
BFSformat subtype |
5245 |
15 Feb 10 |
nicklas |
231 |
# The 'BFSformat' must be on the the first line, subtype is optional |
5244 |
12 Feb 10 |
nicklas |
232 |
# A comment line starts with '#'. Empty lines are ignored |
5244 |
12 Feb 10 |
nicklas |
233 |
|
5244 |
12 Feb 10 |
nicklas |
234 |
# A section is started by enclosing the section name in brackets |
5244 |
12 Feb 10 |
nicklas |
235 |
# Section entries are key/value pairs separated by tab |
5244 |
12 Feb 10 |
nicklas |
236 |
# Vector-type values are allowed. Duplicate keys may or may |
5244 |
12 Feb 10 |
nicklas |
237 |
# not be allowed depending on the use case. |
5244 |
12 Feb 10 |
nicklas |
238 |
[settings] |
5244 |
12 Feb 10 |
nicklas |
239 |
key-1 value1 |
5244 |
12 Feb 10 |
nicklas |
240 |
key-2 value2a value2b |
5244 |
12 Feb 10 |
nicklas |
241 |
|
5244 |
12 Feb 10 |
nicklas |
242 |
# The 'files' section points to additional files in the file set |
5244 |
12 Feb 10 |
nicklas |
243 |
# Keys should be unique |
5244 |
12 Feb 10 |
nicklas |
244 |
[files] |
5245 |
15 Feb 10 |
nicklas |
245 |
report report.txt |
5244 |
12 Feb 10 |
nicklas |
246 |
table tabla-data.txt |
5244 |
12 Feb 10 |
nicklas |
247 |
plot plotted-data.png |
5244 |
12 Feb 10 |
nicklas |
248 |
</programlisting> |
5244 |
12 Feb 10 |
nicklas |
249 |
</example> |
5244 |
12 Feb 10 |
nicklas |
250 |
|
5244 |
12 Feb 10 |
nicklas |
251 |
</sect3> |
5244 |
12 Feb 10 |
nicklas |
252 |
|
5244 |
12 Feb 10 |
nicklas |
253 |
<sect3 id="fileformats.bfs.annotations"> |
5244 |
12 Feb 10 |
nicklas |
254 |
<title>Annotation files</title> |
5244 |
12 Feb 10 |
nicklas |
255 |
|
5244 |
12 Feb 10 |
nicklas |
256 |
<para> |
5244 |
12 Feb 10 |
nicklas |
257 |
The first line is a header line containing the column names for each column. |
5245 |
15 Feb 10 |
nicklas |
258 |
The first column is required and must always be <code>ID</code>. Other columns |
5244 |
12 Feb 10 |
nicklas |
259 |
are optional, but must have unique names. Column names are separated with |
5244 |
12 Feb 10 |
nicklas |
260 |
tabs and are encoded using the normal rules. All other lines are data lines. |
5244 |
12 Feb 10 |
nicklas |
261 |
Each line must have <emphasis>exactly the same number of columns</emphasis> |
5244 |
12 Feb 10 |
nicklas |
262 |
as the header line. Comment lines and empty lines are not supported, but |
5244 |
12 Feb 10 |
nicklas |
263 |
a column may have an empty value. |
5244 |
12 Feb 10 |
nicklas |
264 |
</para> |
5244 |
12 Feb 10 |
nicklas |
265 |
|
5244 |
12 Feb 10 |
nicklas |
266 |
<para> |
5244 |
12 Feb 10 |
nicklas |
267 |
The ID column holds a unique identifier used internally by BASE. A given ID |
5244 |
12 Feb 10 |
nicklas |
268 |
should only be used once and may not be repeated later in the file. The ID |
5244 |
12 Feb 10 |
nicklas |
269 |
is a numeric positive integer value. Zero, negative or empty values are not |
5244 |
12 Feb 10 |
nicklas |
270 |
allowed. There is no special ordering (unless a specific use-case require this). |
5244 |
12 Feb 10 |
nicklas |
271 |
Note that the ID values are not indexes. They don't have to start at 1 and |
5244 |
12 Feb 10 |
nicklas |
272 |
there may be "holes" in the range of values used. Some use-cases may use ID |
5244 |
12 Feb 10 |
nicklas |
273 |
values with some specific meaning, other use-cases may simple enumerate the |
5244 |
12 Feb 10 |
nicklas |
274 |
rows using a counter. |
5244 |
12 Feb 10 |
nicklas |
275 |
</para> |
5244 |
12 Feb 10 |
nicklas |
276 |
|
5244 |
12 Feb 10 |
nicklas |
277 |
</sect3> |
5244 |
12 Feb 10 |
nicklas |
278 |
|
5244 |
12 Feb 10 |
nicklas |
279 |
<sect3 id="fileformats.bfs.matrixdata"> |
5245 |
15 Feb 10 |
nicklas |
280 |
<title>Data files</title> |
5244 |
12 Feb 10 |
nicklas |
281 |
|
5244 |
12 Feb 10 |
nicklas |
282 |
<para> |
5244 |
12 Feb 10 |
nicklas |
283 |
A data file is a matrix containing one data value for each row-column |
5244 |
12 Feb 10 |
nicklas |
284 |
element. Data starts on the first line. There is no header line. |
5244 |
12 Feb 10 |
nicklas |
285 |
All data lines <emphasis>must have the same number of columns</emphasis>. |
5244 |
12 Feb 10 |
nicklas |
286 |
The number of rows and columns and their order are defined by other, |
5244 |
12 Feb 10 |
nicklas |
287 |
use-case specfic, information in the metadata file or in annotation file(s). |
5244 |
12 Feb 10 |
nicklas |
288 |
Comment lines and empty lines are not supported, but a column may hold an |
5244 |
12 Feb 10 |
nicklas |
289 |
empty value. |
5244 |
12 Feb 10 |
nicklas |
290 |
</para> |
5244 |
12 Feb 10 |
nicklas |
291 |
|
5244 |
12 Feb 10 |
nicklas |
292 |
</sect3> |
5244 |
12 Feb 10 |
nicklas |
293 |
</sect2> |
5244 |
12 Feb 10 |
nicklas |
294 |
|
5244 |
12 Feb 10 |
nicklas |
295 |
<sect2 id="fileformats.bfs.spotdata"> |
5245 |
15 Feb 10 |
nicklas |
296 |
<title>Using BFS for spotdata to and from external plug-ins</title> |
5244 |
12 Feb 10 |
nicklas |
297 |
|
5245 |
15 Feb 10 |
nicklas |
298 |
<para> |
5245 |
15 Feb 10 |
nicklas |
299 |
The use case is to use BFS to transport data to and from |
5245 |
15 Feb 10 |
nicklas |
300 |
an external analysis plug-in. The general outline is: |
5245 |
15 Feb 10 |
nicklas |
301 |
</para> |
5244 |
12 Feb 10 |
nicklas |
302 |
|
5245 |
15 Feb 10 |
nicklas |
303 |
<orderedlist> |
5245 |
15 Feb 10 |
nicklas |
304 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
305 |
<para>Export bioassay set data to BFS.</para> |
5245 |
15 Feb 10 |
nicklas |
306 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
307 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
308 |
<para> |
5245 |
15 Feb 10 |
nicklas |
309 |
Execute the external plug-in which process the data |
5245 |
15 Feb 10 |
nicklas |
310 |
and generates a new BFS. |
5245 |
15 Feb 10 |
nicklas |
311 |
</para> |
5245 |
15 Feb 10 |
nicklas |
312 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
313 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
314 |
<para>Import the transformed data to BASE.</para> |
5245 |
15 Feb 10 |
nicklas |
315 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
316 |
</orderedlist> |
5244 |
12 Feb 10 |
nicklas |
317 |
|
5245 |
15 Feb 10 |
nicklas |
318 |
<para> |
5245 |
15 Feb 10 |
nicklas |
319 |
The export will generate at least two files. One metadata file |
5245 |
15 Feb 10 |
nicklas |
320 |
and one data file. It is also possible to export reporter and |
5245 |
15 Feb 10 |
nicklas |
321 |
assay annotations if the plug-in needs it. Note that reporter |
5245 |
15 Feb 10 |
nicklas |
322 |
and assay annotation files are always needed if new spot data |
5245 |
15 Feb 10 |
nicklas |
323 |
is going to be imported so in most cases at least four files |
5245 |
15 Feb 10 |
nicklas |
324 |
will be created. |
5245 |
15 Feb 10 |
nicklas |
325 |
</para> |
5245 |
15 Feb 10 |
nicklas |
326 |
|
5245 |
15 Feb 10 |
nicklas |
327 |
<sect3 id="fileformats.bfs.spotdata.metadata"> |
5245 |
15 Feb 10 |
nicklas |
328 |
<title>The metadata file</title> |
5245 |
15 Feb 10 |
nicklas |
329 |
|
5245 |
15 Feb 10 |
nicklas |
330 |
<para> |
5245 |
15 Feb 10 |
nicklas |
331 |
There are two subtypes: |
5245 |
15 Feb 10 |
nicklas |
332 |
</para> |
5245 |
15 Feb 10 |
nicklas |
333 |
|
5245 |
15 Feb 10 |
nicklas |
334 |
<itemizedlist> |
5245 |
15 Feb 10 |
nicklas |
335 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
336 |
<para> |
5245 |
15 Feb 10 |
nicklas |
337 |
serial: One data file is required for each assay. The columns |
5245 |
15 Feb 10 |
nicklas |
338 |
in the data files represents different spot data values, eg. |
5245 |
15 Feb 10 |
nicklas |
339 |
first column = Ch 1, second column = Ch 2, etc. |
5245 |
15 Feb 10 |
nicklas |
340 |
</para> |
5245 |
15 Feb 10 |
nicklas |
341 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
342 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
343 |
<para> |
5245 |
15 Feb 10 |
nicklas |
344 |
matrix: One data file is required for each spot data value. The |
5245 |
15 Feb 10 |
nicklas |
345 |
columns in the data files represents assays. |
5245 |
15 Feb 10 |
nicklas |
346 |
</para> |
5245 |
15 Feb 10 |
nicklas |
347 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
348 |
</itemizedlist> |
5245 |
15 Feb 10 |
nicklas |
349 |
|
5245 |
15 Feb 10 |
nicklas |
350 |
<para> |
5245 |
15 Feb 10 |
nicklas |
351 |
For both subtypes the <code>[files]</code> section is used |
5245 |
15 Feb 10 |
nicklas |
352 |
to name the files holding data and annotations. The following |
5245 |
15 Feb 10 |
nicklas |
353 |
entries should be used: |
5245 |
15 Feb 10 |
nicklas |
354 |
</para> |
5245 |
15 Feb 10 |
nicklas |
355 |
|
5245 |
15 Feb 10 |
nicklas |
356 |
<itemizedlist> |
5245 |
15 Feb 10 |
nicklas |
357 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
358 |
<para> |
5245 |
15 Feb 10 |
nicklas |
359 |
rdata: The filename of the file containing reporter annotations |
5245 |
15 Feb 10 |
nicklas |
360 |
</para> |
5245 |
15 Feb 10 |
nicklas |
361 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
362 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
363 |
<para> |
5245 |
15 Feb 10 |
nicklas |
364 |
pdata: The filename of the file containing assay annotations |
5245 |
15 Feb 10 |
nicklas |
365 |
</para> |
5245 |
15 Feb 10 |
nicklas |
366 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
367 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
368 |
<para> |
5245 |
15 Feb 10 |
nicklas |
369 |
sdata1, sdata2, ..., sdataN: N entries, numbered from 1 to N, |
5245 |
15 Feb 10 |
nicklas |
370 |
with the filenames of the files containing spot data. If the |
5245 |
15 Feb 10 |
nicklas |
371 |
serial subtype is used there should be one file for each assay |
5245 |
15 Feb 10 |
nicklas |
372 |
in the bioassayset. If the matrix subtype is used, there should |
5245 |
15 Feb 10 |
nicklas |
373 |
be one file for each entry in the <code>[sdata]</code> section. |
5245 |
15 Feb 10 |
nicklas |
374 |
</para> |
5245 |
15 Feb 10 |
nicklas |
375 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
376 |
</itemizedlist> |
5245 |
15 Feb 10 |
nicklas |
377 |
|
5245 |
15 Feb 10 |
nicklas |
378 |
<para> |
5245 |
15 Feb 10 |
nicklas |
379 |
Other files may be included if they use <code>x-</code> as a prefix. |
5245 |
15 Feb 10 |
nicklas |
380 |
</para> |
5245 |
15 Feb 10 |
nicklas |
381 |
|
5245 |
15 Feb 10 |
nicklas |
382 |
<para>Example:</para> |
5245 |
15 Feb 10 |
nicklas |
383 |
|
5245 |
15 Feb 10 |
nicklas |
384 |
<programlisting> |
5245 |
15 Feb 10 |
nicklas |
385 |
BFSformat serial |
5245 |
15 Feb 10 |
nicklas |
386 |
[files] |
5245 |
15 Feb 10 |
nicklas |
387 |
rdata reporters.txt |
5245 |
15 Feb 10 |
nicklas |
388 |
pdata assays.txt |
5245 |
15 Feb 10 |
nicklas |
389 |
sdata1 Assay 1.txt |
5245 |
15 Feb 10 |
nicklas |
390 |
sdata2 Assay 2.txt |
5245 |
15 Feb 10 |
nicklas |
391 |
x-custom custom.txt |
5245 |
15 Feb 10 |
nicklas |
392 |
</programlisting> |
5245 |
15 Feb 10 |
nicklas |
393 |
|
5245 |
15 Feb 10 |
nicklas |
394 |
<para> |
5245 |
15 Feb 10 |
nicklas |
395 |
The <code>[sdata]</code> section contains information |
5245 |
15 Feb 10 |
nicklas |
396 |
about the spot data that is found in the <code>sdataX</code> |
5245 |
15 Feb 10 |
nicklas |
397 |
files. The key of each entry is the name or title of the data |
5245 |
15 Feb 10 |
nicklas |
398 |
that is exported. The value describes the data type and can be |
5245 |
15 Feb 10 |
nicklas |
399 |
either <code>text</code>, <code>float</code> or <code>int</code>. |
5245 |
15 Feb 10 |
nicklas |
400 |
</para> |
5245 |
15 Feb 10 |
nicklas |
401 |
|
5245 |
15 Feb 10 |
nicklas |
402 |
<para> |
5245 |
15 Feb 10 |
nicklas |
403 |
The order in this section is important. If the matrix |
5245 |
15 Feb 10 |
nicklas |
404 |
subtype is used, the entries in this section must match the |
5245 |
15 Feb 10 |
nicklas |
405 |
<code>sdataX</code> entries in the <code>[files]</code> section. |
5245 |
15 Feb 10 |
nicklas |
406 |
Eg. the data that corresponds to the first entry in this section |
5245 |
15 Feb 10 |
nicklas |
407 |
is found in the <code>sdata1</code> file. The number of entries |
5245 |
15 Feb 10 |
nicklas |
408 |
in this section must be the same as the number of <code>sdataX</code> |
5245 |
15 Feb 10 |
nicklas |
409 |
entries in the <code>[files]</code> section. |
5245 |
15 Feb 10 |
nicklas |
410 |
</para> |
5245 |
15 Feb 10 |
nicklas |
411 |
|
5245 |
15 Feb 10 |
nicklas |
412 |
<para> |
5245 |
15 Feb 10 |
nicklas |
413 |
If the serial subtype is used the entries in this section must |
5245 |
15 Feb 10 |
nicklas |
414 |
match the column order in each of the <code>sdataX</code> files. |
5245 |
15 Feb 10 |
nicklas |
415 |
Eg. the data that corresponds to the first entry in this section |
5245 |
15 Feb 10 |
nicklas |
416 |
is found in the first column in all <code>sdataX</code> |
5245 |
15 Feb 10 |
nicklas |
417 |
files. The number of entries in this section must match the number of |
5245 |
15 Feb 10 |
nicklas |
418 |
columns in the <code>sdataX</code> files. |
5245 |
15 Feb 10 |
nicklas |
419 |
</para> |
5245 |
15 Feb 10 |
nicklas |
420 |
|
5245 |
15 Feb 10 |
nicklas |
421 |
<para>Example:</para> |
5245 |
15 Feb 10 |
nicklas |
422 |
|
5245 |
15 Feb 10 |
nicklas |
423 |
<programlisting> |
5245 |
15 Feb 10 |
nicklas |
424 |
[sdata] |
5245 |
15 Feb 10 |
nicklas |
425 |
Ch 1 float |
5245 |
15 Feb 10 |
nicklas |
426 |
Ch 2 float |
5245 |
15 Feb 10 |
nicklas |
427 |
Weight float |
5245 |
15 Feb 10 |
nicklas |
428 |
Flag int |
5245 |
15 Feb 10 |
nicklas |
429 |
</programlisting> |
5245 |
15 Feb 10 |
nicklas |
430 |
|
5245 |
15 Feb 10 |
nicklas |
431 |
<para> |
5245 |
15 Feb 10 |
nicklas |
432 |
The <code>[parameters]</code> section contains extra parameters |
5245 |
15 Feb 10 |
nicklas |
433 |
needed by the plug-in. Keys and values are defined by the plug-in |
5245 |
15 Feb 10 |
nicklas |
434 |
and/or job configuration. Duplicate keys are not allowed, and order |
5245 |
15 Feb 10 |
nicklas |
435 |
is not important. Multiple values for the same parameter are separated |
5245 |
15 Feb 10 |
nicklas |
436 |
with a tab character. |
5245 |
15 Feb 10 |
nicklas |
437 |
</para> |
5245 |
15 Feb 10 |
nicklas |
438 |
|
5245 |
15 Feb 10 |
nicklas |
439 |
<para>Example:</para> |
5245 |
15 Feb 10 |
nicklas |
440 |
|
5245 |
15 Feb 10 |
nicklas |
441 |
<programlisting> |
5245 |
15 Feb 10 |
nicklas |
442 |
[parameters] |
5245 |
15 Feb 10 |
nicklas |
443 |
beta 0.5 |
5245 |
15 Feb 10 |
nicklas |
444 |
length 100 |
5245 |
15 Feb 10 |
nicklas |
445 |
vector 10 10.3 23 |
5245 |
15 Feb 10 |
nicklas |
446 |
median true |
5245 |
15 Feb 10 |
nicklas |
447 |
</programlisting> |
5245 |
15 Feb 10 |
nicklas |
448 |
|
5245 |
15 Feb 10 |
nicklas |
449 |
|
5245 |
15 Feb 10 |
nicklas |
450 |
</sect3> |
5245 |
15 Feb 10 |
nicklas |
451 |
|
5245 |
15 Feb 10 |
nicklas |
452 |
<sect3 id="fileformats.bfs.spotdata.annotations"> |
5245 |
15 Feb 10 |
nicklas |
453 |
<title>Reporter and assay annotations</title> |
5245 |
15 Feb 10 |
nicklas |
454 |
|
5245 |
15 Feb 10 |
nicklas |
455 |
<para> |
5245 |
15 Feb 10 |
nicklas |
456 |
The file used for reporter annotations is given by the <code>rdata</code> |
5245 |
15 Feb 10 |
nicklas |
457 |
entry in the <code>[files]</code> section. This file is optional when exporting |
5245 |
15 Feb 10 |
nicklas |
458 |
but required when importing. The only required column is the <code>ID</code> |
5245 |
15 Feb 10 |
nicklas |
459 |
column, which holds the internal spot position values. All <code>sdataX</code> |
5245 |
15 Feb 10 |
nicklas |
460 |
files must have the same number of rows as this file (not counting the |
5245 |
15 Feb 10 |
nicklas |
461 |
header line) and data should be sorted in the same order. Additional columns may |
5245 |
15 Feb 10 |
nicklas |
462 |
be included in the export. |
5245 |
15 Feb 10 |
nicklas |
463 |
</para> |
5245 |
15 Feb 10 |
nicklas |
464 |
|
5245 |
15 Feb 10 |
nicklas |
465 |
<para> |
5245 |
15 Feb 10 |
nicklas |
466 |
Note that the same underlying reporter may be assigned to more than one |
5245 |
15 Feb 10 |
nicklas |
467 |
position. If the plug-in needs to operate on merged-per-reporter data |
5245 |
15 Feb 10 |
nicklas |
468 |
the export should include either the internal or external reporter id in |
5245 |
15 Feb 10 |
nicklas |
469 |
an additional column so that the plug-in can use this information to |
5245 |
15 Feb 10 |
nicklas |
470 |
determine what should be merged. The exporter has no support for exporting |
5245 |
15 Feb 10 |
nicklas |
471 |
merged data. |
5245 |
15 Feb 10 |
nicklas |
472 |
</para> |
5245 |
15 Feb 10 |
nicklas |
473 |
|
5245 |
15 Feb 10 |
nicklas |
474 |
<para> |
5245 |
15 Feb 10 |
nicklas |
475 |
The file used for assay annotations is given by the <code>pdata</code> |
5245 |
15 Feb 10 |
nicklas |
476 |
entry in the <code>[files]</code> section. This file is optional when |
5245 |
15 Feb 10 |
nicklas |
477 |
exporting but required when importing. The only required |
5245 |
15 Feb 10 |
nicklas |
478 |
column is the ID column, which holds the interal bioassay id values. |
5245 |
15 Feb 10 |
nicklas |
479 |
If the matrix subtype is used the columns in the <code>sdataX</code> |
5245 |
15 Feb 10 |
nicklas |
480 |
files must be in the same order as the assays appear in this file. The |
5245 |
15 Feb 10 |
nicklas |
481 |
number of columns in the data files must be the same as the number of rows |
5245 |
15 Feb 10 |
nicklas |
482 |
in this file (not counting the header line). |
5245 |
15 Feb 10 |
nicklas |
483 |
</para> |
5245 |
15 Feb 10 |
nicklas |
484 |
|
5245 |
15 Feb 10 |
nicklas |
485 |
<para> |
5245 |
15 Feb 10 |
nicklas |
486 |
If the serial subtype is used, the <code>sdata1</code> file has data |
5245 |
15 Feb 10 |
nicklas |
487 |
for the assay that is described in the first line in this file, the |
5245 |
15 Feb 10 |
nicklas |
488 |
<code>sdata2</code> file has data for the second assay, etc. The number |
5245 |
15 Feb 10 |
nicklas |
489 |
of data files must match the number of lines in this file. |
5245 |
15 Feb 10 |
nicklas |
490 |
</para> |
5245 |
15 Feb 10 |
nicklas |
491 |
|
5245 |
15 Feb 10 |
nicklas |
492 |
</sect3> |
5245 |
15 Feb 10 |
nicklas |
493 |
<sect3 id="fileformats.bfs.spotdata.data"> |
5245 |
15 Feb 10 |
nicklas |
494 |
<title>Data files</title> |
5245 |
15 Feb 10 |
nicklas |
495 |
|
5245 |
15 Feb 10 |
nicklas |
496 |
<para> |
5245 |
15 Feb 10 |
nicklas |
497 |
Data files contains data in matrix format. More than one data file may be |
5245 |
15 Feb 10 |
nicklas |
498 |
required. The organisation of the data depends on the BFS subtype. In |
5245 |
15 Feb 10 |
nicklas |
499 |
both subtypes the number and order of the rows must match the number |
5245 |
15 Feb 10 |
nicklas |
500 |
and order of rows in the reporter annotations file. |
5245 |
15 Feb 10 |
nicklas |
501 |
|
5245 |
15 Feb 10 |
nicklas |
502 |
</para> |
5245 |
15 Feb 10 |
nicklas |
503 |
|
5245 |
15 Feb 10 |
nicklas |
504 |
<para> |
5245 |
15 Feb 10 |
nicklas |
505 |
If the matrix subtype is used, the columns in the data files corresponds |
5245 |
15 Feb 10 |
nicklas |
506 |
to assays. The number of columns and their order must match the lines |
5245 |
15 Feb 10 |
nicklas |
507 |
in the assay annotations file. The number of data files and their content |
5245 |
15 Feb 10 |
nicklas |
508 |
is defined by the entries in the <code>[sdata]</code> section. |
5245 |
15 Feb 10 |
nicklas |
509 |
</para> |
5245 |
15 Feb 10 |
nicklas |
510 |
|
5245 |
15 Feb 10 |
nicklas |
511 |
<para> |
5245 |
15 Feb 10 |
nicklas |
512 |
If the serial subtype is used, the the number of columns and their order |
5245 |
15 Feb 10 |
nicklas |
513 |
must match the entries in the <code>[sdata]</code> section. Each data |
5245 |
15 Feb 10 |
nicklas |
514 |
file has data from one assay. The number of sdata files in the |
5245 |
15 Feb 10 |
nicklas |
515 |
<code>[files]</code> section must match the number of lines in the |
5245 |
15 Feb 10 |
nicklas |
516 |
assay annotations file. |
5245 |
15 Feb 10 |
nicklas |
517 |
</para> |
5245 |
15 Feb 10 |
nicklas |
518 |
|
5245 |
15 Feb 10 |
nicklas |
519 |
</sect3> |
5245 |
15 Feb 10 |
nicklas |
520 |
|
5245 |
15 Feb 10 |
nicklas |
521 |
<sect3 id="fileformats.bfs.spotdata.import"> |
5245 |
15 Feb 10 |
nicklas |
522 |
<title>Importing spot data</title> |
5245 |
15 Feb 10 |
nicklas |
523 |
|
5245 |
15 Feb 10 |
nicklas |
524 |
<para> |
5245 |
15 Feb 10 |
nicklas |
525 |
The above information is mostly true for both export and import, but |
5245 |
15 Feb 10 |
nicklas |
526 |
there are a few additional things that a plug-in should know about when |
5245 |
15 Feb 10 |
nicklas |
527 |
generating data that is going to be imported. The most important |
5319 |
20 Apr 10 |
nicklas |
528 |
thing is that both reporter and assay annotation files are required |
5319 |
20 Apr 10 |
nicklas |
529 |
for importing spot data. If the program only generates extra files |
5319 |
20 Apr 10 |
nicklas |
530 |
the <code>[sdata]</code> section should not be included and no |
5319 |
20 Apr 10 |
nicklas |
531 |
data or annoatation files are need. |
5245 |
15 Feb 10 |
nicklas |
532 |
All files are specified in the <code>[files]</code> section in the |
5245 |
15 Feb 10 |
nicklas |
533 |
same way as for the export. File entries starting with <code>x-</code> |
5245 |
15 Feb 10 |
nicklas |
534 |
will be uploaded to BASE and linked with the new bioassay set. |
5245 |
15 Feb 10 |
nicklas |
535 |
</para> |
5245 |
15 Feb 10 |
nicklas |
536 |
|
5245 |
15 Feb 10 |
nicklas |
537 |
<note> |
5245 |
15 Feb 10 |
nicklas |
538 |
<para> |
5245 |
15 Feb 10 |
nicklas |
539 |
The importer currently supports importing spot data intensity |
5319 |
20 Apr 10 |
nicklas |
540 |
values and extra files. Position/reporter mapping and child/parent |
5319 |
20 Apr 10 |
nicklas |
541 |
assay mapping may remain the same or they may be changed. The importer can |
5245 |
15 Feb 10 |
nicklas |
542 |
also upload additional files generated by the plug-in, for |
5245 |
15 Feb 10 |
nicklas |
543 |
example plots. The importer has no support for importing |
5245 |
15 Feb 10 |
nicklas |
544 |
extra values, reporter lists or annotations. |
5245 |
15 Feb 10 |
nicklas |
545 |
</para> |
5245 |
15 Feb 10 |
nicklas |
546 |
</note> |
5245 |
15 Feb 10 |
nicklas |
547 |
|
5245 |
15 Feb 10 |
nicklas |
548 |
<para> |
5245 |
15 Feb 10 |
nicklas |
549 |
In the metadata file, a <code>[settings]</code> section may be included |
5245 |
15 Feb 10 |
nicklas |
550 |
to control certain aspects of the import. The following entries can be |
5245 |
15 Feb 10 |
nicklas |
551 |
used: |
5245 |
15 Feb 10 |
nicklas |
552 |
</para> |
5245 |
15 Feb 10 |
nicklas |
553 |
|
5245 |
15 Feb 10 |
nicklas |
554 |
<itemizedlist> |
5245 |
15 Feb 10 |
nicklas |
555 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
556 |
<para> |
5245 |
15 Feb 10 |
nicklas |
557 |
<code>new-data-cube</code>: If this is set, the data is imported into a new |
5245 |
15 Feb 10 |
nicklas |
558 |
data cube. A new data cube is needed whenever the position/reporter |
5245 |
15 Feb 10 |
nicklas |
559 |
mappings has changed or when parent assays has been merged. This |
5245 |
15 Feb 10 |
nicklas |
560 |
setting requires that the reporter annotations file contains |
5245 |
15 Feb 10 |
nicklas |
561 |
information about the new mapping. It needs to include either |
5245 |
15 Feb 10 |
nicklas |
562 |
<code>Internal ID</code> or <code>External ID</code> columns so |
5245 |
15 Feb 10 |
nicklas |
563 |
that the importer can map the new position to the correct reporter. |
5245 |
15 Feb 10 |
nicklas |
564 |
The reporter must already exist in the database. The position values |
5245 |
15 Feb 10 |
nicklas |
565 |
have no relation to the position values in the old bioassay set. We |
5245 |
15 Feb 10 |
nicklas |
566 |
recommend that a plug-in simply starts enumerates the lines starting at |
5245 |
15 Feb 10 |
nicklas |
567 |
1. |
5245 |
15 Feb 10 |
nicklas |
568 |
</para> |
5245 |
15 Feb 10 |
nicklas |
569 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
570 |
|
5245 |
15 Feb 10 |
nicklas |
571 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
572 |
<para> |
5245 |
15 Feb 10 |
nicklas |
573 |
<code>multi-assay-parents</code>: If this is set, a child assay may have |
5245 |
15 Feb 10 |
nicklas |
574 |
more than one parent assay (for example, due to a merge). A new data |
5245 |
15 Feb 10 |
nicklas |
575 |
cube is needed and this setting is ignored unless |
5245 |
15 Feb 10 |
nicklas |
576 |
<code>new-data-cube</code> is also set. This setting requires that the |
5245 |
15 Feb 10 |
nicklas |
577 |
assay annotations file has a <code>Parent ID</code> column which |
5245 |
15 Feb 10 |
nicklas |
578 |
holds a comma-separated list with the ID:s of the parent assays. |
5245 |
15 Feb 10 |
nicklas |
579 |
</para> |
5245 |
15 Feb 10 |
nicklas |
580 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
581 |
|
5245 |
15 Feb 10 |
nicklas |
582 |
<listitem> |
5245 |
15 Feb 10 |
nicklas |
583 |
<para> |
5245 |
15 Feb 10 |
nicklas |
584 |
<code>transform</code>: If not specified, the child spot data is |
5245 |
15 Feb 10 |
nicklas |
585 |
assumed to use the same intensity transform as the parent data. To force |
5245 |
15 Feb 10 |
nicklas |
586 |
a specific a specific intensity transform for the child bioassay set |
5245 |
15 Feb 10 |
nicklas |
587 |
include this setting and choose one fo the values: none, log2, log10. |
5245 |
15 Feb 10 |
nicklas |
588 |
</para> |
5245 |
15 Feb 10 |
nicklas |
589 |
</listitem> |
5245 |
15 Feb 10 |
nicklas |
590 |
|
5245 |
15 Feb 10 |
nicklas |
591 |
</itemizedlist> |
5245 |
15 Feb 10 |
nicklas |
592 |
|
5245 |
15 Feb 10 |
nicklas |
593 |
<para> |
5319 |
20 Apr 10 |
nicklas |
594 |
In the metadata file, the precense of an <code>[sdata]</code> section |
5319 |
20 Apr 10 |
nicklas |
595 |
indicates that spot data should be imported. If this section is not |
5319 |
20 Apr 10 |
nicklas |
596 |
present only extra files are uploaded to BASE and they are attached to |
5319 |
20 Apr 10 |
nicklas |
597 |
the transformation instead of a child bioassay set. If the <code>[sdata]</code> |
5319 |
20 Apr 10 |
nicklas |
598 |
section is present it must include one entry for each channel with names like, |
5319 |
20 Apr 10 |
nicklas |
599 |
<code>Ch 1</code>, <code>Ch 2</code>, and so on. The value is always |
5319 |
20 Apr 10 |
nicklas |
600 |
<code>float</code>. All other entries in this section are ignored. |
5245 |
15 Feb 10 |
nicklas |
601 |
</para> |
5245 |
15 Feb 10 |
nicklas |
602 |
|
5245 |
15 Feb 10 |
nicklas |
603 |
<para> |
5245 |
15 Feb 10 |
nicklas |
604 |
In the reporter annotations file, the <code>ID</code> column should hold |
5245 |
15 Feb 10 |
nicklas |
605 |
the position values. Values must be positive integers and |
5245 |
15 Feb 10 |
nicklas |
606 |
duplicates are not allowed. The order of the values doesn't |
5245 |
15 Feb 10 |
nicklas |
607 |
matter. If importing data to a new data cube the reporter annotations |
5245 |
15 Feb 10 |
nicklas |
608 |
file also needs either <code>Internal ID</code> or <code>External ID</code> |
5245 |
15 Feb 10 |
nicklas |
609 |
columns. |
5245 |
15 Feb 10 |
nicklas |
610 |
</para> |
5245 |
15 Feb 10 |
nicklas |
611 |
|
5245 |
15 Feb 10 |
nicklas |
612 |
<para> |
5245 |
15 Feb 10 |
nicklas |
613 |
In the assay annotations file, the <code>ID</code> column usually holds the |
5245 |
15 Feb 10 |
nicklas |
614 |
internal assay id of the parent assay. The exception is if the |
5245 |
15 Feb 10 |
nicklas |
615 |
<code>multi-assay-parents</code> options has been enabled. In this |
5245 |
15 Feb 10 |
nicklas |
616 |
case the id values have no special meaning, but the <code>Parent ID</code> |
5245 |
15 Feb 10 |
nicklas |
617 |
column must have a comma-separated list with id values instead. |
5245 |
15 Feb 10 |
nicklas |
618 |
</para> |
5245 |
15 Feb 10 |
nicklas |
619 |
<para> |
5245 |
15 Feb 10 |
nicklas |
620 |
The assay annotations file may optionally have a <code>Name</code> column. |
5245 |
15 Feb 10 |
nicklas |
621 |
If present, the values in this columns are used as names on the child assays. |
5245 |
15 Feb 10 |
nicklas |
622 |
Otherwise, they are given default names (usually the same name as the |
5245 |
15 Feb 10 |
nicklas |
623 |
parent assay). |
5245 |
15 Feb 10 |
nicklas |
624 |
</para> |
5245 |
15 Feb 10 |
nicklas |
625 |
|
5245 |
15 Feb 10 |
nicklas |
626 |
</sect3> |
5244 |
12 Feb 10 |
nicklas |
627 |
</sect2> |
5244 |
12 Feb 10 |
nicklas |
628 |
|
5244 |
12 Feb 10 |
nicklas |
629 |
</sect1> |
5244 |
12 Feb 10 |
nicklas |
630 |
|