svndigest - svndigest

doc/src/docbook/appendix/bfs.xml

: Code
: Comments
: Other

Rev	Date	Author	Line
5244	12 Feb 10	nicklas	1	<?xml version="1.0" encoding="UTF-8"?>
5244	12 Feb 10	nicklas	2	<!DOCTYPE appendix PUBLIC
5244	12 Feb 10	nicklas	3	"-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN"
5244	12 Feb 10	nicklas	4	"../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd">
5244	12 Feb 10	nicklas	5	<!--
5244	12 Feb 10	nicklas	6	$Id $
5244	12 Feb 10	nicklas	7
5244	12 Feb 10	nicklas	8	Copyright (C) 2010 Nicklas Nordborg
5244	12 Feb 10	nicklas	9
5244	12 Feb 10	nicklas	10	This file is part of BASE - BioArray Software Environment.
5244	12 Feb 10	nicklas	11	Available at http://base.thep.lu.se/
5244	12 Feb 10	nicklas	12
5244	12 Feb 10	nicklas	13	BASE is free software; you can redistribute it and/or
5244	12 Feb 10	nicklas	14	modify it under the terms of the GNU General Public License
5244	12 Feb 10	nicklas	15	as published by the Free Software Foundation; either version 3
5244	12 Feb 10	nicklas	16	of the License, or (at your option) any later version.
5244	12 Feb 10	nicklas	17
5244	12 Feb 10	nicklas	18	BASE is distributed in the hope that it will be useful,
5244	12 Feb 10	nicklas	19	but WITHOUT ANY WARRANTY; without even the implied warranty of
5244	12 Feb 10	nicklas	20	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
5244	12 Feb 10	nicklas	21	GNU General Public License for more details.
5244	12 Feb 10	nicklas	22
5244	12 Feb 10	nicklas	23	You should have received a copy of the GNU General Public License
5244	12 Feb 10	nicklas	24	along with BASE. If not, see <http://www.gnu.org/licenses/>.
5244	12 Feb 10	nicklas	25	-->
5244	12 Feb 10	nicklas	26	<sect1 id="appendix.fileformats.bfs" >
5782	04 Oct 11	nicklas	27	<?dbhtml filename="bfs.html" ?>
5244	12 Feb 10	nicklas	28	<title>The BFS (BASE File Set) format</title>
5244	12 Feb 10	nicklas	29
5244	12 Feb 10	nicklas	30	<para>
5244	12 Feb 10	nicklas	31	The BASE File Set (BFS) format is a collection of file formats that can be used
5244	12 Feb 10	nicklas	32	together to transport all kinds data. The major use is to send spot data
5245	15 Feb 10	nicklas	33	to a plug-in for analysis and then to import the analyzed results. We
5244	12 Feb 10	nicklas	34	have tried to keep the format generic and extendable so it is not unlikely
5244	12 Feb 10	nicklas	35	that the BFS format can be used for other applications in the future.
5244	12 Feb 10	nicklas	36	</para>
5244	12 Feb 10	nicklas	37
5244	12 Feb 10	nicklas	38	<sect2 id="fileformats.bfs.basics" >
5244	12 Feb 10	nicklas	39	<title>The basics of BFS</title>
5244	12 Feb 10	nicklas	40
5244	12 Feb 10	nicklas	41	<para>
5244	12 Feb 10	nicklas	42	The idea is to use simple, plain-text files with data organised into
5245	15 Feb 10	nicklas	43	rows and columns. A single type of file may not be able to hold all
5244	12 Feb 10	nicklas	44	kinds of data, so to begin with we have defined three types of files:
5244	12 Feb 10	nicklas	45	</para>
5244	12 Feb 10	nicklas	46
5244	12 Feb 10	nicklas	47	<itemizedlist>
5244	12 Feb 10	nicklas	48	<listitem>
5244	12 Feb 10	nicklas	49	<para>
5244	12 Feb 10	nicklas	50	Metadata files: Holds information about the data that is found
5244	12 Feb 10	nicklas	51	in the other files in the file set.
5244	12 Feb 10	nicklas	52	</para>
5244	12 Feb 10	nicklas	53	</listitem>
5244	12 Feb 10	nicklas	54	<listitem>
5244	12 Feb 10	nicklas	55	<para>
5244	12 Feb 10	nicklas	56	Annotation files: Column-based files that holds one record per
5244	12 Feb 10	nicklas	57	line. The first line is a header line. The remaining lines are
5244	12 Feb 10	nicklas	58	data lines identified by a unique positive ID value in
5244	12 Feb 10	nicklas	59	the first column.
5244	12 Feb 10	nicklas	60	</para>
5244	12 Feb 10	nicklas	61	</listitem>
5244	12 Feb 10	nicklas	62	<listitem>
5244	12 Feb 10	nicklas	63	<para>
5244	12 Feb 10	nicklas	64	Data files: Pure matrix data files without header lines or
5244	12 Feb 10	nicklas	65	ID columns. Data is usually identified by matching it line-by-line
5244	12 Feb 10	nicklas	66	with data in annotation files, or with information in the metadata file.
5244	12 Feb 10	nicklas	67	</para>
5244	12 Feb 10	nicklas	68	</listitem>
5244	12 Feb 10	nicklas	69	</itemizedlist>
5244	12 Feb 10	nicklas	70
5244	12 Feb 10	nicklas	71	<sect3 id="fileformats.bfs.encoding">
5244	12 Feb 10	nicklas	72	<title>Character encoding</title>
5244	12 Feb 10	nicklas	73
5244	12 Feb 10	nicklas	74	<para>
5244	12 Feb 10	nicklas	75	All files are text-based and should use the UTF-8 character encoding.
5244	12 Feb 10	nicklas	76	A newline (<code>\n</code>) is used as a record separator and a tab
5244	12 Feb 10	nicklas	77	(<code>\t</code>) is used a column separator. Data that contains newline
5244	12 Feb 10	nicklas	78	or tab characters need to be escaped. A backslash (<code>\</code>) is used
5244	12 Feb 10	nicklas	79	to indicate the start of an escaped sequence. This means that the backslash
5244	12 Feb 10	nicklas	80	character must also be escaped. Since some editors includes a carriage
5245	15 Feb 10	nicklas	81	return (<code>\r</code>) in line breaks, we should also escape
5244	12 Feb 10	nicklas	82	carriage return.
5244	12 Feb 10	nicklas	83	</para>
5244	12 Feb 10	nicklas	84
5244	12 Feb 10	nicklas	85	<table frame="all" id="fileformats.bfs.escape_table">
5244	12 Feb 10	nicklas	86	<title>Escaped characters in the BFS format</title>
5244	12 Feb 10	nicklas	87	<tgroup cols="2" align="center">
5244	12 Feb 10	nicklas	88	<colspec colname="character" />
5244	12 Feb 10	nicklas	89	<colspec colname="escape" />
5244	12 Feb 10	nicklas	90	<thead>
5244	12 Feb 10	nicklas	91	<row>
5244	12 Feb 10	nicklas	92	<entry>Character</entry>
5244	12 Feb 10	nicklas	93	<entry>Escape sequence</entry>
5244	12 Feb 10	nicklas	94	</row>
5244	12 Feb 10	nicklas	95	</thead>
5244	12 Feb 10	nicklas	96	<tbody>
5244	12 Feb 10	nicklas	97	<row>
5244	12 Feb 10	nicklas	98	<entry><code><backslash></code></entry>
5244	12 Feb 10	nicklas	99	<entry><code>\\</code></entry>
5244	12 Feb 10	nicklas	100	</row>
5244	12 Feb 10	nicklas	101	<row>
5244	12 Feb 10	nicklas	102	<entry><code><newline></code></entry>
5244	12 Feb 10	nicklas	103	<entry><code>\n</code></entry>
5244	12 Feb 10	nicklas	104	</row>
5244	12 Feb 10	nicklas	105	<row>
5244	12 Feb 10	nicklas	106	<entry><code><carriage return></code></entry>
5244	12 Feb 10	nicklas	107	<entry><code>\r</code></entry>
5244	12 Feb 10	nicklas	108	</row>
5244	12 Feb 10	nicklas	109	<row>
5244	12 Feb 10	nicklas	110	<entry><code><tab></code></entry>
5244	12 Feb 10	nicklas	111	<entry><code>\t</code></entry>
5244	12 Feb 10	nicklas	112	</row>
5244	12 Feb 10	nicklas	113	</tbody>
5244	12 Feb 10	nicklas	114	</tgroup>
5244	12 Feb 10	nicklas	115	</table>
5244	12 Feb 10	nicklas	116
5244	12 Feb 10	nicklas	117	<para>
5244	12 Feb 10	nicklas	118	It is recommended that parsers are forgiving and if an invalid escape
5244	12 Feb 10	nicklas	119	sequence is found, eg. a backslash followed by anything else than
5244	12 Feb 10	nicklas	120	<code>\</code>, <code>n</code>, <code>r</code> or <code>t</code>,
5244	12 Feb 10	nicklas	121	the input is taken literally. Strict parsers may throw
5244	12 Feb 10	nicklas	122	exceptions or log warning messages.
5244	12 Feb 10	nicklas	123	</para>
5244	12 Feb 10	nicklas	124	</sect3>
5244	12 Feb 10	nicklas	125
5244	12 Feb 10	nicklas	126	<sect3 id="fileformats.bfs.numbers">
5244	12 Feb 10	nicklas	127	<title>Numerical values</title>
5244	12 Feb 10	nicklas	128	<para>
5244	12 Feb 10	nicklas	129	Numeric values should use dot (<code>.</code>) as decimal point. Scientific
5244	12 Feb 10	nicklas	130	notation is accepted. Null, NaN, Infinity, and other special values should all
5244	12 Feb 10	nicklas	131	be represented by empty string values. It is recommended that parsers
5245	15 Feb 10	nicklas	132	are forgiving and treat invalid numerical data as empty values.
5244	12 Feb 10	nicklas	133	</para>
5244	12 Feb 10	nicklas	134	</sect3>
5244	12 Feb 10	nicklas	135
5244	12 Feb 10	nicklas	136	<sect3 id="fileformats.bfs.comments">
5244	12 Feb 10	nicklas	137	<title>Comments and white-space</title>
5244	12 Feb 10	nicklas	138	<para>
5244	12 Feb 10	nicklas	139	Lines starting with <code>#</code> are comment lines and should be ignored. Empty
5244	12 Feb 10	nicklas	140	lines should also be ignored. A line that contains only white-space is
5245	15 Feb 10	nicklas	141	considered as empty. White-space=spaces, tabs and other characters that
5244	12 Feb 10	nicklas	142	matches <code>\s</code> in regular expressions.
5244	12 Feb 10	nicklas	143	</para>
5244	12 Feb 10	nicklas	144
5244	12 Feb 10	nicklas	145	<note>
5244	12 Feb 10	nicklas	146	<para>
5244	12 Feb 10	nicklas	147	This can only be used in metadata files.
5244	12 Feb 10	nicklas	148	Annotation files and data files doesn't allow
5244	12 Feb 10	nicklas	149	comments or empty lines.
5244	12 Feb 10	nicklas	150	</para>
5244	12 Feb 10	nicklas	151	</note>
5244	12 Feb 10	nicklas	152	</sect3>
5244	12 Feb 10	nicklas	153
5244	12 Feb 10	nicklas	154	<sect3 id="fileformats.bfs.metadata">
5244	12 Feb 10	nicklas	155	<title>Metadata files</title>
5244	12 Feb 10	nicklas	156
5244	12 Feb 10	nicklas	157	<para>
5244	12 Feb 10	nicklas	158	A BASE File Set usually contains one metadata file. This file contains
5244	12 Feb 10	nicklas	159	information about the other files that make up the file set. The
5245	15 Feb 10	nicklas	160	metadata file can also hold information that is specific to a
5244	12 Feb 10	nicklas	161	use case.
5244	12 Feb 10	nicklas	162	</para>
5244	12 Feb 10	nicklas	163
5244	12 Feb 10	nicklas	164	<para>
5244	12 Feb 10	nicklas	165	A metadata file always starts with the beginning-of-file (BOF) marker
5244	12 Feb 10	nicklas	166	<code>BFSformat</code>, optionally followed by a tab and a value indicating the
5244	12 Feb 10	nicklas	167	subtype of the file. This must be the first line of the file. Comments
5244	12 Feb 10	nicklas	168	or empty lines are not allowed before the beginning-of-file marker.
5244	12 Feb 10	nicklas	169	</para>
5244	12 Feb 10	nicklas	170
5244	12 Feb 10	nicklas	171	<para>
5244	12 Feb 10	nicklas	172	All data in a metadata file must be inside a section. A section is
5244	12 Feb 10	nicklas	173	started by surrounding a value in brackets on a line by it's own,
5244	12 Feb 10	nicklas	174	for example, <code>[my section]</code>. There is no restriction on the name of the
5244	12 Feb 10	nicklas	175	section as long as it is escaped using the normal rules. Note that
5244	12 Feb 10	nicklas	176	there is no need to escape brackets in the name. For example,
5244	12 Feb 10	nicklas	177	<code>[[a\\b]]</code> is a valid section with the name <code>[a\b]</code>.
5244	12 Feb 10	nicklas	178	Trailing white-space after the closing bracket is ignored.
5244	12 Feb 10	nicklas	179	</para>
5244	12 Feb 10	nicklas	180
5244	12 Feb 10	nicklas	181	<para>
5244	12 Feb 10	nicklas	182	Multiple sections may have the same name, and the order of the
5244	12 Feb 10	nicklas	183	sections is usually of no concern. However, this may be restricted
5244	12 Feb 10	nicklas	184	in specific cases if there is need to, for example, require unique
5244	12 Feb 10	nicklas	185	section names or enforce a specific order.
5244	12 Feb 10	nicklas	186	Parsers are recommended to provide access to sections by name
5244	12 Feb 10	nicklas	187	and by ordinal number, starting at 0 and writers are recommended
5244	12 Feb 10	nicklas	188	to write sections in the order they are added.
5244	12 Feb 10	nicklas	189	</para>
5244	12 Feb 10	nicklas	190
5244	12 Feb 10	nicklas	191	<para>
5244	12 Feb 10	nicklas	192	Each section contains data in the form of tab-separated
5244	12 Feb 10	nicklas	193	key-value pairs. Keys may not start with <code>#</code> or <code>[</code>
5244	12 Feb 10	nicklas	194	since this would interfere with comments and sections. Otherwise, the
5244	12 Feb 10	nicklas	195	normal escape rules should be used for both keys and values.
5244	12 Feb 10	nicklas	196	Values are allowed to use non-escaped tab characers, which makes
5244	12 Feb 10	nicklas	197	it possible to use vector-type values.
5244	12 Feb 10	nicklas	198	</para>
5244	12 Feb 10	nicklas	199
5244	12 Feb 10	nicklas	200	<para>
5244	12 Feb 10	nicklas	201	A key doesn't have to be unique within a section, but specific use
5244	12 Feb 10	nicklas	202	cases may require this globally or on section-per-section basis.
5244	12 Feb 10	nicklas	203	The order of the keys are usually not important, except if the
5244	12 Feb 10	nicklas	204	use case requires it.
5244	12 Feb 10	nicklas	205	Parser implementations are recommended to provide access to
5244	12 Feb 10	nicklas	206	keys by name and by ordinal number, starting at 0. Generic writers
5244	12 Feb 10	nicklas	207	implementations are recommended to write keys and values in the order
5244	12 Feb 10	nicklas	208	they are added to each section.
5244	12 Feb 10	nicklas	209	</para>
5244	12 Feb 10	nicklas	210
5244	12 Feb 10	nicklas	211	<para>
5244	12 Feb 10	nicklas	212	If the file set includes more files than the metadata file, those
5244	12 Feb 10	nicklas	213	files should be listed in the <code>[files]</code> section. Keys should be
5245	15 Feb 10	nicklas	214	unique, but there are no other restrictions. The value is the file name
5245	15 Feb 10	nicklas	215	without path information. The files are expected to be located in the same
5245	15 Feb 10	nicklas	216	container as the current metadata file. A container could for example be a
5244	12 Feb 10	nicklas	217	folder in the file system, a zip-file, or any other logical item
5244	12 Feb 10	nicklas	218	that group files. Metadata about the files and file content is not
5244	12 Feb 10	nicklas	219	part of the generic BFS specification. This is left to specific use cases.
5244	12 Feb 10	nicklas	220	</para>
5244	12 Feb 10	nicklas	221
5244	12 Feb 10	nicklas	222	<note>
5244	12 Feb 10	nicklas	223	Files doesn't have to be other BFS file types. It can be any type
5244	12 Feb 10	nicklas	224	of files, like pdf files, images, etc.
5244	12 Feb 10	nicklas	225	</note>
5244	12 Feb 10	nicklas	226
5244	12 Feb 10	nicklas	227	<example id="fileformats.bfs.metadata_example">
5244	12 Feb 10	nicklas	228	<title>Example BFS metadata file</title>
5244	12 Feb 10	nicklas	229	<programlisting>
5245	15 Feb 10	nicklas	230	BFSformat subtype
5245	15 Feb 10	nicklas	231	# The 'BFSformat' must be on the the first line, subtype is optional
5244	12 Feb 10	nicklas	232	# A comment line starts with '#'. Empty lines are ignored
5244	12 Feb 10	nicklas	233
5244	12 Feb 10	nicklas	234	# A section is started by enclosing the section name in brackets
5244	12 Feb 10	nicklas	235	# Section entries are key/value pairs separated by tab
5244	12 Feb 10	nicklas	236	# Vector-type values are allowed. Duplicate keys may or may
5244	12 Feb 10	nicklas	237	# not be allowed depending on the use case.
5244	12 Feb 10	nicklas	238	[settings]
5244	12 Feb 10	nicklas	239	key-1 value1
5244	12 Feb 10	nicklas	240	key-2 value2a value2b
5244	12 Feb 10	nicklas	241
5244	12 Feb 10	nicklas	242	# The 'files' section points to additional files in the file set
5244	12 Feb 10	nicklas	243	# Keys should be unique
5244	12 Feb 10	nicklas	244	[files]
5245	15 Feb 10	nicklas	245	report report.txt
5244	12 Feb 10	nicklas	246	table tabla-data.txt
5244	12 Feb 10	nicklas	247	plot plotted-data.png
5244	12 Feb 10	nicklas	248	</programlisting>
5244	12 Feb 10	nicklas	249	</example>
5244	12 Feb 10	nicklas	250
5244	12 Feb 10	nicklas	251	</sect3>
5244	12 Feb 10	nicklas	252
5244	12 Feb 10	nicklas	253	<sect3 id="fileformats.bfs.annotations">
5244	12 Feb 10	nicklas	254	<title>Annotation files</title>
5244	12 Feb 10	nicklas	255
5244	12 Feb 10	nicklas	256	<para>
5244	12 Feb 10	nicklas	257	The first line is a header line containing the column names for each column.
5245	15 Feb 10	nicklas	258	The first column is required and must always be <code>ID</code>. Other columns
5244	12 Feb 10	nicklas	259	are optional, but must have unique names. Column names are separated with
5244	12 Feb 10	nicklas	260	tabs and are encoded using the normal rules. All other lines are data lines.
5244	12 Feb 10	nicklas	261	Each line must have <emphasis>exactly the same number of columns</emphasis>
5244	12 Feb 10	nicklas	262	as the header line. Comment lines and empty lines are not supported, but
5244	12 Feb 10	nicklas	263	a column may have an empty value.
5244	12 Feb 10	nicklas	264	</para>
5244	12 Feb 10	nicklas	265
5244	12 Feb 10	nicklas	266	<para>
5244	12 Feb 10	nicklas	267	The ID column holds a unique identifier used internally by BASE. A given ID
5244	12 Feb 10	nicklas	268	should only be used once and may not be repeated later in the file. The ID
5244	12 Feb 10	nicklas	269	is a numeric positive integer value. Zero, negative or empty values are not
5244	12 Feb 10	nicklas	270	allowed. There is no special ordering (unless a specific use-case require this).
5244	12 Feb 10	nicklas	271	Note that the ID values are not indexes. They don't have to start at 1 and
5244	12 Feb 10	nicklas	272	there may be "holes" in the range of values used. Some use-cases may use ID
5244	12 Feb 10	nicklas	273	values with some specific meaning, other use-cases may simple enumerate the
5244	12 Feb 10	nicklas	274	rows using a counter.
5244	12 Feb 10	nicklas	275	</para>
5244	12 Feb 10	nicklas	276
5244	12 Feb 10	nicklas	277	</sect3>
5244	12 Feb 10	nicklas	278
5244	12 Feb 10	nicklas	279	<sect3 id="fileformats.bfs.matrixdata">
5245	15 Feb 10	nicklas	280	<title>Data files</title>
5244	12 Feb 10	nicklas	281
5244	12 Feb 10	nicklas	282	<para>
5244	12 Feb 10	nicklas	283	A data file is a matrix containing one data value for each row-column
5244	12 Feb 10	nicklas	284	element. Data starts on the first line. There is no header line.
5244	12 Feb 10	nicklas	285	All data lines <emphasis>must have the same number of columns</emphasis>.
5244	12 Feb 10	nicklas	286	The number of rows and columns and their order are defined by other,
5244	12 Feb 10	nicklas	287	use-case specfic, information in the metadata file or in annotation file(s).
5244	12 Feb 10	nicklas	288	Comment lines and empty lines are not supported, but a column may hold an
5244	12 Feb 10	nicklas	289	empty value.
5244	12 Feb 10	nicklas	290	</para>
5244	12 Feb 10	nicklas	291
5244	12 Feb 10	nicklas	292	</sect3>
5244	12 Feb 10	nicklas	293	</sect2>
5244	12 Feb 10	nicklas	294
5244	12 Feb 10	nicklas	295	<sect2 id="fileformats.bfs.spotdata">
5245	15 Feb 10	nicklas	296	<title>Using BFS for spotdata to and from external plug-ins</title>
5244	12 Feb 10	nicklas	297
5245	15 Feb 10	nicklas	298	<para>
5245	15 Feb 10	nicklas	299	The use case is to use BFS to transport data to and from
5245	15 Feb 10	nicklas	300	an external analysis plug-in. The general outline is:
5245	15 Feb 10	nicklas	301	</para>
5244	12 Feb 10	nicklas	302
5245	15 Feb 10	nicklas	303	<orderedlist>
5245	15 Feb 10	nicklas	304	<listitem>
5245	15 Feb 10	nicklas	305	<para>Export bioassay set data to BFS.</para>
5245	15 Feb 10	nicklas	306	</listitem>
5245	15 Feb 10	nicklas	307	<listitem>
5245	15 Feb 10	nicklas	308	<para>
5245	15 Feb 10	nicklas	309	Execute the external plug-in which process the data
5245	15 Feb 10	nicklas	310	and generates a new BFS.
5245	15 Feb 10	nicklas	311	</para>
5245	15 Feb 10	nicklas	312	</listitem>
5245	15 Feb 10	nicklas	313	<listitem>
5245	15 Feb 10	nicklas	314	<para>Import the transformed data to BASE.</para>
5245	15 Feb 10	nicklas	315	</listitem>
5245	15 Feb 10	nicklas	316	</orderedlist>
5244	12 Feb 10	nicklas	317
5245	15 Feb 10	nicklas	318	<para>
5245	15 Feb 10	nicklas	319	The export will generate at least two files. One metadata file
5245	15 Feb 10	nicklas	320	and one data file. It is also possible to export reporter and
5245	15 Feb 10	nicklas	321	assay annotations if the plug-in needs it. Note that reporter
5245	15 Feb 10	nicklas	322	and assay annotation files are always needed if new spot data
5245	15 Feb 10	nicklas	323	is going to be imported so in most cases at least four files
5245	15 Feb 10	nicklas	324	will be created.
5245	15 Feb 10	nicklas	325	</para>
5245	15 Feb 10	nicklas	326
5245	15 Feb 10	nicklas	327	<sect3 id="fileformats.bfs.spotdata.metadata">
5245	15 Feb 10	nicklas	328	<title>The metadata file</title>
5245	15 Feb 10	nicklas	329
5245	15 Feb 10	nicklas	330	<para>
5245	15 Feb 10	nicklas	331	There are two subtypes:
5245	15 Feb 10	nicklas	332	</para>
5245	15 Feb 10	nicklas	333
5245	15 Feb 10	nicklas	334	<itemizedlist>
5245	15 Feb 10	nicklas	335	<listitem>
5245	15 Feb 10	nicklas	336	<para>
5245	15 Feb 10	nicklas	337	serial: One data file is required for each assay. The columns
5245	15 Feb 10	nicklas	338	in the data files represents different spot data values, eg.
5245	15 Feb 10	nicklas	339	first column = Ch 1, second column = Ch 2, etc.
5245	15 Feb 10	nicklas	340	</para>
5245	15 Feb 10	nicklas	341	</listitem>
5245	15 Feb 10	nicklas	342	<listitem>
5245	15 Feb 10	nicklas	343	<para>
5245	15 Feb 10	nicklas	344	matrix: One data file is required for each spot data value. The
5245	15 Feb 10	nicklas	345	columns in the data files represents assays.
5245	15 Feb 10	nicklas	346	</para>
5245	15 Feb 10	nicklas	347	</listitem>
5245	15 Feb 10	nicklas	348	</itemizedlist>
5245	15 Feb 10	nicklas	349
5245	15 Feb 10	nicklas	350	<para>
5245	15 Feb 10	nicklas	351	For both subtypes the <code>[files]</code> section is used
5245	15 Feb 10	nicklas	352	to name the files holding data and annotations. The following
5245	15 Feb 10	nicklas	353	entries should be used:
5245	15 Feb 10	nicklas	354	</para>
5245	15 Feb 10	nicklas	355
5245	15 Feb 10	nicklas	356	<itemizedlist>
5245	15 Feb 10	nicklas	357	<listitem>
5245	15 Feb 10	nicklas	358	<para>
5245	15 Feb 10	nicklas	359	rdata: The filename of the file containing reporter annotations
5245	15 Feb 10	nicklas	360	</para>
5245	15 Feb 10	nicklas	361	</listitem>
5245	15 Feb 10	nicklas	362	<listitem>
5245	15 Feb 10	nicklas	363	<para>
5245	15 Feb 10	nicklas	364	pdata: The filename of the file containing assay annotations
5245	15 Feb 10	nicklas	365	</para>
5245	15 Feb 10	nicklas	366	</listitem>
5245	15 Feb 10	nicklas	367	<listitem>
5245	15 Feb 10	nicklas	368	<para>
5245	15 Feb 10	nicklas	369	sdata1, sdata2, ..., sdataN: N entries, numbered from 1 to N,
5245	15 Feb 10	nicklas	370	with the filenames of the files containing spot data. If the
5245	15 Feb 10	nicklas	371	serial subtype is used there should be one file for each assay
5245	15 Feb 10	nicklas	372	in the bioassayset. If the matrix subtype is used, there should
5245	15 Feb 10	nicklas	373	be one file for each entry in the <code>[sdata]</code> section.
5245	15 Feb 10	nicklas	374	</para>
5245	15 Feb 10	nicklas	375	</listitem>
5245	15 Feb 10	nicklas	376	</itemizedlist>
5245	15 Feb 10	nicklas	377
5245	15 Feb 10	nicklas	378	<para>
5245	15 Feb 10	nicklas	379	Other files may be included if they use <code>x-</code> as a prefix.
5245	15 Feb 10	nicklas	380	</para>
5245	15 Feb 10	nicklas	381
5245	15 Feb 10	nicklas	382	<para>Example:</para>
5245	15 Feb 10	nicklas	383
5245	15 Feb 10	nicklas	384	<programlisting>
5245	15 Feb 10	nicklas	385	BFSformat serial
5245	15 Feb 10	nicklas	386	[files]
5245	15 Feb 10	nicklas	387	rdata reporters.txt
5245	15 Feb 10	nicklas	388	pdata assays.txt
5245	15 Feb 10	nicklas	389	sdata1 Assay 1.txt
5245	15 Feb 10	nicklas	390	sdata2 Assay 2.txt
5245	15 Feb 10	nicklas	391	x-custom custom.txt
5245	15 Feb 10	nicklas	392	</programlisting>
5245	15 Feb 10	nicklas	393
5245	15 Feb 10	nicklas	394	<para>
5245	15 Feb 10	nicklas	395	The <code>[sdata]</code> section contains information
5245	15 Feb 10	nicklas	396	about the spot data that is found in the <code>sdataX</code>
5245	15 Feb 10	nicklas	397	files. The key of each entry is the name or title of the data
5245	15 Feb 10	nicklas	398	that is exported. The value describes the data type and can be
5245	15 Feb 10	nicklas	399	either <code>text</code>, <code>float</code> or <code>int</code>.
5245	15 Feb 10	nicklas	400	</para>
5245	15 Feb 10	nicklas	401
5245	15 Feb 10	nicklas	402	<para>
5245	15 Feb 10	nicklas	403	The order in this section is important. If the matrix
5245	15 Feb 10	nicklas	404	subtype is used, the entries in this section must match the
5245	15 Feb 10	nicklas	405	<code>sdataX</code> entries in the <code>[files]</code> section.
5245	15 Feb 10	nicklas	406	Eg. the data that corresponds to the first entry in this section
5245	15 Feb 10	nicklas	407	is found in the <code>sdata1</code> file. The number of entries
5245	15 Feb 10	nicklas	408	in this section must be the same as the number of <code>sdataX</code>
5245	15 Feb 10	nicklas	409	entries in the <code>[files]</code> section.
5245	15 Feb 10	nicklas	410	</para>
5245	15 Feb 10	nicklas	411
5245	15 Feb 10	nicklas	412	<para>
5245	15 Feb 10	nicklas	413	If the serial subtype is used the entries in this section must
5245	15 Feb 10	nicklas	414	match the column order in each of the <code>sdataX</code> files.
5245	15 Feb 10	nicklas	415	Eg. the data that corresponds to the first entry in this section
5245	15 Feb 10	nicklas	416	is found in the first column in all <code>sdataX</code>
5245	15 Feb 10	nicklas	417	files. The number of entries in this section must match the number of
5245	15 Feb 10	nicklas	418	columns in the <code>sdataX</code> files.
5245	15 Feb 10	nicklas	419	</para>
5245	15 Feb 10	nicklas	420
5245	15 Feb 10	nicklas	421	<para>Example:</para>
5245	15 Feb 10	nicklas	422
5245	15 Feb 10	nicklas	423	<programlisting>
5245	15 Feb 10	nicklas	424	[sdata]
5245	15 Feb 10	nicklas	425	Ch 1 float
5245	15 Feb 10	nicklas	426	Ch 2 float
5245	15 Feb 10	nicklas	427	Weight float
5245	15 Feb 10	nicklas	428	Flag int
5245	15 Feb 10	nicklas	429	</programlisting>
5245	15 Feb 10	nicklas	430
5245	15 Feb 10	nicklas	431	<para>
5245	15 Feb 10	nicklas	432	The <code>[parameters]</code> section contains extra parameters
5245	15 Feb 10	nicklas	433	needed by the plug-in. Keys and values are defined by the plug-in
5245	15 Feb 10	nicklas	434	and/or job configuration. Duplicate keys are not allowed, and order
5245	15 Feb 10	nicklas	435	is not important. Multiple values for the same parameter are separated
5245	15 Feb 10	nicklas	436	with a tab character.
5245	15 Feb 10	nicklas	437	</para>
5245	15 Feb 10	nicklas	438
5245	15 Feb 10	nicklas	439	<para>Example:</para>
5245	15 Feb 10	nicklas	440
5245	15 Feb 10	nicklas	441	<programlisting>
5245	15 Feb 10	nicklas	442	[parameters]
5245	15 Feb 10	nicklas	443	beta 0.5
5245	15 Feb 10	nicklas	444	length 100
5245	15 Feb 10	nicklas	445	vector 10 10.3 23
5245	15 Feb 10	nicklas	446	median true
5245	15 Feb 10	nicklas	447	</programlisting>
5245	15 Feb 10	nicklas	448
5245	15 Feb 10	nicklas	449
5245	15 Feb 10	nicklas	450	</sect3>
5245	15 Feb 10	nicklas	451
5245	15 Feb 10	nicklas	452	<sect3 id="fileformats.bfs.spotdata.annotations">
5245	15 Feb 10	nicklas	453	<title>Reporter and assay annotations</title>
5245	15 Feb 10	nicklas	454
5245	15 Feb 10	nicklas	455	<para>
5245	15 Feb 10	nicklas	456	The file used for reporter annotations is given by the <code>rdata</code>
5245	15 Feb 10	nicklas	457	entry in the <code>[files]</code> section. This file is optional when exporting
5245	15 Feb 10	nicklas	458	but required when importing. The only required column is the <code>ID</code>
5245	15 Feb 10	nicklas	459	column, which holds the internal spot position values. All <code>sdataX</code>
5245	15 Feb 10	nicklas	460	files must have the same number of rows as this file (not counting the
5245	15 Feb 10	nicklas	461	header line) and data should be sorted in the same order. Additional columns may
5245	15 Feb 10	nicklas	462	be included in the export.
5245	15 Feb 10	nicklas	463	</para>
5245	15 Feb 10	nicklas	464
5245	15 Feb 10	nicklas	465	<para>
5245	15 Feb 10	nicklas	466	Note that the same underlying reporter may be assigned to more than one
5245	15 Feb 10	nicklas	467	position. If the plug-in needs to operate on merged-per-reporter data
5245	15 Feb 10	nicklas	468	the export should include either the internal or external reporter id in
5245	15 Feb 10	nicklas	469	an additional column so that the plug-in can use this information to
5245	15 Feb 10	nicklas	470	determine what should be merged. The exporter has no support for exporting
5245	15 Feb 10	nicklas	471	merged data.
5245	15 Feb 10	nicklas	472	</para>
5245	15 Feb 10	nicklas	473
5245	15 Feb 10	nicklas	474	<para>
5245	15 Feb 10	nicklas	475	The file used for assay annotations is given by the <code>pdata</code>
5245	15 Feb 10	nicklas	476	entry in the <code>[files]</code> section. This file is optional when
5245	15 Feb 10	nicklas	477	exporting but required when importing. The only required
5245	15 Feb 10	nicklas	478	column is the ID column, which holds the interal bioassay id values.
5245	15 Feb 10	nicklas	479	If the matrix subtype is used the columns in the <code>sdataX</code>
5245	15 Feb 10	nicklas	480	files must be in the same order as the assays appear in this file. The
5245	15 Feb 10	nicklas	481	number of columns in the data files must be the same as the number of rows
5245	15 Feb 10	nicklas	482	in this file (not counting the header line).
5245	15 Feb 10	nicklas	483	</para>
5245	15 Feb 10	nicklas	484
5245	15 Feb 10	nicklas	485	<para>
5245	15 Feb 10	nicklas	486	If the serial subtype is used, the <code>sdata1</code> file has data
5245	15 Feb 10	nicklas	487	for the assay that is described in the first line in this file, the
5245	15 Feb 10	nicklas	488	<code>sdata2</code> file has data for the second assay, etc. The number
5245	15 Feb 10	nicklas	489	of data files must match the number of lines in this file.
5245	15 Feb 10	nicklas	490	</para>
5245	15 Feb 10	nicklas	491
5245	15 Feb 10	nicklas	492	</sect3>
5245	15 Feb 10	nicklas	493	<sect3 id="fileformats.bfs.spotdata.data">
5245	15 Feb 10	nicklas	494	<title>Data files</title>
5245	15 Feb 10	nicklas	495
5245	15 Feb 10	nicklas	496	<para>
5245	15 Feb 10	nicklas	497	Data files contains data in matrix format. More than one data file may be
5245	15 Feb 10	nicklas	498	required. The organisation of the data depends on the BFS subtype. In
5245	15 Feb 10	nicklas	499	both subtypes the number and order of the rows must match the number
5245	15 Feb 10	nicklas	500	and order of rows in the reporter annotations file.
5245	15 Feb 10	nicklas	501
5245	15 Feb 10	nicklas	502	</para>
5245	15 Feb 10	nicklas	503
5245	15 Feb 10	nicklas	504	<para>
5245	15 Feb 10	nicklas	505	If the matrix subtype is used, the columns in the data files corresponds
5245	15 Feb 10	nicklas	506	to assays. The number of columns and their order must match the lines
5245	15 Feb 10	nicklas	507	in the assay annotations file. The number of data files and their content
5245	15 Feb 10	nicklas	508	is defined by the entries in the <code>[sdata]</code> section.
5245	15 Feb 10	nicklas	509	</para>
5245	15 Feb 10	nicklas	510
5245	15 Feb 10	nicklas	511	<para>
5245	15 Feb 10	nicklas	512	If the serial subtype is used, the the number of columns and their order
5245	15 Feb 10	nicklas	513	must match the entries in the <code>[sdata]</code> section. Each data
5245	15 Feb 10	nicklas	514	file has data from one assay. The number of sdata files in the
5245	15 Feb 10	nicklas	515	<code>[files]</code> section must match the number of lines in the
5245	15 Feb 10	nicklas	516	assay annotations file.
5245	15 Feb 10	nicklas	517	</para>
5245	15 Feb 10	nicklas	518
5245	15 Feb 10	nicklas	519	</sect3>
5245	15 Feb 10	nicklas	520
5245	15 Feb 10	nicklas	521	<sect3 id="fileformats.bfs.spotdata.import">
5245	15 Feb 10	nicklas	522	<title>Importing spot data</title>
5245	15 Feb 10	nicklas	523
5245	15 Feb 10	nicklas	524	<para>
5245	15 Feb 10	nicklas	525	The above information is mostly true for both export and import, but
5245	15 Feb 10	nicklas	526	there are a few additional things that a plug-in should know about when
5245	15 Feb 10	nicklas	527	generating data that is going to be imported. The most important
5319	20 Apr 10	nicklas	528	thing is that both reporter and assay annotation files are required
5319	20 Apr 10	nicklas	529	for importing spot data. If the program only generates extra files
5319	20 Apr 10	nicklas	530	the <code>[sdata]</code> section should not be included and no
5319	20 Apr 10	nicklas	531	data or annoatation files are need.
5245	15 Feb 10	nicklas	532	All files are specified in the <code>[files]</code> section in the
5245	15 Feb 10	nicklas	533	same way as for the export. File entries starting with <code>x-</code>
5245	15 Feb 10	nicklas	534	will be uploaded to BASE and linked with the new bioassay set.
5245	15 Feb 10	nicklas	535	</para>
5245	15 Feb 10	nicklas	536
5245	15 Feb 10	nicklas	537	<note>
5245	15 Feb 10	nicklas	538	<para>
5245	15 Feb 10	nicklas	539	The importer currently supports importing spot data intensity
5319	20 Apr 10	nicklas	540	values and extra files. Position/reporter mapping and child/parent
5319	20 Apr 10	nicklas	541	assay mapping may remain the same or they may be changed. The importer can
5245	15 Feb 10	nicklas	542	also upload additional files generated by the plug-in, for
5245	15 Feb 10	nicklas	543	example plots. The importer has no support for importing
5245	15 Feb 10	nicklas	544	extra values, reporter lists or annotations.
5245	15 Feb 10	nicklas	545	</para>
5245	15 Feb 10	nicklas	546	</note>
5245	15 Feb 10	nicklas	547
5245	15 Feb 10	nicklas	548	<para>
5245	15 Feb 10	nicklas	549	In the metadata file, a <code>[settings]</code> section may be included
5245	15 Feb 10	nicklas	550	to control certain aspects of the import. The following entries can be
5245	15 Feb 10	nicklas	551	used:
5245	15 Feb 10	nicklas	552	</para>
5245	15 Feb 10	nicklas	553
5245	15 Feb 10	nicklas	554	<itemizedlist>
5245	15 Feb 10	nicklas	555	<listitem>
5245	15 Feb 10	nicklas	556	<para>
5245	15 Feb 10	nicklas	557	<code>new-data-cube</code>: If this is set, the data is imported into a new
5245	15 Feb 10	nicklas	558	data cube. A new data cube is needed whenever the position/reporter
5245	15 Feb 10	nicklas	559	mappings has changed or when parent assays has been merged. This
5245	15 Feb 10	nicklas	560	setting requires that the reporter annotations file contains
5245	15 Feb 10	nicklas	561	information about the new mapping. It needs to include either
5245	15 Feb 10	nicklas	562	<code>Internal ID</code> or <code>External ID</code> columns so
5245	15 Feb 10	nicklas	563	that the importer can map the new position to the correct reporter.
5245	15 Feb 10	nicklas	564	The reporter must already exist in the database. The position values
5245	15 Feb 10	nicklas	565	have no relation to the position values in the old bioassay set. We
5245	15 Feb 10	nicklas	566	recommend that a plug-in simply starts enumerates the lines starting at
5245	15 Feb 10	nicklas	567	1.
5245	15 Feb 10	nicklas	568	</para>
5245	15 Feb 10	nicklas	569	</listitem>
5245	15 Feb 10	nicklas	570
5245	15 Feb 10	nicklas	571	<listitem>
5245	15 Feb 10	nicklas	572	<para>
5245	15 Feb 10	nicklas	573	<code>multi-assay-parents</code>: If this is set, a child assay may have
5245	15 Feb 10	nicklas	574	more than one parent assay (for example, due to a merge). A new data
5245	15 Feb 10	nicklas	575	cube is needed and this setting is ignored unless
5245	15 Feb 10	nicklas	576	<code>new-data-cube</code> is also set. This setting requires that the
5245	15 Feb 10	nicklas	577	assay annotations file has a <code>Parent ID</code> column which
5245	15 Feb 10	nicklas	578	holds a comma-separated list with the ID:s of the parent assays.
5245	15 Feb 10	nicklas	579	</para>
5245	15 Feb 10	nicklas	580	</listitem>
5245	15 Feb 10	nicklas	581
5245	15 Feb 10	nicklas	582	<listitem>
5245	15 Feb 10	nicklas	583	<para>
5245	15 Feb 10	nicklas	584	<code>transform</code>: If not specified, the child spot data is
5245	15 Feb 10	nicklas	585	assumed to use the same intensity transform as the parent data. To force
5245	15 Feb 10	nicklas	586	a specific a specific intensity transform for the child bioassay set
5245	15 Feb 10	nicklas	587	include this setting and choose one fo the values: none, log2, log10.
5245	15 Feb 10	nicklas	588	</para>
5245	15 Feb 10	nicklas	589	</listitem>
5245	15 Feb 10	nicklas	590
5245	15 Feb 10	nicklas	591	</itemizedlist>
5245	15 Feb 10	nicklas	592
5245	15 Feb 10	nicklas	593	<para>
5319	20 Apr 10	nicklas	594	In the metadata file, the precense of an <code>[sdata]</code> section
5319	20 Apr 10	nicklas	595	indicates that spot data should be imported. If this section is not
5319	20 Apr 10	nicklas	596	present only extra files are uploaded to BASE and they are attached to
5319	20 Apr 10	nicklas	597	the transformation instead of a child bioassay set. If the <code>[sdata]</code>
5319	20 Apr 10	nicklas	598	section is present it must include one entry for each channel with names like,
5319	20 Apr 10	nicklas	599	<code>Ch 1</code>, <code>Ch 2</code>, and so on. The value is always
5319	20 Apr 10	nicklas	600	<code>float</code>. All other entries in this section are ignored.
5245	15 Feb 10	nicklas	601	</para>
5245	15 Feb 10	nicklas	602
5245	15 Feb 10	nicklas	603	<para>
5245	15 Feb 10	nicklas	604	In the reporter annotations file, the <code>ID</code> column should hold
5245	15 Feb 10	nicklas	605	the position values. Values must be positive integers and
5245	15 Feb 10	nicklas	606	duplicates are not allowed. The order of the values doesn't
5245	15 Feb 10	nicklas	607	matter. If importing data to a new data cube the reporter annotations
5245	15 Feb 10	nicklas	608	file also needs either <code>Internal ID</code> or <code>External ID</code>
5245	15 Feb 10	nicklas	609	columns.
5245	15 Feb 10	nicklas	610	</para>
5245	15 Feb 10	nicklas	611
5245	15 Feb 10	nicklas	612	<para>
5245	15 Feb 10	nicklas	613	In the assay annotations file, the <code>ID</code> column usually holds the
5245	15 Feb 10	nicklas	614	internal assay id of the parent assay. The exception is if the
5245	15 Feb 10	nicklas	615	<code>multi-assay-parents</code> options has been enabled. In this
5245	15 Feb 10	nicklas	616	case the id values have no special meaning, but the <code>Parent ID</code>
5245	15 Feb 10	nicklas	617	column must have a comma-separated list with id values instead.
5245	15 Feb 10	nicklas	618	</para>
5245	15 Feb 10	nicklas	619	<para>
5245	15 Feb 10	nicklas	620	The assay annotations file may optionally have a <code>Name</code> column.
5245	15 Feb 10	nicklas	621	If present, the values in this columns are used as names on the child assays.
5245	15 Feb 10	nicklas	622	Otherwise, they are given default names (usually the same name as the
5245	15 Feb 10	nicklas	623	parent assay).
5245	15 Feb 10	nicklas	624	</para>
5245	15 Feb 10	nicklas	625
5245	15 Feb 10	nicklas	626	</sect3>
5244	12 Feb 10	nicklas	627	</sect2>
5244	12 Feb 10	nicklas	628
5244	12 Feb 10	nicklas	629	</sect1>
5244	12 Feb 10	nicklas	630