Changes

643 bytes added , 17:36, 20 February 2017

Line 1: Line 1:

== TSG suggested file format for experiment data ==

−

The TSG suggests a common file format for storing experimental data. Adhering to this format whenever practical makes it easier to re-use files and tools. The file format is a tab-separated values (tsv) file with the following specifications:

+

The TSG suggests a common file format for storing experimental data. Adhering to this format whenever practical makes it easier to re-use files and tools. The file is plain text for easy inspection and manipulation. The file format is a tab-separated values (tsv) file with the following specifications:

===== File =====

* File encoding is ASCII or UTF-8.

−

* The file contains no byte order mark (BOM) or other magic number.

+

* The file contains no byte order mark (BOM) or other magic number. This makes it ASCII compatible.

===== Lines =====

* Lines are separated by the '''\r\n''' line delimiter for better compatibility between operating systems.

−

* The line delimiter should also be added after the last line~~, because~~...

+

* The line delimiter should also be added after the last line. This simplifies stream reading since all records (lines) are terminated. This allows for the use of a readline() function for acquiring a line.

* The first line contains a header with column/field names.

===== Fields =====

−

* ~~Field~~ are separated by the '''tab''' field delimiter, because they rarely occur in texts and ~~therefore require no escaping~~.

+

* Fields are separated by the '''tab''' field delimiter, because they rarely occur in texts. This allows for the use of comma's and semicolons in sentences without using an escape character.

−

* The field delimiter should ~~also~~ be added after each line's last field~~, because.~~..

+

* The field delimiter should '''not''' be added after each line's last field. This allows for the use of a split() function for parsing a line.

−

* The last field in a line must not be empty, because~~... if there is no value, wat do..~~.

+

* The last field in a line must not be empty, because it will show to parsers that the previous rule was obeyed.

−

* Fields are ~~not~~ surrounded by a quoting character.

+

* Fields are never surrounded by a quoting character.

−

* White space ~~between~~ field delimiters are considered part of ~~the~~ field.

+

* White space before or after field delimiters are considered part of a field.

−

* There is no defined escape character. If your data can contain tabs, use a different field delimiter or file format.

+

* There is no defined escape character. If your data can contain tabs or newlines, use a different field delimiter or file format.

===== Data =====

* For numbers the decimal separator is a dot, not a comma. There is no thousands separator.

−

== ~~File Format~~ ==

+

== Example ==

−

~~Data can be saved in~~ a ~~lot of~~ file ~~formats. If there is no reason to do otherwise, we prefer delimited files with the options shown~~ in ~~bold. Alternative options are also shown.~~

+

−

~~{| class="wikitable"~~

+

An example of what a file in this format may look like:

−

|-

+

<pre>

−

~~| file extension || '''tsv''' || csv || '''dat''' || txt~~

+

User ID	Hair color	Response time	

−

|-

+

1	brown	1.4	

−

~~| file extension || '''ascii''' || '''UTF-8''' || UTF-16BE || UTF-16LE || UCS-4/UTF-32~~

+

2	blond	1230.434	

−

|-

+

3	brown	0.399	

−

~~| magic number || '''None''' ||~~ <~~BOM~~>

+

−

|-

+

</pre>

−

~~| line delimiter || \n || \r || '''\r\n'''~~

+

An example file can be downloaded here [[File:Example.zip|thumb]] (sorry, it is zipped).

−

|-

−

~~| line delimiter after last line || no || '''yes'''~~

−

|-

−

~~| field delimiter || '''~~<~~tab~~>~~''' || , || ;~~

−

|-

−

~~| field delimiter after last field || '''no''' || yes~~

−

|-

−

~~| quoting character || '''None''' || " || '~~

−

|-

−

~~| escape qc by doubling || no || yes~~

−

|-

−

~~| escape character || '''none''' || \~~

−

|-

−

~~| first line || '''contains header''' || contains data~~

−

|-

−

~~| last field in line || '''must not~~ be ~~empty''' || may be empty~~

−

|-

−

~~| whitespace following delimiter || '''part of field''' || not part of field~~

−

|-

−

~~| decimal separator || '''~~.~~''' |~~| ,

−

|-

−

~~| thousands separator || '''none''' || . || ␣ || U+2009~~

−

|}

−

~~Note that tab characters and newlines cannot be present in field content~~.

== Parsing ==

−

~~Here is an example [[File:Example.zip|thumb]] file. Sorry, it is zipped.~~ Importing such files can be done in many languages:

+

Importing such files can be done in many languages:

=== Python Standard Library===

Line 90: Line 66:

d <- read.csv("example.tsv", head=TRUE, sep = "\t")

</nowiki>

+

== Alternatives ==

+

Data can be saved in a lot of file formats. If there is no reason to do otherwise, we prefer delimited files with the options shown in bold. Alternative options are also shown.

+

{| class="wikitable"

+

|-

+

| File Extension || '''tsv''', csv, '''dat''', txt

+

|-

+

| File Encoding || '''ASCII''', '''UTF-8''', UTF-16BE, UTF-16LE, UCS-4/UTF-32

+

|-

+

| [[wikipedia:Magic_number_(programming)|Magic Number]] || '''None''', [[wikipedia:Byte_order_mark|BOM]]

+

|-

+

| Line Delimiter || \n, \r, '''\r\n'''

+

|-

+

| Line Delimiter after Last Line || '''Yes''', No

+

|-

+

| Field Delimiter || '''<tab>''', <comma> , <semicolon>

+

|-

+

| Field Delimiter after Last Field || Yes, '''No'''

+

|-

+

| Quoting Character || '''None''', ', "

+

|-

+

| Escape QC by doubling || Yes, No

+

|-

+

| Escape Character || '''None''', \

+

|-

+

| First Line Contains: || '''Header''', Data

+

|-

+

| Empty Last Field in Line || Allowed, '''Not Allowed'''

+

|-

+

| Whitespace Following Delimiter || '''Part of Field''', Excluded

+

|-

+

| Decimal Separator || '''<dot>''', <comma>

+

|-

+

| Thousands Separator || '''None''', <dot>, <space>, U+2009

+

|}

+

Note that tab characters and newlines cannot be present in field content.

E.vandenberge

Bureaucrats, Administrators

1,344

edits

Changes

Data Files (view source)

Revision as of 17:36, 20 February 2017

Navigation menu

Search