Changes

Jump to navigation Jump to search
204 bytes added ,  12:59, 20 February 2017
no edit summary
Line 22: Line 22:  
* For numbers the decimal separator is a dot, not a comma. There is no thousands separator.
 
* For numbers the decimal separator is a dot, not a comma. There is no thousands separator.
   −
== File Format ==
+
== Example ==
 +
 
 +
This is what the file format looks like:
 +
<pre>
 +
User ID&#9;Hair color&#9;Response time&#9;
 +
1&#9;brown&#9;1.4&#9;
 +
2&#9;blond&#9;1230.434&#9;
 +
3&#9;brown&#9;0.399&#9;
 +
 
 +
</pre>
 +
An example file can be downloaded here [[File:Example.zip|thumb]] (sorry, it is zipped).
 +
 
 +
== Parsing ==
 +
Importing such files can be done in many languages:
 +
=== Python Standard Library===
 +
<nowiki>
 +
import csv
 +
with open('example.tsv', 'rb') as csvfile:
 +
    reader = csv.reader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)
 +
    for row in reader:
 +
        print(', '.join(row))
 +
</nowiki>
 +
or with header extraction
 +
<nowiki>
 +
import csv
 +
with open('example.tsv', 'rb') as csvfile:
 +
    reader = csv.DictReader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)
 +
    print(', '.join(reader.fieldnames)) # print header
 +
    for row in reader:
 +
        print(', '.join([row[key] for key in reader.fieldnames]))
 +
</nowiki>
 +
Note that when using Python 2 the field content will remain UTF-8 encoded (type=str). In Python3 strings will unicode (type=string).
 +
 
 +
=== Python Pandas ===
 +
Pandas can interpret column type. You will have to store it separately or hardcode it.
 +
<nowiki>
 +
import pandas as pd
 +
 
 +
d = pd.read_csv('example.tsv', delimiter='\t', skip_blank_lines=False, quoting=csv.QUOTE_NONE)
 +
</nowiki>
 +
=== GNU R ===
 +
<nowiki>
 +
d <- read.csv("example.tsv", head=TRUE, sep = "\t")
 +
</nowiki>
 +
 
 +
== Alternatives ==
 
Data can be saved in a lot of file formats. If there is no reason to do otherwise, we prefer delimited files with the options shown in bold. Alternative options are also shown.
 
Data can be saved in a lot of file formats. If there is no reason to do otherwise, we prefer delimited files with the options shown in bold. Alternative options are also shown.
 
{| class="wikitable"
 
{| class="wikitable"
Line 57: Line 102:  
|}
 
|}
 
Note that tab characters and newlines cannot be present in field content.
 
Note that tab characters and newlines cannot be present in field content.
  −
== Parsing ==
  −
Here is an example [[File:Example.zip|thumb]] file. Sorry, it is zipped. Importing such files can be done in many languages:
  −
=== Python Standard Library===
  −
<nowiki>
  −
import csv
  −
with open('example.tsv', 'rb') as csvfile:
  −
    reader = csv.reader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)
  −
    for row in reader:
  −
        print(', '.join(row))
  −
</nowiki>
  −
or with header extraction
  −
<nowiki>
  −
import csv
  −
with open('example.tsv', 'rb') as csvfile:
  −
    reader = csv.DictReader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)
  −
    print(', '.join(reader.fieldnames)) # print header
  −
    for row in reader:
  −
        print(', '.join([row[key] for key in reader.fieldnames]))
  −
</nowiki>
  −
Note that when using Python 2 the field content will remain UTF-8 encoded (type=str). In Python3 strings will unicode (type=string).
  −
  −
=== Python Pandas ===
  −
Pandas can interpret column type. You will have to store it separately or hardcode it.
  −
<nowiki>
  −
import pandas as pd
  −
  −
d = pd.read_csv('example.tsv', delimiter='\t', skip_blank_lines=False, quoting=csv.QUOTE_NONE)
  −
</nowiki>
  −
=== GNU R ===
  −
<nowiki>
  −
d <- read.csv("example.tsv", head=TRUE, sep = "\t")
  −
</nowiki>
 
14

edits

Navigation menu