Changes

Data Files (view source)

Revision as of 12:59, 20 February 2017

204 bytes added , 12:59, 20 February 2017

no edit summary

Line 22: Line 22:

* For numbers the decimal separator is a dot, not a comma. There is no thousands separator.

−

== File ~~Format~~ ==

+

== Example ==

+

This is what the file format looks like:

+

<pre>

+

User ID	Hair color	Response time	

+

1	brown	1.4	

+

2	blond	1230.434	

+

3	brown	0.399	

+

</pre>

+

An example file can be downloaded here [[File:Example.zip|thumb]] (sorry, it is zipped).

+

== Parsing ==

+

Importing such files can be done in many languages:

+

=== Python Standard Library===

+

+

import csv

+

with open('example.tsv', 'rb') as csvfile:

+

reader = csv.reader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)

+

for row in reader:

+

print(', '.join(row))

+

</nowiki>

+

or with header extraction

+

+

import csv

+

with open('example.tsv', 'rb') as csvfile:

+

reader = csv.DictReader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)

+

print(', '.join(reader.fieldnames)) # print header

+

for row in reader:

+

print(', '.join([row[key] for key in reader.fieldnames]))

+

</nowiki>

+

Note that when using Python 2 the field content will remain UTF-8 encoded (type=str). In Python3 strings will unicode (type=string).

+

=== Python Pandas ===

+

Pandas can interpret column type. You will have to store it separately or hardcode it.

+

+

import pandas as pd

+

d = pd.read_csv('example.tsv', delimiter='\t', skip_blank_lines=False, quoting=csv.QUOTE_NONE)

+

</nowiki>

+

=== GNU R ===

+

+

d <- read.csv("example.tsv", head=TRUE, sep = "\t")

+

</nowiki>

+

== Alternatives ==

Data can be saved in a lot of file formats. If there is no reason to do otherwise, we prefer delimited files with the options shown in bold. Alternative options are also shown.

{| class="wikitable"

Line 57: Line 102:

|}

Note that tab characters and newlines cannot be present in field content.

−

~~== Parsing ==~~

−

~~Here is an example [[File:Example.zip|thumb]] file. Sorry, it is zipped. Importing such files can be done in many languages:~~

−

~~=== Python Standard Library===~~

−

~~<nowiki>~~

−

~~import csv~~

−

~~with open('example.tsv', 'rb') as csvfile:~~

−

~~reader = csv.reader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)~~

−

~~for row in reader:~~

−

~~print(', '.join(row))~~

−

~~</nowiki>~~

−

~~or with header extraction~~

−

~~<nowiki>~~

−

~~import csv~~

−

~~with open('example.tsv', 'rb') as csvfile:~~

−

~~reader = csv.DictReader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)~~

−

~~print(', '.join(reader.fieldnames)) # print header~~

−

~~for row in reader:~~

−

~~print(', '.join([row[key] for key in reader.fieldnames]))~~

−

~~</nowiki>~~

−

~~Note that when using Python 2 the field content will remain UTF-8 encoded (type=str). In Python3 strings will unicode (type=string).~~

−

~~=== Python Pandas ===~~

−

~~Pandas can interpret column type. You will have to store it separately or hardcode it.~~

−

~~<nowiki>~~

−

~~import pandas as pd~~

−

~~d = pd.read_csv('example.tsv', delimiter='\t', skip_blank_lines=False, quoting=csv.QUOTE_NONE)~~

−

~~</nowiki>~~

−

~~=== GNU R ===~~

−

~~<nowiki>~~

−

~~d <- read.csv("example.tsv", head=TRUE, sep = "\t")~~

−

~~</nowiki>~~

A.datadien

14

edits

Changes

Data Files (view source)

Revision as of 12:59, 20 February 2017

Navigation menu

Search