Line 22: |
Line 22: |
| * For numbers the decimal separator is a dot, not a comma. There is no thousands separator. | | * For numbers the decimal separator is a dot, not a comma. There is no thousands separator. |
| | | |
− | == File Format == | + | == Example == |
| + | |
| + | This is what the file format looks like: |
| + | <pre> |
| + | User ID	Hair color	Response time	 |
| + | 1	brown	1.4	 |
| + | 2	blond	1230.434	 |
| + | 3	brown	0.399	 |
| + | |
| + | </pre> |
| + | An example file can be downloaded here [[File:Example.zip|thumb]] (sorry, it is zipped). |
| + | |
| + | == Parsing == |
| + | Importing such files can be done in many languages: |
| + | === Python Standard Library=== |
| + | <nowiki> |
| + | import csv |
| + | with open('example.tsv', 'rb') as csvfile: |
| + | reader = csv.reader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE) |
| + | for row in reader: |
| + | print(', '.join(row)) |
| + | </nowiki> |
| + | or with header extraction |
| + | <nowiki> |
| + | import csv |
| + | with open('example.tsv', 'rb') as csvfile: |
| + | reader = csv.DictReader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE) |
| + | print(', '.join(reader.fieldnames)) # print header |
| + | for row in reader: |
| + | print(', '.join([row[key] for key in reader.fieldnames])) |
| + | </nowiki> |
| + | Note that when using Python 2 the field content will remain UTF-8 encoded (type=str). In Python3 strings will unicode (type=string). |
| + | |
| + | === Python Pandas === |
| + | Pandas can interpret column type. You will have to store it separately or hardcode it. |
| + | <nowiki> |
| + | import pandas as pd |
| + | |
| + | d = pd.read_csv('example.tsv', delimiter='\t', skip_blank_lines=False, quoting=csv.QUOTE_NONE) |
| + | </nowiki> |
| + | === GNU R === |
| + | <nowiki> |
| + | d <- read.csv("example.tsv", head=TRUE, sep = "\t") |
| + | </nowiki> |
| + | |
| + | == Alternatives == |
| Data can be saved in a lot of file formats. If there is no reason to do otherwise, we prefer delimited files with the options shown in bold. Alternative options are also shown. | | Data can be saved in a lot of file formats. If there is no reason to do otherwise, we prefer delimited files with the options shown in bold. Alternative options are also shown. |
| {| class="wikitable" | | {| class="wikitable" |
Line 57: |
Line 102: |
| |} | | |} |
| Note that tab characters and newlines cannot be present in field content. | | Note that tab characters and newlines cannot be present in field content. |
− |
| |
− | == Parsing ==
| |
− | Here is an example [[File:Example.zip|thumb]] file. Sorry, it is zipped. Importing such files can be done in many languages:
| |
− | === Python Standard Library===
| |
− | <nowiki>
| |
− | import csv
| |
− | with open('example.tsv', 'rb') as csvfile:
| |
− | reader = csv.reader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)
| |
− | for row in reader:
| |
− | print(', '.join(row))
| |
− | </nowiki>
| |
− | or with header extraction
| |
− | <nowiki>
| |
− | import csv
| |
− | with open('example.tsv', 'rb') as csvfile:
| |
− | reader = csv.DictReader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)
| |
− | print(', '.join(reader.fieldnames)) # print header
| |
− | for row in reader:
| |
− | print(', '.join([row[key] for key in reader.fieldnames]))
| |
− | </nowiki>
| |
− | Note that when using Python 2 the field content will remain UTF-8 encoded (type=str). In Python3 strings will unicode (type=string).
| |
− |
| |
− | === Python Pandas ===
| |
− | Pandas can interpret column type. You will have to store it separately or hardcode it.
| |
− | <nowiki>
| |
− | import pandas as pd
| |
− |
| |
− | d = pd.read_csv('example.tsv', delimiter='\t', skip_blank_lines=False, quoting=csv.QUOTE_NONE)
| |
− | </nowiki>
| |
− | === GNU R ===
| |
− | <nowiki>
| |
− | d <- read.csv("example.tsv", head=TRUE, sep = "\t")
| |
− | </nowiki>
| |