Changes

Data Files (view source)

Revision as of 17:32, 13 February 2017

1,313 bytes added , 17:32, 13 February 2017

no edit summary

Line 1: Line 1: +

== File Format ==

Data can be saved in a lot of file formats. If there is no reason to do otherwise, we prefer delimited files with the options shown in bold. Alternative options are also shown.

{| class="wikitable"

Line 16: Line 17:

| field delimiter after last field || '''no''' || yes

|-

−

| ~~Quoting~~ character || '''None''' || " || '

+

| quoting character || '''None''' || " || '

|-

−

| ~~Escape character~~ || ~~'''None'''~~ || \

+

| escape qc by doubling || no || yes

|-

−

| ~~First line~~ || '''~~Contains header~~''' || ~~Contains data~~

+

| escape character || '''none''' || \

|-

−

| ~~Last field in~~ line || '''~~Must not be empty~~''' || ~~May be empty~~

+

| first line || '''contains header''' || contains data

|-

−

| ~~Whitespace following delimiter~~ || '''~~Part of field~~''' || ~~Not part of field~~

+

| last field in line || '''must not be empty''' || may be empty

|-

+

| whitespace following delimiter || '''part of field''' || not part of field

+

|-

+

| decimal separator || '''.''' || ,

+

|-

+

| thousands separator || '''none''' || . || ␣ || U+2009

|}

+

Note that tab characters and newlines cannot be present in field content.

+

== Parsing ==

+

Importing these files can be done in many languages:

+

=== Python Standard Library===

+

+

import csv

+

with open('example.tsv', 'rb') as csvfile:

+

reader = csv.reader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)

+

for row in reader:

+

print(', '.join(row))

+

</nowiki>

+

or with header extraction

+

+

import csv

+

with open('example.tsv', 'rb') as csvfile:

+

reader = csv.DictReader(csvfile, delimiter='\t', quoting=csv.QUOTE_NONE)

+

print(', '.join(reader.fieldnames)) # print header

+

for row in reader:

+

print(', '.join([row[key] for key in reader.fieldnames]))

+

</nowiki>

+

Note that when using Python 2 the field content will remain UTF-8 encoded (type=str). In Python3 strings will unicode (type=string).

+

=== Python Pandas ===

+

Pandas can interpret column type. You will have to store it separately or hardcode it.

+

+

import pandas as pd

+

d = pd.read_csv('example.tsv', delimiter='\t', skip_blank_lines=False, quoting=csv.QUOTE_NONE)

+

</nowiki>

+

=== GNU R ===

+

+

d <- read.csv("example.tsv", head=TRUE, sep = "\t")

+

</nowiki>

Wilbert.vanham

Administrators

414

edits

Changes

Data Files (view source)

Revision as of 17:32, 13 February 2017

Navigation menu

Search