Difference between revisions of "Participant Identification Code"

From TSG Doc
Jump to navigation Jump to search
m
(check procedure)
 
(One intermediate revision by the same user not shown)
Line 17: Line 17:
  
 
''R.E.Searcher@socsci.ru.nl''
 
''R.E.Searcher@socsci.ru.nl''
 +
 +
If a participant later comes to you with a PPN and a PIC, you go and lookup your ES. Based on your ES and the PPN given to you by the participant, you recalculate the PIC. If this is identical to the PIC supplied by the participant, then the participant is indeed the person identified by the given PPN in your data file.
 
=== Example ===
 
=== Example ===
[[File:PIC libre office.png|frame|Calculating Participant Identification Code in LibreOffice]]
+
[[File:pic.png|frame|Calculating Participant Identification Code in LibreOffice]]
 
==== Online ====
 
==== Online ====
 
Try this [https://www.socsci.ru.nl/wilberth/computer/pic.html calculator] to make these checksums yourself.
 
Try this [https://www.socsci.ru.nl/wilberth/computer/pic.html calculator] to make these checksums yourself.

Latest revision as of 13:05, 29 April 2020

In order to observe the GDPR, experimental data is preferably saved anonymously. This poses a problem if a participant to the experiment wishes to identify themselves. This could occur for instance because the participant wishes to revoke the data submitted to the experiment. If you want to make this possible you can follow the approved procedure below.

Procedure

  • Make up an Experiment Secret (ES), this is some random string that you store with the experiment. Keep it secret from your participants.
  • Store an anonymous Participant Number (PPN) with the data that is related to a certain participant. This participant number can for instance be the token that you use in Limesurvey. It must be unique to the participant and it must not contain information that you cannot give to the participant. The participant number can contain letters. It is ok if the PPN is just the participant serial number (1, 2, 3, ...)
  • Calculate a Participant Identification Code (PIC) checksum for each participant. If you give the participant your contact information, the name of the experiment, the PPN and their PIC, they will be able to prove that they participated in your experiment and you can identify the data that they supplied. Especially if your PPN has a fixed length you can give them a concatenation of PPN and PIC. If for instance the PPN is 1234 and the PIC is A3D4 then you simple send them the following text:

Dear Participant,

Thank you for participating in my experiment 'The Role of Squares and Circles in modern Society'. Your data was stored anonymously. If you ever want to contact me about the data you supplied, please use the code 1234A3D4. I myself have no way of linking you to your data without this code.

Kind regards,

dr. Rudolph Everest Searcher

R.E.Searcher@socsci.ru.nl

If a participant later comes to you with a PPN and a PIC, you go and lookup your ES. Based on your ES and the PPN given to you by the participant, you recalculate the PIC. If this is identical to the PIC supplied by the participant, then the participant is indeed the person identified by the given PPN in your data file.

Example

Calculating Participant Identification Code in LibreOffice

Online

Try this calculator to make these checksums yourself.

OpenOffice / LibreOffice

If you install the cryptographic hash extension to OpenOffice/ LibreOffice you can use this document to calculate PIC. You may have to enable macros for it to work.

Google Sheets

You can also calculate the PICs with this Google Sheet. Use File -> Make a copy if you want to alter the document.

Python 3

In Python 3 you can easily calculate pic. Note that secret and the ppn must be strings.

 #!/usr/bin/env python3
 import hashlib
 secret = "mySecret123!"
 ppn = "0"
 pic = hashlib.sha256(secret.encode('utf-8')+ppn.encode('utf-8')).hexdigest()[0:4].upper()
 print(pic)

Rationale

If you simply give the anonymous PPN to your participants, they can also identify themselves, but the PPN will have to be sufficiently long and random to make sure the participant cannot guess someone else's PPN as well. If you generate PPN the same way for every experiment, then anyone who knows how you do it for one experiment can do it for another and pretend to be a participant.

Technical Details

The PIC is the capitalized hexadecimal four character representation of the first two bytes of the SHA256 hash of the concatenation of the UTF-8 representation of the Experiment Secret and the Participant Number.