Difference between revisions of "Participant Identification Code"

From TSG Doc
Jump to navigation Jump to search
(Created page with "In order to observe the GDPR, experimental data is preferable saved anonymously. This poses a problem if the a participant to the experiment wishes to identify themselves. Thi...")
 
(check procedure)
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
In order to observe the GDPR, experimental data is preferable saved anonymously. This poses a problem if the a participant to the experiment wishes to identify themselves. This could occur for instance because the participant wishes to revoke the data submitted to the experiment. If you want to make this possible you can follow the approved procedure below.
+
In order to observe the GDPR, experimental data is preferably saved anonymously. This poses a problem if a participant to the experiment wishes to identify themselves. This could occur for instance because the participant wishes to revoke the data submitted to the experiment. If you want to make this possible you can follow the approved procedure below.
  
 
== Procedure ==
 
== Procedure ==
* Make up an Experiment Secret (ES), this is some random string that you store with the experiment. Keep it secret for your participants.
+
* Make up an Experiment Secret (ES), this is some random string that you store with the experiment. Keep it secret from your participants.
* Store an anonymous Participant Number (PPN) with the data that is related to a certain participant. This participant number can for instance be the token that you use in Limesurvey. It must be unique to the participant and it must not contain information that you cannot give to the participant. It is ok if the PPN is just the participant serial number (1, 2, 3, ...)
+
* Store an anonymous Participant Number (PPN) with the data that is related to a certain participant. This participant number can for instance be the token that you use in Limesurvey. It must be unique to the participant and it must not contain information that you cannot give to the participant. The participant number can contain letters. It is ok if the PPN is just the participant serial number (1, 2, 3, ...)
* Calculate a Participant Identification Code for each participant. If you give the participant your contact information, the name of the experiment, the PPN and their PIC, they will be able to prove that they participated in your experiment and you can identify the data that they supplied. Especially if your PPN has a fixed length you can give them a concatenation of PPN and PIC. If for instance the PPN is 1234 and the PIC is A3D444 then you simple send them the following text:
+
* Calculate a Participant Identification Code (PIC) checksum for each participant. If you give the participant your contact information, the name of the experiment, the PPN and their PIC, they will be able to prove that they participated in your experiment and you can identify the data that they supplied. Especially if your PPN has a fixed length you can give them a concatenation of PPN and PIC. If for instance the PPN is 1234 and the PIC is A3D4 then you simple send them the following text:
  
 
''Dear Participant,''
 
''Dear Participant,''
Line 10: Line 10:
 
''Thank you for participating in my experiment 'The Role of Squares and Circles in modern Society'.
 
''Thank you for participating in my experiment 'The Role of Squares and Circles in modern Society'.
 
Your data was stored anonymously. If you ever want to contact me about the data you supplied,  
 
Your data was stored anonymously. If you ever want to contact me about the data you supplied,  
please use the code 1234A3D444. I myself have no way of linking you to your data without this code.''
+
please use the code 1234A3D4. I myself have no way of linking you to your data without this code.''
  
 
''Kind regards,''
 
''Kind regards,''
Line 17: Line 17:
  
 
''R.E.Searcher@socsci.ru.nl''
 
''R.E.Searcher@socsci.ru.nl''
 +
 +
If a participant later comes to you with a PPN and a PIC, you go and lookup your ES. Based on your ES and the PPN given to you by the participant, you recalculate the PIC. If this is identical to the PIC supplied by the participant, then the participant is indeed the person identified by the given PPN in your data file.
 
=== Example ===
 
=== Example ===
 +
[[File:pic.png|frame|Calculating Participant Identification Code in LibreOffice]]
 +
==== Online ====
 
Try this [https://www.socsci.ru.nl/wilberth/computer/pic.html calculator] to make these checksums yourself.
 
Try this [https://www.socsci.ru.nl/wilberth/computer/pic.html calculator] to make these checksums yourself.
 +
 +
==== OpenOffice / LibreOffice ====
 +
If you install the [https://extensions.openoffice.org/en/project/cryptographic-hash-functions-uno-component-openofficeorg cryptographic hash] extension to OpenOffice/ LibreOffice you can use [https://www.socsci.ru.nl/wilberth/nocms/computer/pic.ods this document] to calculate PIC. You may have to enable macros for it to work.
 +
 +
==== Google Sheets ====
 +
You can also calculate the PICs with this [https://docs.google.com/spreadsheets/d/18jmxpui2rr3dShySQiyn06kX36cCHHGRgan2MGGXjlo/edit?usp=sharing Google Sheet]. Use ''File'' -> ''Make a copy'' if you want to alter the document.
 +
 +
==== Python 3 ====
 +
In Python 3 you can easily calculate pic. Note that ''secret'' and the ''ppn'' must be strings.
 +
 +
  #!/usr/bin/env python3
 +
  import hashlib
 +
  secret = "mySecret123!"
 +
  ppn = "0"
 +
  pic = hashlib.sha256(secret.encode('utf-8')+ppn.encode('utf-8')).hexdigest()[0:4].upper()
 +
  print(pic)
  
 
== Rationale ==
 
== Rationale ==
If you simply give the anonymous PPN to your participants, they can also identify themselves, but the PPN will have to be sufficiently long and random to make sure the participant cannot guess someone else's PPN as well. If you generate PPN the same way for every experiment, then anyone who knows how you do it for one experiment can do it for another and pretent to be a participant.
+
If you simply give the anonymous PPN to your participants, they can also identify themselves, but the PPN will have to be sufficiently long and random to make sure the participant cannot guess someone else's PPN as well. If you generate PPN the same way for every experiment, then anyone who knows how you do it for one experiment can do it for another and pretend to be a participant.
 
 
  
 
== Technical Details ==
 
== Technical Details ==
The PIC is the capitalized hexadecimal four character representation of the first two bytes of the SHA256 hash of the concatenation of the UTF-8 representation of the Experiment Secret and the Participant Identifier.
+
The PIC is the capitalized hexadecimal four character representation of the first two bytes of the SHA256 hash of the concatenation of the UTF-8 representation of the Experiment Secret and the Participant Number.

Latest revision as of 12:05, 29 April 2020

In order to observe the GDPR, experimental data is preferably saved anonymously. This poses a problem if a participant to the experiment wishes to identify themselves. This could occur for instance because the participant wishes to revoke the data submitted to the experiment. If you want to make this possible you can follow the approved procedure below.

Procedure

  • Make up an Experiment Secret (ES), this is some random string that you store with the experiment. Keep it secret from your participants.
  • Store an anonymous Participant Number (PPN) with the data that is related to a certain participant. This participant number can for instance be the token that you use in Limesurvey. It must be unique to the participant and it must not contain information that you cannot give to the participant. The participant number can contain letters. It is ok if the PPN is just the participant serial number (1, 2, 3, ...)
  • Calculate a Participant Identification Code (PIC) checksum for each participant. If you give the participant your contact information, the name of the experiment, the PPN and their PIC, they will be able to prove that they participated in your experiment and you can identify the data that they supplied. Especially if your PPN has a fixed length you can give them a concatenation of PPN and PIC. If for instance the PPN is 1234 and the PIC is A3D4 then you simple send them the following text:

Dear Participant,

Thank you for participating in my experiment 'The Role of Squares and Circles in modern Society'. Your data was stored anonymously. If you ever want to contact me about the data you supplied, please use the code 1234A3D4. I myself have no way of linking you to your data without this code.

Kind regards,

dr. Rudolph Everest Searcher

R.E.Searcher@socsci.ru.nl

If a participant later comes to you with a PPN and a PIC, you go and lookup your ES. Based on your ES and the PPN given to you by the participant, you recalculate the PIC. If this is identical to the PIC supplied by the participant, then the participant is indeed the person identified by the given PPN in your data file.

Example

Calculating Participant Identification Code in LibreOffice

Online

Try this calculator to make these checksums yourself.

OpenOffice / LibreOffice

If you install the cryptographic hash extension to OpenOffice/ LibreOffice you can use this document to calculate PIC. You may have to enable macros for it to work.

Google Sheets

You can also calculate the PICs with this Google Sheet. Use File -> Make a copy if you want to alter the document.

Python 3

In Python 3 you can easily calculate pic. Note that secret and the ppn must be strings.

 #!/usr/bin/env python3
 import hashlib
 secret = "mySecret123!"
 ppn = "0"
 pic = hashlib.sha256(secret.encode('utf-8')+ppn.encode('utf-8')).hexdigest()[0:4].upper()
 print(pic)

Rationale

If you simply give the anonymous PPN to your participants, they can also identify themselves, but the PPN will have to be sufficiently long and random to make sure the participant cannot guess someone else's PPN as well. If you generate PPN the same way for every experiment, then anyone who knows how you do it for one experiment can do it for another and pretend to be a participant.

Technical Details

The PIC is the capitalized hexadecimal four character representation of the first two bytes of the SHA256 hash of the concatenation of the UTF-8 representation of the Experiment Secret and the Participant Number.