What is data pseudonymization and how does it relate to the GDPR?

Data Pseudonymization is one method of data protection to ensure compliance with the General Data Protection Regulation (GDPR). It protects the rights of individual data subjects and gives organisations some flexibility to balance these privacy rights against legitimate business goals.

The reasoning behind the GDPR is to protect the inherent, explicit and implied rights of data subjects within the constitution of Europe, in that its individuals have a fundamental right to privacy. Data Pseudonymization makes it hard for data to be identified due to the process of replacing identifiable characteristics with artificial identifiers. The purpose is to render the original data unidentifiable.

Why pseudonymize data?

This data protection approach is helpful if an organisation is likely to have situations as follows:

  • Where third party access is often required to have access to IT equipment for repairs for example, or if employees need to take equipment away from their place of work and there is the related threat of loss or theft of data, or “shoulder surfing” in public spaces.
  • Organisations will adopt pseudonymization associated with a management strategy to provide protection for their users or as an internal data policy decreasing the risk of a data breach when sharing data with third-party data processors or external data controllers.
  • Where personnel regularly require access to personal data.
  • Pseudonymization can also assist in the practice of data housekeeping.

Many high profile organisations already regularly take advantage of this practice, such as Google, Apple, and Uber, to enable them to analyze data without the worry of repercussions from regulators. In this respect, pseudonymization is a welcome approach to ensure enhanced data protection with respect of security breaches and to provide a higher level of protection of privacy for employees and users.

Is pseudonymization risk-free?

Pseudonymization used alone still carries the potential risk of the breach because the original, identifying data is still present in some form and, as such, may be at risk of being identified by linkage or inference, allowing for identification of the data subject. Here, it is vital to differentiate between pseudonymization and anonymization. Pseudonymization differs from the process of anonymization in that pseudonymization provides enhanced but still limited protection and relies upon other external methods to re-identify the data. Pseudonymization obscures the data so that it is indecipherable, but this obfuscated data can be re-identified when linked with additional data, a bit like the key that unlocks the door.

Pseudonymization data must be kept entirely separate from the linking data that would attribute to an individual’s identity. Robust organisational measures need to be put in place to prevent any third party linking the data. Conversely, anonymization strips away all personal information, deleting all original data, and is a permanent method of protecting privacy. Organisations use this method usually to collect relevant data for specific purposes, such as testing new systems, analyzing patterns in surveys or any other instance where the data is not required to be tied to a particular individual. In this respect, the information is collected as needed then stripped back and the original data deleted to provide a snapshot of “what” is happening rather than “who” the data refers to.

Pseudonymization is subject to tests to reduce breaches. For pseudonymized data to pass the reasonableness test, it would need the scrutiny of the “motivated intruder” analysis, with a close examination of the likelihood of the stolen data becoming re-identified and, as already mentioned, robust methods would need to be in place to prevent the likelihood of pseudonymized data becoming re-identified.

However, even with the robust re-identifying process, pseudonymization still falls within the remit of personal data and will not be subject to the get out of GDPR provision applicable to that of anonymized data.

How can pseudonymization be implemented within an organisation?

There are some ways to pseudonymize personal data. It will depend upon the organisation and the data protection impact assessment (DPIA) in place at each organisation. An organisation should look at the following guidelines:

  1. Privacy by design is an effective and popular strategy. It allows for improved privacy protection where pseudonymization implementation takes place at the outset of any new project.
  2. Scrambling of data, using software techniques to mix up or obfuscate letters to make the data unrecognizable.
  3. Encryption used to render original data unintelligible with the process irreversible without access to the correct decryption key. It is a requirement of the GDPR for additional information (such as decryption keys) to be stored separately from the pseudonymized data.
  4. Masking is a technique allowing for certain aspects of data to be hidden by the use of random characters or other data. Banking organisations are very familiar with this type of pseudonymization, with the masking of credit card numbers for example stored as “XXXX XXXX XXXX 0001”.
  5. Tokenisation is considered best practice and used by many payment providers such as credit card companies. It protects data by replacing sensitive data with tokens while keeping it in such a format that it can still be processed. For many organisations, it is the preferred method of pseudonymization. It may be used by legacy systems in that, by maintaining data that is specifically required for processing and analyzing but keeping sensitive data hidden.
  6. Hashing transforms data into an indecipherable piece of data called a hash value. The hash value becomes the original data’s summary, and it is almost impossible to decode the original data with no knowledge of the unique formula used to create the hash value.
  7. Encryption of data prevents third parties from having access to the data so only authorized users have access to the encryption keys.
  8. Public key cryptography, allows data entered by one user to be read by a different user, but without the first user having to share their encryption key with the second user.
  9. Physical partitioning so that systems are managed independently by separate teams, running on separate hardware and resources without the ability to share.

Benefits of pseudonymization

The GDPR recommends data pseudonymization as an appropriate measure that organisations may implement as a data protection method. There are some practical benefits highlighted below of using this method of data protection and whilst these may not be as far reaching as anonymization, they still help organisations to meet their data privacy obligations:

  • Organisations can be confident in continuing with existing policies and processes that otherwise would be impossible under the strict data privacy regulations.
  • Data can be kept, albeit not intact, and the costs mitigated to that of protecting data rather than managing the entire data.
  • The method of pseudonymization allows for the reassigning of personal data with the data subject whenever required in line with the GDPR, such as the right to be forgotten. Data subject access requests is a process that is not possible with anonymisation until the original data is maintained (a paradox of the concept of anonymisation) thus, data pseudonymization is a data protection method that works better for dealing effectively with data subject access requests (DSAR).