Blinding Identity Taxonomy
Reducing the risk of identifying governing entities in blinded datasets
With the enforcement of the EU’s General Data Protection Regulation (GDPR) on May 25th, 2018, swiftly followed by the California Consumer Privacy Act (CCPA) being signed into law almost exactly a month later on June 28th, civil society was witnessing the dawn of a new era of data protection and privacy directives that continue to shape today’s digital landscape. The broad brushstroke of these directives has seen a shift in data governance from corporate entities to citizens, improving an individual’s right to self-determination regarding personal data. Since mid-2018, the exponential rise of decentralisation projects and initiatives in the areas of data governance, distributed ledger technologies (DLT) and contextual trust continue to drive the legal and technical innovation pieces necessary to empower a dynamic data economy (DDE).
An equal measure of human accountability and cryptographic assurance are required to establish trust in digital systems. Although regulations such as GDPR and CCPA provide a strong legal framework for human and corporate accountability regarding the handling of personal data, the word “personal” becomes ambiguous in the context of technical implementation and the level of cryptographic assurance required to protect the identity of a governing entity.
Article 4 of the GDPR defines “personal data” as follows:
‘Personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.
The above definition is not granular enough to enforce the flagging of, for example, a transaction date or an unstructured data input which could subsequently be used in a re-identification attack against a dataset in order to extract personally identifiable information (PII) about a data principal for unconsented activity. Cryptographic assurance can only be established if attributes which require cryptographic encoding to reduce the risk of identifying a governing entity are marked at the time of schema creation and, for that, software developers require a community-defined list of elements which can act as a common standard for determining which attributes to flag.
The Blinding Identity Taxonomy (BIT) is a defensive tool created for the purpose of reducing the risk of identifying data subjects within blinded datasets. BIT contains a list of elements to be referred to by schema issuers and data controllers for flagging attributes which may contain identifying information about governing entities. Once attributes have been flagged, any marked data can be removed or encrypted during the data lifecycle. Once marked data has been removed or encrypted, the dataset is ‘blinded’. A dataset may be said to be successfully ‘blinded’ when an adversary with access to the dataset cannot identify a significant number of data principals contained in the dataset. BIT is a defensive tool to be used against re-identification attacks.
The Human Colossus Foundation is proud to sight Paul Knowles and Jan Lindquist, two of the original instigators of the Foundation, as the primary spearheads behind the BIT. Paul and Jan donated the intellectual property rights of the BIT to Kantara Initiative on January 22nd, 2020. Although the BIT is not currently referenced in the GDPR, CCPA or similar national regulations as an official appendix, its publication as an official Kantara Initiative report on June 16th, 2020, gives it the status of the most comprehensive taxonomy of its kind in open circulation.
The BIT is one of those critical pieces of behind-the-scenes plumbing that is expected to fundamentally improve online protection of personal data as deployment rates in both traditional and DLT domains rise. Data-centric innovation points to a future society where new values and services will evolve to make people’s lives more conformable and sustainable on an ongoing basis. In line with technological advances in Artificial Intelligence (AI) and Internet of Things (IoT) solutions, the list of BIT elements is expected to continually mature alongside the exponential rise of captured data points across digital systems. Future versions of the BIT will be published by Kantara Initiative in a concerted effort to combat the misuse of personal data. We hope that the BIT will prove to be a useful and practical guide for implementers as a de-identification technique and resource to provide stakeholder assurances about their datasets.