Overlays Capture Architecture
Providing a standardised global solution for deterministic semantics
The post-millennial generation has witnessed an explosion of captured data points which has sparked profound possibilities in both Artificial Intelligence (AI) and Internet of Things (IoT) solutions. This has spawned the collective realisation that society’s current technological infrastructure is simply not equipped to fully protect personally identifiable information (PII) or to entice corporations to break down internal data silos, streamline data harmonisation processes and ultimately resolve worldwide data duplication and storage resource issues.
The FAIR Data Principles (Wilkinson et al., 2016), a set of guiding principles to make data findable, accessible, interoperable and reusable, provide guidance for scientific data management and stewardship and are relevant to all stakeholders in the digital economy. In light of current data management deficiencies, Overlays Capture Architecture (OCA) was born in accordance with the FAIR principles and courtesy of practical knowledge and experience gained from managing data within the complex realm of clinical trials. This blog post provides an introduction to OCA, an architecture purposefully designed to act as a catalyst for data language unification and a seed component in the enablement of a new dynamic data economy (DDE).
The cognitive framework available for data management requirements across all industry sectors continues to be hampered by limitations to the foundational structure of currently deployed electronic capture solutions. In terms of schema design, OCA represents a schema as a multi-dimensional object consisting of a stable capture base and interoperable overlays. By introducing overlays as task-oriented linked data objects within the schema stack, OCA offers an optimal level of both efficiency and interoperability in alignment with FAIR principles. The architecture provides a stable infrastructure to facilitate seamless data harmonisation and interoperability processes, not only between internal departments and functions, but also between external companies working under a multi-layered governance framework as defined by, for instance, an industry consortium.
OCA: a cornerstone in a new Dynamic Data Economy
In terms of the creation of a fully-fledged DDE, we are still in the early turbulent stages of business, legal, technological and social innovation. To support a new paradigm of decentralised interactions, processes and practices such as de-identification, dissolution of corporate “walled-garden” storage, worldwide data harmonisation and single-source master data retrieval, new data-centric innovation is required. Despite these required revisions, one thing is for certain, civic society is heading into an era where siloed data ownership will be superseded by consented access to subsets of decentralised data. A new and finer understanding of how data is entered and captured in an ever-expanding and intrusive Internet continues to fuel a decentralisation movement consisting of active and passive identifiers to enable the interaction of authentic inputs and deterministic semantics beyond the confines of siloed data management solutions. The concept of self-determination regarding personal data is rapidly evolving in technological circles with the advancement of decentralised digital identity solutions.
If decentralised digital identity establishes whether or not data inputs can be trusted as having come from an authentic source under the governance of an entity, decentralised semantics ensure that the contextual meaning of inputted data remains intelligible for all interacting actors. It is within the context of these two domains that informed consent can be defined for auditing data portability in a DDE and OCA is the catalyst to enable consensual data flows.
OCA is the cornerstone for a safe and secure data sharing economy. The architecture provides an interoperable data capture solution for both centralised and decentralised networks. Blockchains and other distributed ledger technologies (DLTs) have the potential to drive uniform data processing mechanisms, verifiable proof of consent, secure data portability and decentralised digital identity. With OCA providing a standardised global solution for data capture, community-driven data standards, interoperable data capture objects and sensitive attribute flagging capability can also be realised. These core public utility technologies provide an ideal evolutionary bridge between the current data economy and a new DDE where data-centric and user-centric characteristics become synonymous whether the user is a human or an active thing.
Data-centric innovation points to a future society where new values and services will be created continuously, making people’s lives more conformable and sustainable. Developing and deploying the right data capture architecture will improve the quality of externally pooled data for future AI and IoT solutions. OCA was conceived for this purpose.
Interoperable Schema
OCA harmonises database models. It is a solution to semantic harmonisation between data models and data representation formats. Primarily devised for data object interoperability and privacy compliant data sharing, OCA is a standardised global solution for data capture that promises to significantly enhance the ability to pool data more effectively in terms of simplicity, accuracy, and allocation of resources.
A schema, a machine-readable definition of the semantics of a data structure, is typically created as a single data object. However, OCA represents a schema as a multi-dimensional object consisting of a stable capture base and linked overlays, data objects that provide colouration to the base object. Any issuer can use a pre-existing capture base and build their own suite of overlays to add extra context to transform how information is displayed to a viewer.
The degree of separation between capture bases and overlays allows multiple parties to use the same base object for similar data capture requirements thus providing a standard base from which to decentralise data. The ingredients provided by the interoperable data objects enables an ontology-driven, dynamic approach to data management.
OCA offers many advantages
Simplified data pooling. Decoupling can occur at any time as overlays are linked objects. With all colouration stored in the overlays, combining data from related sources becomes much easier. Overlays can be removed from the base objects before the data merging process begins and reapplied to ensure consistent colouration post data pooling.
Stable capture bases. Most schema updates tend to be done at the application stage. In the case of OCA, all extension, colouration, and functionality definitions are applied in the overlays. This enables issuers to edit one or more of the linked objects to create simple updates rather than having to reissue capture bases on an ongoing basis.
Flagged attributes for encryption. By referencing the Blinding Identity Taxonomy (BIT), issuers can flag attributes in the capture base that could potentially unblind the identity of a governing entity. With attributes flagged at the base object layer, all corresponding data can be treated as sensitive throughout the data lifecycle and encrypted or removed at any stage making associated governing entity identification impossible.
Data decentralisation. Capture base definitions can remain in their purest form thus providing a standard base from which to decentralise data. Once the data holder has given adequate consent, data controllers can contribute anonymous data to decentralised data sharing hubs upon which 3rd parties can trigger accurate criteria searches for matched data. This eliminates the need for data silos and encourages consented data sharing. The data holder is empowered by self-determination regarding secondary use of their personal data.
Internationalisation. As character set encoding definitions are captured in a separate linked data object, a single report definition can contain different attribute forms for different languages available to users, based on a user’s locale and other language preferences.
Method and Defined Data Objects
Rather than a schema being defined as a single data object, OCA represents a schema as a multi-dimensional object consisting of a capture base and linked overlays. Each of these data objects serve a specific function in the overall schema definition which, when amalgamated, provide a set of metadata that adequately describes a single set of data.
Each data object contains its own decentralised resource identifier (DRI), a type of passive identifier containing a cryptographic hash of content which is globally unique, resolvable with high availability, and cryptographically verifiable. In order for an overlay to be linked to a capture base, the DRI of the base object must be referenced in the metadata block of the overlay.
CAPTURE BASE
A capture base is a stable base object that defines a single set of data in its purest form thus providing a standard base from which to decentralise data.
Attribute names and types are defined in the capture base. The construct also contains a blinding block which allows the issuer to flag any attributes that could potentially unblind the identity of a governing entity. With these attributes flagged at the base layer, all corresponding data can be treated as sensitive throughout the data lifecycle and encrypted or removed at any stage thus reducing the risk of identifying governing entities in blinded datasets.
META OVERLAY
A meta overlay is a core linked object that can be used to add contextual meta-information about the schema, including schema name, description and broad classification schemes.
CHARACTER ENCODING OVERLAY
A character encoding overlay is a core linked object that can be used to define the character set encoding (e.g. UTF-8, ISO-8859-1, Windows-1251, Base58Check, etc.). This overlay type is useful when implementing solutions that facilitate data inputs across multiple languages.
FORMAT OVERLAY
A format overlay is a core linked object that can be used to add formats, field lengths, or dictionary coding schemes to schema attributes.
ENTRY OVERLAY
An entry overlay is a core linked object that can be used to add predefined field values in a specified language to schema attributes. To minimise the risk of capturing unforeseen PII, the implementation of free form text fields is best avoided. This overlay type enables structured data to be entered thereby negating the risk of capturing and subsequently storing dangerous data.
LABEL OVERLAY
A label overlay is a core linked object that can be used to add labels in a specified language to schema attributes and categories. This overlay type enables labels to be displayed in a specific language at the presentation layer for better comprehensibility to the end user.
INFORMATION OVERLAY
An information overlay is a core linked object that can be used to add instructional, informational or legal prose to assist the data entry process.
SENSITIVE OVERLAY (HOLDER ONLY)
In contrast to other overlay types which are assigned by an issuer, a sensitive overlay is an optional object assigned by the data holder that can be used to flag user-defined sensitive attributes. For example, gender is not defined as a PII element in its most common presentation of male or female as, in isolation, it cannot identify an individual. However, Thailand has 18 gender identities that are recognised in the local lexicon which, due to the narrower subgroups, may be deemed as sensitive to a Thai citizen. In this case, a sensitive overlay could be coupled to a data vault on a personal device or a data repository held by a trusted agent to flag the element.
Conclusion and Current Developments
Primarily devised for data object interoperability and privacy compliant data sharing, OCA significantly enhances the ability to pool data more effectively in terms of simplicity, accuracy, and allocation of resources. The degree of separation between capture bases and overlays allows multiple parties to use the same base objects for similar data capture requirements thus providing a standard base from which to decentralise data.
OCA enables the unification of data languages and aims to provide a standardised global solution for data capture which can facilitate decentralisation of non-sensitive data for societal benefit.
Within Trust over IP Foundation, a Linux Foundation Project, trustees of the Human Colossus Foundation have convened an Inputs and Semantics Working Group (ISWG), subdivided into an Inputs Domain Group (ISWG-I) and a Semantic Domain Group (ISWG-S). The mission of the ISWG-S is to define an Internet-scale data capture architecture consisting of stable capture bases and interoperable overlays where multiple parties can interact with and contribute to the schema structure without having to change the capture base definition. OCA is the core public utility technology of choice in the ISWG-S. Health Care, Notice & Consent, Privacy & Risk, and Storage & Portability task forces have also been convened under the ISWG to integrate the technology into a Dynamic Data Economy (DDE).
Check out Part 3: Semantic Domain of the webinar “Core public utility technologies for a ‘next generation’ internet” where Paul Knowles, Head of the Advisory Council at The Human Colossus Foundation, explains how OCA is set to become a breakthrough solution for deterministic semantics and brings with it the promise of unifying a data language through semantic harmonisation across a digital network.
To learn more about the advantages of interoperable schema design, including any use cases that may benefit from the architecture, follow our upcoming blog posts, join the Human Colossus Foundation Matrix room, or email us directly at info@humancolossus.org.
Official OCA website: oca.colossi.network
This work is licensed under a Creative Commons Attribution 4.0 International License.
Note: The Human Colossus Foundation will be introducing key components of a Dynamic Data Economy (DDE) through a series of blog posts over the course of 2020.