Building

The future

Health Data For Precision Medicine

Developing technology for patient data collection, protection and sharing.

The Challenge

Skein worked with the University of Oxford’s Digitally Enabled Preventative Health research group on defining the requirements and developing technology for patient data collection and analysis. It aimed at addressing the critical challenge of the quality of the data used in the evaluation of clinical outcomes, development of new drugs and vaccines, in precision medicine and personalised health treatments. 

The research was funded by a European Institute of Innovation and Technology grant.

Results
Background

Widespread adoption of precision medicine depends on an understanding of the implications of individual variations on drug type, dose, and response in various diseases, and access to high-quality patient data.

Herpes Simplex Virus (HSV) was used as an example of a disease that affects a huge population. Due to insufficient knowledge of the virus, there are still no vaccines for it despite large numbers of infections globally. Whilst there are many studies, researchers  are unable to control the data provenance and quality.  The World Health Organisation defined a better collection and management of epidemiologic data as the key first step towards an improved understanding of the virus and thus advancing research. 

Data heterogeneity is a key problem in standardising epidemiologic data. Through developing a standardised patient data registry tools, we aimed to facilitate the collection of better quality data for  a systematical study of HSV epidemiology. As a part of this project, we also developed an innovative machine learning algorithm for the identification of risk groups among people potentially infected with HSV. 

 

AI for real-life medical evidence and personal data insights

Comprehensive understanding of patterns requires robust genomic and demographic data, that includes extended data such as family history, ancestry, genomic, biomarker and imaging information. 

Anonymisation is one of the critical instruments for providing a secure environment for data sharing. The pseudo-anonymity techniques provide tools for designing a data collection system that enables users to get access to it without providing his/her identity. It requires the use of a robust cryptographic hash function to anonymise information related to the patient’s identity and solutions for reversible pseudonym generation.

An increasing proportion of older people among users means that the technology implementation needs to take that into account and avoid, for example, a purely app-based implementation. 

To resolve these challenges, we created a patient registry engineering solution using machine learning and data science methods. The source database contained over 600 variables on demographic, socioeconomic, dietary, and health-related information collected by interviews and physical examinations.

Agile delivery

The process followed the UK Government Agile Service Design framework. At the first stage, Discovery, we learned about the users and their contexts, the technological constraints, defined design requirements and user stories. After completion of Discovery, technologically we are going into Alpha with a set of user-focused requirements and design specifications. At the Alpha phase, a technological blueprint was developed.

Data integration and security

We researched the health data exchange standards in operation and development:

  • FHIR is a standard developed by Health Level Seven (HL7) that functions as an API for developers to access needed clinical information from the EMR
  • openEHR
  • EN/ISO 13606 – Electronic Health Record Communication
  • Extensible Markup Language (XML) 
  • The Resource Description Framework (RDF) and RDF-Schema (RDFS)
  • Simple Knowledge Organization System (SKOS)
  • Common Terminology Services, Release 2 (CTS2) 

FHIR and openEHR are the two most recent, robust and complete healthcare data persistence and exchange specifications.

Based on the data standards requirements, including openEHR and FHIR, we planned the database architecture that deploys a history of updates, supports the JSON, XML, RDF formats and provides oAuth authorisation. The core database is a relational PostgreSQL with additional NoSQL data storage for unstructured data. 

 

Machine Learning module

We explored and prototyped solutions based on Natural Language Processing (NLP), XGboost models and Classification and Regression Trees approaches to reducing informational entropy. Ultimately, a CART Random Forest (RF) model was used for generating questions for users in the HSV Diagnostic Tool implementation.

An anonymous lifestyle-data based questionnaire with a Random Forest algorithm was devised using Python. The algorithm was optimised to reduce the number of questions and to identify risk groups for HSV. We split the data set in training and validation subsets, which were used for training and performance testing of the model.

See more detail in the published academic paper.

Technology Stack
Security: SSL, Encryption for data at rest and in transit Rest API ReactJS Python CART Random Forest Machine Learning models
About

About Skein

Skein brings together digital technology businesses that develop innovative solutions for the data economy.

https://skein.co

 

About EIT Health 

EIT Health is a ‘knowledge and innovation community’ (KIC) of the European Institute of Innovation and Technology. It works across borders with approximately partner organisations, bringing together the brightest minds in healthcare to answer some of the biggest health challenges facing Europe. Headquartered in Munich, Germany, with a pan-EU representation via six regional Innovation Hubs.

https://eithealth.eu/

 

About Digitally Enabled Preventative Health Research Group

The group is focused on high-impact research in digital health,  developing and evaluating digital solutions to health-related issues, integrated health data ecosystems and enabling infrastructure.

Oxford DEPTH website

Contact us
Explore opportunities today!
Get in touch
See other projects
User Data Infrastructure: tp-link.com
TP-Link
User Data Infrastructure: tp-link.com

Partner portal for the networking products provider TP-Link.

Ecommerce
See case study
Analytics for a VR start-up
Business planning and analytics for the multi-platform service.
Analytics for a VR start-up

Teslasuit: virtual reality reinvented

Research & Analytics
See case study
Analytics for the Travel Industry
Analytical platform and a digital PPC strategy for a leading content provider.
Analytics for the Travel Industry

Analytical platform and a digital PPC strategy for a leading content provider.

Research & Analytics
See case study