The future

Electronic Health Records

Developing technology for patient data collection, protection and sharing.

Personal data
The Challenge

Skein worked with the University of Oxford’s Digitally Enabled Preventative Health research group on defining the requirements and developing technology for patient data collection and analysis. It aimed at addressing the critical challenge of the quality of the data used in the evaluation of clinical outcomes, development of new drugs and vaccines, in precision medicine and personalised health treatments. 

The research was funded by a European Institute of Innovation and Technology grant.


Widespread adoption of precision medicine depends on an understanding of the implications of individual variations on drug type, dose, and response in various diseases, and access to high-quality patient data.

Herpes Simplex Virus (HSV) was used as an example of a disease that affects a huge population. Due to insufficient knowledge of the virus, there are still no vaccines for it despite large numbers of infections globally. Whilst there are many studies, researchers  are unable to control the data provenance and quality.  The World Health Organisation defined a better collection and management of epidemiologic data as the key first step towards an improved understanding of the virus and thus advancing research. 

Data heterogeneity is a key problem in standardising epidemiologic data. Through developing a standardised patient data registry tools, we aimed to facilitate the collection of better quality data for  a systematical study of HSV epidemiology. As a part of this project, we also developed an innovative machine learning algorithm for the identification of risk groups among people potentially infected with HSV. 


AI for real-life medical evidence and personal data insights

Comprehensive understanding of patterns requires robust genomic and demographic data, that includes extended data such as family history, ancestry, genomic, biomarker and imaging information. 

Anonymisation is one of the critical instruments for providing a secure environment for data sharing. The pseudo-anonymity techniques provide tools for designing a data collection system that enables users to get access to it without providing his/her identity. It requires the use of a robust cryptographic hash function to anonymise information related to the patient’s identity and solutions for reversible pseudonym generation.

An increasing proportion of older people among users means that the technology implementation needs to take that into account and avoid, for example, a purely app-based implementation. 

To resolve these challenges, we created a patient registry engineering solution using machine learning and data science methods. The source database contained over 600 variables on demographic, socioeconomic, dietary, and health-related information collected by interviews and physical examinations.

Agile delivery

The process followed the UK Government Agile Service Design framework. At the first stage, Discovery, we learned about the users and their contexts, the technological constraints, defined design requirements and user stories. After completion of Discovery, technologically we are going into Alpha with a set of user-focused requirements and design specifications. At the Alpha phase, a technological blueprint was developed.

Data integration and security

We researched the health data exchange standards in operation and development:

  • FHIR is a standard developed by Health Level Seven (HL7) that functions as an API for developers to access needed clinical information from the EMR
  • openEHR
  • EN/ISO 13606 – Electronic Health Record Communication
  • Extensible Markup Language (XML) 
  • The Resource Description Framework (RDF) and RDF-Schema (RDFS)
  • Simple Knowledge Organization System (SKOS)
  • Common Terminology Services, Release 2 (CTS2) 

FHIR and openEHR are the two most recent, robust and complete healthcare data persistence and exchange specifications.

Based on the data standards requirements, including openEHR and FHIR, we planned the database architecture that deploys a history of updates, supports the JSON, XML, RDF formats and provides oAuth authorisation. The core database is a relational PostgreSQL with additional NoSQL data storage for unstructured data. 


Machine Learning module

We explored and prototyped solutions based on Natural Language Processing (NLP), XGboost models and Classification and Regression Trees approaches to reducing informational entropy. Ultimately, a CART Random Forest (RF) model was used for generating questions for users in the HSV Diagnostic Tool implementation.

An anonymous lifestyle-data based questionnaire with a Random Forest algorithm was devised using Python. The algorithm was optimised to reduce the number of questions and to identify risk groups for HSV. We split the data set in training and validation subsets, which were used for training and performance testing of the model.

See more detail in the published academic paper.

Technology Stack
Security: SSL, Encryption for data at rest and in transit Rest API ReactJS Python CART Random Forest Machine Learning models

About Skein

Skein brings together digital technology businesses that develop innovative solutions for the data economy.


About EIT Health 

EIT Health is a ‘knowledge and innovation community’ (KIC) of the European Institute of Innovation and Technology. It works across borders with approximately partner organisations, bringing together the brightest minds in healthcare to answer some of the biggest health challenges facing Europe. Headquartered in Munich, Germany, with a pan-EU representation via six regional Innovation Hubs.


About Digitally Enabled Preventative Health Research Group

The group is focused on high-impact research in digital health,  developing and evaluating digital solutions to health-related issues, integrated health data ecosystems and enabling infrastructure.


Contact us
Explore opportunities today!
Get in touch
See other projects
User Experience Design
Innovative Variable Data Publishing technology for personalised gifts. User research, product strategy, UX and build.
User Experience Design

Case-study: Variable Data Publishing. User research, product strategy, UX and build.

Data infrastructure
Read more
TARA Skein platform
TARA is a €6mln project that aims at developing data-driven personalised therapies.
TARA Skein platform

Skein develops an innovative Patient Data platform, supported by a EU-funded research and innovation grant.

Mobile apps
Personal data
Read more
AjaX Skein
AjaX Skein

Skein partnered with King’s College London for research and innovation project in AI-Enabled Clinical Decision Making.

Data infrastructure
Personal data
Research & Analytics
Read more