Overcoming Data Ingestion Challenges for health Tech Startup

Major challenges threaten delivery for San Francisco Health Tech Startup

The Challenge for the company ⛔

The health tech startup company I worked for, Verana Health, faced critical contractual and board-level obligations that needed to be achieved within the first quarter of the following year in 2022. Failure to meet these objectives would jeopardize contractual commitments with major health registries, risking the loss of our sole proprietary access to some of the world’s largest clinical databases—an outcome that could fundamentally undermine our business model to provide near real-time data to provide actionable insights for Life Science organizations to accelerate drug development and clinical trials.

After a major re-org and leadership shakeup it was clear had our backs up against the wall. We had to come up with a plan to hit our goals.

THe Goals we had to hit 🎯

After weeks of brainstorming and numerous meetings, it became clear that achieving the following goals was imperative. Failure to meet these objectives would severely impact our fiduciary responsibilities and result in a breach of contract.

  1. 40% of American Academy of Neurology (AAN) registry target goal

  2. Achieve delivery of Merit-based Incentive Payment System (MIPS) reporting for 3,000 participating clinicians in the American Academy of Ophthalmology IRISⓇ Registry (Intelligent Research in Sight) and the American Academy of Neurology Axon RegistryⓇ by Q1 2022.

  3. Hit our Q1 Ingestion Roadmap Goals

How the heck we are going to do this⁉️

In order for us to achieve our goals we had figured out that it would take a herculean effort. Without a clear path forward we would exhaust our funding and resources.

Working with product, operations, and engineering leadership we devised a path forward that would allow use to achieve our goals. This would be broken down to the following:

  1. Increase ingestion by a minimum of 380 providers to hit 40% of American Academy of Neurology (AAN) registry by Q1 2022.

  2. Fix several critical ETL and operational issues. We had to fix several critical issues with ETL failures and data extraction issues that were causing major delays in ingesting data. This involved:

    1. Identifying and fixing ETL Failures

    2. Stabilizing and scaling data extraction by rolling out a new version of our remote agent do practices. This would be a large batch rollout.

    3. Remove manual operational steps that were bottlenecks.

  3. Map additional 8 EHR systems

  4. Hire new FTEs to handle the workload

This is Where I come in 🚀

Right before the pandemic I was the first TPM hired at Verana Health to support the product development of veraQ Enterprise data platform, a population health data engine for life science research and development with over 90 million de-identified patients and 20 thousand healthcare providers.

This platform leverages ML/AI technology to deliver high-quality data insights for drug lifecycle management and healthcare analytics to major pharmaceutical and life science organizations for rare diseases and feasibility studies.

I was the TPM for the ingestion of electronic healthcare data where I managed cross-functional programs under product development to build the roadmap capabilities for data lake acquisition and structured data store for normalization in AWS data lake. This was focusing on data extraction, data operations, platform infrastructure, and ETL processes. This included mapping and normalizing of patient data for downstream handoff to curation, data quality, and data science teams.

Specifically, this was to provide program management for the development of data pipelines, ingestion services, and deployment solutions, ensuring scalability, resiliency, and robust monitoring.

I was brought in to bring alignment, accountability, and structure to our teams in order to hit our Q1 2022 target goals.

Specifically, this was to do the following:

  • Map out the priorities from operations, product, and engineering leadership into break them actionable bite size pieces of work that could be tracked managed in Jira and prioritized from team backlogs.

  • Provide Agile leadership and structure teams to deliver in two week sprint cycles.

  • Incorporate release management across the teams to ensure successful downstream handoffs.

  • Establish operational feedback loops from production operations services to product and engineering to fix broken issues.

  • Facilitate and negotiate tradeoffs between priorities, scope, and delivery by effectively managing schedules, resources, and downstream handoffs.

  • Increase transparency, ownership, and accountability. Transform the teams to deliver technical solutions that were scalable, stable, consistent, and sustainable.

The Results 🙌

Success! After months and months of hard work, many iterations, team building and building trust we able to finally call it a successful outcome.

We were able to hit out target goals by implementing a new remote agent and roll it out successfully to all of our target practices. This was a game-changer for our operations teams, empowering them to execute large batch deployments efficiently. This allowed for increased stability, scalability and observability to allow for operations to increase ingestion of data on different EHR platforms to acquire date to land into the pipeline.

Also, we were able to stabilize our ETL failures. This involved a multi-prong approach:

  1. Fix Data Quality Issues: Incomplete, inconsistent, or duplicate data in the source.

  2. Fix Transformation Logic Errors: Incorrect rules for normalizing or mapping data.

  3. Fix Performance Bottlenecks: ETL processes failing due to resource constraints.

Lastly, we were able to map out the remaining AAN EHR systems with the help of new FTE hires. This includes the following EHR systems:

  1. AllScripts Pro

  2. Amazing Charts

  3. eCW

  4. GeCentricity

  5. Greenway Health Primesuite

  6. NextGen

  7. AllScripts Touchworks

  8. Greenway Intergy

Overall, this was a massive success and it led to our major Q1 2022 target goals. Our executive team, board of directors, and senior management were all exhilarated that we were able to collectively hit our target goals.

News Flash!

“Verana Health Marks Milestone with 2022 MIPS Submissions for IRIS Registry and Axon Registry”

Press Release May 1, 2023

Verana Health achieved a significant milestone which has been several years in the making. In Q1, we submitted data to the Centers for Medicare and Medicaid Services (CMS) Merit-based Incentive Payment System (MIPS) for the 2022 reporting year on behalf of nearly 3,000 participating clinicians in the American Academy of Ophthalmology IRISⓇ Registry (Intelligent Research in Sight) and the American Academy of Neurology Axon RegistryⓇ. These submissions were powered by the Verana Quality Measures (VQM) Dashboard solution and represent a significant milestone in Verana Health’s history as we curate data from over 90 million de-identified patients, normalizing across more than 50 different electronic health record systems and among multiple therapeutic areas.

MIPS is a program implemented by CMS that incentivizes clinicians to report on performance that should lead to improved quality and value in healthcare. Clinicians can earn a payment adjustment based on their performance in four categories: 

  1. Quality

  2. Promoting Interoperability

  3. Improvement Activities

  4. Cost

On behalf of Qualified Clinical Data Registries (QCDRs), such as the IRIS Registry and the Axon Registry, Verana Health enables clinicians to report on a range of measures beyond what is required by traditional MIPS reporting. The clinical quality measures (CQMs) and eCQMs available for MIPS reporting include QPP measures (from CMS) and QCDR measures. The quality measures include the following measure types: process, outcome, patient reported outcomes, clinical quality, and performance, which can provide a more comprehensive view of the quality of care provided to patients.

The QCDRs also provide the ability to track practice and clinician performance during the reporting period, allowing clinicians to better identify areas for improvement and support more data-driven decisions to enhance their patient care. More meaningful data and insights into quality measure performance have great potential to improve patient care and outcomes. 

See full article:
https://veranahealth.com/verana-health-marks-milestone-with-2022-mips-submissions-for-iris-registry-and-axon-registry/


want to know more about data ingestion? 🔍

Data ingestion is the process of collecting, integrating, and preparing large amounts of patient data from various sources to enable analysis and decision-making. This process begins with gathering data from diverse sources such as electronic health records (EHRs), medical devices, insurance claims, hospital operations, public health systems, and patient-generated inputs. These sources provide data in multiple formats, including structured formats like lab results or diagnosis codes, unstructured formats such as clinical notes and imaging files, and semi-structured formats like HL7 or FHIR used for healthcare interoperability.

The collected data is brought into the system through methods like APIs, and or secure file transfers (SFTP). Once acquired, the data is integrated to ensure coherence and usability. This involves mapping data fields across sources, performing extract, transform, and load (ETL) processes, and leveraging standards like HL7, FHIR, or DICOM to facilitate interoperability between different systems. The integrated data is then stored in repositories such as data lakes for raw storage, data warehouses for structured data, or cloud platforms designed for healthcare scalability and security.

Maintaining data quality is critical, and imense effort is taken to validate accuracy, remove duplicates, and address errors or missing fields. Security and compliance are also paramount due to the sensitive nature of healthcare data. Encryption ensures secure data transfer and storage, while access controls and adherence to regulations for HIPAA.

Once ingested, the data requires transformation to prepare it for analysis. This includes normalizing data into consistent formats, enriching it with derived metrics such as risk scores, and anonymizing patient identifiers to preserve privacy.

Finally, the processed data is delivered to analytics platforms for use in dashboards, predictive models, or reporting systems. This enables actionable clinical insights that improve patient care, optimize operational efficiency, and support data-driven decision-making, all while ensuring the security and integrity of the data throughout the process.

Ingestion of EHR Data step by step

1. Data Sources

EHRs ingest data from a variety of internal and external sources:

  • Clinical Systems: Laboratory Information Systems (LIS), Radiology Information Systems (RIS), Pharmacy Management Systems.

  • Medical Devices: Vital sign monitors, infusion pumps, imaging devices.

  • Patient-Generated Data: Data from wearables, mobile health apps, and remote patient monitoring devices.

  • External Systems: Health Information Exchanges (HIEs), payer systems, and third-party platforms.

  • Manual Inputs: Notes from clinicians, scanned documents, and forms.

2. Data Types

  • Structured: Lab results, prescriptions, coded diagnosis data (e.g., ICD-10, SNOMED).

  • Unstructured: Physician notes, imaging files, or free-text fields.

  • Semi-structured: HL7 messages, FHIR resources, or XML files.

3. Methods of Ingestion

  • Batch Processing: Periodic updates from external systems or archives.

  • Real-Time Streaming: Continuous ingestion of real-time data from devices and systems (e.g., vital signs monitoring).

  • APIs and Connectors: Use of standard APIs (e.g., FHIR, HL7) to pull and push data into the EHR system.

  • File Transfers: Importing CSV, XML, or JSON files through secure FTP or other methods.

4. Data Cleaning and Transformation

  • Normalization: Standardizing data formats (e.g., date-time, units of measurement).

  • Validation: Ensuring data integrity and completeness (e.g., checking mandatory fields).

  • Mapping: Aligning data with EHR schemas using terminologies like LOINC or SNOMED CT.

5. Integration Standards

  • HL7 (Health Level 7): Widely used for structured clinical messaging.

  • FHIR (Fast Healthcare Interoperability Resources): A modern standard designed for easy integration with web technologies.

  • DICOM: For medical imaging data.

  • CCDA (Consolidated Clinical Document Architecture): For exchanging patient summaries.

6. Security and Compliance

  • Data must be encrypted during transmission and at rest.

  • Access controls must ensure only authorized personnel can handle sensitive patient data.

  • Compliance with regulations like HIPAA (in the U.S.) and GDPR (in Europe).

What are some Other challenges that you have to deal with? 🫤

Data ingestion in healthcare is challenging due to the complexity of the data, regulatory requirements, and the diverse systems and stakeholders involved. Here are the primary reasons:

Healthcare data comes from multiple sources, such as hospitals, clinics, labs, practices each using different systems. Data can exist in structured (e.g., EHR databases), semi-structured (e.g., HL7 messages), and unstructured forms (e.g., clinical notes, PDFs, or imaging files). Many healthcare organizations still rely on older systems that lack modern data integration capabilities.

Proprietary Systems: Vendors of healthcare software sometimes use proprietary formats, creating silos of data.

Regulatory Compliance: Healthcare data ingestion must comply with strict regulations like HIPAA (in the U.S.) and GDPR (in the EU), which govern data access, storage, and sharing.

Sensitive Data: The highly sensitive nature of patient data requires robust encryption, authentication, and access control mechanisms.

Audit and Traceability: Organizations must track and log data ingestion activities to ensure accountability and compliance.

Incomplete or Inaccurate Data: Patient records may have missing or erroneous information, making integration unreliable. Duplicate records or inconsistent patient identifiers across systems lead to difficulties in merging datasets. Ensuring data integrity and consistency is resource-intensive.

Large Data Volumes: Healthcare generates vast amounts of data, from EHR entries to imaging studies, requiring scalable infrastructure.

Vendor Lock-In: Organizations may face resistance from EHR vendors reluctant to support integration with third-party systems.

Cross-Organizational Collaboration: Ingesting data across multiple organizations (e.g., for health information exchanges) adds legal, technical, and operational challenges.

Infrastructure Investment: Setting up secure, scalable infrastructure for data ingestion and storage is very expensive. Healthcare data integration often requires skilled professionals with knowledge of healthcare standards, data engineering, and compliance requirements.

Data Refresh and connectivity issues were also difficult to manage at scale.

Previous
Previous

AI Computer Vision Checkout Free Shopping for International Client

Next
Next

Overhauling a Major Health Data Platform