Life sciences and RWE at scale

Build repeatable cohort and feature pipelines from FHIR on Apache Spark

Pontegra helps life sciences and real-world evidence (RWE) teams turn longitudinal clinical data into analysis-ready datasets in a fast, repeatable, and scalable way.

With Spark-on-FHIR you can define cohorts, create sampling timepoints, compute features and endpoints, and deliver curated tables for HEOR, epidemiology, and AI workflows.

Powered by: Spark-on-FHIR, Pontegra’s FHIR-native Spark toolkit.

Title?

Subtitle?

Accelerate time-to-dataset

Move from data access to analysis-ready tables without hand-built flattening scripts.

Make studies reproducible

Define cohort logic, sampling rules, and feature definitions as versioned pipelines that can be rerun and refreshed.

Scale to real-world volumes

Run the same logic across millions of patients and billions of records with Spark-native execution.

Support iterative study development

Start with a cohort sample for quick validation, then scale out to full production runs.

Why Spark-on-FHIR for RWE

Most RWE pipelines spend disproportionate time on:

  • extracting and normalizing longitudinal events,
  • aligning timelines to an index date,
  • generating covariates/outcomes consistently,
  • keeping refresh runs stable as data grows.

Spark-on-FHIR is built specifically to reduce that burden by combining:

  • FHIR-native query semantics (FHIR search + FHIRPath),
  • Spark scale-out compute, and
  • higher-level building blocks for cohorts, sampling, and feature extraction.

Core workflow

Subtitle?

Define the cohort

Model cohorts with reproducible logic:

  • index event definition (e.g., first diagnosis, first prescription, procedure date
  • inclusion/exclusion criteria
  • entry/exit timepoints (e.g., diagnosis to death/discharge)
  • optional sampling mode to test logic on large datasets quickly

Sample the timeline

Create timepoints for feature computation:

  • Periodic sampling (e.g., monthly/quarterly)
  • Event-aligned sampling (e.g., pre/post ICU admission, surgery, treatment start)
  • Support “since last timepoint” and “since index event” windows

Compute features and endpoints

Generate analysis datasets:

  • covariates (labs, vitals, diagnoses, meds, utilization)
  • endpoints/outcomes (events, counts, time-to-event derivations)
  • feature aggregations (avg/max/min/last/count; windowed variants)
  • export tidy tables to Parquet/Delta/CSV for downstream analysis