Secondary Use Starter

Secondary Use Starter

  • A production-ready pipeline for: cohort -> sampling -> features -> dataset
  • Executable, rerunnable definitions for cohort and dataset logic.
  • Run-level traceability with manifest outputs.
  • Governed output pattern suitable for secure environments.
  • Handover assets for repeatable team operation.

Timeline (Phase A)

4-6 weeks

Typical investment (Phase A)

EUR 30k-EUR 45k (excluding infrastructure)

Best for

Research platforms, life sciences/RWE teams, and hospital AI teams that need governed, repeatable dataset production.

Teams ready to implement one concrete cohort-to-dataset use case with measurable delivery.

Package Content (Phase A)

Discovery and scope confirmation with acceptance criteria

Data quality checks aligned to target dataset outputs

One cohort definition and one dataset specification

Run documentation package (manifest references, parameters, outputs)

Environment template for your stack (Databricks, Spark on Kubernetes, or on-prem Spark/YARN)

Handover session (code walkthrough, runbook, and team enablement)

Notebook deliverable (Scala): cohort, sampling, feature, and outcome pipeline implementation with runnable cells and parameterized execution

Phase A supports FHIR and/or OMOP based on agreed pilot scope

Includes runbook notes for rerun, refresh, and result inspection (entries/stats/history)

Deliverables at Handover

Notebook(s) with runnable, parameterized pipeline cells.

Runbook for rerun, refresh, and operational checks.

Pipeline implementation package/repository for agreed cohort and dataset scope.

Sample run manifest/output package from pilot execution.

Cohort and dataset specification documents with agreed schema/logic.

Data quality and validation summary report.

Customer Prerequisites

  • Approved data access to at least one pilot source (FHIR and/or OMOP)
  • Target Spark execution environment available for pilot runs
  • Named clinical and engineering contacts for requirement and validation cycles

Acceptance Criteria (Examples)

  • Cohort logic and dataset specification accepted by study stakeholders
  • At least one successful end-to-end run using agreed pilot scope
  • Re-run of the same pipeline produces traceable manifest-backed output
  • Output tables and quality checks match agreed acceptance criteria

Package Boundaries (Phase A)

  • No full managed platform hosting unless explicitly scoped as add-on
  • Additional cohorts/datasets are out of base scope and handled as incremental extensions
  • Clinical model training/validation studies are not included in this package

Planned Extensions (Brief)

This package is delivered first as a Phase A OSS-aligned starter.

  • Phase B (planned): API-based governance flow demonstration (request/approval/publish-export) when Studyfyr MVP API is available.
  • Phase C (planned): End-to-end UI governance flow demonstration when Studyfyr UI is available.

Extension scope, timeline, and pricing are confirmed at activation.

Optional Add-Ons

  • Secure processing pattern workshop and reference architecture
  • Quarterly refresh and monitoring support

The Engagament Path

Starter (Current Step)

Deliver first governed cohort-to-dataset pipeline and validate operational fit.

Scale-Up (3-6 months)

Add cohorts/datasets, schedule refresh/recompute cycles, and harden multi-study governance.

Operate (Ongoing, Optional)

Establish recurring operational support, quality monitoring, and optimization cadence.