One cohort definition, one dataset specification, pipeline implementation, run manifest/audit outputs, handover.
STARTER PACK
Secondary Use Starter
- A production-ready pipeline for:
cohort -> sampling -> features -> dataset - Executable, rerunnable definitions for cohort and dataset logic.
- Run-level traceability with manifest outputs.
- Governed output pattern suitable for secure environments.
- Handover assets for repeatable team operation.
Timeline (Phase A)
4-6 weeks
Typical investment (Phase A)
EUR 30k-EUR 45k (excluding infrastructure)
Best for
Research platforms, life sciences/RWE teams, and hospital AI teams that need governed, repeatable dataset production.
Teams ready to implement one concrete cohort-to-dataset use case with measurable delivery.
WHAT IS INCLUDED
Package Content (Phase A)
Discovery and scope confirmation with acceptance criteria
Data quality checks aligned to target dataset outputs
One cohort definition and one dataset specification
Run documentation package (manifest references, parameters, outputs)
Environment template for your stack (Databricks, Spark on Kubernetes, or on-prem Spark/YARN)
Handover session (code walkthrough, runbook, and team enablement)
Notebook deliverable (Scala): cohort, sampling, feature, and outcome pipeline implementation with runnable cells and parameterized execution
Phase A supports FHIR and/or OMOP based on agreed pilot scope
Includes runbook notes for rerun, refresh, and result inspection (entries/stats/history)
Deliverables at Handover
Notebook(s) with runnable, parameterized pipeline cells.
Runbook for rerun, refresh, and operational checks.
Pipeline implementation package/repository for agreed cohort and dataset scope.
Sample run manifest/output package from pilot execution.
Cohort and dataset specification documents with agreed schema/logic.
Data quality and validation summary report.
Customer Prerequisites
- Approved data access to at least one pilot source (FHIR and/or OMOP)
- Target Spark execution environment available for pilot runs
- Named clinical and engineering contacts for requirement and validation cycles
Acceptance Criteria (Examples)
- Cohort logic and dataset specification accepted by study stakeholders
- At least one successful end-to-end run using agreed pilot scope
- Re-run of the same pipeline produces traceable manifest-backed output
- Output tables and quality checks match agreed acceptance criteria
Package Boundaries (Phase A)
- No full managed platform hosting unless explicitly scoped as add-on
- Additional cohorts/datasets are out of base scope and handled as incremental extensions
- Clinical model training/validation studies are not included in this package
Planned Extensions (Brief)
This package is delivered first as a Phase A OSS-aligned starter.
- Phase B (planned): API-based governance flow demonstration (request/approval/publish-export) when Studyfyr MVP API is available.
- Phase C (planned): End-to-end UI governance flow demonstration when Studyfyr UI is available.
Extension scope, timeline, and pricing are confirmed at activation.
Optional Add-Ons
- Secure processing pattern workshop and reference architecture
- Quarterly refresh and monitoring support
WHAT IS NEXT
The Engagament Path
1
Starter (Current Step)
Deliver first governed cohort-to-dataset pipeline and validate operational fit.
2
Scale-Up (3-6 months)
Add cohorts/datasets, schedule refresh/recompute cycles, and harden multi-study governance.
3
Operate (Ongoing, Optional)
Establish recurring operational support, quality monitoring, and optimization cadence.