Draft v0.1.0

Open Experiment
Standard.

A vendor-neutral standard for documenting, exchanging, archiving, and presenting online experiments. Portable between vendors and products.

Read the spec Validate a document JSON Schema

Why a standard?

Experiment data lives in many tools — GrowthBook, Optimizely, Statsig, internal platforms — but most teams want the same things from it: an archive of past decisions, a way to present results to executives, the ability to migrate between platforms, and a learning repository that outlives any single vendor.

Existing tools standardize how flags are evaluated. OES standardizes how experiments are documented and exchanged — design, metrics, results, decisions, and the trust checks that justify them.

Top-level envelope

A short, predictable structure makes documents safe to parse, even when fields are extended.

{
  "schemaVersion": "0.1.0",
  "objectType": "experiment",
  "experiment": {},
  "design": {},
  "variants": [],
  "metrics": [],
  "analysis": {},
  "results": {},
  "scorecard": {},
  "decision": {},
  "qualityChecks": [],
  "artifacts": [],
  "provenance": {},
  "extensions": {}
}

Design principles

Separate planning from outcomes

The standard distinguishes what was planned, what happened, how it was analyzed, what was concluded, and what should be shown to humans.

Snapshot, don't reference

Metric definitions, code versions, and warehouse queries are captured into the document at analysis time — they may have changed by the time you reread it.

Extensible by namespace

Vendor-specific fields live under extensions.*. Importers must safely ignore unknown extensions instead of rejecting documents.

Trust is first-class

Sample ratio mismatch, exposure health, invariant metrics, and other quality checks are part of the standard, not an afterthought.

Decisions, not just results

Experiments fail as institutional memory when the result exists but the decision does not. OES makes the decision a first-class object.

Bundle, don't fragment

JSON is the canonical manifest. Charts, CSVs, SQL, notebooks, and HTML reports travel alongside it as a research object.

What's in the spec

Twelve sections, ordered from envelope to artifacts. The MVP covers the fields needed for the 80% of online A/B tests teams run today.

Envelope

Portability and version safety for the document itself.

Identity

The human-facing context that makes a result meaningful later.

Design

The intended experimental design — not just what happened to be measured.

Variants

Each variant — independently identifiable, documentable, and presentable.

Metrics

Snapshot metric identity AND calculation — definitions change.

Analysis

How the scorecard was computed — reproducibility starts here.

Results

Machine-readable, granular, and per-metric — not just an overall summary.

Scorecard

A curated, presentation-ready view — not a dump of every metric.

Decision

Experiments fail as institutional memory when the result exists but the decision does not.

Quality

Trust checks as a first-class part of the standard.

Provenance

What allows another system to trust — or reproduce — the result.

Artifacts

Charts, CSVs, SQL, notebooks — bundled or linked.

MVP at a glance

For v0.1, we don't try to model every possible statistical method. We start with the fields needed for the 80% case:

Experiment identity: ID, title, status, owner, hypothesis, dates
Design: type, randomization unit, population, allocation, variants
Metrics: definitions with role, direction, window, data source
Results: sample sizes, deltas, intervals, p-values or Bayesian probabilities
Scorecard: primary outcome, guardrails, overall result, recommended action
Decision: ship / do-not-ship / iterate / rerun, with rationale
Quality checks: SRM, exposure health, invariants, data freshness
Artifacts and provenance: links to charts, SQL, dashboards, commits, source system

How it relates to OpenFeature

OpenFeature standardizes how applications evaluate feature flags and associate those evaluations with downstream outcomes. OES standardizes how experiment plans, metrics, results, scorecards, and decisions are exchanged after or during analysis. The two are complementary, not competitive.