This document is the top-level benchmark guide for the repository.
Current policy is intentionally split:
benchmark/source_semantic_benchmark/README.mdbenchmark/full_benchmark/README.mdcargo bench, scripts/benchmark/*The canonical validation surface for release-quality decompiler work is:
python3 benchmark/source_semantic_benchmark/run_source_semantic_benchmark.py ...benchmark/source_semantic_benchmark/manifests/benchmark/artifacts/source_semantic_benchmark/Use the source semantic benchmark for decompilation quality work. It compares Fission output against checked-in original source code and does not use Ghidra as the quality oracle.
Primary operator guide:
Current scope:
benchmark/binary/.Recommended loop:
CLI-first sample automation lives under scripts/corpus/
and scripts/benchmark/. The pipeline is intentionally
split so it can be run locally or in CI without a browser session:
# 1. Collect/hash local samples and emit a full-benchmark-compatible manifest.
python3 scripts/corpus/hash_and_manifest.py \
--input benchmark/binary/x86-64/window/small/binary \
--copy-to benchmark/binary/realworld/local \
--repo-relative \
--output benchmark/config/benchmark_corpus/realworld_local.json
# Optional: download URLs listed one per line before hashing.
python3 scripts/corpus/hash_and_manifest.py \
--url-list /path/to/urls.txt \
--copy-to benchmark/binary/realworld/downloaded \
--repo-relative \
--output benchmark/config/benchmark_corpus/realworld_downloaded.json
# Optional: collect GitHub release asset URLs and, with --download, store them
# under an ignored local corpus directory before writing a manifest.
python3 scripts/corpus/collect_github_release_samples.py \
--source-config benchmark/config/benchmark_corpus/github_release_sources.example.json \
--download \
--store benchmark/binary/realworld/github \
--output benchmark/config/benchmark_corpus/github_release_samples.json
# 2. Run loader-only smoke over the manifest.
python3 scripts/benchmark/run_loader_smoke.py \
--manifest benchmark/config/benchmark_corpus/realworld_local.json \
--fission-bin target/release/fission_cli \
--output-dir benchmark/artifacts/realworld_suite/loader_smoke_latest
# 3. Orchestrate loader smoke plus optional raw p-code and full benchmark lanes.
python3 scripts/benchmark/run_realworld_suite.py \
--build-cli \
--loader-manifest benchmark/config/benchmark_corpus/realworld_local.json \
--raw-pcode-manifest benchmark/raw_p_code_benchmark/canonical_rows.json \
--full-corpus-manifest benchmark/config/benchmark_corpus/realworld_local.json \
--ghidra-dir vendor/ghidra/ghidra_12.0.4_PUBLIC \
--fission-release \
--output-dir benchmark/artifacts/realworld_suite/latest
# 4. Diff any two JSON reports from loader/raw/full/suite runs.
python3 scripts/benchmark/compare_reports.py \
--baseline benchmark/artifacts/realworld_suite/baseline/loader_smoke_report.json \
--current benchmark/artifacts/realworld_suite/latest/loader_smoke/loader_smoke_report.json \
--output-dir benchmark/artifacts/realworld_suite/latest/diff
Downloaded samples are not automatically trusted. The manifest records SHA-256,
size, source provenance, and a magic-byte format guess; loader semantics still
come from Fission parsers and typed loader failures, not benchmark-side repair.
Downloaded GitHub release binaries under benchmark/binary/realworld/ are local
corpus artifacts and must not be staged or pushed.
Config roots:
benchmark/source_semantic_benchmark/manifests/smoke_windows_small_c.jsonbenchmark/source_semantic_benchmark/manifests/source_owned_all.jsonbenchmark/config/benchmark_corpus/smoke_corpus.jsonbenchmark/config/benchmark_corpus/release_corpus.jsonbenchmark/config/benchmark_corpus/parity_corpus.jsoncrates/fission-automation/config/sentinel_sets.toml (default nir-check manifest)benchmark/config/automation/README.md (pointer / layout note)Artifact roots:
benchmark/artifacts/source_semantic_benchmark/benchmark/artifacts/full_benchmark/benchmark/artifacts/automation/Naming contract:
benchmark/artifacts/source_semantic_benchmark/<suite>-latestbenchmark/artifacts/full_benchmark/<target>-<profile>-latestbenchmark/artifacts/full_benchmark/<target>-<profile>-baselinebenchmark/artifacts/full_benchmark/<target>-<profile>-<YYYYmmdd-HHMMSS>benchmark/artifacts/automation/<lane>-<run-profile>-<unix_run_id>benchmark/artifacts/automation/latest/<lane>/The canonical source semantic suites are source-owned. Every source-defined function produces a row; mapping, decompilation, candidate compilation, and behavior failures are recorded as failures rather than skipped cases.
smoke: deterministic source-owned C fixture validationsource-owned-all: auto-discovered checked-in source/binary pairsputty remains the primary canary, but it is not the only release narrative.
Cross-binary degraded rows, x86/x64 split reporting, owner metrics, and shape-drift
proxies are part of the canonical corpus summary.
Current rollout stays advisory-first.
gate_mode=advisory still computes regressions and promotion blockersrelease_promotion_allowed=false is expected until a suite is intentionally promotedWhen a same-axis baseline is present, summaries should expose:
benchmark statusgate moderelease promotion eligibilityThe preferred machine-readable artifacts are:
source_semantic_rows.jsonsource_semantic_summary.jsonUse them for first-pass review, AI tooling, and advisory summarization. Keep the Markdown artifact for operator-facing triage.
Criterion / perf-benchmark helpers remain in the repository, but they are no longer the canonical decompilation-quality surface.
Still valid for targeted throughput/perf work:
cargo bench ...scripts/benchmark/analyze_benchmark.pyscripts/benchmark/update_history.pyscripts/benchmark/setup.shUse those only when the question is microbenchmark throughput or performance history. Do not treat them as the release-quality oracle for Ghidra-parity work.