Fission

Benchmark Guide

This document is the top-level benchmark guide for the repository.

Current policy is intentionally split:

The canonical validation surface for release-quality decompiler work is:

Canonical Workflow

Use the source semantic benchmark for decompilation quality work. It compares Fission output against checked-in original source code and does not use Ghidra as the quality oracle.

Primary operator guide:

Current scope:

Recommended loop:

  1. targeted trace or targeted crate validation
  2. source semantic smoke benchmark
  3. source-owned corpus benchmark
  4. Ghidra reference/comparison benchmark only when investigating parity against local Ghidra code

Real-World Sample Automation

CLI-first sample automation lives under scripts/corpus/ and scripts/benchmark/. The pipeline is intentionally split so it can be run locally or in CI without a browser session:

# 1. Collect/hash local samples and emit a full-benchmark-compatible manifest.
python3 scripts/corpus/hash_and_manifest.py \
  --input benchmark/binary/x86-64/window/small/binary \
  --copy-to benchmark/binary/realworld/local \
  --repo-relative \
  --output benchmark/config/benchmark_corpus/realworld_local.json

# Optional: download URLs listed one per line before hashing.
python3 scripts/corpus/hash_and_manifest.py \
  --url-list /path/to/urls.txt \
  --copy-to benchmark/binary/realworld/downloaded \
  --repo-relative \
  --output benchmark/config/benchmark_corpus/realworld_downloaded.json

# Optional: collect GitHub release asset URLs and, with --download, store them
# under an ignored local corpus directory before writing a manifest.
python3 scripts/corpus/collect_github_release_samples.py \
  --source-config benchmark/config/benchmark_corpus/github_release_sources.example.json \
  --download \
  --store benchmark/binary/realworld/github \
  --output benchmark/config/benchmark_corpus/github_release_samples.json

# 2. Run loader-only smoke over the manifest.
python3 scripts/benchmark/run_loader_smoke.py \
  --manifest benchmark/config/benchmark_corpus/realworld_local.json \
  --fission-bin target/release/fission_cli \
  --output-dir benchmark/artifacts/realworld_suite/loader_smoke_latest

# 3. Orchestrate loader smoke plus optional raw p-code and full benchmark lanes.
python3 scripts/benchmark/run_realworld_suite.py \
  --build-cli \
  --loader-manifest benchmark/config/benchmark_corpus/realworld_local.json \
  --raw-pcode-manifest benchmark/raw_p_code_benchmark/canonical_rows.json \
  --full-corpus-manifest benchmark/config/benchmark_corpus/realworld_local.json \
  --ghidra-dir vendor/ghidra/ghidra_12.0.4_PUBLIC \
  --fission-release \
  --output-dir benchmark/artifacts/realworld_suite/latest

# 4. Diff any two JSON reports from loader/raw/full/suite runs.
python3 scripts/benchmark/compare_reports.py \
  --baseline benchmark/artifacts/realworld_suite/baseline/loader_smoke_report.json \
  --current benchmark/artifacts/realworld_suite/latest/loader_smoke/loader_smoke_report.json \
  --output-dir benchmark/artifacts/realworld_suite/latest/diff

Downloaded samples are not automatically trusted. The manifest records SHA-256, size, source provenance, and a magic-byte format guess; loader semantics still come from Fission parsers and typed loader failures, not benchmark-side repair. Downloaded GitHub release binaries under benchmark/binary/realworld/ are local corpus artifacts and must not be staged or pushed.

Canonical Paths

Config roots:

Artifact roots:

Naming contract:

Corpus Semantics

The canonical source semantic suites are source-owned. Every source-defined function produces a row; mapping, decompilation, candidate compilation, and behavior failures are recorded as failures rather than skipped cases.

putty remains the primary canary, but it is not the only release narrative. Cross-binary degraded rows, x86/x64 split reporting, owner metrics, and shape-drift proxies are part of the canonical corpus summary.

Advisory vs Promotion

Current rollout stays advisory-first.

When a same-axis baseline is present, summaries should expose:

Compact Summary

The preferred machine-readable artifacts are:

Use them for first-pass review, AI tooling, and advisory summarization. Keep the Markdown artifact for operator-facing triage.

Legacy Microbenchmarks

Criterion / perf-benchmark helpers remain in the repository, but they are no longer the canonical decompilation-quality surface.

Still valid for targeted throughput/perf work:

Use those only when the question is microbenchmark throughput or performance history. Do not treat them as the release-quality oracle for Ghidra-parity work.