Fission

Full Benchmark Interpretation

This guide explains how to run the canonical Python benchmark harness and how to read high-level signals from its outputs.

Normative runner documentation remains in benchmark/full_benchmark/README.md—start there for commands.

Runner responsibilities

The full_decomp_benchmark.py entry compares:

Historical compatibility probing may instead use compare_legacy_preview.py—see the README for seed-address workflows.

Typical local invocation sketch

After installing benchmark Python deps (requirements-benchmark.txt), configure GHIDRA_INSTALL_DIR or rely on vendored Ghidra paths documented in the README. Example shape:

python3 benchmark/full_benchmark/full_decomp_benchmark.py \
  samples/windows/x64/example.exe \
  --fission-bin target/release/fission_cli \
  --ghidra-dir vendor/ghidra/ghidra-Ghidra_12.0.4_build \
  --output-dir benchmark/artifacts/full_benchmark/run-latest

Corpus manifests live under benchmark/config/benchmark_corpus/ (smoke_corpus.json, parity_corpus.json, …).

Reading results

Benchmark outputs bundle logs, structured summaries, and sometimes rendered comparisons—exact filenames evolve with runner versions.

General interpretation guidelines:

  1. Treat regressions as hypotheses: Confirm toolchain versions (fission_cli hash), Ghidra directory version, seeds, and timeouts match baseline expectations.
  2. Separate infra failures from semantics: Missing Ghidra install vs divergent structuring belong to different remediation paths.
  3. Prefer manifests: Corpus-driven runs encode binary lists and constraints reproducibly—avoid one-off paths when chasing systemic parity gaps.

For conceptual parity framing, see docs/architecture/GHIDRA_PARITY_GAP_AUDIT.md where applicable.