This is the canonical operator-facing benchmark guide for decompilation quality
and Ghidra-parity work. Criterion/perf helpers under scripts/benchmark/ are
non-canonical and should be treated as microbenchmark tooling only.
Canonical benchmark script root:
Default artifact root:
Fission keeps two benchmark entrypoints:
compare_legacy_preview.py: historical fixed-seed compatibility benchmarkfull_decomp_benchmark.py: whole-binary 2-way benchmarkfull_decomp_benchmark.py compares:
pyghidra: Python-host baselinefission: Rust decompiler pipelineThe runner now supports both:
--corpus-manifestpyghidrapsutil for RSS / CPU metricsGHIDRA_INSTALL_DIR configured, or vendor/ghidra/ghidra-Ghidra_12.0.4_buildfission_cli binarypython3 -m pip install -r benchmark/full_benchmark/requirements-benchmark.txt# Historical fixed-seed comparison
python3 benchmark/full_benchmark/compare_legacy_preview.py \
samples/windows/x64/putty.exe \
--addresses 0x140006380 \
--with-ghidra \
--repeat 3 \
--fission-bin target/release/fission_cli \
--output-dir benchmark/artifacts/full_benchmark/compare_legacy_preview/putty-fixed
# Whole-binary benchmark
python3 benchmark/full_benchmark/full_decomp_benchmark.py \
samples/windows/x64/putty.exe \
--fission-bin target/release/fission_cli \
--ghidra-dir vendor/ghidra/ghidra-Ghidra_12.0.4_build \
--output-dir benchmark/artifacts/full_benchmark/putty-balanced-latest
# Faster validation: first N canonical seed functions
python3 benchmark/full_benchmark/full_decomp_benchmark.py \
samples/windows/x64/test_control_flow_x64_O0.exe \
--limit 30 \
--timeout 300
# Smoke corpus benchmark
python3 benchmark/full_benchmark/full_decomp_benchmark.py \
--corpus-manifest benchmark/config/benchmark_corpus/smoke_corpus.json \
--fission-bin target/release/fission_cli \
--ghidra-dir vendor/ghidra/ghidra-Ghidra_12.0.4_build \
--output-dir benchmark/artifacts/full_benchmark/fission-smoke-windows-samples-balanced-latest
# Parity corpus benchmark for Ghidra-reference work
python3 benchmark/full_benchmark/full_decomp_benchmark.py \
--corpus-manifest benchmark/config/benchmark_corpus/parity_corpus.json \
--fission-bin target/release/fission_cli \
--ghidra-dir vendor/ghidra/ghidra-Ghidra_12.0.4_build \
--output-dir benchmark/artifacts/full_benchmark/fission-ghidra-parity-windows-workbench-balanced-latest \
--baseline-dir benchmark/artifacts/full_benchmark/fission-ghidra-parity-windows-workbench-balanced-baseline
# Release corpus benchmark against a previously accepted corpus baseline
python3 benchmark/full_benchmark/full_decomp_benchmark.py \
--corpus-manifest benchmark/config/benchmark_corpus/release_corpus.json \
--fission-bin target/release/fission_cli \
--ghidra-dir vendor/ghidra/ghidra-Ghidra_12.0.4_build \
--output-dir benchmark/artifacts/full_benchmark/fission-release-windows-samples-balanced-latest \
--baseline-dir benchmark/artifacts/full_benchmark/fission-release-windows-samples-balanced-baseline
Default output directories are split by workflow:
benchmark/artifacts/full_benchmark/<binary-or-suite>-<profile>-latestbenchmark/artifacts/full_benchmark/<binary-or-suite>-<profile>-<YYYYmmdd-HHMMSS> when --timestamped-outputbenchmark/artifacts/automation/<lane>-<run-profile>-<unix_run_id>benchmark/artifacts/automation/latest/<lane>/ for the synced latest view*_legacy_vs_preview.json*_legacy_vs_preview.mdfission_full.jsonghidra_full.jsonbenchmark_summary.jsonbenchmark_summary.mdbenchmark_compact_summary.jsonfission_stdout.log, fission_stderr.logbenchmark_summary.json / .md: corpus-global assessmentbenchmark_compact_summary.json: AI-facing compact sidecar<binary-id>/fission_full.json<binary-id>/ghidra_full.json<binary-id>/benchmark_summary.json<binary-id>/benchmark_summary.mdUse the helper below to run full_decomp_benchmark.py twice for --limit 2 and --limit 20,
then validate:
python3 benchmark/full_benchmark/validate_limit_regression.py \
samples/windows/x64/test_control_flow_x64_O0.exe \
--fission-bin target/debug/fission_cli \
--ghidra-dir vendor/ghidra/ghidra-Ghidra_12.0.4_build
The corpus manifest is a JSON file with an entries array. Each entry keeps the benchmark contract minimal:
idbinary_pathghidra_project_keytagsseed_limitroleOptional:
row_fidelity_targets: fixed row watchlist for that binaryweight: override the default corpus weight (primary_canary=2, others =1)Checked-in defaults:
benchmark/config/benchmark_corpus/smoke_corpus.jsonbenchmark/config/benchmark_corpus/release_corpus.jsonbenchmark/config/benchmark_corpus/parity_corpus.jsonCurrent checked-in suites are intentionally constrained to samples/windows so Ghidra-parity work stays on Windows x86/x64 binaries only.
Top-level manifest metadata:
namesuite_tier: smoke | release | paritygate_mode: advisory | blockingdynamic_watchlist_limitnotesgoto_counttop_level_label_countmust_emit_label_countempty_if_countconstant_if_countresidue_scoreNirBuildStatspreview_build_statsCorpus and per-binary summaries now surface stable owner-facing counters already present in
summary.engines.fission / preview_build_stats.
Current owner metrics:
alias_unsafemissing_mergerepresentative_roottemp_only_lifecycledead_temprepresentative_downgradedowngrade_no_aliassafe_sourcedowngrade_join_conflictmaterialization_stabilizedThese appear in:
owner_metricsowner_metric_totalsowner_metric_totals_per_binaryThe benchmark also carries presentation-oriented proxy metrics so semantic owner drift and surface-shape drift are easier to separate.
Current proxy set:
goto_totaltop_level_label_totalgeneric_local_name_sumgeneric_param_name_sumunknown_type_var_totalptr_offset_totalindex_expr_totaltext_avg_line_length_meantext_max_brace_nesting_meansynthetic_helper_call_totalsynthetic_helper_call_total is derived conservatively from __fission_*( call sites only.
These metrics are not semantic truth; they are there to highlight surfacing drift.
min/avg/median/p95 wall-clock timing per engineinit_sectotal_decomp_sectotal_postprocess_secwall_clock_secmax_rss_mbavg_rss_mbavg_cpu_pctmax_cpu_pctThe release owner is no longer putty.exe alone.
putty remains the primary canary and keeps a larger default weightsmoke: fast local validation across a small Windows x86/x64 suiteparity: Ghidra-reference workbench for owner-focused parity experiments on Windows x86/x64 samplesrelease: broader advisory Windows x86/x64 corpus for promotion candidatesRow fidelity is no longer intended to be putty-only, but the checked-in suites are still Windows-only.
row_fidelity_targets are treated as bootstrap hintswatchlist_sourcedynamic_watchlist_rowsbootstrap_row_targetswatchlist_diagnostics.selected_because_countsDynamic row selection reasons are explicit and stable:
bootstrap_explicitbaseline_degradedbaseline_low_similaritydynamic_low_similarityCorpus suites currently default to gate_mode=advisory.
release_promotion_allowed=false is expected until a suite is intentionally promotedRecommended workflow:
If --limit 20 results in a 900-second timeout:
# Identify the culprit function by testing each one individually
python benchmark/full_benchmark/find_timeout_culprit.py samples/windows/x64/putty.exe --limit 20 --timeout 120 --verbose
For the full procedure, see docs/debug/TIMEOUT_DEBUG_GUIDE.md.
test_control_flow_x64_O0.exe --limit 30
init 0.183s, decomp 4.470s, post 0.027s, success 25/30init 1.412s, decomp 0.170s, success 30/303, explicit error 2putty.exe --limit 100
init 0.260s, decomp 157.037s, success 50/100init 1.767s, decomp 3.140s, success 100/100