Generated: 2026-04-21 Scope: repository root
Fission is a Rust-first reverse-engineering/decompilation workspace. Ghidra-native lifting feeds Rust-owned NIR/HIR normalization, structuring, rendering, and automation quality lanes.
Repository layout (directories + workspace crates): see docs/PROJECT_MAP.md. Use this file plus the tree below when navigating; avoid duplicating full crate lists in multiple docs.
Fission/
├── benchmark/
│ ├── config/ # Benchmark corpus + automation manifests
│ ├── source_semantic_benchmark/ # Canonical source-vs-Fission semantic benchmark
│ ├── full_benchmark/ # Benchmark runner, support modules, rendering
│ ├── artifacts/ # Benchmark outputs (automation/, source_semantic_benchmark/, full_benchmark/)
│ └── binary/ # Curated benchmark binaries and fixtures
├── crates/
│ ├── fission-pcode/ # Canonical IR, NIR/HIR, structuring, CFG, printer
│ ├── fission-decompiler/ # Orchestration + Rust-Sleigh bridge (re-exports IR crate)
│ ├── fission-sleigh/ # Sleigh decode/lift runtime
│ ├── fission-static/ # Static facts, native preparation, analysis services
│ ├── fission-automation/ # Quality lanes, deltas, go/stop signals, artifacts
│ ├── fission-loader/ # Binary parsing, symbols, sections, strings
│ ├── fission-signatures/ # FID/signature data and lookup
│ ├── fission-cli/ # CLI surface
│ └── fission-tauri/ # Desktop surface
├── utils/ # Checked-in signatures, type info, benchmark support data
├── vendor/ # Ghidra, RetDec, other reference code
├── scripts/benchmark/ # Benchmark setup / history helpers
├── scripts/test/ # Smoke / fuzz / automation helpers
└── .github/workflows/ # CI/CD source of truth
crates/fission-pcode/src/nir/AGENTS.mdcrates/fission-pcode/src/nir/structuring/AGENTS.mdcrates/fission-automation/AGENTS.mdcrates/fission-cli/AGENTS.mdcrates/fission-loader/AGENTS.mdRead the nearest child file before editing those areas.
| Task | Location | Notes |
|---|---|---|
| NIR structuring / canonicalization | crates/fission-pcode/src/nir/structuring/ |
Core algorithmic decompiler work lives here |
| NIR telemetry contract | crates/fission-pcode/src/nir/types.rs |
NirBuildStats is canonical |
| Decompilation orchestration / Rust-Sleigh | crates/fission-decompiler/ |
Routing, workers, type-context assembly; consumes fission-pcode + fission-static facts |
| Quality lanes / automation summaries | crates/fission-automation/ |
nir-check, reports; must stay aligned with NirBuildStats |
| Automation summaries / deltas (implementation) | crates/fission-automation/src/report/ |
Markdown/JSON pipeline; must stay aligned with NirBuildStats |
| Source semantic benchmark / corpus reports | benchmark/source_semantic_benchmark/ |
Canonical source-vs-Fission semantic quality surface; Ghidra is not used as the oracle |
| Ghidra reference benchmark | benchmark/full_benchmark/ |
Reference/comparison lane only; keep reporting/gating additive |
| Benchmark manifests / automation manifests | benchmark/config/ |
Corpus manifests and sentinel sets live here now |
| CLI one-shot parsing / command ownership | crates/fission-cli/src/cli/ |
Keep subcommand UX and legacy shims separate from semantics |
| Runtime resource paths (signatures, DiE, FID, patterns, typeinfo) | crates/fission-core/src/core/path_config.rs, resource_roots.rs |
PATHS / PathConfig::detect; overrides: CLI --resource-root, FISSION_RESOURCE_ROOT; operator docs: docs/CLI.md § Runtime resource bundle |
| Checked-in utility resources | /Users/sjkim1127/Fission/utils |
Prefer existing resource/path config and utility loaders over hardcoded paths; use this tree when reusable signatures, type info, benchmark support data, or other checked-in resources already cover the need |
| Loader identity / binary provenance hints | crates/fission-loader/src/loader/identity/ |
Evidence-backed BinaryIdentityReport on LoadedBinary; not an IR/decompiler repair layer |
| Binary loaders (PE, ELF, Mach-O, TE, COFF, etc.) | crates/fission-loader/src/loader/ |
Format-specific byte parsing, section mapping, relocation, and symbol resolution |
| Static facts and binary-derived analysis services | crates/fission-static/src/analysis/ |
Xrefs, discovery, patches, strings; fact extraction — not decompiler orchestration |
| Decomp-facing facts / native prep surface | crates/fission-static/src/analysis/decomp/ |
FactStore and related helpers consumed by fission-decompiler |
| Reference algorithms | /Users/sjkim1127/Fission/vendor, especially /Users/sjkim1127/Fission/vendor/ghidra/ and /Users/sjkim1127/Fission/vendor/retdec-5.0/ |
Reference these often for invariants and behavior, but do not add runtime/build dependencies, bindings, or copied implementation shortcuts |
NirBuildStats./Users/sjkim1127/Fission/utils has reusable signatures, type info, benchmark support data, or other checked-in resources, route access through existing PathConfig, PATHS, resource_roots, or utility loaders instead of embedding absolute paths./Users/sjkim1127/Fission/vendor as a reference corpus only: consult it often for algorithms, invariants, and expected behavior, but keep Fission-owned Rust implementations dependency-free from that tree.fission-cli or fission-tauri as semantic repair layers.PathConfig, PATHS, resource_roots, or related helpers by embedding /Users/sjkim1127/Fission/utils directly in implementation logic./Users/sjkim1127/Fission/vendor code in production paths.# CLI
cargo build -p fission-cli --release
# Common decompiler validation
cargo nextest run -p fission-pcode
cargo check -p fission-pcode
cargo check -p fission-decompiler
cargo check -p fission-automation
# Quality lane
cargo run -p fission-automation -- nir-check --lane nir
# Canonical benchmark runner
python3 benchmark/source_semantic_benchmark/run_source_semantic_benchmark.py --help
cargo nextest run as the default local Rust test runner. Use cargo test only when checking doctests, harness-specific behavior, or when nextest is unavailable.cargo nextest run -p fission-pcode → cargo check -p fission-pcode.cargo check -p fission-decompiler (and CLI/Tauri surfaces as needed).cargo nextest run -p fission-core and smoke fission_cli resources status.cargo check -p fission-automation.benchmark/source_semantic_benchmark/ and keep artifacts under benchmark/artifacts/..github/workflows/ci.yml and ci-heavy.yml as CI source of truth..github/workflows/release-tag.yml (Release Tag (CI green)): it only tags a commit after ci.yml has a successful push run for that SHA, then cd.yml builds assets.Use this loop for source-semantic or pseudocode-quality work, especially when a concrete row/function motivated the change.
vendor/ reference tools.Use this prompt as the standing operating model for decompiler-quality cycles:
Current priority order:
1. x86 / x86-64 decompilation correctness and readable pseudocode quality.
SLEIGH lift coverage is now good enough that day-to-day quality work should focus on source-semantic correctness and human-readable pseudocode for x86/x86-64 sample binaries.
The goal is not only mechanically correct p-code/NIR, but final output that reads like useful C pseudocode.
Focus areas:
- control-flow recovery
- if / else / switch / loop / break / continue structuring
- pointer, array, struct, and field-access expressions
- calling convention, parameter, and local-variable recovery
- removal of unnecessary temporaries
- C-friendly recovery of pointer arithmetic into array/index/field forms
- return-value, accumulator, and loop-induction-variable cleanup
- function-level pseudocode readability compared with Ghidra
2. Type and data abstraction.
Improve struct, pointer, array, field access, calling convention, parameter, and local recovery at the NIR/HIR semantic layer, not by output-only substitution.
3. Large and hard function structuring.
Improve small sample functions first, then extend to complex x86/x86-64 functions using CFG, dominance, post-dominance, SCC, dataflow, and fixed-point analysis.
4. Maintain SLEIGH lift correctness and prevent regressions.
Do not add manual mappings in the SLEIGH engine. Keep `.sla` ConstructTpl execution as the success source, and do not grow legacy token cursor, BoundOperand fallback, or compatibility-classifier debt.
When SLEIGH changes are necessary, validate row-level raw p-code parity first, then the canonical gate, then benchmarks.
5. FID/name recovery.
Gradually improve packed `.fidb`, exact hash inputs, and program seeker coverage relative to Ghidra Function ID / signature / symbol ecosystems.
6. Architecture and file-format breadth.
Expand to ARM, MIPS, PPC, ELF, and Mach-O advanced cases only after x86/x86-64 quality is strong enough.
Required principles:
1. Improve sample binary quality before real-world binary breadth.
2. Treat Ghidra as a cleanroom reference for algorithms, invariants, and edge cases.
3. Default to zero new production dependencies.
4. Prefer CFG, dominance, dataflow, fixed-point, and constraint-based reasoning over brittle pattern matching or temporary heuristics.
5. Avoid overfitting to one ISA/compiler, while keeping x86/x86-64 as the current optimization target.
6. Consider Rust libraries only when a confirmed long-term bottleneck cannot be solved internally. Do not add C++ bindings.
7. Prefer long-term maintainability and generalizable architecture over short-term output patches.
8. Make proposals and implementations valid across multiple future quality cycles, with explicit observability and verification.
9. Do not use estimates as evidence. Base claims on measured, reproducible data.
10. The final success criterion is actual improvement in `benchmark/source_semantic_benchmark` semantic correctness and pseudocode quality.
Resource rules:
- `/Users/sjkim1127/Fission/utils` contains reusable Fission resources, type information, signatures, and benchmark support data.
- Prefer existing resource loaders, `PathConfig`, `PATHS`, and resource-root mechanisms over hardcoded paths or duplicate implementations.
- Use `utils` only when it supports a maintainable semantic-layer design, not as a workaround.
- `/Users/sjkim1127/Fission/vendor` and especially `/Users/sjkim1127/Fission/vendor/ghidra/ghidra-Ghidra_12.0.4_build` are reference-only sources.
- Do not add runtime/build dependencies on vendor code, do not copy implementations, and do not add C++ bindings.
Regression-prevention workflow:
1. Start from a concrete source-semantic row or small sample function.
2. Record baseline behavior, case pass count, semantic/static score, stdout/stderr, line/byte size, and top feature gaps.
3. Diagnose the canonical owner before editing: SLEIGH/raw p-code, NIR materialization, type/data recovery, structuring, cleanup, printer, benchmark, or automation.
4. Make the smallest invariant-based production change at that owner.
5. Add targeted coverage for the invariant before or with the fix.
6. Run the targeted test, relevant crate checks/tests, and release CLI build when benchmark validation needs it.
7. Run the focused source-semantic benchmark with stale decompilation and behavior caches disabled for semantic changes.
8. Compare against the baseline and inspect candidate artifacts, not just aggregate scores.
9. Run smoke or automation regression checks after focused improvement.
10. Report whether quality improved, stayed unchanged, or regressed. Separate test/telemetry changes from semantic fixes.
11. Commit and push intermittently only from `main`, and stage only intended hunks in a dirty worktree.
docs/architecture/ARCHITECTURE.mddocs/adr/ — architectural decisions (ADR index lives alongside numbered entries)docs/build/BUILD.mdREADME.md.github/workflows/ci.yml.github/workflows/ci-heavy.yml.github/workflows/release-tag.yml