# Fuzz Testing CFGPack includes six [libFuzzer](https://llvm.org/docs/LibFuzzer.html) harnesses that exercise the parsers and decode paths with randomized input. All harnesses are compiled with AddressSanitizer (ASan) and UndefinedBehaviorSanitizer (UBSan) to catch memory errors and undefined behavior at runtime. ## Why Fuzz the Parsers? CFGPack's schema parsers (`.map`, JSON, and MessagePack binary) accept external input that may be corrupted, truncated, or adversarial — especially when loading configuration from flash storage or receiving it over a network. The `cfgpack_pagein_buf()` path also deserializes untrusted MessagePack data. Fuzzing these entry points provides confidence that malformed input is rejected cleanly rather than triggering buffer overflows, out-of-bounds reads, or undefined behavior. ## Fuzz Targets Six harness files live in `tests/fuzz/`: | Harness | Source | What it exercises | |---------|--------|-------------------| | `fuzz_parse_map` | `fuzz_parse_map.c` | `.map` text schema parser (`cfgpack_parse_schema`) | | `fuzz_parse_json` | `fuzz_parse_json.c` | JSON schema parser (`cfgpack_schema_parse_json`) | | `fuzz_parse_msgpack` | `fuzz_parse_msgpack.c` | MessagePack binary schema parser (`cfgpack_schema_parse_msgpack`) | | `fuzz_parse_msgpack_mutator` | `fuzz_parse_msgpack_mutator.c` | Structure-aware msgpack schema fuzzer using `LLVMFuzzerCustomMutator` — generates valid msgpack schema blobs with targeted corruption to reach deeper parser paths; on successful parse, exercises init and pageout/pagein roundtrip | | `fuzz_pagein` | `fuzz_pagein.c` | `cfgpack_pagein_buf()` and `cfgpack_pagein_remap()` against a fixed schema — wraps fuzzer input with a valid CRC-32C trailer to exercise decoder paths, and also feeds raw input to exercise the CRC rejection path | | `fuzz_msgpack_decode` | `fuzz_msgpack_decode.c` | All low-level msgpack decode functions (`cfgpack_msgpack_decode_uint64`, `_int64`, `_f32`, `_f64`, `_str`, `_map_header`, `_skip_value`) | Each harness implements libFuzzer's `LLVMFuzzerTestOneInput` entry point, allocates a stack-based `cfgpack_ctx_t`, and feeds the fuzzer-provided data directly to the target function. The `fuzz_parse_msgpack_mutator` harness additionally implements `LLVMFuzzerCustomMutator` to generate structurally valid msgpack schema blobs with 16 corruption modes (truncation, bitflips, wrong counts, type mismatches, duplicate names/indices, etc.), enabling coverage of parser code paths that random bytes alone are unlikely to reach. When parsing succeeds, the harness also initializes a runtime context and performs a `cfgpack_pageout`/`cfgpack_pagein_buf` roundtrip, exercising the encode and decode I/O paths with fuzzer-derived schema data. All harnesses are self-contained and do not use the heap. ## Prerequisites ### Linux Any recent Clang (11+) ships libFuzzer. No extra setup needed. ```bash sudo apt install clang # Debian/Ubuntu ``` ### macOS Apple Clang does **not** ship libFuzzer. Install the full LLVM toolchain via Homebrew: ```bash brew install llvm ``` The build system auto-detects Homebrew LLVM when the system `clang` lacks libFuzzer support. You do not need to manually set `CC`. ## Building From the project root: ```bash make fuzz ``` This delegates to the sub-makefile at `tests/fuzz/Makefile`, which: 1. Detects whether `CC` has libFuzzer support. On macOS, if Apple Clang is detected, it automatically switches to Homebrew LLVM. 2. Builds the seed corpus generator (`gen_seeds`) and runs it to populate the corpus directories. 3. Compiles all six fuzz harnesses with `-fsanitize=fuzzer,address,undefined`. Binaries are placed in `build/out/`: ``` build/out/fuzz_parse_map build/out/fuzz_parse_json build/out/fuzz_parse_msgpack build/out/fuzz_parse_msgpack_mutator build/out/fuzz_pagein build/out/fuzz_msgpack_decode build/out/gen_seeds ``` ### Why harnesses compile sources directly Fuzz harnesses compile the library source files directly (`$(LIBSRC)`) rather than linking against `libcfgpack.a`. This is required because AddressSanitizer and UBSan instrument code at compile time — both the harness and the library code must be compiled with `-fsanitize=...` flags for the sanitizers to detect issues in library code. ## Seed Corpus The `gen_seeds.c` program generates valid seed files across six corpus directories: | Directory | Seeds | Description | |-----------|------:|-------------| | `tests/fuzz/corpus_map/` | 1 | A valid `.map` schema file | | `tests/fuzz/corpus_json/` | 3 | Valid JSON schemas (minimal, typical, all types) | | `tests/fuzz/corpus_msgpack/` | 1 | A valid msgpack binary schema | | `tests/fuzz/corpus_msgpack_mutator/` | 4 | Small random byte sequences that parameterize the custom mutator | | `tests/fuzz/corpus_pagein/` | 2 | Valid serialized config blobs (empty + populated) | | `tests/fuzz/corpus_decode/` | 10 | Individual msgpack-encoded values (uint, int, float, string, map, etc.) | Seeds are regenerated automatically every time `make fuzz` runs (the `fuzz` target depends on `gen-seeds`). Starting from valid inputs helps the fuzzer reach deeper code paths faster. ## Running ### Using the runner script The `scripts/run-fuzz.sh` script runs all six targets sequentially with colored output: ```bash scripts/run-fuzz.sh # 60s per target (default) scripts/run-fuzz.sh 300 # 300s per target scripts/run-fuzz.sh 0 # run indefinitely (Ctrl-C to stop) ``` The script sets `-max_len=4096` and `-print_final_stats=1` for each target. Exit code is non-zero if any target crashes. ### Running a single target directly You can run any harness directly with libFuzzer flags: ```bash build/out/fuzz_parse_msgpack tests/fuzz/corpus_msgpack/ \ -max_total_time=120 \ -max_len=4096 \ -print_final_stats=1 ``` Useful libFuzzer flags: | Flag | Description | |------|-------------| | `-max_total_time=N` | Stop after N seconds (0 = indefinite) | | `-max_len=N` | Maximum input size in bytes | | `-jobs=N` | Run N fuzzing jobs in parallel | | `-workers=N` | Number of parallel worker processes | | `-print_final_stats=1` | Print coverage and execution stats at exit | | `-artifact_prefix=crashes/` | Save crash files to a directory | See [libFuzzer documentation](https://llvm.org/docs/LibFuzzer.html) for the full list. ## Investigating Crashes When libFuzzer finds a crash, it writes a reproducer file (e.g., `crash-`) to the current directory (or the path set by `-artifact_prefix`). ### Reproducing a crash Run the harness with the crash file as an argument (not a directory): ```bash build/out/fuzz_parse_msgpack crash-435c5524aa57e3619dd857148000af58d295e4f4 ``` ASan will print a detailed report showing the crash type (heap-buffer-overflow, stack-buffer-overflow, use-after-free, etc.), the exact source location, and a stack trace. ### Debugging with lldb ```bash lldb -- build/out/fuzz_parse_msgpack crash-435c5524aa57e3619dd857148000af58d295e4f4 (lldb) run ``` ASan stops at the exact point of the memory error. Use `bt` for a backtrace. ### Minimizing a crash input libFuzzer can shrink a crash reproducer to its minimal triggering input: ```bash build/out/fuzz_parse_msgpack -minimize_crash=1 -max_total_time=60 crash- ``` ## Architecture ### Sub-makefile design Fuzz build logic lives in `tests/fuzz/Makefile`, invoked by the root makefile via: ```makefile fuzz: @$(MAKE) -C tests/fuzz fuzz ROOT=$(CURDIR) BUILD=$(CURDIR)/$(BUILD) OUT=$(CURDIR)/$(OUT) CC=$(CC) ``` This keeps the default `make` / `make tests` path free of fuzz-related overhead (no Homebrew detection, no LLVM checks). The sub-makefile receives absolute paths for `ROOT`, `BUILD`, and `OUT` so all paths resolve correctly from its working directory (`tests/fuzz/`). On macOS, the sub-makefile uses `override CC` to replace Apple Clang with Homebrew LLVM. The `override` is necessary because the parent passes `CC=clang` on the command line, which takes precedence over regular variable assignments in the sub-makefile.