Fuzz Testing

CFGPack includes six libFuzzer harnesses that exercise the parsers and decode paths with randomized input. All harnesses are compiled with AddressSanitizer (ASan) and UndefinedBehaviorSanitizer (UBSan) to catch memory errors and undefined behavior at runtime.

Why Fuzz the Parsers?

CFGPack’s schema parsers (.map, JSON, and MessagePack binary) accept external input that may be corrupted, truncated, or adversarial — especially when loading configuration from flash storage or receiving it over a network. The cfgpack_pagein_buf() path also deserializes untrusted MessagePack data. Fuzzing these entry points provides confidence that malformed input is rejected cleanly rather than triggering buffer overflows, out-of-bounds reads, or undefined behavior.

Fuzz Targets

Six harness files live in tests/fuzz/:

Harness	Source	What it exercises
`fuzz_parse_map`	`fuzz_parse_map.c`	`.map` text schema parser (`cfgpack_parse_schema`)
`fuzz_parse_json`	`fuzz_parse_json.c`	JSON schema parser (`cfgpack_schema_parse_json`)
`fuzz_parse_msgpack`	`fuzz_parse_msgpack.c`	MessagePack binary schema parser (`cfgpack_schema_parse_msgpack`)
`fuzz_parse_msgpack_mutator`	`fuzz_parse_msgpack_mutator.c`	Structure-aware msgpack schema fuzzer using `LLVMFuzzerCustomMutator` — generates valid msgpack schema blobs with targeted corruption to reach deeper parser paths; on successful parse, exercises init and pageout/pagein roundtrip
`fuzz_pagein`	`fuzz_pagein.c`	`cfgpack_pagein_buf()` and `cfgpack_pagein_remap()` against a fixed schema — wraps fuzzer input with a valid CRC-32C trailer to exercise decoder paths, and also feeds raw input to exercise the CRC rejection path
`fuzz_msgpack_decode`	`fuzz_msgpack_decode.c`	All low-level msgpack decode functions (`cfgpack_msgpack_decode_uint64`, `_int64`, `_f32`, `_f64`, `_str`, `_map_header`, `_skip_value`)

Each harness implements libFuzzer’s LLVMFuzzerTestOneInput entry point, allocates a stack-based cfgpack_ctx_t, and feeds the fuzzer-provided data directly to the target function. The fuzz_parse_msgpack_mutator harness additionally implements LLVMFuzzerCustomMutator to generate structurally valid msgpack schema blobs with 16 corruption modes (truncation, bitflips, wrong counts, type mismatches, duplicate names/indices, etc.), enabling coverage of parser code paths that random bytes alone are unlikely to reach. When parsing succeeds, the harness also initializes a runtime context and performs a cfgpack_pageout/cfgpack_pagein_buf roundtrip, exercising the encode and decode I/O paths with fuzzer-derived schema data. All harnesses are self-contained and do not use the heap.

Prerequisites

Linux

Any recent Clang (11+) ships libFuzzer. No extra setup needed.

sudo apt install clang   # Debian/Ubuntu

macOS

Apple Clang does not ship libFuzzer. Install the full LLVM toolchain via Homebrew:

brew install llvm

The build system auto-detects Homebrew LLVM when the system clang lacks libFuzzer support. You do not need to manually set CC.

Building

From the project root:

make fuzz

This delegates to the sub-makefile at tests/fuzz/Makefile, which:

Detects whether CC has libFuzzer support. On macOS, if Apple Clang is detected, it automatically switches to Homebrew LLVM.
Builds the seed corpus generator (gen_seeds) and runs it to populate the corpus directories.
Compiles all six fuzz harnesses with -fsanitize=fuzzer,address,undefined.

Binaries are placed in build/out/:

build/out/fuzz_parse_map
build/out/fuzz_parse_json
build/out/fuzz_parse_msgpack
build/out/fuzz_parse_msgpack_mutator
build/out/fuzz_pagein
build/out/fuzz_msgpack_decode
build/out/gen_seeds

Why harnesses compile sources directly

Fuzz harnesses compile the library source files directly ($(LIBSRC)) rather than linking against libcfgpack.a. This is required because AddressSanitizer and UBSan instrument code at compile time — both the harness and the library code must be compiled with -fsanitize=... flags for the sanitizers to detect issues in library code.

Seed Corpus

The gen_seeds.c program generates valid seed files across six corpus directories:

Directory	Seeds	Description
`tests/fuzz/corpus_map/`	1	A valid `.map` schema file
`tests/fuzz/corpus_json/`	3	Valid JSON schemas (minimal, typical, all types)
`tests/fuzz/corpus_msgpack/`	1	A valid msgpack binary schema
`tests/fuzz/corpus_msgpack_mutator/`	4	Small random byte sequences that parameterize the custom mutator
`tests/fuzz/corpus_pagein/`	2	Valid serialized config blobs (empty + populated)
`tests/fuzz/corpus_decode/`	10	Individual msgpack-encoded values (uint, int, float, string, map, etc.)

Seeds are regenerated automatically every time make fuzz runs (the fuzz target depends on gen-seeds). Starting from valid inputs helps the fuzzer reach deeper code paths faster.

Running

Using the runner script

The scripts/run-fuzz.sh script runs all six targets sequentially with colored output:

scripts/run-fuzz.sh          # 60s per target (default)
scripts/run-fuzz.sh 300      # 300s per target
scripts/run-fuzz.sh 0        # run indefinitely (Ctrl-C to stop)

The script sets -max_len=4096 and -print_final_stats=1 for each target. Exit code is non-zero if any target crashes.

Running a single target directly

You can run any harness directly with libFuzzer flags:

build/out/fuzz_parse_msgpack tests/fuzz/corpus_msgpack/ \
    -max_total_time=120 \
    -max_len=4096 \
    -print_final_stats=1

Useful libFuzzer flags:

Flag	Description
`-max_total_time=N`	Stop after N seconds (0 = indefinite)
`-max_len=N`	Maximum input size in bytes
`-jobs=N`	Run N fuzzing jobs in parallel
`-workers=N`	Number of parallel worker processes
`-print_final_stats=1`	Print coverage and execution stats at exit
`-artifact_prefix=crashes/`	Save crash files to a directory

See libFuzzer documentation for the full list.

Investigating Crashes

When libFuzzer finds a crash, it writes a reproducer file (e.g., crash-<hash>) to the current directory (or the path set by -artifact_prefix).

Reproducing a crash

Run the harness with the crash file as an argument (not a directory):

build/out/fuzz_parse_msgpack crash-435c5524aa57e3619dd857148000af58d295e4f4

ASan will print a detailed report showing the crash type (heap-buffer-overflow, stack-buffer-overflow, use-after-free, etc.), the exact source location, and a stack trace.

Debugging with lldb

lldb -- build/out/fuzz_parse_msgpack crash-435c5524aa57e3619dd857148000af58d295e4f4
(lldb) run

ASan stops at the exact point of the memory error. Use bt for a backtrace.

Minimizing a crash input

libFuzzer can shrink a crash reproducer to its minimal triggering input:

build/out/fuzz_parse_msgpack -minimize_crash=1 -max_total_time=60 crash-<hash>

Architecture

Sub-makefile design

Fuzz build logic lives in tests/fuzz/Makefile, invoked by the root makefile via:

fuzz:
	@$(MAKE) -C tests/fuzz fuzz ROOT=$(CURDIR) BUILD=$(CURDIR)/$(BUILD) OUT=$(CURDIR)/$(OUT) CC=$(CC)

This keeps the default make / make tests path free of fuzz-related overhead (no Homebrew detection, no LLVM checks). The sub-makefile receives absolute paths for ROOT, BUILD, and OUT so all paths resolve correctly from its working directory (tests/fuzz/).

On macOS, the sub-makefile uses override CC to replace Apple Clang with Homebrew LLVM. The override is necessary because the parent passes CC=clang on the command line, which takes precedence over regular variable assignments in the sub-makefile.