Fuzz Testing

CFGPack includes six libFuzzer harnesses that exercise the parsers and decode paths with randomized input. All harnesses are compiled with AddressSanitizer (ASan) and UndefinedBehaviorSanitizer (UBSan) to catch memory errors and undefined behavior at runtime.

Why Fuzz the Parsers?

CFGPack’s schema parsers (.map, JSON, and MessagePack binary) accept external input that may be corrupted, truncated, or adversarial — especially when loading configuration from flash storage or receiving it over a network. The cfgpack_pagein_buf() path also deserializes untrusted MessagePack data. Fuzzing these entry points provides confidence that malformed input is rejected cleanly rather than triggering buffer overflows, out-of-bounds reads, or undefined behavior.

Fuzz Targets

Six harness files live in tests/fuzz/:

Harness

Source

What it exercises

fuzz_parse_map

fuzz_parse_map.c

.map text schema parser (cfgpack_parse_schema)

fuzz_parse_json

fuzz_parse_json.c

JSON schema parser (cfgpack_schema_parse_json)

fuzz_parse_msgpack

fuzz_parse_msgpack.c

MessagePack binary schema parser (cfgpack_schema_parse_msgpack)

fuzz_parse_msgpack_mutator

fuzz_parse_msgpack_mutator.c

Structure-aware msgpack schema fuzzer using LLVMFuzzerCustomMutator — generates valid msgpack schema blobs with targeted corruption to reach deeper parser paths; on successful parse, exercises init and pageout/pagein roundtrip

fuzz_pagein

fuzz_pagein.c

cfgpack_pagein_buf() and cfgpack_pagein_remap() against a fixed schema — wraps fuzzer input with a valid CRC-32C trailer to exercise decoder paths, and also feeds raw input to exercise the CRC rejection path

fuzz_msgpack_decode

fuzz_msgpack_decode.c

All low-level msgpack decode functions (cfgpack_msgpack_decode_uint64, _int64, _f32, _f64, _str, _map_header, _skip_value)

Each harness implements libFuzzer’s LLVMFuzzerTestOneInput entry point, allocates a stack-based cfgpack_ctx_t, and feeds the fuzzer-provided data directly to the target function. The fuzz_parse_msgpack_mutator harness additionally implements LLVMFuzzerCustomMutator to generate structurally valid msgpack schema blobs with 16 corruption modes (truncation, bitflips, wrong counts, type mismatches, duplicate names/indices, etc.), enabling coverage of parser code paths that random bytes alone are unlikely to reach. When parsing succeeds, the harness also initializes a runtime context and performs a cfgpack_pageout/cfgpack_pagein_buf roundtrip, exercising the encode and decode I/O paths with fuzzer-derived schema data. All harnesses are self-contained and do not use the heap.

Prerequisites

Linux

Any recent Clang (11+) ships libFuzzer. No extra setup needed.

sudo apt install clang   # Debian/Ubuntu

macOS

Apple Clang does not ship libFuzzer. Install the full LLVM toolchain via Homebrew:

brew install llvm

The build system auto-detects Homebrew LLVM when the system clang lacks libFuzzer support. You do not need to manually set CC.

Building

From the project root:

make fuzz

This delegates to the sub-makefile at tests/fuzz/Makefile, which:

  1. Detects whether CC has libFuzzer support. On macOS, if Apple Clang is detected, it automatically switches to Homebrew LLVM.

  2. Builds the seed corpus generator (gen_seeds) and runs it to populate the corpus directories.

  3. Compiles all six fuzz harnesses with -fsanitize=fuzzer,address,undefined.

Binaries are placed in build/out/:

build/out/fuzz_parse_map
build/out/fuzz_parse_json
build/out/fuzz_parse_msgpack
build/out/fuzz_parse_msgpack_mutator
build/out/fuzz_pagein
build/out/fuzz_msgpack_decode
build/out/gen_seeds

Why harnesses compile sources directly

Fuzz harnesses compile the library source files directly ($(LIBSRC)) rather than linking against libcfgpack.a. This is required because AddressSanitizer and UBSan instrument code at compile time — both the harness and the library code must be compiled with -fsanitize=... flags for the sanitizers to detect issues in library code.

Seed Corpus

The gen_seeds.c program generates valid seed files across six corpus directories:

Directory

Seeds

Description

tests/fuzz/corpus_map/

1

A valid .map schema file

tests/fuzz/corpus_json/

3

Valid JSON schemas (minimal, typical, all types)

tests/fuzz/corpus_msgpack/

1

A valid msgpack binary schema

tests/fuzz/corpus_msgpack_mutator/

4

Small random byte sequences that parameterize the custom mutator

tests/fuzz/corpus_pagein/

2

Valid serialized config blobs (empty + populated)

tests/fuzz/corpus_decode/

10

Individual msgpack-encoded values (uint, int, float, string, map, etc.)

Seeds are regenerated automatically every time make fuzz runs (the fuzz target depends on gen-seeds). Starting from valid inputs helps the fuzzer reach deeper code paths faster.

Running

Using the runner script

The scripts/run-fuzz.sh script runs all six targets sequentially with colored output:

scripts/run-fuzz.sh          # 60s per target (default)
scripts/run-fuzz.sh 300      # 300s per target
scripts/run-fuzz.sh 0        # run indefinitely (Ctrl-C to stop)

The script sets -max_len=4096 and -print_final_stats=1 for each target. Exit code is non-zero if any target crashes.

Running a single target directly

You can run any harness directly with libFuzzer flags:

build/out/fuzz_parse_msgpack tests/fuzz/corpus_msgpack/ \
    -max_total_time=120 \
    -max_len=4096 \
    -print_final_stats=1

Useful libFuzzer flags:

Flag

Description

-max_total_time=N

Stop after N seconds (0 = indefinite)

-max_len=N

Maximum input size in bytes

-jobs=N

Run N fuzzing jobs in parallel

-workers=N

Number of parallel worker processes

-print_final_stats=1

Print coverage and execution stats at exit

-artifact_prefix=crashes/

Save crash files to a directory

See libFuzzer documentation for the full list.

Investigating Crashes

When libFuzzer finds a crash, it writes a reproducer file (e.g., crash-<hash>) to the current directory (or the path set by -artifact_prefix).

Reproducing a crash

Run the harness with the crash file as an argument (not a directory):

build/out/fuzz_parse_msgpack crash-435c5524aa57e3619dd857148000af58d295e4f4

ASan will print a detailed report showing the crash type (heap-buffer-overflow, stack-buffer-overflow, use-after-free, etc.), the exact source location, and a stack trace.

Debugging with lldb

lldb -- build/out/fuzz_parse_msgpack crash-435c5524aa57e3619dd857148000af58d295e4f4
(lldb) run

ASan stops at the exact point of the memory error. Use bt for a backtrace.

Minimizing a crash input

libFuzzer can shrink a crash reproducer to its minimal triggering input:

build/out/fuzz_parse_msgpack -minimize_crash=1 -max_total_time=60 crash-<hash>

Architecture

Sub-makefile design

Fuzz build logic lives in tests/fuzz/Makefile, invoked by the root makefile via:

fuzz:
	@$(MAKE) -C tests/fuzz fuzz ROOT=$(CURDIR) BUILD=$(CURDIR)/$(BUILD) OUT=$(CURDIR)/$(OUT) CC=$(CC)

This keeps the default make / make tests path free of fuzz-related overhead (no Homebrew detection, no LLVM checks). The sub-makefile receives absolute paths for ROOT, BUILD, and OUT so all paths resolve correctly from its working directory (tests/fuzz/).

On macOS, the sub-makefile uses override CC to replace Apple Clang with Homebrew LLVM. The override is necessary because the parent passes CC=clang on the command line, which takes precedence over regular variable assignments in the sub-makefile.