AFL experiments, or please eat your brötli

Post Syndicated from Unknown original https://lcamtuf.blogspot.com/2017/04/afl-experiments-or-please-eat-your.html

When messing around with AFL, you sometimes stumble upon something unexpected or amusing. Say,
having the fuzzer spontaneously synthesize JPEG files,
come up with non-trivial XML syntax,
or discover SQL semantics.

It is also fun to challenge yourself to employ fuzzers in non-conventional ways. Two canonical examples are having your fuzzing target call abort() whenever two libraries that are supposed to implement the same algorithm produce different outputs when given identical input data; or when a library produces different outputs when asked to encode or decode the same data several times in a row.

Such tricks may sound fanciful, but they actually find interesting bugs. In one case, AFL-based equivalence fuzzing revealed a
bunch of fairly rudimentary flaws in common bignum libraries,
with some theoretical implications for crypto apps. Another time, output stability checks revealed long-lived issues in
IJG jpeg and other widely-used image processing libraries, leaking
data across web origins.

In one of my recent experiments, I decided to fuzz
brotli, an innovative compression library used in Chrome. But since it’s been
already fuzzed for many CPU-years, I wanted to do it with a twist:
stress-test the compression routines, rather than the usually targeted decompression side. The latter is a far more fruitful
target for security research, because decompression normally involves dealing with well-formed inputs, whereas compression code is meant to
accept arbitrary data and not think about it too hard. That said, the low likelihood of flaws also means that the compression bits are a relatively unexplored surface that may be worth
poking with a stick every now and then.

In this case, the library held up admirably – spare for a handful of computationally intensive plaintext inputs
(that are now easy to spot due to the recent improvements to AFL).
But the output corpus synthesized by AFL, after being seeded just with a single file containing just “0”, featured quite a few peculiar finds:

  • Strings that looked like viable bits of HTML or XML:
    <META HTTP-AAA IDEAAAA,
    DATA="IIA DATA="IIA DATA="IIADATA="IIA,
    </TD>.

  • Non-trivial numerical constants:
    1000,1000,0000000e+000000,
    0,000 0,000 0,0000 0x600,
    0000,$000: 0000,$000:00000000000000.

  • Nonsensical but undeniably English sentences:
    them with them m with them with themselves,
    in the fix the in the pin th in the tin,
    amassize the the in the in the inhe@massive in,
    he the themes where there the where there,
    size at size at the tie.

  • Bogus but semi-legible URLs:
    CcCdc.com/.com/m/ /00.com/.com/m/ /00(0(000000CcCdc.com/.com/.com

  • Snippets of Lisp code:
    )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))).

The results are quite unexpected, given that they are just a product of randomly mutating a single-byte input file and observing the code coverage in a simple compression tool. The explanation is that brotli, in addition to more familiar binary coding methods, uses a static dictionary constructed by analyzing common types of web content. Somehow, by observing the behavior of the program, AFL was able to incrementally reconstruct quite a few of these hardcoded keywords – and then put them together in various semi-interesting ways. Not bad.