Automatically inferring file syntax with afl-analyze

Post Syndicated from Unknown original https://lcamtuf.blogspot.com/2016/02/say-hello-to-afl-analyze.html

The nice thing about the control flow instrumentation used by American Fuzzy Lop is that it allows you to do much more than just, well, fuzzing stuff. For example, the suite has long shipped with a standalone tool called afl-tmin, capable of automatically shrinking test cases while still making sure that they exercise the same functionality in the targeted binary (or that they trigger the same crash). Another similar tool, afl-cmin, employed a similar trick to eliminate redundant files in any large testing corpora.

The latest release of AFL features another nifty new addition along these lines: afl-analyze. The tool takes an input file, sequentially flips bytes in this data stream, and then observes the behavior of the targeted binary after every flip. From this information, it can infer several things:

  • Classify some content as no-op blocks that do not elicit any changes to control flow (say, comments, pixel data, etc).
  • Checksums, magic values, and other short, atomically compared tokens where any bit flip causes the same change to program execution.
  • Longer blobs exhibiting this property – almost certainly corresponding to checksummed or encrypted data.
  • “Pure” data sections, where analyzer-injected changes consistently elicit differing changes to control flow.

This gives us some remarkable and quick insights into the syntax of the file and the behavior of the underlying parser. It may sound too good to be true, but actually seems to work in practice. For a quick demo, let’s see what afl-analyze has to say about running cut -d ‘ ‘ -f1 on a text file:

We see that cut really only cares about spaces and newlines. Interestingly, it also appears that the tool always tokenizes the entire line, even if it’s just asked to return the first token. Neat, right?

Of course, the value of afl-analyze is greater for incomprehensible binary formats than for simple text utilities; perhaps even more so when dealing with black-box parsers (which can be analyzed thanks to the runtime QEMU instrumentation supported in AFL). To try out the tool’s ability to deal with binaries, let’s check out libpng:

This looks pretty damn good: we have two four-byte signatures, followed by chunk length, four-byte chunk name, chunk length, some image metadata, and then a comment section. Neat, right? All in a matter of seconds: no configuration needed and no knobs to turn.

Of course, the tool shipped just moments ago and is still very much experimental; expect some kinks. Field testing and feedback welcome!