Binspector’s format grammar centers around two fundamental data types: the structure and the atom. While the introductory post provided an overview of these two types, in this post I will explain each in more fine-grained detail, along with a couple examples.
Predicated Arrays
It is normal for formats to specify an array of data that is not length-prefixed. Instead, the format usually mandates some kind of marker to signify the end of the array. The canonical example of this is a null-terminated C string. We call these arrays predicated because there is a check- a predicate- that when met denotes the end of the array. In this post we’ll take a look at some of the tools available within Binspector to handle these kinds of arrays.
Sentries
Binspector can analyze a binary file and report to the user if the file is well-formed or not, that is, if the file passes analysis. While true
is a straightforward answer, false
comes with a host of complications. Specifically, what was it about the file that caused the analysis to fail? Was there some invariant violated, a read that went off into the weeds… what? Validation works best when it fails as fast as it can, because the closer one halts to the actual point of failure, the more information can be gathered about it.
Sentries are one way to facilitate failing as fast as possible during file validation. So how do they work?
A Hairbrained Approach to Security Testing
Let’s go back to a structure defined in the introductory post:
1 2 3 4 5 |
|
If you were testing an application that read a pascal_t
, what kind of data would you feed it in an attempt to break it? One strategy might be to throw random data at the program: this is called fuzzing. Because reading string
is dependent upon the value of length
, it would follow that fuzzing length
is more likely to surface vulnerabilities than fuzzing string
. This is a classic buffer overflow exploitation.
Binspector knows a lot about a file being analyzed. As it turns out, the knowledge it collects makes this kind of intelligent fuzzing heuristic pretty straightforward to implement.
Big Trouble in Little Endian
Exif is a body of specifications related to the embedding of metadata inside images. (Interestingly enough, when embedded inside a JPEG it uses a structure based on TIFF, making for a well-formed TIFF within a JPEG.) One of the features, for better or for worse, is that Exif can be stored either big- or little-endian. It is the responsibility of the input code to detect which mode the incoming Exif is in, and to interpret ensuing values correctly.
Binspector: A Binary Format Analysis Tool
Binary formats and files are inescapable. Although optimal for computers to read, sussing them manually requires exacting patience. Every developer has a moment in their career with a hex editor open, staring blankly at screenfuls of 0xDEADBEEF
or UTF-8 encoded multibyte unicode. Binspector1 was born when I found myself scouring JPEGs to make sure their Exif and IPTC/IIM metadata blocks were telling consistent stories. The tool has evolved into something genuinely useful, and I am excited to share it in the hopes others will benefit from it as well.