Binspector

A Binary Format Analysis Tool

Sentries

| Comments

Binspector can analyze a binary file and report to the user if the file is well-formed or not, that is, if the file passes analysis. While true is a straightforward answer, false comes with a host of complications. Specifically, what was it about the file that caused the analysis to fail? Was there some invariant violated, a read that went off into the weeds… what? Validation works best when it fails as fast as it can, because the closer one halts to the actual point of failure, the more information can be gathered about it.

Sentries are one way to facilitate failing as fast as possible during file validation. So how do they work?

File formats such a PNG and TIFF contain data wrapped in length-prefixed blocks. Sometimes the format is completely block-based; sometimes it’s just substructures that are. For our purposes lets modify our original sample format grammar to be length-prefixed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct pascal_t
{
  unsigned 8 big length;
  unsigned 8 big string[length];

  summary str(@string);
}

struct user_name_t
{
  unsigned 16 big length;
  pascal_t        first;
  pascal_t        last;

  summary summaryof(first), " ", summaryof(last);
}

To keep our binary file up to speed with the grammar, we prefix file.bin with two bytes that indicate the length of the block:

If, in the course of analyzing one of the pascal_ts, a length is larger or smaller than it should be, we won’t find out about it until the parse is completed. Given a malformed binary file:

The analysis result doesn’t give us much to go on:

1
2
3
4
$ binspector -t format.bfft -i file.bin -s user_name_t
error: EOF reached. Consider using the eof slot.
in file: format.bfft:3
$main$

The key piece of information we need to leverage is main.length. If we know the scope to which that length applies, we could inform Binspector of a boundary that must be met exactly by the time that scope ends. The boundary is specified with the sentry declaration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
struct pascal_t
{
  unsigned 8 big length;
  unsigned 8 big string[length];

  summary str(@string);
}

struct user_name_t
{
  unsigned 16 big length;

  sentry (length)
  {
    pascal_t first;
    pascal_t last;
  }

  summary summaryof(first), " ", summaryof(last);
}

And the Binspector output is more informative:

1
2
3
4
5
6
7
$ binspector -t format.bfft -i file.bin -s user_name_t
main sentry barrier breach
main sentry barrier breach
error: EOF reached. Consider using the eof slot.
while analyzing: main.length
in file: format.bfft:3
$main$

I’ll be the first to admit the sentry error reporting needs to be cleaned up, but let me break down what Binspector is trying to say. The two key bits of information are main sentry barrier breach and the point the grammar failed, namely format.bfft:3. Binspector was in the process of executing the line found at format.bfft:3, namely, the length of a pascal_t, when the sentry established by main.length was overrun.

If the length value is malformed and specifies a larger block than actual data:

We get notified of that in turn:

1
2
3
$ binspector -t format.bfft -i file.bin -s user_name_t
WARNING: After  sentry, read position should be 34 but instead is 18.
$main$

Notice in both cases, Binspector still drops you into a command-line interface. This gives the user the ability to navigate the analysis up to the point of failure in an attempt to discern where things went wrong.

Comments