fhtr: December 2009

Background

A few days ago, I was fiddling around with loading 3D models for use with WebGL (here demos). I had an OBJ parser and some test models lying around from an earlier project, and the parser was only a two hundred lines, so I ported it over to JavaScript. But the OBJ files were pretty huge, a 40-thousand quad model without normals or texcoords was 2.4 megs. Not really something you want to send over the internet.

The OBJ file format stores coordinates and face indexes as text. While it makes the model files human-readable and easy to parse, it also increases the file size. Uncompressed, a single coord takes 9 bytes, plus an optional sign byte.

My first instinct was to just use gzip compression on server-side. And it did manage to bring the file size down to 880k for the above-mentioned 2.4 meg OBJ. Still. Maybe there would be more gains to be had by storing the coords and face indexes as binary data.

I went for [0,1]-normalized 16-bit fixed-point numbers for the coordinates and 16-bit unsigned ints for the face indexes. I also had to store the vertex and face counts, for which I used 32-bit unsigned ints (which, come think of it, is a bit silly, as using 16-bit face indexes limits the vert count to 65536). Finally, I added two three-float vectors for coord scaling and translation, in order to inflate the normalized coords back to something close to their original values.

Result: 560k for uncompressed binary file, 470k gzipped. Nice. Though it could be brought down further by using a triangle strip instead of separate quads and tris. And there are fancy 3D mesh compression papers that talk about compressing down to less than 2 bits per triangle (for the example model: 40kquads -> 80ktris = 160kb = 20kB). But enough of that for now. Next up, how to parse the binary data.

Parsing binary data

To parse a binary string with JavaScript, you use charCodeAt and the usual bit munging operators (&, |, <<, >>). First, you need the byte value at some point in the string. As charCodeAt operates on characters and not bytes, you need to tell the browser to get the data as an unparsed string by calling overrideMimeType("text/plain; charset=x-user-defined") on your XMLHttpRequest. Then mask away the high byte of each char code to get the low byte value. The following works at least on Firefox and WebKit, I don't know about IE and Opera.


  var byteValue = str.charCodeAt(index) & 0xFF;

Loading unsigned ints is simple enough, just load the bytes for the int and shift them to their places:


  // read big-endian (network byte order) unsigned 32-bit int from data, at offset
  readUInt32 = function(data, offset) {
    return ((data.charCodeAt(offset) & 0xFF) << 24) +
           ((data.charCodeAt(offset+1) & 0xFF) << 16) +
           ((data.charCodeAt(offset+2) & 0xFF) << 8) +
           (data.charCodeAt(offset+3) & 0xFF);
  }
  // read big-endian (network byte order) unsigned 16-bit int from data, at offset
  readUInt16 = function(data, offset) {
    return ((data.charCodeAt(offset) & 0xFF) << 8) +
           (data.charCodeAt(offset+1) & 0xFF);
  }

Reading in normalized fixed-point numbers as floats isn't much harder either, as they're generated with uint16(normalized_float * 65535):


  readNormalizedUFixedPoint16 = function(data, offset) {
    return readUInt16(data, offset) / 65535.0;
  }

Floats, on the other hand, are a bit more involved. Writing the float parser was educational, now I have a better understanding of the buggers! The following code doesn't handle the special numbers and likely loses precision, is slow, etc., so if you have a better way of parsing binary floats, please share.


  // read big-endian (network byte order) 32-bit float
  readFloat32 = function(data, offset) {
    var b1 = data.charCodeAt(offset) & 0xFF,
        b2 = data.charCodeAt(offset+1) & 0xFF,
        b3 = data.charCodeAt(offset+2) & 0xFF,
        b4 = data.charCodeAt(offset+3) & 0xFF;
    var sign = 1 - (2*(b1 >> 7));                     // sign = bit 0
    var exp = (((b1 << 1) & 0xff) | (b2 >> 7)) - 127; // exponent = bits 1..8
    var sig = ((b2 & 0x7f) << 16) | (b3 << 8) | b4;   // significand = bits 9..31
    if (sig == 0 && exp == -127)
      return 0.0;
    return sign * (1 + sig * Math.pow(2, -23)) * Math.pow(2, exp);
  }

And there you have it. I haven't gotten around to parsing signed integers, though I guess subtracting 1 << (bits-1) from the unsigned value would do the trick.

fhtr

2009-12-17

3D models and parsing binary data with JavaScript

Background

Parsing binary data

Blog Archive

About Me