fhtr: January 2012

2012-01-29

Shooting high ISO in broad daylight

DSLRs these days get usable results even at super-high ISO (sensor sensitivity to light, higher it is the less light you need). But, um, what's the use of ISO 25600 if most of your shooting happens in bright daylight (or anything other than pitch-black darkness). Let's think!

What you get from cranking up ISO is the capability to use faster shutter speeds and tighter apertures. So you get photos with less motion blur and more of the image in focus. So you could shoot in motion and not need to focus. This is starting to sound useful. Shoot while walking without having to stop.

How about framing though? Having your camera up on your eye when you're walking up stairs sounds like a recipe for broken bones. Shooting from the hip would be nicer, but now you can't see what's in frame. Wide-angle lenses to the rescue! Get enough of the scene in the image that you'll likely have your subject in the frame, and do the framing in post.

It's really fast to take photos if you don't have worry about focus or framing. Point camera at what you're interested in, press shutter, done.

High ISO looks like crap in color though. Go black and white and you'll get smooth usable results at 6400 and noisy results at 25600 on a Nikon D5100 / D7000 / Sony NEX-5N (all have the same sensor AFAIK. I have a D5100). I'd kinda like to try a NEX-5N with a pancake lens for a small setup.

To recap: set ISO to 6400 or 25600, shoot in black and white, use manual focus (set to near-infinity or 20 meters or something), set shutter speed to 1/1000 or 1/500, aperture to f/16, use a 24mm lens or wider, snap away while walking!

Here's a gallery of my results from yesterday. They're not all pure examples of this technique, for some I brought the camera to my eye to do framing.

2012-01-27

Animating a million letters with WebGL

Here's an WebGL animated book demo! It's got just 150000 letters, but it does scale up to two million.

Writing efficient WebGL is a bit funny. The basic idea is to collect as many operations as possible into a single draw call, as changing the WebGL state machine state and doing WebGL calls is relatively expensive. If you want to draw more than a couple thousand objects at once, you need to adopt a quite different strategy for drawing.

The usual way of drawing with WebGL is to set up your uniforms, buffers and shaders for each object, followed by a call to draw the object. Unless your object is very complex, the time taken in this way of drawing is dominated by the state setup. To draw in a fast way, you can either do some buffer editing in JavaScript, followed by re-uploading the buffer and the draw call. If you need to go even faster, you can push more computation to the shaders.

My goal in this article is to draw a million animated letters on the screen at a smooth framerate. This task should be quite possible with modern GPUs. Each letter consists of two textured triangles, so we're only talking about two million triangles per frame.

Ok, let's start. First I'm going to create a texture with the letter bitmaps on it. I'm using the 2D canvas for this. The resulting texture has all the letters I want to draw. Then I'm going to create a buffer with texture coordinates to the letter sprite sheet. While this is an easy and straightforward method of setting up the letters, it’s a bit wasteful as it uses two floats per vertex for the texcoords. A shorter way would be to pack the letter index and corner index into one number and convert that back to texture coordinates in the vertex shader.

I also upload a two-million triangle array to the GPU. These vertices are used by the vertex shader to put the letters on the screen. The vertices are set to the letter positions in the text so that if you render the triangle array as-is, you get a basic layout rendering of the text.

With a simple vertex shader, I get a flat view of the text. Nothing fancy. Runs well, but if I want to animate it, I need to do the animation in Javascript. And JavaScript is kinda slow for animating the six million vertices involved, especially if you want to do it on every frame. Maybe there is there a faster way.

Why yes, we can do procedural animation. What that means is that we do all our position and rotation math in the vertex shader. Now I don't need to run any JavaScript to update the positions of the vertices. The vertex shader runs very fast and I get a smooth framerate even with a million triangles being individually animated every frame. To address the individual triangles, I round down the vertex coordinates so that all four points of a letter quad map to a single unique coordinate. Now I can use this coordinate to set the animation parameters for the letter in question.

The only problem now is that JavaScript doesn’t know about the particle positions. If you really need to know where your particles are, you could duplicate the vertex shader logic in JavaScript and update them in, say, a web worker every time you need the positions. That way your rendering thread doesn’t have to wait for the math and you can continue animating at a smooth frame rate.

For more controllable animation, we could use render-to-texture functionality to tween between the JavaScript-provided positions and the current positions. First we render the current positions to the texture, then tween from the JS array towards these positions, updating the texture on each frame. The nice thing about this is that we can update a small fraction of the JS positions per frame and still continue animating all the letters every frame. The vertex shader is tweening the positions.

Using MediaStream API

Using the MediaStream API to access webcam from JavaScript:

navigator.webkitGetUserMedia("video,audio",
function(stream) {
var url = webkitURL.createObjectURL(stream);
videoTag.src = url;
videoTag.onerror = function() {
stream.stop();
alert('camera error');
};
},
function(error) {
alert(error.code);
}
);

Very basic math

I was playing around with the idea of presenting fractions in the same way as negative numbers. Instead of 1/x, you'd write /x. Just like instead of 0-x, you write -x. And since multiplication with single-letter symbols is often annotated with putting the symbols next to each other, marking the inverse with /x looks quite natural: A x /B = A/B, 9 x /7 = 9/7.

It also makes you think of the inverse in less magical terms. Consider the addition rule for fractions:

A C AD BC AD + BC
- + - = -- + -- = -------
B D BD BD BD

There's some crazy magic happening right there. The literal meaning is (A x D x 1/B x 1/D) + (C x B x 1/D x 1/B), but you wouldn't know from looking at that formula. And it gets even more confusing when you start multiplying and dividing with fractions. Think about the following for a moment:

A C AD
- / - = --
B D BC

Right?

In linear notation with /B and /D and suchlike, this all actually sort of makes sense in a non-magical way. Here's the first of the above two examples (with intermediate phases written out):

(A x /B) + (C x /D)
= [1 x (A x /B)] + [1 x (C x /D)]
= [(D x /D) x (A x /B)] + [(B x /B) x (C x /D)]
= [(A x D) x (/B x /D)] + [(B x C) x (/B x /D)]
= (/B x /D) x [(A x D) + (B x C)]

[here's where you go: "oh right, /7 x /4 = /28", analogous to 7 x 4 = 28]

And the second one:

A x /B x /(C x /D)
= A x /B x /C x D
= (A x D) x (/B x /C)

Note the similarity with addition:

A + -B + -(C + -D)
= A + -B + -C + D
= (A + D) + (-B + -C)

Now, you might notice that there is a bit of magic there. How does /(C x /D) magically turn into (/C x D)? Or -(C + -D) to (-C + D) for that matter. Let's find out! Here's how it works:

/(C x /D)
= 1 x /(C x /D)
= [(/C x D) x /(/C x D)] x /(C x /D)
= (/C x D) x /(/C x C x D x /D)
= (/C x D) x /(1 x 1)
= (/C x D) x /1 -- Remember the axioms 1 x N = N and N x /N = 1. Since 1 x /1 = 1 we get /1 = 1.
= (/C x D) x 1 = (/C x D)

For the -(C + -D) case, replace / with -, x with + and 1 with 0.

And there you have it, my small thought experiment. And derivations for some basic arithmetic rules. I kinda like how breaking the magic bits down into the basic field axioms makes things clearer.

[edit]

Why is /A x /B = /(A x B)?

/(A x B) x (A x B) = 1

1 x (/A x /B) = (/A x /B)
/(A x B) x (A x B) x (/A x /B) = (/A x /B)
/(A x B) x (A x /A) x (B x /B) = (/A x /B)
/(A x B) x 1 x 1 = (/A x /B)
/(A x B) = (/A x /B)

2012-01-21

Fast code

I was thinking of the characteristics of high-performance language runtimes (read: execution times close to optimal for hardware) and came up with this list:

Flat data structures (err, like an array of structs where struct N is in memory right before struct N+1)

streaming memory reads are prefetcher-friendly, spend less time chasing pointers

Tightly-packed data

memory fetches happen in cache line -sized chunks, tightly-packed data gives you more payload per memory fetch
fit more payload into cache, faster subsequent memory accesses

Reusable memory

keep more of the working set of data in cache

Unboxed values

spend less time chasing pointers
generate tight code for data manipulation because data type known (float/double/int/short/byte/vector)

Vector instructions

more bytes manipulated per instruction, data moves faster through an execution unit

Parallel execution

more bytes manipulated per clock cycle, data moves faster through the processor

Keep data in registers when possible

less time spent waiting for caches

Keep data in cache when possible

less time spent waiting for memory
instead of going over the full data set several times end-to-end, split it into cache-sized chunks and process each one fully before moving onto the next one

Minimize amount of unnecessary data movement between processing units

keep data close to processor until you're done with it, even more important with GPUs

Flat code layout

low amount of jumps per byte processed

Tight code

keep more of the program in cache

Interleaved I/O

work on already loaded data while loading in new data
minimum amount of time spent waiting for I/O

You might notice that the major theme is optimizing memory use. I started thinking of program execution as a way to read in the input data set and write out the output data set. The sizes of the input and output data give you a nice optimum execution time by dividing the data set size by memory bandwidth (or I/O bandwidth if you're working on big things). The flow of the program then becomes pushing this river of data through the CPU.

Suck in the data in cache line -sized chunks, process entire cache line before moving to the next, preload next cache line while processing the current one. Use vector instructions to manipulate several bytes of data at the same time, use parallelism to manipulate several streams of data at the same time. Make your processing kernel fit into L1 instruction cache

Gnnnn ok, back to writing JavaScript.

fhtr