art with code

2008-08-30

Compiling upwards

This is a bit dense, sorry about that. If something needs clarification, please tell.

A compiler translates programming language into machine language, a programmer translates ideas into programming language. As a compiler may first compile into an intermediate language, so may a programmer. A programmer takes an idea, finds the mathematics of it, the algorithms of the mathematics and translates the algorithms into statements in the programming language.

For want of a more concrete explanation, consider the following:
"I want to know which of my pages are popular" (The Need)
-> "Show the top ten visited pages" (UI Design - how to fulfill the need)
-> "Page is URL, visited is count of page instances in log, top ten visited is the ten first pages when they are sorted by visited, show is print." (Mathematics - theoretical basis)
-> "Scan log line by line, insert/increment each page into a min-heap, pop ten when done, print each." (Algorithm - theoretical implementation)
-> "recurseLines (fun h l -> Heap.incr h (getURL l)) (Heap.empty min) log |> Heap.take 10 |> iter (puts . showInt)" (Programming - concrete implementation)

The closer a programming language is to the algorithmic language of a programmer, the easier it is to write programs in it. The closer the programming language is to the machine language, the easier it is to compile. Note that these are independent of each other, it is possible to have a programming language that is close to both the algorithmic language and to the machine language - consider a programmer whose algorithmic language is the machine language: now the two are the same. However, usually this not the case, and a language divides its attention between the two goals.

For measuring the productivity of a programming language, a[n overly] simple quantative measure is to count the amount of tokens and characters required to write a program. By optimizing the character count of each token (by e.g. Huffman coding), the amount of mechanical typing can be minimized. But this is a form of micro-optimization, and should not be the only programming optimization: there are larger gains to be had in optimizing the structure of programming (and thus the token count.)

Token count optimization works by abstracting common patterns in the program, it is like seeking madlibs templates in the source code: recursion, iteration, splits, joins, scans. A hypothetical algorithm for doing token count optimization would find common partial trees in the AST and replace them with higher-order functions.

A keen-minded reader may have noticed the similarities between optimizing towards the algorithmic language and optimizing towards the machine language: both aim to minimize the work required. Optimizing towards the machine language seeks to execute the program in the least amount of time, optimizing towards the algorithmic language seeks to write the program in the least amount of time.

No comments:

Blog Archive