art with code

2008-09-08

Non-copying forked workers using Bigarrays

Prelude.ml: Wrote some utils for Bigarrays and a parallel Bigarray init pbainit that uses mmapped /dev/zero as shared memory (thanks to mfp for the pointer) to allow the worker processes to write into a shared array, eliminating result copying overhead.

Here's a small program that averages two string arrays:

open Prelude

let () =
let sz = parseInt Sys.argv.(1) in
let s1 = String.create sz in
let s2 = String.create sz in
let ba = pbainit Bigarray.char
(fun i -> char ((ord (suget s1 i) + ord (suget s2 i)) / 2) )
sz in
puts (showInt (balen ba))

I tested it with sz=1G, which creates two 1GB strings and a third 1GB string/array created by averaging the two strings together. I have 4GB of RAM, so any copying will easily cause swapping to disk. I timed versions using sequential and parallel Bigarray and String inits:
  • bainit: 24.5s (sequential Bigarray init)

  • pbainit: 16.1s (parallel Bigarray init)

  • sinit: 9.5s (sequential String init)

  • psinit: started swapping, killed it after 80s (parallel String init)

As you can see from the above, while mmapped Bigarrays aren't faster than native strings, they can share results without copying. To tip the balance towards Bigarrays, let's make the compute kernel do more work:

open Prelude

let () =
let sz = parseInt Sys.argv.(1) in
let s1 = String.create sz in
let s2 = String.create sz in
let ba = pbainit Bigarray.char
begin fun i -> (* average of 11-wide averages *)
let a = saverageSlice (max 0 (i-5)) (min (sz-1) (i+5)) s1 in
let b = saverageSlice (max 0 (i-5)) (min (sz-1) (i+5)) s2 in
char ((a+b) / 2)
end
sz in
ignore ba

And the results with 1G strings:
  • bainit: 213s

  • pbainit: 111s

  • sinit: 188s

  • psinit: Killed after 240s as it had pushed swapfile use to 1G.

Now we actually see a 70% speedup with using parallel Bigarray init compared to sequential string init.

Conclusions: Bigarrays are helpful when you need to process large in-memory structures in parallel, provided that you do enough computation to mask the parallelisation overhead. Bigarrays are also good for communicating with libraries written in C or FORTRAN.

No comments:

Blog Archive