art with code

2018-03-26

Infinibanding, pt 4. Almost there

Got my PCIe-M.2 adapters, plugged 'em in, one of them runs at PCIe 1.0 lane speeds instead of PCIe 3.0, capping perf to 850 MB/s. And causes a Critical Interrupt #0x18 | Bus Fatal Error that resets the machine. Then the thing overheats, melts its connection solder, shorts, releases the magic smoke and makes the SSD PCB look like wet plastic. Yeah, it's dead. The SSD's dead too.

[Edit: OR.. IS IT? Wiped the melted goop off the SSD and tried it in the other adapter and it seems to be fine. Just has high controller temps, apparently a thing with the 960s. Above 90 C after 80 GB of writes. It did handle 800 GB of /dev/zero written to it fine and read everything back in order as well. Soooo, maybe my two-NVMe pool lives to ride once more? Shock it, burn it, melt it, it just keeps on truckin'. Disk label: "The Terminator"]

I only noticed this because the server rebooted and one of the mirrors in the ZFS pool was offline. Hooray for redundancy?

Anyway, the fast NVMe work pool is online, it can kinda saturate the connection. It's got two one Samsung 960 EVOs in it, which are is fast for reads, if maybe not the best for synced writes.

I also got a 280 gig Optane 900p. It feels like Flash Done Right. Low queue depths, low parallelism, whatever, it just burns through everything. Optane also survives many more writes than flash, it's really quite something. And it hasn't caught on fire yet. I set up two small partitions (10 GB) as ZFS slog devices for the two pools and now the pools can handle five-second write bursts at 2 GB/s.

Played with BeeGFS over the weekend. Easyish to get going, sort of resilient to nodes going down if you tune it that way, good performance with RDMA (netbench mode dd went at 2-3GB/s). The main thing lacking seems to be the "snapshots, backups and rebuild from nodes gone bye"-story.

Samba and NFS get to 1.4 to 1.8 GB/s per client, around 2.5 GB/s aggregate, high CPU usage on the server somehow, even with NFSoRDMA. I'll see if I can hit 2 GB/s on a single client. Not really native-level random access perf though. The NVMe drive can do two 1 GB/s clients. Fast filestore experiment mostly successful, if not completely.

Next up, wait for a bit of a lull in work, and switch my workstation to a machine that's got more than one PCIe slot. And actually hook it up to the fast network to take advantage of all this bandwidth. Then plonk the backup disks into one box, take it home, off-site backups yaaay.

Somehow I've got nine cables, four network cards, three computers, and a 36-port switch. Worry about that in Q3, time to wrap up with this build.

No comments:

Blog Archive