Things to make sure of when choosing your distributed storage: 1) are you _reall...

flemhans · on Feb 3, 2024

1) only if it removes a "janitor" token of nannying the servers. Right now I just have one big server with a big 160TB ZFS pool, but it's running out.

2) No modifications, just new files and the occasional deletion request.

3) Almost just 1 write and 1 read per file, this is a backing storage for the source files, and they are cached in front.

4) Never

5) Files are written only by one other server, and there will be no parallel writes.

6) I pick consistency and as the half, availability.

7) This happened something like 15 years ago with MogileFS and thus scared us away. (Hence the single-server ZFS setup).

8) Reads are public, writes restricted to one other service that may write.

KaiserPro · on Feb 3, 2024

GPFS is pretty sexy nowadays, although its really expensive: https://www.ibm.com/products/storage-scale

SheddingPattern · on Feb 3, 2024

Sounds like you are talking from experience. Are you storage specialist, how did you learn so much about this?

KaiserPro · on Feb 3, 2024

VFX engineer, I have suffered through:

_early_ lustre (its much better now)

GPFS

Gluster (fuck that)

clustered XFS (double fuck that)

Isilon

Nowadays, a single 2u server can realistically support 2x 100gig nics at full bore. So the biggest barrier is density. You can probably get 1pb in a rack now, and linking a bunch of jbods(well NVMEs) is probably easily to do now.

nh2 · on Feb 3, 2024

"1PB in a rack"? You can apparently already buy 2.5PB in a single 4U server:

https://www.techradar.com/pro/seagate-has-launched-a-massive...

KaiserPro · on Feb 4, 2024

sorry I should have added a caveat of 1pb _at decent performance_

That seagate array will be fine for streaming (so long as you spread the data properly) as soon as you start mixing read/write loads on that, it'll start to chug. You can expect 70-150iops out of each drive, and thats a 60 drive array (from guess, you can get 72 drives in a 4u, but they are less maintainable, well used to be, thing might have improved recently)

When I was using luster with ultra scisi (yes, that long ago) we had good 10-20 racks to get to 100tb, that could sustain 1gigabyte a second.

nh2 · on Feb 4, 2024

Agreed, it depends on the use case. For some "more storage" is all that matters, for others you don't want to be bottlenecked on getting it into / out of the machine or through processing.