The github repo has 5 million lines of C++ code (headers included), 1.6 million ...

gritukan · on March 22, 2023

Historically, the master server of YTsaurus was a single RSM (replicated state machine) that contained all the meta-information about the cluster. This included the tree of the distributed filesystem, transactions, information about users and tables, placement of chunks, and much more.

However, this approach proved to be non-scalable as the memory amount and throughput of the master server soon became insufficient. To address this issue, we implemented Multicell technology. With Multicell, there are multiple RSMs called secondary masters that store information about chunks of the tables and their placement. The primary master still stores information about the distributed filesystem and transactions but is now single and non-sharded.

After a few years, the masters became overloaded again, and we implemented Portals. With Portals, one can select a subtree of Cypress and place it in one of the secondary masters. This technology is used nowadays, and home directories of some active users are hosted on secondary masters.

However, we anticipate that this approach will also become insufficient in a few years. Therefore, we are currently working on a new technology called Sequoia, which stores information about the Cypress tree shape in horizontally scalable dynamic tables.

It is hard to describe all aspects of master server internals in one comment. Therefore, feel free to join our chat at t.me/ytsaurus for further discussion!

ddorian43 · on March 24, 2023

> Therefore, we are currently working on a new technology called Sequoia, which stores information about the Cypress tree shape in horizontally scalable dynamic tables.

Why not just use a database for the metadata? Something that can be sharded and has transactions like YugabyteDB/Yandex-ydb/etc?

gritukan · on March 25, 2023

Our objects have complex semantics of changes so it seems challenging to implement the whole Cypress over k-v storage.

Storing Cypress nodes in the ad hoc RSMs and information about tree in k-v storage seems a good compromise that is both scalable and allowing to implement any functionality for objects efficiently.

karsinkk · on March 22, 2023

I was going over some of the code in the core folder for concurrency, threading and compression, what surprised me is that there’s absolutely no comments whatsoever. Agree that unless there’s excellent documentation, open source maintenance might be challenging.

Having said that, this definitely does look to be an impressive feat of engineering!