> Yes, but such situations are incredibly rare IMO. They happen all the time if ...

extr · on Dec 16, 2024

Maybe "rare" was the wrong choice of word. Maybe I should have said such situations are "highly contextual". It's one of those things where if you have to ask the question "Do I need this?" you almost certainly don't. And even in situations where you actually do need it, it's strongly preferred to "find a way not to need it" for day to day things (aka downsample, or be judicious about what you load into the analytics environment). I can only speak to my experience in FAANG, but even though we had the infra and budget to run huge distributed queries, it was highly discouraged unless absolutely needed!

This is my problem with databricks. It seems like in the course of selling their product they have taken the received wisdom of "do not run expensive and complex compute clusters unless absolutely necessary" and turned it into "it's fun and easy to run distributed compute clusters - everyone's doing it and you should too" regardless of how contextually appropriate it is.