It’s about scale. Of course the compute nodes need some storage for when they spill to disk but they compute what is asked of them and then submit their work. So a coordinator can split the work and send it to a worker and that can be replicated on N nodes achieving a parallel speed up and then reconstituted.
As you mention there’s an overhead or a tax to this sure but the overall speed up’s are there.
If you’re talking about one big node with gobs of fast storage and CPUs for compute then that’s more your traditional scale up instead of scale out approach and for many years that was the conventional wisdom in databases: when things got slow just move your DB to a bigger and bigger machine. But that has its limits. So scaling horizontally was the answer and with it came all of its complexity sure but it allowed for much better scaling.
If a single node contains the relevant partition(s) has all the data then it could be that the compute happens there and the result is then sent back no map reduce work needed.
I envision it as many nodes store many partitions, and initial compute happens locally, and then results are re-partitioned if needed for the next step.
Trivial case is predicate push down, when predicate is applied on storage backend and not compute node.
I think ClickHouse is an example of such architecture, and they don't have dedicated compute nodes.
Yeah there's no perfect way to do this but there are advantages to being flexible and the use-case you mention is a subset of what I am describing.
Imagine a workload that is write heavy and handles very few reads say 20:1 writes to reads. Then having many storage nodes could speed up the reads and if you have say a fix set of nodes allocating 20x more storage nodes than compute nodes might make sense.
The whole thing is a science and I do like the flexibility.
I am wondering why opposite wouldn't make sense, you could greatly reduce over network traffic for heavy queries.