IIRC few of their blogs go into detail, where data layout and way to access it was main limit, not actual computation speed for some operation
IIRC few of their blogs go into detail, where data layout and way to access it was main limit, not actual computation speed for some operation