OK that makes sense. So this would add to the scenario where you have a compute-intensive task, to disable HT? HT's more suited as another level of latency hiding, while one thread's stalled the other can work?
It's pretty rare for a thread to completely stall in modern CPUs as you might imagine an in-order CPU doing, but it is common to achieve less than 1 instruction/cycle throughput out of a theoretical ~3-4 instructions/cycle in the frontend and 6 µop/cycle in the backend (Sandy Bridge). So another thread helps feed the execution units, even if the first thread has no classical stalls.