Okay, so the real problem here is there was effectively a yield() inside a buffer fill loop with a small block size, and Android decided to make each of those cost 40ms. The article made it sound like you were always waiting 15ms, but you were actually waiting 0ms when there was more work to do.
I still think the blocksize should've been larger for various reasons in this case (it's a trade-off still, usually larger block sizes are mildly more CPU-efficient, besides preventing pathological scheduling cases like this one) but this explanation makes more sense.
Yes, that's all correct. The reason for the small block size has to do design decisions made in the rest of the streaming stack, which is shared between all TV devices.
I still think the blocksize should've been larger for various reasons in this case (it's a trade-off still, usually larger block sizes are mildly more CPU-efficient, besides preventing pathological scheduling cases like this one) but this explanation makes more sense.