1) It’s not — there are lots of procedures called “the bootstrap” that act differently.
2) The fact that “substitute the data for the population distribution” both works and is sometimes provably better than other more sensible approaches is a little mind blowing.
Most things called the bootstrap feel like cheating, ie “this part seems hard, let’s do the easiest thing possible instead and hope it works.”
I think it's brilliant as an idea but not particularly mysterious after the fact or something. I love it, but think it's brilliant in part because it's so transparent and simple.
I agree "bootstrap" has expanded a bit in meaning but I think it's basically the same idea.
I used to think it was cheating but have realized there is a cost to it, which is replication. For many things it's just impractical in terms of computation time. So although it's great, it requires a lot.
Yes. It only tells you about variability within the sampled values. If you don't sample outliers, or get unlucky and same many non-representative vvalues, it can't tell you what you're missing out on (obviously).