Curiously enough, today, such a hack is completely pointless (at least on x86): the rsqrtps instruction can do 4 of these in just 3 clock cycles, with higher accuracy to boot.
Most modern instruction sets with remotely decent floating point support have a similar instruction, in large part because of the prevalence of hacks like this in the past.
Yes, but what a beautiful little hack it is. The fact that modern processors implement it in hardware is actually a tribute to this tricks ingenuity and elegance.
After all, to have major chip manufacturers implement your software hack in hardware is quite a compliment, assuming they use the same mechanism.
This particular hack, yes, but the general point stands. Sometimes hacks make for great code. The point is that this is good code:
float h = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
return x*(1.5f - h*x*x);
That's 3 local variables, 5 short statements... makes me wonder how quickly one could search/test all combinations of sensible 5-statement functions to find other such "hacks"...
Yeah, though at that point I'd be start to worry about whether that's a fair L1 cache tradeoff, given that it only saves a single (integer!) instruction and requires 512 bytes of memory.
Actually it is not completely pointless today: Some months ago I used it to speed up an force-directed algorithm which spent 80% of its time in calculating normalized vectors between two points - and the fast inverse sqrt actually helped to speed up the program by a factor of 2.5 to 3.0. The reason why i used it is quite simple: In (unsafe) .NET code you can execute all the necessary arithmetic operations (pointer casts, bit shifts) but you are not allowed to execute arbitrary x86 operations.
Most modern instruction sets with remotely decent floating point support have a similar instruction, in large part because of the prevalence of hacks like this in the past.