Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> but it has nothing to do with big O notation

If this has nothing to do with Big-O then Big-O is a useless concept and should be abandoned. You may be interested in measuring the number of CPU instructions it takes to traverse a linked list in which case O(N) is accurate.

Personally I care about how much wall-clock time it takes in which case a linked list is very much not O(N) and the larger the list the further away from O(N) it becomes. There is a rather large delta between a 10K and a 100MB linked list. You can go test this for yourself right now.

> Random access in memory is still in O(1)

No it isn't. Random access only looks like O(1) at certain plateaus, like size of L1, size of L2, size of L3, size of RAM on the local NUMA node, size of RAM on neighboring NUMA nodes, size of RAM on distant NUMA nodes, size of SSD, size of HDD, size of NAS locally connected, size of NAS distantly connected, size of cloud storage.

Constants can matter in Big-O, especially if they are very large. Compared to the number of instructions 3Ghz CPU must execute to do a hash lookup, accessing RAM has a relatively large constant factor and execution time can be massively impacted by sub-optimal RAM access patterns.



Except Big-O was never intended to describe wall clock time. As soon as you try to use Big-O to describe a number of concrete units you'll fail. Big-O is purely a comparative measure, this algorithm vs. that algorithm. As it stands, it is fairly meaningless outside of purely academic exercises.

EDIT: I mean, consider that the entire basis for Big-O notation assumes that every "operation" are always equal. Adding is always constant. Big-O is very hand wavey and claiming the constants matter is to invent an entirely new notation.


Big-O isn't handwavey, it was absolutely intended to describe (among other things!) wall-clock time, and the article isn't just talking about constants. Also, many operations (like adding) are constant-time on modern computers, outside of memory hierarchy effects, ALU stalls, etc., since they're bound to finish within exactly one clock-cycle.


Big-O is mathematically rigorously defined, but has nothing to do with wall-clock time. It's about the asymptotic behaviour of functions. How you interpret the values of these functions is up to you - of course you can model wall-clock time, but then you'd probably be interested in all the constant factors and lower-order terms that asymptotics hide from you, so Big-O analysis is the wrong tool.


If wall-clock time is being affected by non-constant factors (as the article convincingly asserts), why on earth wouldn't you want to model those factors with big-O notation? Moreover, why do you think that people uninterested in wall-clock time wouldn't care about constant factors, but people interested in wall-clock time unilaterally would?


If your program does two steps, one of which takes time O(n) and the other O(√n), then your program takes time O(n) because that dominates. That's an example of lower order terms. If you measure wall-clock time, then you'd be interested to know about that second step. But if you're looking at asymptotic analysis, then it's completely irrelevant because the linear term dominates it.

Your other questions are all best answered by: Big-O notation is not the right tool if you want to make predictions about wall-clock time. I don't think I need to justify why someone interested in wall-clock time would be very interested in the constant factors involved - the difference between n and 20n is a factor of twenty, and whether a program takes half a second or ten does make quite the difference.

But asymptotically, they're the same. Someone who is interested in the asymptotic behaviour wants to know how fast the function grows - is it linear? linearithmic? quadratic? cubic? exponential? (with which base - 1.1^n, 2^n, 10^n?). For them, a constant factor isn't interesting.

Of course, neither extreme is particularly helpful when trying to compare algorithms in practice. Modelling the hardware extensively and including all the little constant factors is exceedingly tedious, even if you can actually get all the information required to do so. Pure asymptotics also don't help much if they hide huge constant factors. That's why Algorithm Engineering is a thing (and a very good thing!). Basically, it tries to bridge the gap between algorithms that theorists like, and those that practitioners use. To that end, design, analysis, implementation, and experimentation must form a circle and influence each other. Wikipedia probably does a better job of explaining it, though.


The article points out that accessing n words of RAM takes O(nsqrt(n)), which is not dominated by O(n), which is my entire point here. And in the cache-oblivious model (for example), big-O notation is still alive and well; there are just additional variables introduced to account for the various constants.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: