Heh, I'm usually on the ideas "side", but since I've publicly shot down the data centers in space idea, the shoe is on the other foot.
I'll just say that if there are some obvious enormous drawbacks to an idea, then you are responsible for mentioning your counter to them when presenting an idea. "Do X by seemingly breaking the laws of physics" had better be accompanied with at least some mention of how you are not, in fact, breaking the laws of physics.
There are two sides, the one presenting an idea and the one receiving it, and both have responsibilities. The article is about one way that one side often fails to maintain one of its responsibilities. No more than that.
I read it as "shooting down" implying low-effort "I'm going to kill this because it makes me uncomfortable" type responses, not legitimate critique. Well-calculated rejection is indeed a valuable skill, but also hopefully won't come across as "shooting down".
It's tough to write any article about any part of this topic, because there's so much nuance and the nuance matters. Yet none of us would read a post that captured the nuance, it'd be way too long and probably cover too much that is obvious. (But maybe now I'm shooting down someone who might attempt to write such a thing... please do, it's worth a try!)
I wrote a couple of books about testing. Yes, writing nuance is really hard. One of my readers noticed I contradicted myself across two different sections, due to a single missing word.
Mercurial was written entirely in Python for quite a while.
But more to the point, I doubt there are many ideas for which the choice of implementation language is core to the idea. Maybe that's how it was presented, but that's usually because you need a concrete realization of an idea in order for people to even get what you're talking about.
It's not a plan, it's an idea. You're shooting down an idea for not being a plan. The best person for coming up with the idea will probably also come up with some of the pieces of the plan, but they're unlikely to be the best person to figure out all of it. That's why you have a company not a sole proprietorship.
> You're shooting down an idea for not being a plan.
If you are pitching an idea out of nowhere, than i think it better have a semblence of a plan, otherwise you are just wasting everyone's time.
Like maybe its a bit different if you are brainstorming for an acknowledged problem, but that is not what the article made it sound like.
The article made it sound like the idea was being pitched unsolicited, with no clear problem it was trying to solve and no clear plan on how to do it. After all 2 of the so-called cheap criticisms were people asking why we want to do this ("the customers aren't asking for it") and how are we going to do it when it has dependencies on stakeholders who have not bought in ("devops doesnt like it").
Why would anyone care about such an idea? Like if you want to work on something by yourself, you dont have to convince anyone, but if you want other people on board, you are going to have to answer basic questions. Questions like: what benefit would implementing this idea bring me, and will my effort on this idea be a waste because neccesary stakeholders aren't on board.
There are a lot of details that can be sorted out on the way. Things like, why would we even want to do this in the first place, is not one of them.
And shooting down shit is also valuable. It is fine to have ideas without thinking them through, and it is also fine to criticize those ideas without thinking through the criticism. That is how we figure out how the ideas could work.
The problem with this is, that the article literally says:
> The person proposing has been thinking about this for weeks or months. They've tested pieces of it in their head or even built proofs of concept. They understand things about the idea that aren't obvious yet. And they're trying to explain all of this to a room full of people encountering it for the first time.
If they did that much upfront work, it's more than an idea. And if it's that easily shot down, they should have done even more upfront work and probably slowly gotten others involved.
Honestly, it sounds like someone so desperate for credit, so worried that someone will steal the idea, that they feel compelled to unveil it in a large gathering that was convened for some other purpose. And that never goes well.
Ideas truly are a dime a dozen. If one gets shot down, then you can reflect whether that was warranted, and try again with the same idea if not.
If you're really emotionally invested in it, as the guy writing the article seems to be, then you damn well better have more than just an idea, and you should understand enough about human nature to slowly try to bring individuals onboard to help before you put it out in front of a big crowd.
Perf is not normally distributed. Or rather, it is very very common for it to not be. Where I work, we often see multimodal (usually mostly bimodal) distributions, and we'll get performance alerts that when you look at them are purely a result of more samples happening to shift from one mode to another.
It's easy to construct ways that this could happen. Maybe you're running a benchmark that does a garbage collection or three. It's easy for a GC to be a little earlier or a little later, and sneak in or out of the timed portion of the test.
Warm starts vs cold starts can also do this. If you don't tear everything down and flush before beginning a test, you might have some amount of stuff cached.
The law of large numbers says you can still make it normal by running enough times and adding them up (or running each iteration long enough), but (1) that takes much longer and (2) smushed together data is often less actionable. You kind of want to know about fast paths and slow paths and that you're falling off the fast path more often than intended.
As usual you can probably cover your eyes, stick your fingers in your ears, and proceed as if everything were Gaussian. It'll probably work well enough!
> We ran benchmarks comparing bisect vs bayesect across flakiness levels. At 90/10, bisect drops to ~44% accuracy while bayesect holds at ~96%. At 70/30 it's 9% vs 67%.
I don't understand what you're comparing. Can't you increase bayesect accuracy arbitrarily by running it longer? When are you choosing to terminate? Perhaps I don't understand this after all.
Yes, bayesect accuracy increases with more iterations. The comparison was at a fixed budget(300 test runs) when I was running. Sorry should have clarified more on that.
You're right, at 300 tests bayesect converges to ~97-100% across the board. I reran with calibration.py and confirmed.
Went a step further and tested graph-weighted priors (per-commit weight proportional to transitive dependents, Pareto-distributed). The prior helps in the budget-constrained regime:
128 commits, 500 trials:
Budget=50, 70/30: uniform 22% → graph 33%
Budget=50, 80/20: uniform 71% → graph 77%
Budget=100, 70/30: uniform 56% → graph 65%
At 300 tests the gap disappears since there's enough data to converge anyway. The prior is worth a few bits, which matters when bits are scarce.
This doesn't sound quite right, but I'm not sure why.
Perhaps: a reasonable objective would be to say that for N bits of information, I would like to pick the test schedule that requires the least total elapsed time. If you have two candidate commits and a slow recompile time, it seems like your algorithm would do many repeats of commit A until the gain in information per run drops below the expected gain from B divided by the recompile time, then it would do many repeats of B, then go back to A, etc. So there are long runs, but you're still switching back and forth. You would get the same number of bits by doing the same number of test runs for each commit, but batching all of the A runs before all of the B runs.
Then again: you wouldn't know how many times to run each in advance, and "run A an infinite number of times, then run B an infinite number of times" is clearly not a winning strategy. Even with a fixed N, I don't think you could figure it out without knowing the results of the runs in advance. So perhaps your algorithm is optimal?
It still feels off. You're normalizing everything to bits/sec and choosing the maximum. But comparing an initial test run divided by the rebuild time vs a subsequent test run divided by a much faster time seems like you're pretending a discrete thing is continuous.
The general requirement for this approach to be optimal, is called "dynamical consistency". A good description is in [1]. It is the situation where, suppose you have a budget B , and you search until your budget is exhausted. Then you are informed that there is an additional budget, B2, and you can continue searching until that is exhausted. A situation is dynamically consistent if, for any B,B2, the optimal strategy is such that you would make the same choices whether you know that you will get B2 or not.
So you are correct that discreteness is a problem, because if you are nearing the end of the budget you may optimally prefer to get more dice rolls than take bigger bets. But the optimal solution is then often analytically intractable (or at least it was - I last read about this a while back), and the entropy approach is often reasonable anyway. (For cases where search effort is significant, a good search plan can be found by simulation).
Note that "pick the commit with best expected information gain" in git_bayesect isn't optimal even in the no overhead regime. I provide a counterexample in the writeup, which implies ajb's heuristic is also not optimal. I don't see a tractable way to compute the optimal policy.
One idea: if you always spend time testing equal to your constant overhead, I think you're guaranteed to be not more than 2x off optimal.
(and agreed with ajb on "just use ccache" in practice!)
> My idea was to make X-to-safe-lang translators, X initially being Python and Javascript.
Both of those languages are already safe. Then you talk about translating to C, so you're actually doing a safe-to-unsafe translation. I'm not sure what properties you're checking with the static analysis at that point. I think what would be more important is that your translator maintains safety.
I hastily wrote that. I probably should've said high-performance, system languages that can be made safely and turned into a single executable. Preferrably with good support for parallelism and concurrency. That's mostly Rust or safe subsets of C and C++ with static analysis.
Python can do the algorithms. It's quick to develop and debug. There's tons of existing code in data science and ML fields. It's worse in the other areas I mentioned, though.
So, a transpiler that generated Rust or safe C/C++ from legacy and AI-generated Python could be a potent combination. What do you think about that?
reply