Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was a fly on the wall as this work was being done and it was super interesting to see the discussions. I was also surprised that Jepsen didn’t find critical bugs. Clarifying the docs and unusual (intentional) behaviors was a very useful outcome. It was a very worthwhile confidence building exercise given that we’re running a bank on Datomic…


Given that Rich Hickey designed this database the outcome is perhaps unsurprising. What a fabulous read - anytime I feel like I’m reasonably smart it’s always good to be humbled by a Jensen analysis


A good design does not guarantee the absence of implementation bugs. But a good design can make introducing bugs harder / less probable. This must be the case, and then it's a case to study and maybe emulate.


What bank is that, if I may ask?



First brazilian fully digital bank, got pretty big in a decade.

I'd love to hear the story from the first engineers, how they got support for this, etc. They never did tech blog posts though...


Ed Wible is a founder of Nubank and chose Datomic. He and Lucas Cavalcanti gave a talk on it at Clojure/conj 2014.

https://www.youtube.com/watch?v=7lm3K8zVOdY


Ed's post when Cognitect joined Nubank is still a great read: https://building.nubank.com.br/welcoming-cognitect-nubank/


There are some videos, both of the start and of their progress. Some of the most impressive work I have ever seen, remarkable.


> I was also surprised that Jepsen didn’t find critical bugs.

From the report..."...we can prove the presence of bugs, but not their absence..."


In practical terms, if you are a database and Jepsen doesn't find any bugs, that's as much assurance as you are going to get in 2024 short of formal verification.


Formal verification is very powerful but still not full assurance. Fun fact: Testing and monitoring of Datomic has sometimes uncovered design flaws in underlying storages that formal verification missed.


Is there anything I can read about what capabilities Datomic requires of the underlying storages it uses?


What kind of flaws? I would expect performance problems.


To start with, you usually perform verification on a behavioral model and not on the code itself. This opens up the possibility that there are behavioral differences between the code itself and the model which wouldn't be caught.


Could you offer an example?


If you work through the TLA+ tutorial[1] it will help you get a good idea of the benefits and limitations of verification.

[1] https://learntla.com/


The work antithesis has been doing here has me really excited as well.


That's consistent with the usual definition of "finding" anything.


"Absence of evidence is not evidence of absence."


Thank you. I've updated my initial guess of p(critical bugs | did not find critical bugs) from 0.5 to 0.82 given my estimate of likelihood and base rates.


If you've looked, it is. The more and the better you look, the better evidence it is.


If you run it through bayes theorem, it adjusts the posterior very little.


If a test almost always finds something, then the failure of that test to find something is strong evidence.


I'd be happy to see you numbers for estimated likelihood, prior, and marginal probability if you have them. I'm curious what you get.


s/evidence/proof/.

Evidence of absence ("we searched really carefully and nothing came up") does update the Bayesian priors significantly, so the probability of absence of bugs can now be estimated as much higher.


Did you not do this work yourself before you started running the bank on it?


I doubt any organization that isn't directly putting lives on the line are testing database technology as thoroughly and competently as Jepsen. Banks jobs are to be banks, not be Jepsen.


I would have thought they would be more rigorous, since mistakes for them could threaten the very viability of the business? Which is why I assume most are still on mainframes. (Never worked at a bank)


Banks exist since a long time before computers existed, and thus have ways to detect and correct errors that are not purely technological (such as double entry bookkeeping, backups, supporting documentation, different processes). So a bank can survive a db doing nasty things on a low enough frequency such that is not detected beforehand, so they don’t need to “prove in coq” that everything is correct.


Anyone who has worked in a bank and is glad of its solutions is either a fool, clueless or politician.

Banks have to answer to regulation and they do by doing the bare minimum they can get away with.


Mistakes don't threaten them that much. When Equifax (admittedly not a bank) can make massive negligent fuckups and still be a going concern there isn't much heat there. Most fuckups a bank make can be unwound.


Mainframe systems aren't tested to the Jepsen level of standard just because they were build on mainframes in the 70s/80s. In fact, quite the opposite.


Banks are not usually ran by people who go for the first fad.js they see ahead; they usually also can think ahead further than 5 min.

Also, I'm sure they engineer their systems so that every operation and action is logged multiple times and have multiple redundancy factors.

A main transaction DB will not be a "single source of truth" for any event. It will be the main source of truth, but the ledger you see in your online bank is only a simplified view into it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: