Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Legend has it that one of the early ESS systems ran into something like this.

Telephone switching systems have functionally zero downtime. They're designed to be fully modular, entirely hot swappable, the kind of thing Erlang was built for and give Z-series a run for their money. So you have this hulking room-sized[1] brute of a telephone switch which cannot ever fail and if it does so must always do so gracefully and with plenty of warning.

And one day it just falls over. No warning, no graceful failover to a redundant system, just poof gone. After much wailing and gnashing of teeth the root cause is identified: n drives were capable of failing without issue, n+1 failed simultaneously.

At this point, stories differ: this was either the beginning of a "no two drives from the same manufacturer" policy or the end of the career of a PHB who vetoed said policy on grounds of excessive cost.

[1] http://www.montagar.com/~patj/phone-switches.htm



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: