Yes, equivalent. *Did* endure, repeatedly. Demonstrated to auditors to maintain ...

pinkmuffinere · 2025-10-20T15:06:47 1760972807

Just letting you know how this response looks to other people -- Anon1096 raises legitimate objections, and their post seems very measured in their concerns, not even directly criticizing you. But your response here is very defensive, and a bit snarky. Really I don't think you even respond directly to their concerns, they say they'd want to see scale equivalent to AWS because that's the best way to see the wide variety of failure modes, but you mostly emphasize the auditors, which is good but not a replacement for the massive real load and issues that come along with it. It feels miscalibrated to Anon's comment. As a result, I actually trust you less. If you can respond to Anon's comment without being quite as sassy, I think you'd convince more people.

bravetraveler · 2025-10-20T15:09:12 1760972952

I appreciate the feedback, truly. Defensive and snarky are both fair, though I'm not trying to convince. The business and practices exist, today.

At risk of more snark [well-intentioned]: Clouds aren't the Death Star, they don't have to have an exhaust port. It's fair the first one does... for a while.

pinkmuffinere · 2025-10-20T16:10:49 1760976649

Ya, I totally believe that cloud platforms don't need a single point of failure. In fact, seeing the vulnerability makes me excited, because I realize there is _still_ potential for innovation in this area! To be fair it's not my area of expertise, so I'm very unlikely to be involved, but it's still exciting to see more change on the horizon :)

bravetraveler · 2025-10-20T16:19:54 1760977194

Others have raised good points, like: they've already won, why bother? We did it because we weren't first!

pinkmuffinere · 2025-10-20T16:27:38 1760977658

What company did you do it with, can you say? Definitely, they may have been an early mover, but they can (and I'll say will!) still be displaced eventually, that's how business goes.

bravetraveler · 2025-10-20T16:29:58 1760977798

It's fine if someone guesses the well-known company, but I can't confirm/deny; like privacy a bit too much/post a bit too spicy. This wasn't a darling VC thing, to be fair. Overstated my involvement with 'made' for effect. A lot of us did the building and testing.

pinkmuffinere · 2025-10-20T17:09:42 1760980182

Definitely, that makes sense. Ya no worries at all, I think we all know these kinds of things involve 100+ human work-years, so at best we all just have some contribution to them.

bravetraveler · 2025-10-20T20:25:35 1760991935

> think we all know these kinds of things involve 100+ human work-years

No kidding! The customers differ, business/finance/governments, but the volume [systems/time/effort] was comparable to Amazon. The people involved in audits were consumed practically for a whole quarter, if memory serves. Not necessarily for testing itself: first, planning, sharing the plan, then dreading the plan.

Anyway, I don't miss doing this at all. Didn't mean to imply mitigation is trivial, just feasible :) 'AWS scale' is all the more reason to do business continuity/disaster recovery testing! I guess I find it being surprising, surprising.

Competitors have an easier time avoiding the creation of a Gordian Knot with their services... when they aren't making a new one every week. There are significant degrees to PaaS, a little focus [not bound to a promotion packet] goes a long way.

jayd16 · 2025-10-20T15:18:14 1760973494

You were in a position to actually cut off production zones with live traffic at Amazon scale and test the recovery?

bravetraveler · 2025-10-20T15:19:57 1760973597

Yes, it was something we would do to maintain certain contracts. Sounds crazy, isn't: they used a significant portion of the capacity, anyway. They brought the auditors.

Real People would notice/care, but financially, it didn't matter. Contract said the edge had to be lost for a moment/restored. I've played both Incident Manager and SRE in this routine.

edit: Less often we'd do a more thorough test: power loss/full recovery. We'd disconnect more regularly given the simplicity.

whatever1 · 2025-10-20T15:09:51 1760972991

There are shared resources in different regions. Electricity. Cables. Common systems for coordination.

Your experiment proves nothing. Anyone can pull it off.

bravetraveler · 2025-10-20T15:10:36 1760973036

The sites were chosen specifically to be more than 50 miles apart, it proved plenty.

whatever1 · 2025-10-20T15:11:57 1760973117

I am the CEO of your company. I forgot to pay the electricity bill. How is the multi-region resilience going?

icedchai · 2025-10-20T15:39:50 1760974790

If you go far up enough the pyramid, there is always a single point of failure. Also, it's unlikely that 1) all regions have the same power company, 2) all of them are on the same payment schedule, 3) all of them would actually shut off a major customer at the same time without warning, so, in your specific example, things are probably fine.

bravetraveler · 2025-10-20T16:00:47 1760976047

I suspect 'whatever1' can't be satisfied, there are no silver bullets. There's always a bigger fish/thing to fail.

The goal posts were fine: bomb the AZ of your choice, I don't care. The Cloud [that isn't AWS, in the case of 'us-east-1'] will still work.

whatever1 · 2025-10-20T19:03:54 1760987034

No. It’s just that in my entire career when anyone claims that they have the perfect solution to a tough problem, it means either that they are selling something, or that they haven’t done their homework. Sometimes it’s both.

bravetraveler · 2025-10-20T19:06:14 1760987174

For what's left of your career: sometimes it's neither. You're confused, perfection? Where? A past employer, who I've deliberately not named, is selling something: I've moved on. Their cloud was designed with multiple-zone regions, and importantly, realizes the benefit: respects the boundaries. Amazon, and you, apparently have not.

Yes, everything has a weakness. Not every weakness is comparable to 'us-east-1'. Ours was billing/IAM. Guess what? They lived in several places with effective and routinely exercised redundancy. No single zone held this much influence. Service? Yes, that's why they span zones.

Said in the absolute kindest way: please fuck off. I have nothing to prove or, worse, sell. The businesses have done enough.

whatever1 · 2025-10-20T15:54:26 1760975666

This is not what the resilience expert stated.

quickthrowman · 2025-10-20T15:24:11 1760973851

If your accounts payable can’t pay the electric bill on time, you’ve got bigger problems.

bravetraveler · 2025-10-20T17:44:50 1760982290

Yea, let's play along. Our CEO is personally choosing to not pay any entire class of partners across the planet. Are we even still in business? I'm so much more worried about being paid than this line of questioning.

A Cloud with multiple regions, or zones for that matter, that depend on one is a poorly designed Cloud; mine didn't, AWS does. So, let's revisit what brought 'whatever1', here:

> Your experiment proves nothing. Anyone can pull it off.

Amazon didn't, we did. Hmm.

ta1243 · 2025-10-20T16:00:29 1760976029

Fine, our overseas offices are different companies and bills are paid for by different people.

Not that "forgot to pay" is going to result in a cut off - that doesn't happen with the multi-megawatt supplies from multiple suppliers that go into a dedicated data centre. It's far more likely that the receivers will have taken over and will pay the bill by that point.

bravetraveler · 2025-10-20T15:12:58 1760973178

Fine, the tab increments. Get back to hyping or something, this is not your job.

whatever1 · 2025-10-20T15:15:36 1760973336

I doubt it should be yours if this is how you think about resilience.

bravetraveler · 2025-10-20T15:16:28 1760973388

Your vote has been tallied

dijit · 2025-10-20T19:33:06 1760988786

Same failure mode of anything else.

How’s not paying your AWS bill going for you?

thomasjudge · 2025-10-21T13:49:46 1761054586

if the ceo of your company is personally paying the electric bill, go work for another company :)