Serverless Databases: The Future of Event-Driven Architecture

hodgesrm · on Feb 14, 2018

> I, personally, have seen many projects burning thousands a month in database costs because they prefer replicating the environment for testing branches.

If this is the premise for serverless database it's a weak start. If you really need lightweight DBMS for testing just run MySQL or PostgreSQL in Docker. If you really need access to production-like data (e.g., a lot of it so you get realistic distributions) run the same DBMS on cheap hardware or cheap instances. In both cases you can use persistent volumes and shut down when things are not in use. Few people really care if it takes a few minutes to spin up the test environment.

As for the main point of Aurora offering a "Serverless" architecture it looks as if what they've really done is enabled the DBMS compute layer to scale up and down quickly. I wonder if this optimization fell out of pushing redo log management down into the storage layer. (See Section 3.1 https://www.allthingsdistributed.com/files/p1041-verbitski.p... for details.)

marknadal · on Feb 14, 2018

Exactly! It is also ridiculous that this "serverless" phrase is being used to push this agenda.

"Serverless" DBs are also very pricey if you consider everything. Instead, it isn't actually that hard to do some work loads at huge scale for cheap, two good articles on this (one discord, one ours)

- https://blog.discordapp.com/how-discord-stores-billions-of-m...

- https://www.youtube.com/watch?v=x_WqBuEA7s8 (100M records/day for $10 total cost, server, disk, S3 backup!)

hodgesrm · on Feb 14, 2018

Thanks for the references! Just read the blog article. It was kind of obvious from first paragraph it would be a Cassandra deployment story. ;)

Gun is new to me. You have a good way of handling distributed consistency. It's intriguing that the system still can reach consistency even if every node loses network connectivity temporarily. Is there any academic work behind this?

marknadal · on Feb 15, 2018

Thank you! Yes, there is currently just a whitepaper we're publishing with Stanford and that was also reviewed by colleagues at MIT. However it isn't up to snuff yet to be posted publicly, so shoot me an email mark@gunDB.io and I'll send you the link.

For non-academics though, I did a comic strip explainer for the layperson (as distributed systems are often hyped up with elitist jargon) here: http://gun.js.org/distributed/matters.html !

The prototype shown in the video was specific to append-only data and was done last year, we recently rewrote the system to be more generalizable using a radix trie structure and have released an alpha of it with the Radix Storage Engine (RSE) in the main repo: https://github.com/amark/gun

Let me know if you need help with any/all of that, or more links/resources (docs are somewhat scarce on it currently, sadly), and I'll do what I can!

Although, I can't resist but to leave this one last animated explainer gif that shows what a radix looks like: http://gun.js.org/see/radix.gif

Clubber · on Feb 15, 2018

I think his main point was cost savings in environments that don't need constant uptime, like Dev and QA DB's in a 9-5ish shop.

You can also throw a light DB (mySql, SqlExpress, Postgres) on a VM that you already have running 24/7, that would make the additional cost zero.

Of course you are still going to need to curate your dev/qa data (through replication or periodic backup/restore) because shit data is hard to code against, not to mention debug against.

hodgesrm · on Feb 14, 2018

Just read the news release and they are claiming exactly that. Good on them--the Aurora/Redshift team has done some really outstanding work.

tytytytytytytyt · on Feb 15, 2018

> If this is the premise for serverless database it's a weak start.

He also seems to imply that you have to keep your test db's running all day instead of simply paying for them by the hour. :|

dsr_ · on Feb 14, 2018

A comparison that only mentions pros and no cons is not a useful comparison. It's a cheerleading piece.

jtwaleson · on Feb 14, 2018

We're deploying thousands of tenants to their own postgres RDS instances which are completely overkill for most scenarios. On average we use about 3% CPU... We still do this because of three reasons: security isolation, performance isolation and monitoring. I think Aurora serverless will be a game changer for us, but we would still need per-tenant monitoring.

icey · on Feb 14, 2018

I do this with a side project, except it's SQLite databases on s3 buckets -- that's about as easy as it gets and doesn't require nearly as much of the configuration overhead that Aurora does.

pletnes · on Feb 14, 2018

How does that work? Download DB file, read/write, upload DB file (if written)? What about concurrency?

icey · on Feb 14, 2018

I wouldn't trust it for write concurrency but it's been great for my use case (reads are multiple orders of magnitude more frequent than writes; and the writes can be queued), I'm using s3sqlite for this from the Zappa project: https://github.com/Miserlou/zappa-django-utils/blob/master/z....

zeckalpha · on Feb 14, 2018

http://kyle.marek-spartz.org/posts/2015-09-08-eventual-consi...

LusoTycoon · on Feb 15, 2018

How about performance? How much time to read from the database?

peterevans · on Feb 14, 2018

I was hoping the future of event-driven architecture is that we would call it "event-driven" and not serverless; alas.

mjb · on Feb 14, 2018

Event-driven and serverless are related, but distinct ideas. Event-driven architectures can be built in a datacenter, on virts, in containers, or on serverless. On the other hand, serverless architectures don't necessarily need to be event driven. These patterns do have natural affinity, so they are often referred to together.

Serverless is a set of financial, scaling and operational properties of an architecture. One of those common properties is phrased as "scaled per request", which is particularly interesting in event-driven architectures.

peterevans · on Feb 14, 2018

I understand your point; my issue is that "serverless" as a term is a misnomer. You cannot have a serverless architecture without some servers to support it.

The "serverless" distinction is a useful one to make, because it implies something about the architecture as you mention—my only point is that we could have done much better with the name we use to reference said architecture.

amarkov · on Feb 14, 2018

It seems like a pretty good name to me. The point of the term "serverless" is that you aren't presented with any abstraction of a server; operations like "spin up more servers to handle the load" or "SSH to the server to figure out what's going on" or "reboot the server because it's acting weird" don't exist for you.

The term does get abused to refer to any server cluster with a bit of autoscaling logic, and in that sense it is a misnomer. But I don't think that's what it originally meant.

styfle · on Feb 14, 2018

But didn’t we already have this concept with Platform-as-a-Service (PaaS)?

What you described sounds like Heroku, Azure Apps, etc.

amarkov · on Feb 15, 2018

In many simple cases, Heroku is effectively serverless. But there are situations (in particular monitoring and billing) where you're forced to think about the individual dynos your app is running on.

hueving · on Feb 15, 2018

No, it's a terrible name because your definition implies that VMware or EC2 instances would be servers since you can SSH into both. A significant portion of ops people would never refer to a VM as a server so you've already run into big problems.

Then there is the problem of who it is 'serverless' for. If the developers use a lambda like service hosted by their own org in their datacenter (operated by separate ops people), is it still serverless? If so, then it has nothing to do with actual servers and it's all about making operating system runtime specifics transparent. If not, then it's just a marketing term for letting Amazon or Google run your company's hardware.

moocowtruck · on Feb 14, 2018

databaseless serverless productless architecture

rb808 · on Feb 14, 2018

https://github.com/kelseyhightower/nocode

dboreham · on Feb 14, 2018

ruderless

manigandham · on Feb 15, 2018

This split of compute and storage into distinct layers is starting to become more common, and it has some pretty neat advantages like much better scalability and efficiency.

Google's BigQuery and Snowflake Data are examples of data warehouses, similar to DIY presto/drill/spark on S3. Apache Pulsar brings that to messaging and distributed logs. It'll be interesting to see how it applies to more OLTP database engines, although there are examples like TiDB which seem to work well enough.

vgt · on Feb 15, 2018

BigQuery specifically pioneered many of these concepts since it's release in 2012 - "serverless manageability", pay-per-query consumption-based pricing, pure separation of compute and storage (no intermediate ssh mesh), and more recently separation of compute and intermediate state (wonders for scalability and complex query performance).

(work at G and used to work on BQ)

hueving · on Feb 15, 2018

Separation of compute and storage has been around since before Google was even relevant as a company (see NFS).

If you think bigquery pioneered all of these concepts then your team did a very poor job of researching prior art. Maybe that was intentional for a good green-field design, but it's certainly not pioneering at that point.

jimbokun · on Feb 15, 2018

“This isn’t the end, management teams are expecting lower resource costs and higher return on investment for the technology projects.”

Wow, really? Bit radical assertion there don’t you think? I thought they might want higher resource costs and lower returns on investment.

Maybe there is some good content in there, but I found it difficult to sift through the banalities to find out.

matchagaucho · on Feb 15, 2018

It makes me nervous when cloud companies release new features to select pilot groups.

The Aurora Serverless signup form asks no questions related to DB scale or capacity, so we're left to assume they're accepting pilot customers based on region or company size?

gregwebs · on Feb 15, 2018

There are other databases in this category.

Snowflake DB (data warehouse, so competes with Redshift rather than Aurora). Google's other database offerings (Cloud Data Store and Big Query)

stunt · on Feb 15, 2018

DynamoDB has same properties but you will have more flexibility in Aurora Serverless.

M_Bakhtiari · on Feb 15, 2018

I still haven't been able to find someone who can explain what make these elaborate layers of indirection running on servers in datacentres exactly 'serverless'.

manigandham · on Feb 15, 2018

Because you (the customer) dont have to think about them. It's similar to PaaS, or rather between PaaS and SaaS as a finer granularity model.

arisAlexis · on Feb 14, 2018

isn't this like Ethereum programming on a private blockchain?