> I, personally, have seen many projects burning thousands a month in database costs because they prefer replicating the environment for testing branches.
If this is the premise for serverless database it's a weak start. If you really need lightweight DBMS for testing just run MySQL or PostgreSQL in Docker. If you really need access to production-like data (e.g., a lot of it so you get realistic distributions) run the same DBMS on cheap hardware or cheap instances. In both cases you can use persistent volumes and shut down when things are not in use. Few people really care if it takes a few minutes to spin up the test environment.
As for the main point of Aurora offering a "Serverless" architecture it looks as if what they've really done is enabled the DBMS compute layer to scale up and down quickly. I wonder if this optimization fell out of pushing redo log management down into the storage layer. (See Section 3.1 https://www.allthingsdistributed.com/files/p1041-verbitski.p... for details.)
Exactly! It is also ridiculous that this "serverless" phrase is being used to push this agenda.
"Serverless" DBs are also very pricey if you consider everything. Instead, it isn't actually that hard to do some work loads at huge scale for cheap, two good articles on this (one discord, one ours)
Thanks for the references! Just read the blog article. It was kind of obvious from first paragraph it would be a Cassandra deployment story. ;)
Gun is new to me. You have a good way of handling distributed consistency. It's intriguing that the system still can reach consistency even if every node loses network connectivity temporarily. Is there any academic work behind this?
Thank you! Yes, there is currently just a whitepaper we're publishing with Stanford and that was also reviewed by colleagues at MIT. However it isn't up to snuff yet to be posted publicly, so shoot me an email mark@gunDB.io and I'll send you the link.
For non-academics though, I did a comic strip explainer for the layperson (as distributed systems are often hyped up with elitist jargon) here: http://gun.js.org/distributed/matters.html !
The prototype shown in the video was specific to append-only data and was done last year, we recently rewrote the system to be more generalizable using a radix trie structure and have released an alpha of it with the Radix Storage Engine (RSE) in the main repo: https://github.com/amark/gun
Let me know if you need help with any/all of that, or more links/resources (docs are somewhat scarce on it currently, sadly), and I'll do what I can!
Although, I can't resist but to leave this one last animated explainer gif that shows what a radix looks like: http://gun.js.org/see/radix.gif
I think his main point was cost savings in environments that don't need constant uptime, like Dev and QA DB's in a 9-5ish shop.
You can also throw a light DB (mySql, SqlExpress, Postgres) on a VM that you already have running 24/7, that would make the additional cost zero.
Of course you are still going to need to curate your dev/qa data (through replication or periodic backup/restore) because shit data is hard to code against, not to mention debug against.
We're deploying thousands of tenants to their own postgres RDS instances which are completely overkill for most scenarios. On average we use about 3% CPU... We still do this because of three reasons: security isolation, performance isolation and monitoring. I think Aurora serverless will be a game changer for us, but we would still need per-tenant monitoring.
I do this with a side project, except it's SQLite databases on s3 buckets -- that's about as easy as it gets and doesn't require nearly as much of the configuration overhead that Aurora does.
I wouldn't trust it for write concurrency but it's been great for my use case (reads are multiple orders of magnitude more frequent than writes; and the writes can be queued), I'm using s3sqlite for this from the Zappa project: https://github.com/Miserlou/zappa-django-utils/blob/master/z....
Event-driven and serverless are related, but distinct ideas. Event-driven architectures can be built in a datacenter, on virts, in containers, or on serverless. On the other hand, serverless architectures don't necessarily need to be event driven. These patterns do have natural affinity, so they are often referred to together.
Serverless is a set of financial, scaling and operational properties of an architecture. One of those common properties is phrased as "scaled per request", which is particularly interesting in event-driven architectures.
I understand your point; my issue is that "serverless" as a term is a misnomer. You cannot have a serverless architecture without some servers to support it.
The "serverless" distinction is a useful one to make, because it implies something about the architecture as you mention—my only point is that we could have done much better with the name we use to reference said architecture.
It seems like a pretty good name to me. The point of the term "serverless" is that you aren't presented with any abstraction of a server; operations like "spin up more servers to handle the load" or "SSH to the server to figure out what's going on" or "reboot the server because it's acting weird" don't exist for you.
The term does get abused to refer to any server cluster with a bit of autoscaling logic, and in that sense it is a misnomer. But I don't think that's what it originally meant.
In many simple cases, Heroku is effectively serverless. But there are situations (in particular monitoring and billing) where you're forced to think about the individual dynos your app is running on.
No, it's a terrible name because your definition implies that VMware or EC2 instances would be servers since you can SSH into both. A significant portion of ops people would never refer to a VM as a server so you've already run into big problems.
Then there is the problem of who it is 'serverless' for. If the developers use a lambda like service hosted by their own org in their datacenter (operated by separate ops people), is it still serverless? If so, then it has nothing to do with actual servers and it's all about making operating system runtime specifics transparent. If not, then it's just a marketing term for letting Amazon or Google run your company's hardware.
This split of compute and storage into distinct layers is starting to become more common, and it has some pretty neat advantages like much better scalability and efficiency.
Google's BigQuery and Snowflake Data are examples of data warehouses, similar to DIY presto/drill/spark on S3. Apache Pulsar brings that to messaging and distributed logs. It'll be interesting to see how it applies to more OLTP database engines, although there are examples like TiDB which seem to work well enough.
BigQuery specifically pioneered many of these concepts since it's release in 2012 - "serverless manageability", pay-per-query consumption-based pricing, pure separation of compute and storage (no intermediate ssh mesh), and more recently separation of compute and intermediate state (wonders for scalability and complex query performance).
Separation of compute and storage has been around since before Google was even relevant as a company (see NFS).
If you think bigquery pioneered all of these concepts then your team did a very poor job of researching prior art. Maybe that was intentional for a good green-field design, but it's certainly not pioneering at that point.
It makes me nervous when cloud companies release new features to select pilot groups.
The Aurora Serverless signup form asks no questions related to DB scale or capacity, so we're left to assume they're accepting pilot customers based on region or company size?
I still haven't been able to find someone who can explain what make these elaborate layers of indirection running on servers in datacentres exactly 'serverless'.
If this is the premise for serverless database it's a weak start. If you really need lightweight DBMS for testing just run MySQL or PostgreSQL in Docker. If you really need access to production-like data (e.g., a lot of it so you get realistic distributions) run the same DBMS on cheap hardware or cheap instances. In both cases you can use persistent volumes and shut down when things are not in use. Few people really care if it takes a few minutes to spin up the test environment.
As for the main point of Aurora offering a "Serverless" architecture it looks as if what they've really done is enabled the DBMS compute layer to scale up and down quickly. I wonder if this optimization fell out of pushing redo log management down into the storage layer. (See Section 3.1 https://www.allthingsdistributed.com/files/p1041-verbitski.p... for details.)