To answer one of the questions in the post: I do believe that MongoDB's transactions will only be used in some applications, and only in few but critical places within those applications. That is because with a document database, data that belongs together will often already be together in the same document, rather than distributed over multiple tables as with a traditional database. And if it's together in the same document, you can already modify that atomically in MongoDB without multi-document transactions. (Disclosure: I work for MongoDB.)
Blog author here. Thanks for reading and responding. I understand that modifications can be made within a single MongoDB document atomically. I was thinking more of transactions that would include multiple documents, say "store" and "customer", and doing a transaction across them. That could be done in single shard. But if it was sharded across a few data centers for data locality, that would have to wait for V4.2, right?
There may be a misunderstanding here about sharding vs replication. Having the same data distributed across a few data centers for data locality is done using replication. Sharding means that data is partitioned horizontally to achieve higher throughput, or other goals. See here for a good discussion: https://stackoverflow.com/questions/11571273/in-mongo-what-i...
MongoDB's multi-document transactions work with replication today, and will work with sharding in 4.2.
Got that -- sorry for the terminology error. I was thinking the same shard but in different data centers, so replicas of the same shard. Transactions within single shard replica set are possible in v4.0. Transactions across multiple shards will be possible in v4.2. Reading the doc to try to see performance impact on transactions if multiple replicas of single shard.
I am one of the team of MongoDB engineers working with Epic on this issue, and I can assure you that the situation is under control and we have everything in place to scale this application to much higher numbers. However, we're not publishing details about our support cases, especially while they are in progress. That is something for Epic to decide, and I do assume they will eventually say in public just how well MongoDB is, in fact, performing for them.
I beg to differ. (Disclosure: I work for MongoDB.) Using JSON as your data model, rather than relational tables, lets you build different applications that don't need multi-document transactions as often, because the data is already together in a single document. But when you do need multi-document transactions (a small percentage of applications do, and only few use cases inside those applications), they are now available. There is no speed impact on cases when you don't use them. And most of the time, you shouldn't use them, otherwise you wouldn't be capitalizing on the advantages of JSON. I think that's a game changer, but then again: I do work for MongoDB.
Its usually only after a while you realize almost every piece of meaningful data is relational. It just didn't look that way when the project started. But now you're committed on the wrong database and its very costly to switch back to SQL.
Literally every project I saw using MongoDB ended up going back to SQL within the first 2 years after realizing the data is indeed very much relational and theres no clean way to model it using documents.
You always end up with either tons of duplication across documents, which is hell to maintain, or tons of multi-document queries with hacks to look ACID, which is also hell to maintain.
Sure Mongo makes it easy to prototype applications, but it makes it very complex to build robust and maintainable software. Its especially bad if you think your data isn't relational, because it almost certainly is.
Disclaimer: I believe Datomic to be the game changing database; because it values simplicity and composition and these attributes drive the entire design.
Video games storing player data are a great example of nonrelational data. I'm intending on writing a blog post after I finish my game detailing the structure of data I store and why it was so perfect for mongodb.
On the surface it sounds like you might have a case for Mongo, lookout for scenarios like...
* Trading in game items between two users (needs multi document atomic locks if you don't want duplicate or lost items) assuming your "schema" is a document per user
* You want to rename or restructure an attribute in the future, with no schema it's not possible change migrate data easily without writing ad hoc code (maybe you can use third party tools) or changing queries to expect data in multiple "schemas" which quickly gets painful
> You want to rename or restructure an attribute in the future, with no schema it's not possible change migrate data easily without writing ad hoc code (maybe you can use third party tools) or changing queries to expect data in multiple "schemas" which quickly gets painful
You can have schemas with MongoDB. There are various libraries to facilitate database design by schema specification.
Also renaming or restructuring your data is not necessarily an easy task with SQL. The nature of a database dictates that how good it works for your application depends on up to how well thought-out your schema is. Having to change your schema around is tasking. One of the reported advantages of document stores when they were becoming trendy was that it was easy to change your schema since your schema is essentially determined and regulated at the application layer.
Also MongoDB has ACIDic transactions now (freaking finally) so if it’s as-advertised then I feel like half of your argument is not really a strong one any more.
Yes players can sell items to other players, so that's the one place so far I've needed to worry about atomicity, but even mongodb docs give examples with how to deal with something like that: https://docs.mongodb.com/manual/tutorial/perform-two-phase-c...
So yes, it's annoying for a very small % of what I'm doing, but 99% of my updates/writes are within a single document, so I find it very nice for development.
You still need ACID transactions over multiple entries even if your data is nonrelational otherwise there is the potential for item and money duping bugs.
A simple example would be a marketplace.
Player buys item X with Y gold from another player.
1. Server checks that item X exists.
2. Server checks that the player has at least Y gold.
3. Server removes the gold from the player
4. Server gives gold to the seller.
5. Server removes item from marketplace.
6. Server adds item to inventory.
What if someone maliciously crafts two requests in a way that step 2 of the second request happens before step 3 of the first request?
The money is deducted properly but the account can now have a negative balance and there are now two instances of the item.
Most meaningful data is in fact not relational. When you consider machine generated data, logs, metrics, network event data and all other types of data like this you find there are not relations in it.
This data is meaningful because it allows to analyze what's going on over massive systems, detect when problems will happen, find bottle necks in applications and infrastructure, among many other use cases.
Application domain data tends to be relational I'd agree. But in general, this makes up a very small percentage of meaningful data in the world.
> When you consider machine generated data, logs, metrics, network event data and all other types of data like this you find there are not relations in it.
I find that hard to believe. Maybe not that the raw data isn't already relational, but that there are no relations real or implied.
If logs contain info about 'things' and any of those things can be considered to be the 'same thing' for multiple entries, then there's a relation right there – entry to thing.
And even metrics and network event data I'd expect to be full of cryptic IDs that reference some 'thing', i.e. a typical 'code' for which it's really nice to have a table with at least a friendly description.
Admittedly some of this data – or maybe even most of this data – isn't very 'deeply relational', but it definitely seems that claiming that "there are [no] relations in it" isn't strictly true.
Well, it's all a point of view really. The data is in the form of an "event". An occurrence of a fact which occurred at a particular time and has data associated with it. So therefore, a "relation" as constructed in a relational database isn't appropriate. You aren't truly denormalizing the data when repeating ID's or tags or labels in this type of data. This is because, at that time that was in the fact the associated ID, tag, label, etc. It would make the stored event false if a field associated with it were to be changed as at the time of the event, that field did not have that value.
But it's all semantics really at that point.
Anyways, a relational database is a poor solution for this type of data. The stored data gains little to nothing, and may even negatively affect it's integrity (at time t, the event DID have this ID; it DID have this label), when stored relationally. Each event is discrete and there will be many of them which optimizes better for scale than relational organization.
I guess my point was there is vastly more useful data appropriate for a non-relational database than there is for relational databases. You might say it still has a "relation" in an abstract sense but this data does not need relational semantics within the database it resides in.
As a developer: What? Almost everything is relational. I do appreciate Mongo's query language and ease of use (it was the first DB I learned), but your statement is ludicrous. Think about a basic blog system. You'll have relations between authors, posts, categories, and comments.
In my experience, Mongo is most often used with ORMs that emulate joins, like Mongoose. And the possibility of data inconsistency due to lack of transactions is ignored, or patched over with cleanup scripts after the fact.
How does MongoDB handle schema changes? For example, let's say I want to add a mobile phone field to a customer record type. How would I go about doing that in MongoDB?
The short answer is: just do it. You can add any field to any document at any time. That's the beauty of JSON documents without schema constraints. Then of course you need to let your application understand that. But it turns out it's almost trivial to make an application display a phone number field if it finds one, and not display a phone number if there isn't one in the document.
It hurts when the next requirement comes along something like...
"As a user I want to have a home, work, and mobile phone number"
Now you have 3 "versions" of your implicit "schema" to contend with
1) No phoneNumber
2) phoneNumber and mapping it into / out of one of the three phone numbers in the UI
3) objects with three properties homePhoneNumber, workPhoneNumber, mobilePhoneNumber etc
Then the business comes up with "As a user I want to have arbitrary phone numbers that I can label" now the developers start to squeal
RDBMS + SQL is no panacea but having DDL operations like the following (all probably syntactically invalid but you get the idea) out of the box is incredibly powerful.
ALTER TABLE user RENAME COLUMN phone_number TO home_phone_number;
ALTER TABLE user ADD COLUMN work_phone VARCHAR(32) NOT NULL;
CREATE TABLE phone_number (id BIGINT NOT NULL, user_id BIGINT NOT NULL, name VARCHAR(64) NOT NULL, phone_number VARCHAR(32) NOT NULL);
I have had reasonable success using MongoDB as a store of "things that happened" and will never change
And I would still claim that this is easier in MongoDB because several versions of the phone number field(s) can happily coexist in the same collection. Those variants are usually trivial to understand for someone who even just looks at the data, and the application can be written to either accept the different formats, or adjust the format on the fly when it encounters a document that still uses an old schema. Or you could indeed write a batch job that bumps all your phone numbers to a new format, and you could put a JSON schema constraint on your collection that enforces the new schema for every future document. All those possibilities exist, and I truly see that as a big advantage.
For any "real" system that is going to be in production for a long time this becomes a real problem
There are tools to "migrate" data but they come with all the limitations of the Mongo isolation model
Typically you either
* Write ad hoc (possibly using some tooling) code to iterate over your old data adding or mutating the field(s) in question
* Write queries such that they can handle the data being present, absent or in different forms for all of time. As you could expect this is a large burden
WiredTiger shows a lot of potential, but it would be irresponsible to make such a radically different engine the default for everyone, even for new databases, without giving it some time to mature.
Six minutes sounds about right to me, if the food is not complex. Try using a stopwatch next time you're in a restaurant. The service industry is pulling some pretty amazing stuff, all the time.