Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most meaningful data is in fact not relational. When you consider machine generated data, logs, metrics, network event data and all other types of data like this you find there are not relations in it.

This data is meaningful because it allows to analyze what's going on over massive systems, detect when problems will happen, find bottle necks in applications and infrastructure, among many other use cases.

Application domain data tends to be relational I'd agree. But in general, this makes up a very small percentage of meaningful data in the world.



> When you consider machine generated data, logs, metrics, network event data and all other types of data like this you find there are not relations in it.

I find that hard to believe. Maybe not that the raw data isn't already relational, but that there are no relations real or implied.

If logs contain info about 'things' and any of those things can be considered to be the 'same thing' for multiple entries, then there's a relation right there – entry to thing.

And even metrics and network event data I'd expect to be full of cryptic IDs that reference some 'thing', i.e. a typical 'code' for which it's really nice to have a table with at least a friendly description.

Admittedly some of this data – or maybe even most of this data – isn't very 'deeply relational', but it definitely seems that claiming that "there are [no] relations in it" isn't strictly true.


Well, it's all a point of view really. The data is in the form of an "event". An occurrence of a fact which occurred at a particular time and has data associated with it. So therefore, a "relation" as constructed in a relational database isn't appropriate. You aren't truly denormalizing the data when repeating ID's or tags or labels in this type of data. This is because, at that time that was in the fact the associated ID, tag, label, etc. It would make the stored event false if a field associated with it were to be changed as at the time of the event, that field did not have that value.

But it's all semantics really at that point.

Anyways, a relational database is a poor solution for this type of data. The stored data gains little to nothing, and may even negatively affect it's integrity (at time t, the event DID have this ID; it DID have this label), when stored relationally. Each event is discrete and there will be many of them which optimizes better for scale than relational organization.

I guess my point was there is vastly more useful data appropriate for a non-relational database than there is for relational databases. You might say it still has a "relation" in an abstract sense but this data does not need relational semantics within the database it resides in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: