Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

60 events per second and it's pretty but basically unfollowable.

It's something I always wonder about visualizations - no human can follow the amount of data in even a small system. How much are they just hype and feelgood, and how much are they genuinely useful?



In finance this is a classic problem and is why you see traders with proliferations of monitors, trying to track everything, and they are anyway outmatched by the machines. The irony is that the vast majority of tick data points do not contain much incremental information. They're very correlated. Think how most stocks tend to move in lockstep to the index.

We use principal components analysis a lot to cut thousands of feeds and get the "big picture" in usually 4-6 "global" variables, and then we use PCA regression to find the "outliers" in the rest of the data and show those. Thus we get at least 2-3 orders of magnitude less data that allows mere humans to actually interpret - big picture + outliers - and it's very rare that using this simple technique we ever miss much. And it can literally cut thousands of feeds into a couple of dozen. We've found this to be much more effective than creating animated "dot swarms" which look beautiful but are very poor at conveying rich information.


That's actually a challenge that we tried to overcome - being able to get the big picture on the one hand, while still being able to catch and filter a single event.


I think visualisations need to be a seriously terse summary, leaving out most of the detail apart from one or two critical dimensions where you're looking for the pattern or trend. A latency distribution heatmap can make a lot of data points visually useful, but you can't cram in five other dimensions and still make it work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: