Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you list the reasons that you're thinking of not storing data on Github? I would assume that you have a (paid) user or organization account and keep the repo private, unless it was for a published paper.

I was thinking more along the lines of data that you'd want to graph or preset for analysis, as in a notebook or a paper. I would (hopefully) assume that your simulation data would be generated and used in such a way that it wouldn't need to be stored permanently. At least, when I was doing simulation based analysis, I wasn't necessarily concerned about any individual run, but rather a combination of a bunch of simulations (all of which were ephemeral).



Because there are data use agreements for health data with personal identifiers, and the vast majority of them aren't going to accept "We're keeping a copy on a private Github repo".

Nor, to be blunt, should they.

My particular simulation work is rather interested both in individual runs (and indeed, individuals within those runs) as well as summarization.

Beyond that, what use is there to putting just the "summary" data online, when the underlying data made that still exists as "you're just going to have to trust me". Being able to replicate my figure code doesn't get people very far.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: