Show HN: Sense - A New Cloud Platform for Data Science and Big Data Analytics

tristanz · on Jan 9, 2014

Sense cofounder here. We're just getting started, so feedback is welcome.

Sense supports R, Python, JavaScript and SQL out of the box, but is fully extensible to new languages and tools:

https://github.com/SensePlatform/sense-engine

We have Julia, Hive, and Spark engines in development.

berto99 · on Jan 9, 2014

Wow, this looks really nice. One thing that isn't clear is how I get the information out. so for example, if I'm building a recommendation engine, is there some kind of api that my webapp can use to get the information (sorry, new to all this).

tristanz · on Jan 9, 2014

Right now Sense is best for ad-hoc interactive analysis and batch/scheduled jobs. You can run long-running services that expose something like a REST endpoint, but we have plans to make exposing services much easier, so I'd probably hold off until we have an "official" solution.

tansey · on Jan 9, 2014

Looks great!

Do you have full support for numpy, scipy, matplotlib, pymc, and pandas?

tristanz · on Jan 9, 2014

Yes. I'll also point our that Anand (apatil) was a core developer for PyMC. We have big plans around Bayesian computation on Sense.

apatil · on Jan 9, 2014

Thanks! Yes, we do, and you can install packages yourself and/or let us know if anything is missing.

siganakis · on Jan 9, 2014

Is it possible to move data around between languages / engines?

E.g. would it be possible to run a query against Red Shift (step 1), then cleanse it in Python (step 2), run R scripts over the Python output (step 3) then dump the results back to Red Shift (step 4)?

If I then decide I need to change the Red Shift query (step 1), can I then re-run the whole pipeline?

Munging data between different tools is what I seem to spend most of my time doing, so anything that helps that would be a big productivity boost.

apatil · on Jan 9, 2014

At the moment, not really. In your scenario, you could have a Python dashboard launch a Redshift dashboard with startup code that runs the initial query, then clean the result, then launch an R dashboard and pass the clean data to it either over the shared filesystem or a messaging system such as 0MQ or Redis, then save the results to S3 for consumption by Redshift.

You're probably looking for something smoother, though. We definitely intend to have a good solution for workflows like you describe in the future.

ironchef · on Jan 9, 2014

I think you may want to think about treating the analyses like an ETL pipeline (think dependent jobs in chronos or some such) using some intermediary (S3, whatevs). That would probably be useful to a lot of analysts.

pwang · on Jan 9, 2014

Looks great! Is it really just you and Anand that put this together?

tristanz · on Jan 9, 2014

Yup. It's just the two of us. So far. If anybody wants to join: tristan@senseplatform.com.

houshuang · on Jan 9, 2014

Looks awesome, great with multi-engine support. I'd love if you could open-source your two-pane approach to IPython... I really like IPython (and have been working with IHaskell lately), but I find the RStudio approach much better... having my code on the left, moving up an down and executing lines with Cmd+Enter, running entire cells (knitr Rmd style), seeing graphs and documentation while you're working, etc...

apatil · on Jan 9, 2014

Glad you like it. The IPython engine isn't open source at the moment, but that may change in the future. Out of curiosity, if it were open source, what might you use it for?

houshuang · on Jan 9, 2014

For my own work, maybe help integrate it with IPython as an alternative front-end. I don't have a cloud project, I just do my own data analysis/learning with IPython and IHaskell, and think a multi-pane approach would be much more powerful. (I applied for an account with Sense, looking forward to playing with it and providing feedback).

fawce · on Jan 9, 2014

Also see Domino Data Lab (http://www.dominodatalab.com) which is in public beta. Similar, with more emphasis on reproducibility of past results.

Blahah · on Jan 9, 2014

Very interesting! Nice work guys. Just sent you an email about academic research use case.

You don't mention anything about RAM in your pricing - what are the restrictions? And what about I/O and storage?

apatil · on Jan 9, 2014

We're planning to charge per core for usage. Each core will be a true physical core, with 5.5 ECU. The container will get 3.75GB of RAM and a slice of the host's bandwidth per core. We'll also have very inexpensive micro tier, and eventually some kind of long-lived services tier, as Tristan mentioned in another comment.

apatil · on Jan 9, 2014

I forgot to mention storage. We're planning on about 1GB of disk per project. Because we live on AWS, we get quick access to all of Amazon's storage services as well.

Blahah · on Jan 9, 2014

any plans for more RAM options? something like the EC2 244GB?

apatil · on Jan 9, 2014

Not at the moment. Right now, the biggest single dashboard is 60GB. You can launch as many dashboards as your plan allows if your application can be distributed, but I'm guessing that isn't the case for you.

jasonkolb · on Jan 9, 2014

Hey Tristan, looks awesome. Great work!

This has some really interesting adjacencies to a project that we currently have in limited beta and getting to roll out widely very soon. I'd love to chat about some ideas I have to work together that could work out really nicely for both of us. If you're interested drop me a line: jason@applieddatalabs.com

notastar · on Jan 9, 2014

Tristan, quick question here.. who did all coding part ? Do you still code ? Does Anand code ?

Since all of your team are very high profile ( Stanford, Harvard) I am wondering how much all have kept to ground work after rising to such level ? Thanks for your answer.

ps: I am hopeful for Stanford MBA admission.

tristanz · on Jan 10, 2014

Anand and I built everything, including the choosing the colors of buttons. That's early stage startup life.

MaBu · on Jan 9, 2014

If you think the platform is finished enough maybe post it on Kaggle: http://www.kaggle.com/ there are many potential users for this app IMHO.

micro_cam · on Jan 9, 2014

Looks nice.

How well does the distributed filesystem perform and what size data sets an it handle?

How quickly can you ramp up 10, 100 or 1000 cores?

Improved performance in these areas are the big things that would get our group to adopt a new platform.

tristanz · on Jan 9, 2014

Scaling up from 10, 100, to 1000 cores is fast (3s per engine in parallel). However, something like 1000 cores would currently require spinning up new instances (1-2 minutes) if deployed in the cloud.

The distributed filesystem is meant for easily sharing code and medium sized data across containers. In the cloud, it is best to to use S3 directly for large data and local disks for high IO tasks. For on premise deployments, there are more options.

I'd be interested to hear about your use case. Feel free to drop me a line at tristan@senseplatform.com

berto99 · on Jan 9, 2014

A little off topic, but nice to see you're using Angular.

manoDev · on Jan 9, 2014

Looks a lot like GitHub. Were you inspired by it?

tristanz · on Jan 9, 2014

Yes, we're fans of how sharing and collaboration works on GitHub. The goal is to make Sense the center of gravity for data scientists the way GitHub is for developers.

We're not trying to replicate GitHub's features. The core of Sense is a better way to work with data: the compute infrastructure, engines, and analytics workflow. Advanced users using git will likely use Github in addition to Sense.

berto99 · on Jan 9, 2014

Tried to sign up, but I need an invitation code.

tristanz · on Jan 9, 2014

You currently need an invitation code to register. We're giving these out slowly to make sure everything works smoothly.

mdda · on Jan 9, 2014

Typo : "Distributed POSIX complaint project file system"

apatil · on Jan 9, 2014

Thanks.

ihnorton · on Jan 9, 2014

SageMath Cloud has open signup:

http://cloud.sagemath.com

quasiben · on Jan 9, 2014

Is IPython notebook on the roadmap?

tristanz · on Jan 9, 2014

The Python engine is IPython underneath the hood. Any code or visualizations that work in IPython notebooks should work in Sense.

There is a difference though. In our experience, we've found that the notebook style development, with code inline, is awkward when doing serious analytics. It's harder to use version control, editors, etc. We have opted for the dual pane experience common in R and Matlab. The output however can be rich and interactive just like an IPython notebook and is always saved.