Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GitHub and Jupyter IPython Notebooks (github.com/blog)
199 points by benn_88 on May 7, 2015 | hide | past | favorite | 51 comments


I am utterly giddy about this. I've spent a lot of time in the book I am writing to generate PDFs, to create TOCs for links to nbviewer, managing all these links and production every time I made a change. Now people can just view it directly in github. Life is as it should be.

There is just so much innovation and hard work coming out of this group, which is quite tiny. Any time I've raised an issue, even if it turned out to be my own dumb fault, gets the immediate attention and support of a core developer.

If you want to communicate with programmers there really isn't a better platform out there. The old 'write code, run code, save results, write latex, gen document, find error, repeat, opps code is out of sync with results....' is pining for the fjords.


Yes! A million times, yes! This lowers the barrier to sharing research results stored in a private repo with collaborators. Also, for those who have been looking for ways to display equations in markdown[1], you can embed the equations in markdown (Ipython uses Mathjax) within the notebook.

[1] https://github.com/github/markup/issues/274


This is great! We use IPython Notebook a lot in our group, for exploratory data analysis, teaching, sharing experimental results, and as an electronic lab notebook. Inline GitHub rendering is very nice for all our public work.

If you run a private GitLab server, you could have a look at my patch which adds similar functionality to GitLab (in a rather ad-hoc way): https://gist.github.com/martijnvermaat/6926070


GitLab CEO here, thanks for maintaining your patch for GitLab. If you want to discuss contributing jupyter support to GitLab itself I'd be happy to discuss.


This is amazing. I think ipython notebook is fantastic in all regards. However, I feel there's a bit of an elephant in the room. The main tools I use as a programmer are shell and a text editor. I'm not about to start writing code in a web browser[1]. Is there any hope that there will be a good workflow for somehow editing .ipynb from a text editor or is that antithetical to the design? Could markdown perhaps be used as a primary editing format to target ipynb?

Looks like there's at least one project doing that. https://github.com/aaren/notedown

And there's the emacs-specific project https://github.com/tkf/emacs-ipython-notebook

[1] EDIT: I mean, not right now anyway. I've got a feeling that statement could seem a bit dated later in my life...


It is an interesting point. In my usage, I don't view it as a one-or-the-other proposition. I use Julia, so if I'm doing some exploratory work and experimentation, or making something to present results, then the IJulia notebook is great. If I'm writing some serious longer-running stuff, or a package, I'm in an editor. Sometimes I'll write code in a separate file and call it from a notebook just to keep the notebook focussed on communicating something, and "hiding" the details.


Yeah, notebooks are great for experimentation but not so much for anything that needs to be reused.

I tend to write prototype code in the notebook then copy/paste it into a package as I become satisfied that the code is correct.


I wrote notedown precisely so that I could edit notebooks in my text editor and version control them as plain text.

It would be good to be able to transparently edit a .ipynb in a text editor - I've been meaning to wrap notedown in a vim plugin but haven't had time. This still leaves the version control problem though.

Note that I recently enabled the reverse (editing markdown in the browser as if it was a notebook), enabled by setting the following in ~/.ipython/profile_default/ipython_notebook_config.py (or similar):

    c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'
This is useful for more interactive work, e.g. iterating on plots, whilst still having everything stored in markdown. See [1] for more info.

[1]: https://github.com/aaren/notedown/issues/22


I've been a big fan of TFK's EIN. Unfortunately, TFK no longer maintains the project and I have not had good luck with the various EIN forks. (Anyone else having better experiences?) I still get by with the original EIN by using older versions of ipython/IPyNB. Moreover, the IPyNB technology is getting somewhat bloated with widgets and whatnot and I don't know how that will integrate with an emacs environment. Nevertheless, I still hope emacs modes similar to EIN will be forthcoming in this area. I was hoping to propose a code sprint on this topic at SciPy this year, but I won't be going to Austin, alas.



So weird, I was playing with IPython Notebook with PyCharm yesterday. Then I went on to think about how it could be integrated into Github without building the compiled files into the repo. :)

Really there's no solution for extending Github except wait for them to integrate hooks. It would be nice if they made this more generic, and created a file render hook, that would allowed devs to render whatever types of files they want into Github Markdown.


For the first time today I put an IPython notebook into a github repo, and thought "cool, they render on GitHub", only to read that they rolled it out today on HN.

Super neat.


How does iPython integrate with PyCharm?


PyCharm allows you to edit and run ipynb files directly – see here: http://blog.jetbrains.com/pycharm/2014/12/feature-spotlight-...


Just a general comment, but in my opinion IPython/Jupyter is worth learning Python for. It is such a great platform for tinkering with data or really any idea you might play with in code.


The notebooks support a very large number of kernels, including Julia, R, bash, Lua, Erlang, Perl, OCaml, and so many more[1].

[1] https://github.com/ipython/ipython/wiki/IPython-kernels-for-...


And in case you were wondering, GitHub does render notebooks using other language kernels. For example, here's a notebook in Haskell: https://github.com/gibiansky/IHaskell/blob/master/notebooks/...


I would like to also make note you can use any of the other languages that Juypter supports. I am using R right now and it works, but it isn't as useful as RStudio.

I am guessing that the Juypter notebook format might get traction with other projects besides Jetbeans.


Hey this is great! I just took a look and saw that a notebook [1] I was tracking in github is now rendering beautifully - and I didn't have to do anything!

https://github.com/rcompton/ml_cheat_sheet/blob/master/super...


They've really relaxed the filesize restrictions for these - I don't think a 7mb PDF will render in GitHub but this notebook will.


There's a new problem which still hasn't been resolved w.r.t. iPython notebooks and reproducible results, and that's the problem of private, ephemeral, or mutable data. At the very least, iPython theoretically helps with the audit trail for private and mutable data, but it doesn't necessarily foster (re)producible results or alternative analysis.

This is a more general problem overall, but potentially something like magnet URIs and bittorrent could really help with part of the problem. (I don't really believe git as a system nor GitHub as a platform to be the appropriate place to solve this either).


Are you saying that people aren't storing data on github? How large of datasets are you talking about?

It's a shame that Github's large file storage (LFS) mechanism isn't publicly released yet, because I am sure that will appease your problems. (This is assuming your datasets don't fit into a csv (note that the current hard limit for github files is 100MB, and a 100MB CSV file is a lot of data)).


It should be noted that there are reasons beyond file size not to store data on GitHub, though I do regularly generate simulation data that's way larger than 100 MB CSV files.


Can you list the reasons that you're thinking of not storing data on Github? I would assume that you have a (paid) user or organization account and keep the repo private, unless it was for a published paper.

I was thinking more along the lines of data that you'd want to graph or preset for analysis, as in a notebook or a paper. I would (hopefully) assume that your simulation data would be generated and used in such a way that it wouldn't need to be stored permanently. At least, when I was doing simulation based analysis, I wasn't necessarily concerned about any individual run, but rather a combination of a bunch of simulations (all of which were ephemeral).


Because there are data use agreements for health data with personal identifiers, and the vast majority of them aren't going to accept "We're keeping a copy on a private Github repo".

Nor, to be blunt, should they.

My particular simulation work is rather interested both in individual runs (and indeed, individuals within those runs) as well as summarization.

Beyond that, what use is there to putting just the "summary" data online, when the underlying data made that still exists as "you're just going to have to trust me". Being able to replicate my figure code doesn't get people very far.


Just curious: why not just provide the code that generates the simulation data?


Because it's an absolutely massive code base that requires access to some serious HPC resources, which basically makes it difficult if not impossible to reproduce for most people. Putting something out there like that also implies something of an obligation to maintain and support it, which isn't what we do.

There's also a weird data rabbit hole. Is the code that generates the simulation data enough, without the underlying data that code uses? Some of that is either protected or proprietary, so even with the code, it's utterly useless.


Simulations may run for weeks or months on large computing clusters. People wanting the resulting data may not have suitable access and/or resource allocations to repeat the runs.


It helps with ephemeral data, unless I misunderstand you. Notebook document how to generate the ephemeral data.

Surely, it does not help with private or mutable data, but how would magnet URIs and bittorrent help? If it is private, it cannot be shared with bittorrent either and sharing mutable data via bittorrent is not its primary use case.

Bittorrent might help with big data, where big means 100MB+ in the case of Github. There are other approaches for big files in git (git annex, Github's LFS, git-bigfiles, etc).


This also means, at GitHub repos can get DOIs via FigShare, you can have DOIs for an iPython Notebook, which is awesome.


Okay, what is the best format to store images in? My notebooks contain lots of graphs and output but I'd rather avoid saving the raw pixels. So far I've been using SVG with mixed success. Does anyone use anything else to save space?


I just tried out https://pypi.python.org/pypi/ipynbcompress/0.3.0 and it worked pretty well. Need to use 'png' to get nice rendering on github.

The last cell here has my use https://github.com/rcompton/ml_cheat_sheet/blob/master/super...


Great idea! Although it appears that the Audio widget does not display correctly. Compare:

http://nbviewer.ipython.org/github/stevetjoa/stanford-mir/bl...

https://github.com/stevetjoa/stanford-mir/blob/master/notebo...


Hope support for R-markdown is next!


I was also a bit dissapointed for no mention about R :( Anyway, i think it's possible to get notebook for R using this: https://github.com/IRkernel/IRkernel

look at this demo: https://github.com/IRkernel/IRkernel/blob/master/Demo.ipynb


So they're only displaying the notebooks, but not actually allowing you to work in them, right? Because for a minute there I was pretty excited!


Excited about IPython notebooks generally or possibility of editing them on GitHub?

If you want one to edit, try here: https://cloud.sagemath.com


There are a lot of tools for editing notebooks, and they're complex. I also don't use their online code editor, because mine is better. Through my personal experiences, my assumption is they'll never take on that project.


you cannot work on it yet.


It sounds like you expect this to change. If you could actually run the commands, where would the computing power come from?


If you go to https://try.jupyter.org/ , you'll get a temporary notebook server in a Docker container, courtesy of Rackspace. It's automatically destroyed after a few minutes of inactivity.


editing does not either imply running code. Markdown cells could be easily edited without a running kernel.


Very thoughtful and awesome. Now is there an easy way to save notebooks directly on GitHub?(git as storage system?). That would make the whole flow seamless and make sharing and versioning easy :)


Screen has both 'c' and 'C-c' set to new window by default. tmux only has 'c' by default.

From https://www.gnu.org/software/screen/manual/html_node/Default...:

  C-a c
  C-a C-c
  (screen)
  Create a new window with a shell and switch to that window. See Screen Command. 
Edit: d'oh. I had to reread the original comment. You already know this, but by adding 'C-c' I don't have to release the Ctrl button.

You can add this to tmux.conf to get screen-like behavior:

  # New Window
  bind C-c new-window


The wires are crossed on HN or your tabs. Your tmux response is in the github-ipython notebook thread.


It would still be nice if IPython Notebook had an option to save the code separate from the output, for example as two files in the same directory. This would make source controlling easier.


Running `ipython --notebook` with the additional option `--script` may be what you are looking for. This saves a `.py` file with the same name next to your `.ipynb` file (the non-code sections are present, but they are commented out).


I just started using these for my data science work and it has been absolutely awesome. Even better now that they display on Github and apparently (based on a comment below) Gitlab too.


Would be nice to have .ipynb preview on Gist also!


This is awesome! Similar to nbviewer.ipython.org functionality.


Awesome, thanks for sharing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: