Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OpenCV in the browser using WebAssembly and web workers (aralroca.com)
316 points by aralroca on May 8, 2020 | hide | past | favorite | 38 comments


Here's a link to a demo you can try in your browser. (not my demo). https://huningxin.github.io/opencv.js/samples/video-processi...


This app lets you play around with OpenCV, like jsfiddle or codepen: https://cloudvision.app/


That's a... very misleading name. It's called "Cloud" vision, but then says:

> Cloud Vision does not track, record or send any images, videos, or information provided by the user to any server. All image processing is performed within the application.


Wow I am impressed at how smoothly it runs in the browser. Was not expecting it to reach 30fps, I thought WASM builds were still pretty unoptimized.


That's a much better demo, but depending on your webcam size, it'll overlap on the controls and block you from changing the filters.


Interesting, this is powerful. UIVision is using OpenCV/Webassembly for their image recognition features: "...run automated visual UI tests inside the web browser and on the desktop, powered by WebAssembly." (https://ui.vision/rpa/docs/visual-ui-testing)

You can install the free Chrome/Firefox extension to test it. In general, I continue to be amazed how powerful the web assembly concept is.



Good stuff! It's nice to see your port of OpenCV to the browser succeed, and a lot of people would be very interested in adopting this! But you may not see more than 2x speedup over raw pixel manipulation of the 2D canvas.

WebGPU holds a lot of promise for fast image processing on the client. 10x boost is not uncommon for RTX 2000 devices ;)


Is there a webGpu based alternative that has the functionality of opencv?


Not sure of the status, but there is a project to get WebGPU working as a backend for OpenCV.js https://summerofcode.withgoogle.com/projects/#53528827207352...


How would one benchmark or compare running OpenCV from native code and non-native embedded code?

Ask since the intent of the code is not about embedding OpenCV in a browser, but offloading the computation workload from +1 users from a server to the endpoint user’s computer.

Might be wrong, but for a single user, this setup would likely not be optimal.


This is probably the first time I am excited to hear about "X in webassembly".

While it'll likely be a lot slower than native implementation, the benefits of an image not leaving your computer could unearth some interesting applications (for example, I am very hesitant to use online OCR services and use them only for data which is public anyway).


Some time ago I've built an OpenCV-based real time masks plugin for my videoconference tool but unfortunately had to limit it to a single thread WASM version because of browser support. That resulted in 320x240 videos when mask was on. However as an experiment I also ran a 8-threaded version locally and its performance on a laptop from 2015 was more than enough for an almost 30fps stream with a standard video size.

If anyone is interested you may check it out on https://xroom.app or even contribute with ideas and commits: https://github.com/punarinta/xroom-plugins/tree/master/nisdo.... For the masks I'd still recommend a desktop browser though.

If you're curious but don't want to bother clicking too much here's how it all looks like: https://imgur.com/93YR87e


Does anyone have a good recommendation for resources to learn a bit about computer vision? I don't want to go too in depth, but I'd like to learn the basics.


OpenCV (demos) are a good way to get started with "old-school" computer vision. For machine-learning kind of computer vision see for example https://experiments.withgoogle.com/collection/ai


If you know python, this is an excellent resource:

https://www.pyimagesearch.com/


In general, I like Adrian's site but over the years he has been locking more and more content (even blog posts that used to be available in the site) behind his courses. Not that there's anything wrong with charging for knowledge (you know, teaching and getting paid for it), but it's a shame that things that used to be available, and of great quality, have been "pay-walled" somehow.

Another good resource is https://learnopencv.com/ . Again, good quality stuff, if you know your ways it tends to be enough[0] but it's also a big funnel to get you to but one of their courses.

[0] though Satya Mallick does hide a lot of complexity from the readers, and that bites you if you try to implement things on your own


This is really cool! WebAssembly is such a great concept and I can’t wait for a better way to manage and preload all these wasm libs.

On my iPhone 11, it requested access to the camera and showed the image, says it’s running at 60fps and is using the camera, but it only captured a single frame.


On an iPhone 11 it is such a waste though :-) Ported today some image processing code from Swift to Metal. Speedup factor: 1000.


It's not just on iPhone, that demo is not very good, I think it just takes a single frame and makes it grayscale. Try these: https://huningxin.github.io/opencv.js/samples/



Where did you find the demo? I can't see any links in the article. Edit: others posted links


Is this an iOS dependent thing? I can’t get the demo working on my iPhone 6+.


Does anyone know how to implement a virtual webcam in python on MacOS? I want to implement something like zoom's background replacement but I can't find a way to represent the output as a webcam that can be used as an input by various conferencing apps.


CamTwist lets you make custom effects and has an SDK[0], you may be able to interface it with Python using PyObjC[1].

It also has direct script functionality but I think the SDK is more powerful.

[0]: http://camtwiststudio.com/developers-objective-c/ [1]: https://pyobjc.readthedocs.io/en/latest/


A while ago I used OBS and Python+OpenCV to do a goofy webcam where I'd had my face replaced by David Bowie's -- not using deep learning though, just "plain old" face detection+landmarking+morphing.

OBS: https://obsproject.com/es

Now I know it can be done, but I'm not sure how things have changed (this has been some 3 or 4 years ago), so I can't really give you too many details other than it's possible.


This is definitely not optimal and would be a overkill setup, but OBS Studio + VirtualCam plugin let's you basically screencapture anything and turn it into a webcam device. So if your python app can display a video feed, you can capture the window and show it as webcam (with OBS as overhead).


VirtualCam is not available on MacOS.


Don’t know about python, but Snap Camera lets you develop your own filters, and takes care of all the virtual device stuff for you.


Why does this use a polling routine to check if the worker is available/finished? Can't this be done more "elegantly"?


Sounds interesting but it's a little unclear what's running at what layer - it sounds like the JS code from OpenCV.js is now running in webassembly? And how much of OpenCV is still running in native code, e.g. in prebuilt OpenCV libraries?


From what I can tell, the OpenCV C code is directly compiled to wasm bytecode; the only JS part is some helper code to let you easily call wasm functions from your own code.


I used OpenCV in WASM to create a RoughJS version of an image a couple of years ago. https://pshihn.github.io/rough-draw/


Love this, OpenCV and all WebAssembly projects. I also use next-translate now and then so kudos for that!

I have been putting AR in-browser when Java applets with JOGL was a thing! I've been nominated twice this year for the Webby awards on AI and AR in browser (1). Small innovative team who have been utilising Emscripten and likeminded technologies for a few years from when Emscripten and WebRTC was starting to be a thing.

I wanted to share some pain points taking this tech to production.

- Bandwidth

This is huge with OpenCV, ~4.5Mb+ to take a picture is quite a difficult bandwidth cost to accept. Especially the clients I worked with have millions of views per day. The total binary for Max Factor VMUA (2) is the same size which includes a large data set needed for a neural network for skin tone analysis and face feature detection.

Learning: Do not include all of OpenCV. You don't need it all, but if you do cherry pick the parts you need. I do recommend writing the simplistic parts (this is for you who just use cv::mat!).

- Speed

If you want a 60 FPS AR effect / AI algo on an Android device OpenCV isn't always the fastest approach. Do not rely on a framework, you will need to get your hands dirty and optimise/rewrite the slow areas. WebAssembly is fast, but not as fast as the desktops and native environments you normally create this code on.

- Market

Not everyone has an iPhone in London. Bandwidth means seconds, JS and WebAssembly execution adds to this. In a world where m-commerce is king this does matter. Think Poland, middle of nowhere in Ohio, Brazil, etc. If it takes 60 seconds for a web app to run on 3g and then another 20 for the executable to start, and then the experience is then sluggish it wont be commercially successful.

- UX

When you put this into a large site most traffic will come via instagram and facebook. On iOS this is typically within a WKWebView which does not support getUserMedia. Make sure you have some nice hints on how to open within iOS Safari (or Android Chrome if the parent app has not enabled permissions).

Nevertheless I wish this blog post existed when I started out. I regret in not writing something similar. In this post I especially love the simplicity of the Emcripten pipeline which is great. It is a fantastic demo post. I do hope it inspires many to play with this innovative stack.

1) https://twitter.com/Holition/status/1258068773623431177 2) https://www.maxfactor.com/vmua/


Nice tutorial, can you do one with Vanilla JS as I don't use React.


I wonder what the filesize of opencv.js is..


It is linked at the end of the article:

https://github.com/vinissimus/opencv-js-webworker/blob/maste...

7.75MB.


Actually not bad




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: