That's a... very misleading name. It's called "Cloud" vision, but then says:
> Cloud Vision does not track, record or send any images, videos, or information provided by the user to any server. All image processing is performed within the application.
Interesting, this is powerful. UIVision is using OpenCV/Webassembly for their image recognition features: "...run automated visual UI tests inside the web browser and on the desktop, powered by WebAssembly." (https://ui.vision/rpa/docs/visual-ui-testing)
You can install the free Chrome/Firefox extension to test it. In general, I continue to be amazed how powerful the web assembly concept is.
Good stuff! It's nice to see your port of OpenCV to the browser succeed, and a lot of people would be very interested in adopting this! But you may not see more than 2x speedup over raw pixel manipulation of the 2D canvas.
WebGPU holds a lot of promise for fast image processing on the client. 10x boost is not uncommon for RTX 2000 devices ;)
How would one benchmark or compare running OpenCV from native code and non-native embedded code?
Ask since the intent of the code is not about embedding OpenCV in a browser, but offloading the computation workload from +1 users from a server to the endpoint user’s computer.
Might be wrong, but for a single user, this setup would likely not be optimal.
This is probably the first time I am excited to hear about "X in webassembly".
While it'll likely be a lot slower than native implementation, the benefits of an image not leaving your computer could unearth some interesting applications (for example, I am very hesitant to use online OCR services and use them only for data which is public anyway).
Some time ago I've built an OpenCV-based real time masks plugin for my videoconference tool but unfortunately had to limit it to a single thread WASM version because of browser support. That resulted in 320x240 videos when mask was on. However as an experiment I also ran a 8-threaded version locally and its performance on a laptop from 2015 was more than enough for an almost 30fps stream with a standard video size.
Does anyone have a good recommendation for resources to learn a bit about computer vision? I don't want to go too in depth, but I'd like to learn the basics.
In general, I like Adrian's site but over the years he has been locking more and more content (even blog posts that used to be available in the site) behind his courses. Not that there's anything wrong with charging for knowledge (you know, teaching and getting paid for it), but it's a shame that things that used to be available, and of great quality, have been "pay-walled" somehow.
Another good resource is https://learnopencv.com/ . Again, good quality stuff, if you know your ways it tends to be enough[0] but it's also a big funnel to get you to but one of their courses.
[0] though Satya Mallick does hide a lot of complexity from the readers, and that bites you if you try to implement things on your own
This is really cool! WebAssembly is such a great concept and I can’t wait for a better way to manage and preload all these wasm libs.
On my iPhone 11, it requested access to the camera and showed the image, says it’s running at 60fps and is using the camera, but it only captured a single frame.
Does anyone know how to implement a virtual webcam in python on MacOS? I want to implement something like zoom's background replacement but I can't find a way to represent the output as a webcam that can be used as an input by various conferencing apps.
A while ago I used OBS and Python+OpenCV to do a goofy webcam where I'd had my face replaced by David Bowie's -- not using deep learning though, just "plain old" face detection+landmarking+morphing.
Now I know it can be done, but I'm not sure how things have changed (this has been some 3 or 4 years ago), so I can't really give you too many details other than it's possible.
This is definitely not optimal and would be a overkill setup, but OBS Studio + VirtualCam plugin let's you basically screencapture anything and turn it into a webcam device. So if your python app can display a video feed, you can capture the window and show it as webcam (with OBS as overhead).
Sounds interesting but it's a little unclear what's running at what layer - it sounds like the JS code from OpenCV.js is now running in webassembly? And how much of OpenCV is still running in native code, e.g. in prebuilt OpenCV libraries?
From what I can tell, the OpenCV C code is directly compiled to wasm bytecode; the only JS part is some helper code to let you easily call wasm functions from your own code.
Love this, OpenCV and all WebAssembly projects. I also use next-translate now and then so kudos for that!
I have been putting AR in-browser when Java applets with JOGL was a thing! I've been nominated twice this year for the Webby awards on AI and AR in browser (1). Small innovative team who have been utilising Emscripten and likeminded technologies for a few years from when Emscripten and WebRTC was starting to be a thing.
I wanted to share some pain points taking this tech to production.
- Bandwidth
This is huge with OpenCV, ~4.5Mb+ to take a picture is quite a difficult bandwidth cost to accept. Especially the clients I worked with have millions of views per day. The total binary for Max Factor VMUA (2) is the same size which includes a large data set needed for a neural network for skin tone analysis and face feature detection.
Learning: Do not include all of OpenCV. You don't need it all, but if you do cherry pick the parts you need. I do recommend writing the simplistic parts (this is for you who just use cv::mat!).
- Speed
If you want a 60 FPS AR effect / AI algo on an Android device OpenCV isn't always the fastest approach. Do not rely on a framework, you will need to get your hands dirty and optimise/rewrite the slow areas. WebAssembly is fast, but not as fast as the desktops and native environments you normally create this code on.
- Market
Not everyone has an iPhone in London. Bandwidth means seconds, JS and WebAssembly execution adds to this. In a world where m-commerce is king this does matter. Think Poland, middle of nowhere in Ohio, Brazil, etc. If it takes 60 seconds for a web app to run on 3g and then another 20 for the executable to start, and then the experience is then sluggish it wont be commercially successful.
- UX
When you put this into a large site most traffic will come via instagram and facebook. On iOS this is typically within a WKWebView which does not support getUserMedia. Make sure you have some nice hints on how to open within iOS Safari (or Android Chrome if the parent app has not enabled permissions).
Nevertheless I wish this blog post existed when I started out. I regret in not writing something similar. In this post I especially love the simplicity of the Emcripten pipeline which is great. It is a fantastic demo post. I do hope it inspires many to play with this innovative stack.