I have been playing with cloud vision API. I did some label and facial detection. During this Google I/O, there is a session where they talked about mobile vision. I understand both the APIs are related to machine learning in Google Cloud.
Can anyone explain (use-cases) when to use one over another?
what sort of applications that we can build by using both ?
There can be many various sets of application requirements and use cases befitting one of the APIs or the other, so the best decision is to be taken on an individual basis.
It is worthwhile noting that the Mobile Vision offers facial recognition, whereas Cloud Vision API does not. Mobile Vision is geared towards the likeliest use-cases to be encountered in a mobile device environment, and encompasses the Face, Barcode Scanner and Text Recognition APIs.
A use case for both the Mobile Vision set and the Cloud Vision API would require face recognition as well as one of the features specific to the Cloud Vision API, such as the detection of inappropriate content in an image.
Google is planning to "wind down" Mobile Vision as I understood.
The Mobile Vision API is now a part of ML Kit. We strongly encourage you to try it out, as it comes with new capabilities like on-device image labeling! Also, note that we ultimately plan to wind down the Mobile Vision API, with all new on-device ML capabilities released via ML Kit. Feel free to reach out to Firebase support for help.
https://developers.google.com/vision/introduction?hl=en
Related
I'm trying to build an app to make image recognition using the phone camera... I saw a lot of videos where using the camara the app identify where is the person or which feelings they have or things like that in real time.
I need to do a built an app like this, I know it's not an easy task, but I need to know which technologies can be use in order to achieve this in a mobile app?
Is it tensor flow?
Are there some libraries that helps to achieve this?
Or do I need to build a full Machine Learning with IA app?
Sorry to make such a general question but I need some insights.
Redgards
If you are trying to do this for the iOS platform, you could use a starter kit here: https://developer.ibm.com/patterns/build-an-ios-game-powered-by-core-ml-and-watson-visual-recognition/ for step-by-step instructions.
https://github.com/IBM/rainbow is a repo which it references.
You train your vision model on the IBM Cloud using Watson Visual Recognition, which just needs example images to learn from. Then you download the model into your iOS app and deploy with XCode. It will "scan" the live camera feed for the classes defined in your model.
I see you tagged TF (which is not part of this starter kit) but if you're open to other technologies, I think it would be very helpful.
Being a novice I need an advice how to solve the following problem.
Say, with photogrammetry I have obtained a point cloud of the part of my room. Then I upload this point cloud to an android phone and I want it to track its camera pose relatively to this point cloud in real time.
As far as I know there can be problems with different cameras' (simple camera or another phone camera VS my phone camera) intrinsics that can affect the presision of localisation, right?
Actually, it's supposed to be an AR-app, so I've tried existing SDKs - vuforia, wikitude, placenote (haven't tried arcore yet cause my device highly likely won't support it). The problem is they all use their own clouds for their services and I don't want to depend on them. Ideally, it's my own PC where I perform 3d reconstruction and from where my phone downloads a point cloud.
Do I need a SLAM (with IMU fusion) or VIO on my phone, don't I? Are there any ready-to-go implementations within libs like ARtoolKit or, maybe, PCL? Will any existing SLAM catch up a map, reconstructed with other algorithms or should I use one and only SLAM for both mapping and localization?
So, the main question is how to do everything arcore and vuforia does without using third party servers. (I suspect the answer is to device the same underlay which vuforia and other SDKs use to employ all available hardware..)
I am very new to Kinect programming and am tasked to understand several methods for 3D point cloud stitching using Kinect and OpenCV. While waiting for the Kinect sensor to be shipped over, I am trying to run the SDK samples on some data sets.
I am really clueless as to where to start now, so I downloaded some datasets here, and do not understand how I am supposed to view/parse these datasets. I tried running the Kinect SDK Samples (DepthBasic-D2D) in Visual Studio but the only thing that appears is a white screen with a screenshot button.
There seems to be very little documentation with regards to how all these things work, so I would appreciate if anyone can point me to the right resources on how to obtain and parse depth maps, or how to get the SDK Samples work.
The Point Cloud Library (or PCL) it is a good starting point to handle point cloud data obtained using Kinect and OpenNI driver.
OpenNI is, among other things, an open-source software that provides an API to communicate with vision and audio sensor devices (such as the Kinect). Using OpenNI you can access to the raw data acquired with your Kinect and use it as a input for your PCL software that can process the data. In other words, OpenNI is an alternative to the official KinectSDK, compatible with many more devices, and with great support and tutorials!
There are plenty of tutorials out there like this, this and these.
Also, this question is highly related.
I just bought a Sony A7 and I am blown away with the incredible pictures it takes, but now I would like to interact and automate the use of this camera using the Sony Remote Camera API. I consider myself a maker and would like to do some fun stuff: add a laser trigger with Arduino, do some computer controlled light painting, and some long-term (on the order of weeks) time-lapse photography. One reason I purchased this Sony camera over other models from famous brands such as Canon, Nikon, or Samsung is because of the ingenious Sony Remote Camera API. However, after reading through the API reference it seems that many of the features cannot be accessed. Is this true? Does anyone know a work around?
Specifically, I am interested in changing a lot of the manual settings that you can change through the menu system on the camera such as ISO, shutter speed, and aperture. I am also interested in taking HDR images in a time-lapse manner and it would be nice to change this setting through the API as well. If anyone knows, why wasn't the API opened up to the whole menu system in the first place?
Finally, if any employee of Sony is reading this I would like to make this plea: PLEASE PLEASE PLEASE keep supporting the Remote Camera API and improve upon an already amazing idea! I think the more control you offer to makers and developers the more popular your cameras will become. I think you could create a cult following if you can manage to capture the imagination of makers across the world and get just one cool project to go viral on the internet. Using http and POST commands is super awesome, because it is OS agnostic and makes communication a breeze. Did I mention that is awesome?! Sony's cameras will nicely integrate themselves into the internet of things.
I think the Remote Camera API strategy is better than the strategies of Sony's competitors. Nikon and Canon have nothing comparable. The closest thing is Samsung gluing Android onto the Galaxy NX, but that is a completely unnecessary cost since most people already own a smart phone; all that needs to exist is a link that allows the camera to talk to the phone, like the Sony API. Sony gets it. Please don't abandon this direction you are taking or the Remote Camera API, because I love where it is heading.
Thanks!
New API features for the Lens Style Cameras DSC-QX100 and DSC-QX10 will be expanded during the spring of 2014. The shutter speed functionality, white balance, ISO settings and more will be included! Check out the official announcement here: https://developer.sony.com/2014/02/24/new-cameras-now-support-camera-remote-api-beta-new-api-features-coming-this-spring-to-selected-cameras/
Thanks a lot for your valuable feedback. Great to hear, that the APIs are used and we are looking forward nice implementations!
Peter
I am trying to do some work using Kinect and the Kinect SDK.
I was wondering whether it is possible to detect facial expressions (e.g. wink, smile etc) using the Kinect SDK Or, getting raw data that can help in recognizing these.
Can anyone kindly suggest any links for this ? Thanks.
I am also working on this and i considered 2 options:
Face.com API:
there is a C# client library and there are a lot of examples in their documentation
EmguCV
This guy Talks about the basic face detection using EmguCV and Kinect SDK and you can use this to recognize faces
Presently i stopped developing this but if you complete this please post a link to your code.
This is currently not featured within the Kinect for Windows SDK due to the limitations of Kinect in producing high-resolution images. That being said, libraries such as OpenCV and AForge.NET have been sucessfuly used to detected finger and facial recognition from both the raw images that are returned from Kinect, and also RGB video streams from web cams. I would use this computer vision libraries are a starting point.
Just a note, MS is releasing the "Kinect for PC" along with a new SDK version in february. This has a new "Near Mode" which will offer better resolution for close-up images. Face and finger recognition might be possible with this. You can read a MS press release here, for example:
T3.com
The new Kinect SDK1.5 is released and contains the facial detection and recognition
you can download the latest SDK here
and check this website for more details about kinect face tracking