Microsoft Speaker Identification Algorithm - voice-recognition

I've been developing application that used Speaker Recognition API by Microsoft especially Speaker Identification.
Is there any paper or journal that related to how Microsoft Speaker Identification works?
Thank you.

If you are interested in algorithms it is better to read modern research on the subject, not specifically Microsoft papers. You can start with
DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION by David Snyder, Pegah Ghahremani, Daniel Povey, Daniel Garcia-Romero, Yishay Carmiel, Sanjeev Khudanpur
Which describes state of the art approach. Moreover, the code and models for the above is available in Kaldi project.

Related

What is the difference between Google Cloud Vision API and Mobile Vision?

I have been playing with cloud vision API. I did some label and facial detection. During this Google I/O, there is a session where they talked about mobile vision. I understand both the APIs are related to machine learning in Google Cloud.
Can anyone explain (use-cases) when to use one over another?
what sort of applications that we can build by using both ?
There can be many various sets of application requirements and use cases befitting one of the APIs or the other, so the best decision is to be taken on an individual basis.
It is worthwhile noting that the Mobile Vision offers facial recognition, whereas Cloud Vision API does not. Mobile Vision is geared towards the likeliest use-cases to be encountered in a mobile device environment, and encompasses the Face, Barcode Scanner and Text Recognition APIs.
A use case for both the Mobile Vision set and the Cloud Vision API would require face recognition as well as one of the features specific to the Cloud Vision API, such as the detection of inappropriate content in an image.
Google is planning to "wind down" Mobile Vision as I understood.
The Mobile Vision API is now a part of ML Kit. We strongly encourage you to try it out, as it comes with new capabilities like on-device image labeling! Also, note that we ultimately plan to wind down the Mobile Vision API, with all new on-device ML capabilities released via ML Kit. Feel free to reach out to Firebase support for help.
https://developers.google.com/vision/introduction?hl=en

nuance : Speech Recognition ,algorithm for answering the user's queries

For voice activated apps(virtual assistants) like (aivc for android ), which uses nuance speech recognition, which API is used to to get the answer to the user's question. FOr example, if the user types "what is your name" the ap gives an answer.
Do we have a standard algorithm which understands the user's query and gets a probable answer.
Nuance has several speech recognition platforms/products, it depends on which one you use. For your situation, probably start out with Nuance Mix, it'a self-service speech recognition platform targeted for mobile developers. There's also Nina too which is a more full service offering from Nuance.

Good speech recognition engine for Mac, not iOS?

Sorry if this is a repeat question, but I didn't see it anywhere.
I'm working on a Mac program that will take voice commands, and NSSpeechRecognizer isn't quite doing it for me.
I want something a little more dynamic so I can set alarms, make dates, give more natural commands, etc.
Every open source speech engine I've found is tailored toward iOS. Do openears/vocalkit etc. still work just as fine for Mac programs?
Speech recognition is exceptionally non-trivial. The engines that are free are free for a reason. If you expect dictation in any amount (like an alarm label), you're out of luck. There are reasons Siri requires an entire data center. The open source packages available won't get you much further than simple telephone auto-attendants.
Unless you have an extensive statistics background and free time, I'd recommend that you pursue licensing a commercial library or server implementation.
pocket sphinx from Carnegie Melon is about the only option
http://cmusphinx.sourceforge.net/

Official Kinect SDK vs. Open-source alternatives

Where do they differ?
What are the advantages of choosing libfreenect or OpenNI+SensorKinect, for example, over the Official SDK, and vice-versa?
What are the disadvantages?
Please note that the below answer is per date and some facts may very well be outdated in the near future. Current state of the Official Kinect SDK is beta 1.00.12.
The first obvious difference is that the official SDK is maintained by the Microsoft Research team while OpenKinect is an open source SDK maintained by the open source community. Both has its cons and pros.
The Official SDK is developed by Microsoft which also develops the hardware and therefore should know internal information about the device that the open source society must reverse engineer. Obviously this is to Microsoft's advantage.
Microsoft is pouring a lot of money into this device and I am sure that they will do what they feel is necessary to keep their SDK up to par. Having economy behind it gives many advantages.
On the other hand, never underestimate the force of the open source society: "The OpenKinect community consists of over 2000 members contributing their time and code to the Project. Our members have joined this Project with the mission of creating the best possible suite of applications for the Kinect. OpenKinect is a true "open source" community!" - http://openkinect.org/wiki/Main_Page.
OpenKinect was released long before the official SDK as the kinect device was hacked on the first or second day of its release. Kudos to OpenKinect!
Programming languages supported:
Official SDK: C++, C#, or Visual Basic by using Microsoft Visual Studio 2010.
OpenKinect: Python, C, C++, C#, Java, Lisp and more! Obviously not requiring Visual Studio.
Operating systems support:
Official SDK: only installs on Windows 7.
OpenKinect: runs on Linux, OS X and Windows
Clearly advantage OpenKinect.
License:
The Official SDK is in its current beta state only for testing. The SDK has been developed specifically to encourage wide exploration and experimentation by academic, research and enthusiast communities. commercial applications are not permitted. Note however that this will probably change in future releases of the SDK. Visit the FAQ for more information
OpenKinect appers to be open for commercial usage, but online sources state that it may not be that simple. I would take a good look at the terms before releasing any commercial apps with it. Read Kinect – Licensing implications of open hardware projects for more info.
Documentation and support:
Official SDK: well documented and provides a support forum
OpenKinect: appears to have a mailing list, twitter and irc. but no official forum/QA? Documentation on website is not as rich as I would like it to be.
Device calibration:
Different Kinect devices may differ slightly depending on the batch that they were produced in. Thus device calibration is sometimes required. But:
the Official SDK does not provide any calibration settings but I have so far not had to calibrate the device I am working on. According to something I read online (link lost) at production time the calibration parameters are written to the kinect device, so with the Official SDK calibration is not needed.
OpenKinect features device calibration: http://openkinect.org/wiki/Calibration. Thus I believe that you should calibrate your device if you go with OpenKinect.
If its true that calibration is only needed for OpenKinect that is a big advantage for the official SDK as it is easier to distribute and install applications without such need.
Personally, after a failed try with the OpenKinect SDK I went with the official SDK, which
came with drivers that installed out of the box
came with examples and code for easy getting into business
All-in-all: I could start my own development within 15 minutes or so.
Now, after working with the Kinect for a few months, I have to say that I am quite satisfied with the API provided. I cannot however compare it to the OpenKinect SDK as I in fact never got it working (but perhaps it didn't give it a fair try).
UPDATE: As of February 1st 2012 there is a commercial license for the official SDK:
"The commercial license for this release authorizes development and distribution of commercial applications. The prior SDK was a beta, and as a result was appropriate only for research, testing and experimentation, and was not suitable for use with a final, commercial product. The new license will enable developers to create and sell their Kinect for Windows applications to end user customers using Kinect for Windows hardware on Windows platforms."
Developer Frequently Asked Questions
As explained by Avada Kedavra in his/her answer, these are some interesting differences:
supported operating systems: you can only use Microsoft SDK on Windows, while open source solutions are usually able to work on other operating systems;
programming languages: you have a wider choice with open source solutions, while Microsoft only supports C++ and C# (Visual Basic is no more supported with SDK 2.0);
documentation and support: Microsoft offer a good forum and a well done documentation (with a lot of samples); but there are several open source solution well documented;
license: Microsoft is less or more proprietary, open source is less or more free. Consider also that open source ideas have sometimes been bought by big companies, and transformed in something that is no more open. Probably yours will not be the case, but keep in mind this additional eventuality.
In my personal opinion, the most significant difference between open source solutions and Microsoft SDKs is strictly related to the skeletal tracking algorithm.
While depth and RGB data can be effectively provided by both open/free APIs and Microsoft SDKs, implementing skeletal tracking capabilities is not only a matter of reverse engineering.
To implement such an algorithm, developers must have strong competences in pattern recognition and machine learning areas, and I am quite sure that such kind of knowledge is available among the open source community. But the implementation of skeletal tracking is based on a "trained" algorithm, that requires a lot of experiments to collect very large amount of data. These data are then used to "train" the algorithm, that can recognize the skeletal joints.
Getting enough data, but also adjusting and properly using them, requires a lot of time and money. Microsoft researchers and developers are in the best conditions to work on this kind of stuff, simply because it is their job.
In my previous experiences, I noticed that open source solutions provide good skeletal tracking capabilities, but they are not at the same level of what Microsoft offers with its SDK.
Remember also that Microsoft SDK provide a lot of additional capabilities, like facial recognition or joint orientation, and several widgets very useful if you want to fastly build a gestural GUI.
So what I suggest is: if you are working on a project in which you simply need depth and/or RGB data, or if you have the necessity to use a programming language that is not supported by Microsoft SDK, then you should opt for open source solution. Otherwise, Microsoft SDK would be my best choice.
I would strongly recommend the Cinder framework. (libcinder.org)
It supports both OpenNI and Kinect develoment, if you're using C++. It now supports Kinect SDK 1.7 and OpenNI 2, via these Cinderblocks:
MS Kinect SDK 1.7 (stable)
https://github.com/BanTheRewind/Cinder-MsKinect
OpenNI 2 / NITE 2.2 (alpha)
https://github.com/wieden-kennedy/Cinder-OpenNI
Both can do skeletal tracking out of the boz, OpenNI being capable of tracking up to six skeletons simultaneously. OpenNI 2 is gaining rapidly on the Kinect, although the new Kinect will probably change that when it comes out next month. However the basic underlying principles are unlikely to change.
The main drawback with the initial release of OpenNI was that it required a full body activation pose to recognise a user, which was a deal breaker for a lot of applications - however this seems to have been solved in the newer versions and OpenNI 2 also supports robust hand tracking at close range, although it still requires a focus gesture initially. If you work on Mac or Linux, it's pretty much your only choice.

Where should I begin with HDLs?

I am a self-taught embedded developer. I mostly use AVRs programmed in C and ASM, but I have dabbled with other systems. I am looking to move onto more complex devices like CPLDs and FPGAs, but I have no idea where to start. So my one and a half questions are:
Do you prefer VHDL or Verilog and why?
What is a good way for one with no prior experience in HDLs get started in learning such a beast?
Buy a cheap starter kit from Xilinx or Altera (the two big FPGA players). A Xilinx Spartan3 starter kit is $200.
I personally prefer VHDL. It is strongly typed and has more advanced features than Verilog. VHDL is more popular in Europe and Verilog is dominating in the US.
Buy a book (e.g. Peter Ashendens The Designers Guide to VHDL) and start simulating your designs in a free simulator. ModelSim from Mentor Graphis is a good one and there are free versions available (with crippled simulation speed).
Make up some interesting project (mini cpu, vga graphics, synthesizer) and start designing. Always simulate and make sure your design works before putting your design into the hardware ...
If you have no background in digital electronics buy a book in that subject as well.
Back in the day when I worked on ASIC design, it was in verilog. In many cases as a designer you don't get to choose: the ASIC synthesis tools for an HDL cost a substantial amount of money, and companies only purchase the full toolchain for one "blessed" language. My employer had standardized on verilog, so that is what we used.
FPGA synthesis tools are substantially cheaper, so you have more freedom as an FPGA designer to pick your favored language and tools.
There are also free verilog simulators available at verilog.net.
As #kris mentioned, an FPGA starter board is also a good way to go. Having your verilog code light up an LED on a board is infinitely more satisfying than a simulator waveform on the screen.
Also check out opencores.org - There are some articles and a lot of open source code in both Verilog and VHDL you can learn from.
As far as I can tell, VHDL vs Verilog gets just as religious as Ruby vs Python or Java vs C#. Different people have their own favourites.
Check out this site:
http://www.fpga4fun.com/
Nice simple projects using simple tools. I used one of these boards a few years ago to build a small VGA display system for use as a notice board.
Looking at the site again I'm thinking of getting a Xylo-LM board as it has an ARM processor as well as SDRAM and a Xilinx Spartan 3e.
Another board I used before was the XPort 2 from Charmed Labs. This plugs into a Gameboy Advance which is well supported with open source development tools.
Check out:
http://www.charmedlabs.com/index.php?option=com_virtuemart&page=shop.browse&category_id=6&Itemid=43
One additional thing to think about is whether you should start by learning an HDL, or by learning boolean logic, Karnaugh maps, DeMorgan's theorem, gates, implementing arithmetic in gates, etc. It's easy to write non-synthesizable HDL if you don't have an accurate mental model of what the underlying hardware will look like.
This book is the Verilog version of the one I used in undergrad, and it did a pretty good job in my opinion. It starts you out with the material mentioned above, as well as some basic, basic info on the transistor-level implementation of gates, then introduces you to an HDL, and has you build progressively more complex structural and behavioral hardware blocks. Yes, I know it's ungodly expensive, as are most college textbooks, but this is one of those things for which the information I've been able to find online, at least, has been woefully inadequate.
Once you're ready to choose an HDL, I heartily recommend Verilog (having learned VHDL first). Yes, VHDL was once much more feature-rich than Verilog, but later revisions of the language (Verilog 2001, Verilog 2005, SystemVerilog, etc..) have cherry-picked most of the interesting features, and there is far more robust toolchain support for Verilog and its variant these days, in addition to it being the dominant language in use in the US (in my experience, VHDL is only used here when dealing with extreme legacy blocks, and in academic contexts, partially due to the tools support mentioned previously). Finally, once you've learned the HDL, you have a hardware verification language (HVL) in SystemVerilog with strict-superset syntax, saving you a good bit of the learning curve. Not so for VHDL, to my knowledge.
Altera and Xilinx have simulators build into their free tool sets. They are limited versions of the very popular Mentor ModelSim tools. They will handle the size of designs you are likely to get to fit in a < $500 (US) board.
For HDL choice Verilog is to C as VHDL is to ADA. So Verilog is easier to get started with, but you can make mistakes more easily. Check your simulation and compilation warnings to avoid those problems.
Verilog 2.http://www.opensparc.net/
HTH
Verilog is much easier to learn and simpler syntax. Its also a newer language. Secondly, most people use verilog. VHDL has many datatypes which give it a learning curve. Once you know verilog it will be easier to bridge the gap to VHDL. Oh and theres also macros in verilog which are very neet. I invented a language with it. Finally, you will eventually be able to do mixed language HW design. I started out with VHDL, then learned verilog and am now pro verilog.
I was in the same boat as you are now a semester ago. My preferred book was this one, since it talked about FPGAs by reviewing digital logic. It also shows side-by-side comparisons of VHDL and Verilog code so that, instead of choosing one that people may push you to, you can learn the one that you like stylistically.
As for an FPGA itself, use Xilinx's ISE webpack to do your programming (it's free), and start off with FPGAs like the Basys2 FPGA board. It's a very small FPGA that should get you started for a small price, but has the added advantage that you learn resource and memory management very early. You can use Digilent's Adept (also free) to make life easy in uploading your "compiled" code to the board.
Good luck!
Before plunging into Verilog/VHDL or buying an FPGA dev kit I'd recommend taking an introductory class on digital design. There are good online OpenCourseWare MIT classes.
Good luck.