I need to automatically transcribe some short MP3s as part of a proof of concept I am working on. I am currently looking into cloud solutions or web API services to send the MP3 as a simple HTTP request and receive a transcription back.
The only free/open source solution I have found here, but the demos don't seem to work (at least not on the files I need to transcribe). I have found some enterprise solutions for call centers, but so far nothing I can simply integrate into a project.
Are there any web based speech recognition services available? One that is able to filter out small noise would be a plus.
Here is an unofficial method to access Google ASR capability. I just tested on Yesterday and it still works - you can get JSON style ASR output with words and associated confidence score from an FLC audio sampled in 16KHz.
Also you can try speech recognition engine of Windows 7 to produce subtitles. Here is the tool for that.
This may be a good match. Also, their techcrunch profile (See this) lists competitors as: SimulScribe, SpinVox, Vlingo, Nuance, Microsoft, Google
Some of these links may be helpful.
Vlingo, Bing and Google have recognizers in the cloud, but I don't think they make them publicly programmable. I believe they are accessible only from their authorized clients.
For a proof of concept (and low volume), have you considered just using the desktop speech engines that come in Windows 7? What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? may be helpful. The MS desktop recognizers ship with a dictation grammar and it sounds like that is what you will need.
Related
In my application, I am already using Google TTS but I am amazed by Microsoft TTS because they are providing a lot more useful attributes than Google. Since I am more familiar with Google, I would like to keep my implementation but would still like to be able to use MS attributes like:
<mstts:express-as style="cheerful">
That'd be just amazing!
</mstts:express-as>
Is that possible?
There are no style attributes in Google Text-to-Speech, but you can change the Standard voice to a WaveNet voice[1].
The WaveNet voice synthesizes speech with more human-like emphasis and inflection on syllables, phonemes, and words. You can see all the supported voices in Google Text-to-Speech[2].
[1]https://cloud.google.com/text-to-speech/docs/wavenet#wavenet_voices
[2]https://cloud.google.com/text-to-speech/docs/voices
I am trying to develop a custom self-hosted embedable player (Just YT embeds for now, but option for adding media later) with social media buttons, clickable overlays, post-roll options, etc that can be shared in FB timeline cross-browser, etc. I have narrowed down frameworks to:
Mediaelement.js
Kaltura Community Edition
Webshim's mediaelement
I'm new to javascript, so ease of use, user base, and documentation are all important. Any reason why one of these might be a dead-end for my purposes, or why one might be easier to develop for?
Just trying to get perspective at the moment before drilling down on the development details. I am experimenting with mediaelement.js in the meantime.
Well, I think it won't be easy.
You also must have a server side for in order to create your own player.
I know that Kaltura are working on a solution.
But I don't see any better choices.
Since I haven't got any response on the Unity3d or Evernote forums, I'll try it here.
The last year I have worked a lot with Unity3D, mostly because the good integration with the Vuforia Augmented Reality library and the fact that publishing for multiple platforms is a piece of cake.
Now I want to show notes in an AR setting and am looking at the Evernote API for this. I couldn't find anything about using this with Unity, I can see why this is not the most common combination.
My question is: do you think I can access the Evernote API through Unity? If so, how should I do this? Or is it for this purpose perhaps wiser to make (parts of) the application with Eclipse/xCode?
Hope to hear from you!
Link to Evernote API: http://dev.evernote.com/doc/
The Evernote API has a C# SDK which you should be able to call through Unity. In terms of how to do it, you will probably need to download the SDK and follow the instructions yourself. Their github seems like a good starting point.
One thing to note is that Unity's .Net library for mobile clients are quite limited and with webplayer you will need to deal with sandbox security issues. But start with the standalone build first and see how you go
I have an usb-camera with its drivers and dll with some functions to use this camera in my solutions. I want to use it in any wide-spread applications, to be able just to choose and use it in Skype, for instance. So. I want to develop something that will allow me to use this device as usual web-camera.
I've heard something about such technologies as "Upper-Level Filter Drivers" and "user-mode DirectShow source filter". Looks like it something that can help.
So the question is: what technologies exist for such tasks? What technology should I choose to solve my problem if I have no skills of driver development?
Skype still uses DirectShow for video capture and user mode filter will do the job. Still Skype makes certain unreasonable assumptions that limit compatible source filters, such as if the developers stopped development/testing as soon as they had their favorite USB cam working and ignoring all other devices users might possibly want to attach.
One of the options you were suggested (in Russian - 1, 2) was to develop a kernel mode driver so that your device is visible to apps through standard WDM Video Capture Filter. This is possible and would work, though in my opinion it is a huge overkill.
Fitting custom source filter is not easy because Skype does not like a debugger attached, however driver development is really a completely different story.
The Skype Forum link you refer to is clearly misleading. The poster complains that Skype update broke compatibility with video sources. And response from admin is about audio devices, and is irrelevant.
Does anyone here know any resources on how to get started writing a plugin for Google's Picasa? I love it for photo management, but I have some ideas for how it could be better.
Riya-esque facial search: given a large enough corpus of faces and pictures (people tend to be repeated often in individuals' albums (family, friends), I would think some semi-workable version of this could be done. And with 13+ gigs/7 years of photos, it would be very nice for search.
Upload to Facebook EDIT: Someone already made a very nice version
Upload to any non-Google property, actually.
I know there are certain APIs and a Picasa2Flickr plugin out there, and I was wondering if anyone had seen any resources on this topic or had any experience
there is a an Opensource Project which created a "Upload To FlickR" Plugin. Maybe you could use it as an startingpoint...
I thought about facial recognition many years ago but my search only found a web API - no plugin api. My idea was to use an external facial recognition program to slowly index my entire catalogue of pictures and reliably tag them according to who was in them. It wouldn't need to be 100% accurate, but anything over 85% would be acceptable.
I would start with the Picasa API:
Picasa API