Why does Affectiva SDK give more than one feature in a second? - affdex-sdk

I downloaded sample affectiva cpp sdk examples from https://github.com/Affectiva/cpp-sdk-samples
I am using video-demo and it gives a csv file.
I would like to understand TimeStamp changes in this csv file. Why does it give more than one feature in a second? What is the logic behind it? How does it work?
Could you please explain?
Thanks!

Output from the Affectiva SDK is unique by time stamp and face ID. So in the event that there are multiple identical time stamps, the sdk probably saw multiple faces at that moment.
you should also get output for each of the metrics, for each face, for each frame
https://developer.affectiva.com/metrics/

Related

"Live" data capable alternative for Google Earth KML

I'm currently using Google Earth + KML Files to visualize Aircraft Flightpaths in 3d, it works perfect and also looks fine, but the big disadvantage is, that there seems to be no way to feed "live" data to Google Earth and draw the Flightpaths in Realtime.
Is there an alternative that is capable to display live data without manually reloading a file or anything like this? Satellite Picture surface would be an absolute MUST.
Maybe someone out there knows a proper solution for my project.
Thanks
The KML NetworkLink tag provides several ways to automatically update/reload a KML file, which will let you provide "live" data. You can either make the NetworkLink update the KML every time the user stops moving the map (with a setable delay), or on a timer (eg: every 10 seconds). Look at the KML Reference and developer tutorials for more info.

Tacotron2 TTS viseme generation

I am currently working on a project that use tacotron2 TTS to produce human like voice for a robot. I would also like to get the visemes from the TTS, so I can synchronize the robot face animation with the voice. How could I get the visemes and the duration of each one with tacotron2?
Thanks
Can you get the phonemes out? You can refer to these phoneme-viseme tables to do conversion. You could try using espeak to do the text -> phoneme conversion. If you don't mind just a rough sync you could compare the duration of espeak output to your tacotron2 output.

Could someone explain to me about the training Tesseract OCR?

I'm trying to do the training process, but I don't understand even how to start. I would like to train for read it numbers. My images are from real world, so it didn't go so good with the reading process.
It says that I have to have a ".tif" image with the examples... is a single image of every number (in this case) or a image with a lot of different types of number (same font, though)?
And what about the makebox? The command didn't work here.
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
Could someone explain me better, at least how to start?
I saw a few softwares that do this more quickly, but I tryied one (SunnyPage 1.8) but isn't free. Anyone know any free software that does this? Or a good tutorial?
Using Tesseract 3, Windows 8 (32bits).
It is important to patiently follow the training wiki google code project site. If needed multiple times. It is an open source library and is constantly evolving.
You will have to create a training image(tiff) with a lot of different types of numbers probably should have all the numbers you wish the engine to recognize.
Please consider posting the exact error message you got with make box.
I think Tesseract is the best free solution available. You have to keep working and seek help from community.
There is a very good post from Cédric here explaining the training process for Tesseract.
A good free OCR software is PDF OCR X which is also based on Tesseract. I tried to copy my notes from German which I had scanned at 1200dpi, and the results were commendable but not perfect. I found that this website - http://onlineocr.net - is a lot more accurate. If you are not registered, it allows a maximum of 4mb file size from most image formats (BMP, PNG, JPEG etc.) and PDF. It can output them as a Word file, an Excel file or an txt file.
Hope this helps.

NAudio - Create software beat machine / sampler - General Strategy

Im new to NAudio but the goal of my project is to provide the user with the ability for the user to listen to an MP3 and then select a sample or a "chunk" of that song as a sample which could be saved to disk. These samples would be able to replayed at the same time (i.e. not merged but played at the same time).
Could someone please let me know what the overall strategy required to achieve this (....not necessarily the specifics...almost like pseduo code....).
For example would the samples / chunks of a song need to be saved as a WAV file. And these samples could be played together in the WAV format, etc.
I have seen a few small examples of a few implementations of some of the ideas Ive mentioned above but dont have a good sense of the big picture just yet.
Thanks in advance,
Andrew
The chunks wouldn't need to be saved as WAV files unless you were keeping them for future use. You can store the PCM audio (Mp3FileReader automatically converts to PCM) in a byte array and use RawSourceWaveStream to play them.
As for mixing them, I'd recommend using the MixingSampleProvider. This does mean you need to convert your RawSourceWaveStream to IEEE float, but you can use Pcm16BitToSampleProvider to do this. This will give the advantage that you can adjust volumes (and do other DSP) easily on the samples you are mixing. MixingSampleProvider also auto-removes completed inputs, so you can just add new inputs whenever you want to trigger a sound.

How should I objectively test my program results?

I have developed two differing methods in MATLAB which aim to analyse a pop song and then automatically create a 30 second audio thumbnail (a preview clip) containing part of the chorus section.
Both methods have varying results:
The first method can create a thumbnail for each track, managing to find a chorus section in 40 out of 50 tested songs
The second method only managed to work on 30 out of the 50 songs, and it found the chorus section 21 times out the 30.
Obviously I know which method is superior, but I need to describe and explain the results in a report which requires the demonstration of proper statistical testing.
Other academic papers have previously used an f-test to do this, but because their methods are vastly superior, their aims are usually involve the detection of chorus onset times with 100% accuracy.
My aim is more relaxed as I am just looking for the generated thumbnails to contain any part of the chorus, regardless of onset.
Can anyone suggest some objective tests that I could possibly explore with regards to my project? This is my first time conducting an investigation like this so my experience/knowledge is incredibly low.
Thank you!
Possibly, the way for you is formating your song track with time cuts for relevant information about type of sound(chorus, etc). In sound editor like CoolEdit, you can set time cuts and assign names for theirs like 'chorus', 'pause','music'... Then, you must extract cut information to import in Matlab. For Windows 32 can be used utility Wav2labs from http://www.pallier.org/ressources/wspot/sig2wav/toolswav.html; http://www.pallier.org/ressources/wspot/sig2wav/Wav2labs.exe This program extract cuts to text file and you can read with Matlab textscan function.
After all, only segmentation accuracy must be proceed, like percent time when signal type(chorus/not chorus) was recognized correctly
Or specify your question more exactly