How to get face detection information from Panasonic smartHD camera embedded in video stream? - camera

I've got a Panasonic WV SP-306 digital camera. It has a built-in function for face detection, and the information can be sent via xml notifications or embedded in the video stream. I'm trying to figure out how to get this information from mjpeg stream.
My discoveries so far:
I've found official documentation and SDK here
There's a PDF document describing the jpeg header format (Panasonic Camera JPEG Format )
According to the document, the header of jpeg after FF FE bytes and the two length bytes consists of sections. Each section has 2-byte ID followed by 2 bytes indicating length. Then goes the body of the section. There're three sections described in the document: section with ID 0010 (related to motion detection), ID 0011 (time information) and ID 0012 (frame information, it has something about the time of the frame, not sure what it is for).
When I turn on the face detection feature, the fourth section appears. It has ID 000F and is not described in the documentation.
The sample programs and library reference were not useful, too. All I can do with face detection is turn it on or off and set the color of the face detection rectangle. I think all the processing of face detection data in the stream is done by the library.
So, my question is: can anybody tell me how to get the face detection data provided by this camera from the stream?

Related

MPEG-2 vs AVC vs HEVC Inputs

I am trying to work with Media live and I'm supposed to send RTMP stream from my iOS/Android devices.
I was checking the pricing model of Media live and I can see there're multiple kinds of input. As I am new to all this (media stuff), I am not sure what they are.
If I send RTMP stream from my iOS device, what kind of input will it be?
MPEG-2 Inputs
AVC Inputs
HEVC Inputs
These are video compression standards you have listed from the oldest (MPEG-2 in 90') to the newest (HEVC in 2013).
They have different features and specificiations. Most importantly, the bitrate they output in the same quality level is significantly different. HEVC is the best in terms of bitrate saving, also the most complex in terms of HW/SW.

Record raw data on Labview

I have this VI in Labview that streams video from a webcam (Logitech C300) and processes the colored layers of each image as arrays. I am trying to get raw Bayer data from the webcam using Logitech's program (http://web.archive.org/web/20100830135714/http://www.quickcamteam.net/documentation/how-to/how-to-enable-raw-streaming-on-logitech-webcams) and the Vision Acquisition tool but I only get as much data as with regular capture, instead of four times more.
Basically, I get 1280x1024 24-bit pixels where I want 1280*1024 32-bit or 2560*2048 8-bit pixels.
Has anyone had any experience with this and knows a way for Labview to process the camera's raw output, or how to actually record a raw file from the camera?
Thank you!
The driver flag you've enabled simply packs the raw pixel value (8/10 bpp) into the least significant bits of the 24bit values. Assuming that the 8bpp mode is used, the raw values can be extracted from the blue color plane as in the following example. It can then be debayered to obtain RGB values for example.
Unless you can improve on the debayer algorithms in the firmware, or have very specific needs this is not very useful. Normally, one can at least reduce the amount of data transferred by enabling raw mode - which is not the case here.
The above assumes that the raw video mode isn't being overwritten by the LabVIEW IMAQdx driver. If that is the case, you might be able to enable raw mode from LabVIEW through property nodes. This requires to manually configure the acquision, as the configurability of express VIs are limited. Use the EnumStrings property to get all possible attributes, and then see if there is something like the one specified outside of the diagram disable structure (this is from a different camera).

What characteristics should have a .wav file as result of TTS engine to be be listened with high quality?

I'm trying to generate high quality voice-over using Microsoft Speech API. What kind of values I should pass in to this constructor to guarantee high quality audio?
The .wav file will be used latter to feed FFmpeg, so audio will be re-encoded latter to a more compact form. My main goal is keep the voice as clear as I can, but I really don't know which values guarantee the best quality perceived by humans.
First of all, just to let you know I haven't used this Speech API, I'll give you an answer based on my Audio processing work.....
You can choose EncodingFormat.Pcm for Pulse Code Modulation
samplesPerSecond is sampling frequency. Because it is voice you can cover it with 16000hz for sure. If you are really perfectionist you can go with 22050 for example. Higher the value is, the audio file size will be larger. If file size isn't a problem you can even go with 32000 or 44100 but there won't be much noticable difference....
bitsPerSample - go with 16 if possible
1 or 2, mono or stereo ..... it won't affect the quality of the sound
averageBytesPerSecond ..... this would be samplesPerSecond*bytesPerSample (for example 22050*2)
blockAlign ..... this would be Bytes Per Sample*numberOfChanels (for example if you have 16bit PCM Mono audio, 16bits are 2 bytes, Mono is 1, so blockAlign is 2*1)
That last one, the byte array doesn't speaks much for itself, I'm not sure what it serves for, I believe the first 6 arguments are enough for audio to be generated.
I hope this was helpful
Cheers

core audio: is zero equivalent to silence only for PCM audio?

I'm trying to create a basic algorithm that does packet loss concealment for core audio. I simply want to replace the missing data with silence.. in the book learning core audio, the author says that in lossless PCM, zeros mean silence. I was wondering if I'm playing VBR (ie compressed data), would putting zeros suffice for silence as well?
In my existing code.. when I plug zeros into the audio queue.. it suddenly jams (ie it no longer frees up consumed data in the audio queue callback..) and i'm wondering why
PCM is the raw encoded sample. All 0 (when using signed data for samples) is indeed silence. (In fact, all of any value is silence, but such a DC offset has the potential to damage your amplifier and/or speakers, if it isn't filtered out.)
When you compress with a lossy codec, you enter a digital format where it is not trivial to just add silence. Think of adding data to a ZIP file to add null bytes to the end of a file. It isn't as simple as just inserting them arbitrarily into the ZIP file.
If you want to add silence to a compressed file, you must do so using the appropriate codec. Then, you have to fit it into the bitstream, which is also not trivial. Usually the stream is broken up by frames, but you can't even split on those frames in some formats. MP3 and AAC use a bit reservoir where unused data in prior frames can be used to encode more complicated frames later on, making splitting the file very difficult.

Scheme to play video file in own container format on Mac OS X

I am planning to write an application (C/C++/Objective-C) that will
play media files in own (private) container format. The files will
contain: multiple video streams, encoded by a video codec (like
XVid or H264, it is supposed, that components capable of decoding
the viideo formats are present in the system); multiple audio
streams in some compressed formats (it is supposed, that decoding
will be performed by a system component or by own code).
So, it seems it is required to implement the following scheme:
1) Implement container demuxer (may be in the form of media handler
component).
2) Pass video frames to a video decoder component, and mix
decompressed frames (using some own rules).
3) Pass audio data to an audio decoder component, or decompress the
audio by own code, and mix decoded audio data.
4) Render video frames to a window.
5) Pass audio data to a selected audio board.
Could anybody provide tips to any of the above stages, that is:
toolkits, that I should use; useful samples; may be names of
functions to be used; may be improvements to the scheme,....
I know I am quite late, so you might not need this anymore, but I just wanted to mention, that the right way to do it is to write a QuickTime component.
Although it is pretty old school, it's the same way Apple uses to support new formats and codecs.
Look at the Perian project as an orientation point.
Best