Canon EDSDK - Audio levels - edsdk

Is there any way to retrieve the real-time audio levels through the Canon EDSDK?
I know with the Nikon SDK it is sent as part of each Live View frame, but couldn't find any references for Canon.
Looking to use this with an EOS 750D.
Thanks!

There isn't, no. Filming in general has quite poor support, you can basically only start and stop.
The live view frame metadata only contains info about the focus rectangle (position, size), zoom (level, position) and histogram (brightness and RGB).

Related

MP4 with single video track and multiple resolutions

I have an MP4 which concatenates video from two separate camera streams of different resolutions. This plays back appropriately in VLC, but Chrome and Windows Media Player fails to handle the change in resolution - the second half of the video is totally distorted.
Tools which read an MP4 file and provide technical data all report a single AVC1 video track of resolution #1.
ffprobe shows:
Duration: 00:00:34.93, start: 0.000000, bitrate: 825 kb/s
Stream #0:0(und): Video: h264 High avc1, yuv420p, 2688x1520 (Resolution #1) , 823 kb/s, SAR 189:190 DAR 15876:9025, 13.42 fps, 15 tbr, 90k tbn, 30 tbc
Deconstructing the MP4 (using onlinemp4parser.com and gpac.github.io/mp4box.js/test), it shows one track atom (trak) which contains one media atom (mdia). further down, inside the sample table description atom (stsd), there are two AVC1 items. The two AVC1 items describe the two resolutions correctly.
e.g.
trak -> mdia -> minf -> stbl -> stsd
AVC1 (showing resolution #1)
AVC1 (showing resolution #2)
These tools also show resolution #1 under the track header (tkhd).
Can someone comment on how the final playback resolution is determined? Or why VLC is able to correctly read the second AVC1 block out of the sample description? Is there a different way to express the change in resolution between the samples that browsers will interpret correctly?
If anyone knows of a better way please feel free to share. The best way I've found to do this is by ensuring the SPS/PPS (and VPS for H.265) NAL units precede every I frame. This still isn't always perfect because I assume the players just don't expect to handle the the video parameters changing mid-stream.
Here's a quick reference of how some players handle this as of 12/5/2022:
Chrome:
Regular playback good, seeking initially shows corruption then resolves.
Firefox
Regular playback does not resize the player on the transition and makes the right and bottom pixels of video fill the remainder. Seeking shows corruption but will eventually resolve itself, but player still doesn't resize.
Edge
Regular playback shows only the top left corner on transition, but will eventually correct itself. Seeking shows corruption, size corrects itself, then just jumps to the end, but remains corrupted.
Media Player
Regular playback good, seeking shows corruption (note you must click the spot you want to jump rather than sliding it).
VLC is good in pretty much all cases. It's just much more resilient than most other players.

Capture a live video of handwriting using pen and paper and replace the hand in video with some object or cursor

I want to process the captured video. I will try to capture the video of handwriting on paper / drawing on paper. But I do not want to show the hand or pen on the paper while live streaming via p5.js.
Can this be done using machine learning?
Any idea how to implement this?
If I understand you right you want to detect where in the image the hand is a draw an overlay on this position right?
If so You can use YOLO more information to detect where the hand is.
There are some trained networks that you can download maybe they are good enough, maybe you have to train your own just for handy.
There are also some libery for yolo and JS https://github.com/ModelDepot/tfjs-yolo-tiny
You may not need to go the full ML object segmentation route.
If the paper's position and illumination are constant (or at least knowable) you could try some simple heuristic comparing the pixels in the current frame with a short history and using the most constant pixel values. There might be some lag as new parts of your drawing 'become constant' so maybe you could try some modification to the accumulation, such as if the pixel was white and is going black.

how to know gps device point at which direction?

Currently I manage to get the direction degrees using below code:
d = Math.Atan2(Math.Sin(long2 - long1) * Math.Cos(lat2), _
Math.Cos(lat1) * Math.Sin(lat2) - Math.Sin(lat1) * Math.Cos(lat2) * Math.Cos(long2 - long1))
Dim direction As Double = (RadToDeg(d) + 360.0) Mod 360
which, in my case let say I got 250.65°
I assign each of the direction values from 0 to 360 to its particular image from imageList which loaded in the pictureBox. (currently I have 36 compass images with different arrow direction, each represent 10 degrees)
When my device is pointed to the north, the arrow image is showing the correct direction, but when when I rotate the device (pointed to anywhere which is not north), the arrow image doesn't change, means it is not showing the correct direction anymore.
So my question is, is it possible to know in which direction the gps device is pointed?
Edit: I'm using Honeywell Dolphin 6000 Scanphone device
The Honeywell Dolphin 6000 documentation doesn't mention a magnetometer or compass, so you're probably SOL. But, if it does have one, then you should be able to find methods to access it in the SDK
I recommend downloading and reviewing any APIs and documents that come with the SDK and look for references to the magnetometer or compass. Microsoft does not have standard APIs to access those sensors in Windows Mobile, so you will need the SDK from Honeywell to get that information.
If I am reading your question correct, it sounds like you are trying to determine a heading when your position is fixed and you are only rotating the device.
Unfortunately, what you are looking for is not possible with GPS.
Both the formula you are using and the GetPosition.Heading is a calculated heading based on sampling your current Latitude/Longitude and your previous Latitude/Longitude. So if you aren't moving in a direction (or moving extremely slowly), your current & previous Latitude/Longitude values will effectively be the same, which reduces that accuracy of the calculated heading.
The only reliable way to get a heading when your position is relatively fixed is to get a magnetic or gyroscopic compass, which some devices to have built in.
"how to know gps device point at which direction?"
by using GPS Intermediate Driver, GetPosition.Heading will give you the current direction you are heading.
As stated in the GPS_POSITION documentation,
"flHeading
"Heading, in degrees. A heading of zero is true north."
You must distinguish between the direction you are moving, that is called bearing or course.
And the direction you are looking or holding your device. (Think of you sitting in a bus that drives north (course = 0°), where you make a photo in direction west. heading = 270°)
A (consumer-) GPS receiver always returns only the course (or bearing), although some API unfortunatley call it heading sometimes.
To know the direction in wich you are holding your device while standing still, you have to use the magnetometer. Some modern smartphones, like iPhones or androids have that build into.
Additonal hint:
If your device has GPS, do NOT calculate the position via your or other formulas, better take the value from the GPS Api. This is much more acurate. The GPS chip does NOT only caluclate the direction by positional change, it also may use physical doppler shift.

QTKit capture: what frame size to use?

I am writing a simple video messenger-like application, and therefore i need to get frames of some compromise size to be able to fit into the available bandwidth, and still to have the captured image not distorted.
To retrieve frames I am using QTCaptureVideoPreviewOutput class, and i am successfully getting frames in the didOutputVideoFrame callback. (i need raw frames - mostly because i am using a custom encoder, so i just would like to get "raw bitmaps").
The problem is that for these new iSight cameras i am getting literally huge frames.
Luckily, these classes for capturing raw frames (QTCaptureVideoPreviewOutput) provide method setPixelBufferAttributes that allows to specify what kind of frames would i like to get. If i am lucky enough to guess some frame size that camera supports, i can specify it and QTKit will switch the camera into this specified mode. If i am unlucky - i get a blurred image (because it was stretched/shrinked), and, most likely, non-proportional.
I have been searching trough lists.apple.com, and stackoverflow.com, the answer is "Apple currently does not provide functionality to retrieve camera's native frame sizes". Well, nothing i can do about that.
Maybe i should provide in settings the most common frame sizes, and the user has to try them to see what works for him? But what are these common frame sizes? Where could i get a list of the frame dimensions that UVC cameras generate usually?
For testing my application i am using a UVC compliant camera, but not an iSight. I assume not every user is using iSight either, and i am sure even between different models iSight cameras have different frame dimensions.
Or, maybe, i should switch the camera to the default mode, generate a few frames, see what sizes it generates, and at least i will have some proportions? This looks like a real hack, and doesn't seem to be natural. And the image is most likely going to be blurred again.
Could you please help me, how have your dealt with this issue? I am sure i am not the first one who is faced with it. What would be the approach you would choose?
Thank you,
James
You are right, iSight camera produces huge frames. However, I doubt you can switch the camera to a different mode by setting pixel buffer attributes. More likely you set the mode of processing the frames in the QTCaptureVideoPreviewOutput. Take a look at QTCaptureDecompressedVideoOutput if you have not done it yet.
We also use the sample buffer to get the frame size. So, I would not say it's a hack.
A more natural way would be to make your own Quicktime Component that implements your custom encoding algorithm. In this case Quicktime would be able to use inside QTCaptureMovieFileOutput during the capture session. It would be a proper, but also a hard way.

Finding significant images in a set of surveillance camera images

I've had theft problems outside my house so I setup a simple webcam to capture every second with Dorgem (http://dorgem.sf.net).
Dorgem does offer a feature to use motion detection to only capture frames where something is moving on the screen. The problem is that the motion detection algorithm it uses is extremely sensitive. It goes off because of variations in color between successive shots on my cheap webcam, and it also goes off because the trees in front of the house are blowing in the wind. Additionally, the front of my house is a high traffic area so there is also a large number of legitimately captured frames.
I average capturing 2800/3600 frames every second using Dorgem's motion detection. This is too much for me to search through to find out where the interesting activity is.
I wish I could re-position the camera to a more optimal position where it would only capture the areas I'm interested in, so that motion detection would be simpler, however this is not an option for me.
I think that because my camera has a fixed position and each picture frames the same area in front of my house, then I should be able to scan the images and figure out which ones have motion in some interesting region of that image, throwing out all other frames.
For example: if there's a change in pixel 320,240 then someone has stepped in front of my house and I want to see that frame, but if there's a change in pixel 1,1 then its just the trees blowing in the wind and the frame can be discarded.
I've looked at pdiff, a tool for finding diffs in sets of pictures, but it seems to be also focused on diffing the entire picture, rather than a specific region of it:
http://pdiff.sourceforge.net/
I've also looked at phash, a tool for calculating a hash based on human perception of an image, but it seems too complex:
http://www.phash.org/
I suppose I could implement it in a shell script using imagemagick's mogrify -crop to cherry pick the regions of the image I'm interested in, then running pdiff to find the interesting ones, and using that to pick out the interesting frames.
Any thoughts? ideas? existing tools?
cropping and then using pdiff seems like the best choice to me.