Trying to do real-time detection, but video is lagging behind - tensorflow

So I have done transfer learning on a Tensorflow model to detect rubik's cubes. Since I don't have a webcam, I am using an app called IP Webcam to use my phone's camera and grab the live feed with cv2, like this:
cap = cv.VideoCapture(0)
address = "http://{My IP}/video"
cap.open(address)
When I run the object detection in real-time (this is running on a gtx 1060), the model understandably can't keep up with the 30 fps of the camera, but instead of displaying, for example, the live detection at 10 fps, it seems to want to display all 30 frames even if it takes longer, resulting in the video feed not being real-time and if I move it takes around 5-10 seconds to show up in the video.
I don't know if this is an issue with Tensorflow or cv2? Is the issue that I'm not using a connected webcam?

Related

ESP32-S3 TensorFlow Lite Model Issues

I worked with the new ESP32-S3 Development Board and an I2S microphone in order to create a simple voice recognition system. The software and firmware behind this are based on this project: Voice Controlled Robot - Atomic14 https://github.com/atomic14/voice-controlled-robot
This project worked flawlessly on the developer kit so I moved forward and created my own board. This schematic is inspired by the design made by Espressif: ESP32-s3 Dev Kit Schematics https://dl.espressif.com/dl/SCH_ESP32-S3-DEVKITC-1_V1_20210312C.pdf
The flashing works perfectly. I am able to use different code that uses the microphone and record sounds. No issues with programming whatsoever. But when I upload the same code that uses TFLITE (the first provided link), the code uploads, runs, and prints outputs, but the neural network does not process or output anything.
Please note that there are no differences between the code uploaded to my board and the one uploaded to the dev kit (on which everything works as it should).
There are only small differences between the two boards, but everything else (code-wise) works on my board so I am not sure what hardware issue would affect only the neural network.
The ESP32-s3 used in my design is the same one as the one on the dev kit, the only difference is the 16MB Flash in my design and 8MB Flash in the ESP32-s3 dev kit design (smaller flash memory on the board that works with the nn, compared to larger flash on the board that doesn't work with the nn).
-> Could it be a memory problem caused by an incorrectly used capacitor that could create small differences in the voltage that powers the memory?
My schematics design is attached below
ESP32-s3 my PCB design for the voice recognition module

ResourceExhausted Error in colab - for Action Recognition using kinetics labels

I tried to do Action Recognition using this Kinetics labels in colab. I refer this link
When I gave the input video below 2 MB this model was working fine. But if I give the input video more than 2 MB I got ResourceExhausted Error after few mins I got GPU memory usage is close to the limit.
Even I terminate the notebook and start the new one I got the same error.
As the error says, the physical limitations of your hardware has been reached. It requires more GPU memory than there is available.
You could prevent this by reducing the models' batch size, or resize the resolution of your input video sequence.
Alternatively, you could try to use Google's cloud training to gain additional hardware resources, however it is not free.
https://cloud.google.com/tpu/

Wireless camera for Convolutional Neural Networks

Soon i will start working on a project that requires me to classify different objects using CNN (Convolutional Neural Networks) and track it using a drone. The camera i need should stream live video on FHD at #60. I searched a lot especially the go pro's camera but i didn't find anything related to how many frames during live stream. Hope you can suggest me some cameras.
You could use a GoPro camera with a HDMI to USB capture card, this will give you 1080p 30 or 60 fps depending on the resolution and frame rate chosen. I'd look for a OpenMV Cam H7 if possible, the camera is designed for computer vision.

In case of multiple GPUs which one does actual rendering to all the monitors?

Can anyone explain or point to an explanation (or at least to some clues) of how rendering in multi-gpu/multi-monitors setup work?
For example I got 5 NVIDIA Quadro 4000 video cards installed and 9 displays connected to them. The displays are not grouped whatsoever. Just arranged in Windows 7 that the total resolution is 4098x2304. The cards are not connected with SLI either.
I got a Flash app which sees a 4096x2304 window as a single Stage3d context (using dx9) and can work with this quite unusual setup as though it was just a huge display with only one video card.
How does the rendering work internally? What video cards are actually doing? Do they share resources? Who renders all the stuff? Why do I get 29.9 fps doing mostly nothing in the app?
Thank you.
I don't know for DX, but for OpenGL I've collected this information here: http://www.equalizergraphics.com/documentation/parallelOpenGLFAQ.html
In short, on Windows with new nVidia drivers one GPU (typically the first) renders everything and the others get the content blitted. If you enable SLI Mosaic Mode, the GL commands are sent to all GPUs, giving you scalability for the fill rate.

windows PC camera image capturing, not taking one frame in a video stream

I got a question of image capture with a PC camera(integrated note book camera or web cam). While I am developing a computer vision system in which high quality image capture is the key issue, most of the current method is use VFW or directShow to capture video stream and snap one frame as an image.
However, this method could not get high quality image ( or using up the full capacity of the camera). For example, I got a 5 mega pixel web cam. but the video stream is maximum 720P(USB bandwidth problem?). Video streaming is wasting some of the camera sensors.
Could I video streaming and taking picture independently? like inputing video with a 640*480 video stream and render on the stream. then take a picture of 1280*720 from the same cam? I guess this would be a hardware problem? the new HTC one X camera?
In short, it's there a way for a PC system to take a picture ,full use of the sensor capacity, without video streaming and capture one frame. Is this a hardware related problem? Does common web cam support this? Or a Software problem, I should learn DirectShow things?
Thanks a lot.
I vaguely remember (some) video sources offer both a capture and still pin, the latter I assume would offer you higher quality. You can easily test this in GraphEdit. If it works then yes, you'll have to learn DirectShow. Or pay someone to code this for you.