I've had theft problems outside my house so I setup a simple webcam to capture every second with Dorgem (http://dorgem.sf.net).
Dorgem does offer a feature to use motion detection to only capture frames where something is moving on the screen. The problem is that the motion detection algorithm it uses is extremely sensitive. It goes off because of variations in color between successive shots on my cheap webcam, and it also goes off because the trees in front of the house are blowing in the wind. Additionally, the front of my house is a high traffic area so there is also a large number of legitimately captured frames.
I average capturing 2800/3600 frames every second using Dorgem's motion detection. This is too much for me to search through to find out where the interesting activity is.
I wish I could re-position the camera to a more optimal position where it would only capture the areas I'm interested in, so that motion detection would be simpler, however this is not an option for me.
I think that because my camera has a fixed position and each picture frames the same area in front of my house, then I should be able to scan the images and figure out which ones have motion in some interesting region of that image, throwing out all other frames.
For example: if there's a change in pixel 320,240 then someone has stepped in front of my house and I want to see that frame, but if there's a change in pixel 1,1 then its just the trees blowing in the wind and the frame can be discarded.
I've looked at pdiff, a tool for finding diffs in sets of pictures, but it seems to be also focused on diffing the entire picture, rather than a specific region of it:
http://pdiff.sourceforge.net/
I've also looked at phash, a tool for calculating a hash based on human perception of an image, but it seems too complex:
http://www.phash.org/
I suppose I could implement it in a shell script using imagemagick's mogrify -crop to cherry pick the regions of the image I'm interested in, then running pdiff to find the interesting ones, and using that to pick out the interesting frames.
Any thoughts? ideas? existing tools?
cropping and then using pdiff seems like the best choice to me.
Related
I want to process the captured video. I will try to capture the video of handwriting on paper / drawing on paper. But I do not want to show the hand or pen on the paper while live streaming via p5.js.
Can this be done using machine learning?
Any idea how to implement this?
If I understand you right you want to detect where in the image the hand is a draw an overlay on this position right?
If so You can use YOLO more information to detect where the hand is.
There are some trained networks that you can download maybe they are good enough, maybe you have to train your own just for handy.
There are also some libery for yolo and JS https://github.com/ModelDepot/tfjs-yolo-tiny
You may not need to go the full ML object segmentation route.
If the paper's position and illumination are constant (or at least knowable) you could try some simple heuristic comparing the pixels in the current frame with a short history and using the most constant pixel values. There might be some lag as new parts of your drawing 'become constant' so maybe you could try some modification to the accumulation, such as if the pixel was white and is going black.
I have some problems with this assignment. Given an image of a street nameplate, like this one
I have to detect the nameplate and mark it on the image with a rectangle, obtaining something like this:
Nameplates can be rotated, scaled and in different lighting conditions. The procedure must be automatic.
What i have tried so far is to isolate the nameplate from the background. I've tried with different thresholding methods, but the problem is that i have different images and one single method doesn't work with all of them, due to different lighting condition and noise. What i've thought is to perform a pre-processing on the images, to reduce noise and normalize light, but, again, how to choose pre-processing steps that work with every image in my dataset? And what for images that don't need pre-processing?
Another problem is that there might be other signs in the image with writings on them and i have to ignore them. So i've thought i could isolate the nameplate by that blue outline, but i don't know if that can be done(or if it is convenient) with template matching, also considering that part of the outline could be cut off from the image.
So what i'm asking is: is there an automatic way to isolate/detect only that type of nameplates that have the blue outline on them, regardless of orientation, light conditions, shadows on them, noise in the image, etc? What steps would you follow?
Thank You
We have a system where people are being taken a face shot via a DSLR camera. We need the people's images with transparent background. What we're currently doing is taking the image and editing and cropping it in Photoshop, removing the background image with the Magic Eraser tool.
What I am looking for is a way to parse the image and automatically erase the semi-white background we have, along with the resizing and cropping. Is there some kind of library or code sample that does this without requiring manual intervention?
This is a real complex problem. Like the answer below suggested you'll need to do a fuzzy match on each pixel and set it to be transparent but you also need to detected other nearby pixels to make sure they are not close in color. A white tag on the shirt, white eyelids, hair, pale skin reflecting the flash. All are candidates to be removed by any greedy fuzzy logic.
Think about the Magic Wand tool in Photoshop. How good is it at detecting the edges of the person in the picture? Yeah, and that's the top standard of image editing software with thousands of engineering hours behind it.
This is not a feasible request for a Q&A format, and this is one of those things that humans just do better than machine. BUT, that doesn't mean it's not possible, and who knows, you might be the one to do it. Just don't do it in VB.NET please :)
Some pseudo-code to get an idea of what you need to do:
Bitmap faceShot = Bitmap.FromFile(filepath)
foreach pixel in faceShot
//the following line is where the magic happens, you can do any fuzzy match on the color that suits you
//figure out your color range and do a fuzzy match percentage wise
if (pixel between RGB(255,255,255) and RGB(250,235,215)) //white and antique white
pixel.setAlpha=0
endif
end foreach
You could start with this as a starting point for processing a single image,
http://www.java2s.com/Code/VB/2D/ProcessanImageinvertPixel.htm
Basically, if you have a constant background color (like the TV green-screen), it's just a matter of selecting pixels close to the color you are erasing and setting their Alpha level to 0 (transparent). Treating the RGB values like XYZ coordinates, you can do a 3d distance from your background color, and make everything within a certain threshold transparent.
As an improvement, you could also make everything within another threshold semi-transparent so the edges right around hair and stuff like that look softer and less harsh.
Alternatively, you could probably do the same exact thing with good results in Photoshop, as it should support batch processing.
Edit, thinking about it some more, you may want to use a green screen type background as well instead of an off-white one like you stated, as you may make people's eyes transparent. I would definitely try to batch it in Photoshop/Gimp/etc.
I am working with kinect in openframework using the ofxKinect addon, which is great and plenty fun!
Anyway I am looking for some pointers or a direction when dealing with multiple bodies on the screen. I was thinking of making a rect around each detected body and when the rects intersect something could happen, an effect or anything.
So what I am looking for are ideas or something that could point me to the right direction of detecting multiple bodies when using a kinect.
Right now based on the depth image I get from the kinect I go through each pixel and create a bunch of smaller rectangles with a padding and group them in a larger rectangle bound if they are separate from another rectangle group. This is not ideal as it only deals with the pixel values and is not really seperating bodies from eachother and is not giving me the results I am looking for.
So any ideas would be greatly appreciated!
If you want to use ofxKinect a quick solution would be to threshold on depth and assume bodies and no other objects will be within a depth range. This should make it easy to use the OpenCV's contour finder to isolate the outlines of the bodies and get the bounding rectangles. If the rectangles intersect(and ofRectangle already does the math you), trigger the reaction you need. Also don't forget to do that once if the effect isn't showing already, otherwise you will trigger the effect multiple times per second while the two bodies' bounding rectangles intersect.
You could try something a bit more hardcore and using ofxCv(not just ofxOpenCV) to tap into the HoG functionality. This is slow in itself and not ideal with the depth map, but hopefully you can run in every few seconds just to detect a person and the depth, then keep tracking that movement.
Personally, if you want to track people with the Kinect I recommend using ofxOpenNI as if already provides the scene segmentation feature and even if you don't track the skeletons you can still get useful information like the pixels pertaining to each body and they're centre of mass. I'm guessing Microsoft KinectSDK has a similar feature and there should be an oF addon, but it's windows only.
ofxKinect/libfreenect does not offer any people detection features, so you will need to roll your own.
This question is very similar to that posed here.
My problem is that I have a map, something like this:
This map is made using 2D Perlin noise, and then running through the created heightmap assigning types and color values to each element in the terrain based on the height or the slope of the corresponding element, so pretty standard. The map array is two dimensional and the exact dimensions of the screen size (pixel-per-pixel), so at 1200 by 800 generation takes about 2 seconds on my rig.
Now zooming in on the highlighted rectangle:
Obviously with increased size comes lost detail. And herein lies the problem. I want to create additional detail on the fly, and then write it to disk as the player moves around (the player would simply be a dot restricted to movement along the grid). I see two approaches for doing this, and the first one that came to mind I quickly implemented:
This is a zoomed-in view of a new biased local terrain created from a sampled element of the old terrain, which is highlighted by the yellow grid space (to the left of center) in the previous image. However this system would require a great deal of modification, as, for example, if you move one unit left and up of the yellow grid space, onto the beach tile, the terrain changes completely:
So for that to work properly you'd need to do an excessive amount of, I guess the word would be interpolation, to create a smooth transition as the player moved the 40 or so grid-spaces in the local world required to reach the next tile over in the over world. That seems complicated and very inelegant.
The second approach would be to break up the grid of the original map into smaller bits, maybe dividing each square by 4? I haven't implemented this and I'm not sure how I would in a way that would actually increase detail, but I think that would probably end up being the best solution.
Any ideas on how I could approach this? Keep in mind it has to be local and on-the-fly. Just increasing the resolution of the map is something I want to avoid at all costs.
Rewrite your Perlin noise to be a function of position. Then you can increase the octaves (and thus the detail level) and resample the area at a higher resolution.