Jython: Need to open two picture and copy the 2nd picture to the 1st's picture bottom right Corner (Sorta like a logo) - jython

I am trying to copy pic2 to pic1's bottom right corner (Adding a logo to a picture) Im pretty sure this is the part of the code that im having difficulty with as I cannot figure out whats next after the two getPixel statements.
for x in range(0, getWidth(pic2)):
for y in range(0, getHeight(pic2)):
p1 = getPixel(pic1, x, y)
p2 = getPixel(pic2, x, y)
setPixel = p1

Assuming one image is bigger than the other, you don't need to collect pixels from both picture objects, just the logo.
Seems like this might be homework so I'll just help you along.
for x in range(0, getWidth(pic2)):
for y in range(0, getHeight(pic2)):
p1 = getPixel(pic2, x, y)
p1Col = getColor(p1)
Also, your nested for loops will start in the top left of the image, to apply the logo in the bottom right we use some simple math after getting the logo pixel color.
imageWidth - logoWidth + x - 1
Where the -1 is to prevent the logo exceeding the image width.
Use that same style of formula for the height.


Draw a line at a canvas

Currently I have this code:
sample.moveTo(X1.x,Y1.x );
sample.arc(X2.x, Y2.y, 4, 0, 2 * Math.PI, false);
sample.lineWidth = 1;
This will create this :
This will point in any direction.
What I want is this:
1.There will only be one line, either Line A or B.
2.They will always point from left to right.
3. They are in 45 degrees.
Take a look at the following figure:
Point (x1,y1) is the starting point of the mouse. Assuming that mouse has moved to the right (you'll have to handle the case when it moves left), new mouse coordinates will be (x2,y2). However, we don't want to draw the line between (x1,y1) and (x2,y2), because the slop of this line (the angle) won't be the desired one. So we must calculate the coordinates of the new point P, that stands on our line. Note: I assumed that x-coordinate of this point will be equal to new mouse-x coordinate x2!
With this assumption and with help of some basic 2D geometry we get:
a = x2 - x1
tan(alpha) = b / a => b = a * tan(alpha)
P.x = x2
Value of the P.y coordinate depends on whether mouse has moved up or down from the start position.
IF (y1 > y2)
P.y = y1 - b // Mouse has moved up (drawing shows this scenario)
P.y = y1 + b // Mouse has moved down (not shown in the drawing)
So we have a new point P, and now you can simply draw the line between (x1,y1) and P. You also have to handle some special case such as what if mouse moves to the left of starting point.
In order to get your point P, you should plug-in your desired angle as well (it can be different than 45 degrees, but it has to be a positive angle - you could derive formula that will work fine with negative angles as well).

Recursively change black to white in image with numpy

What's the best way to remove large shadowed regions from greyscaled images. I'm struggling to write a method that takes a 2d Numpy array A and an entry (x,y) in A, and "crawls" through the array changing any (x',y') entry "connected" to (x,y) from 0 to 255. What I mean by connected is there's some path of 0 valued entries from (x,y) to (x',y'). Here's a picture of what I mean.
The black region at the bottom should all be set to grayscale 255. I'm almost positive this algorithm should be recursive, is there a fast way to do this in numpy, or using PIL?
OK thanks for the advice, here's what I've been able to come up with;
def creep(data, x, y):
data[x, y]=255
for (i,j) in [(1,0),(-1,0),(0,1),(0,-1)]:
x, y = x + i, y + j
if data[x, y]==0:
return creep(data, x, y)
return data
def crop_big_region(data):
""" Looks for black regions in image and makes them white """
n, m = data.shape
r = int(0.012*min(n,m))
num_samples = int(0.0001*n*m)
for _ in xrange(0,num_samples):
x, y = numpy.random.randint(r,n - r), numpy.random.randint(r,m -r)
if numpy.all(data[x-r:x+r, y-r:y+r] == 0):
data[x,y] = 255
data = creep(data, x, y)
return data
It seems to sort of work, except it just returns lines, instead of filling out the entire region.
Think I'm just too tired to figure out the recursive step here properly.
As #Boaz pointed out is more an image processing question than a python question. You can achieve the desired result using the so-called adaptive thresholding. Scikits-image has a nice implementation available, with a complete tutorial here:
You will need to tune it a bit, but it should work.
Well - it is not really "python related question" more of an image processing question.
I think you can have 2 options:
1. Use edge detection, and then iterate to see that the area is big enough
look at Image outline using python/PIL for example.
2. Use OpenCV. Don't know it well - but this code seems to do something rather close to what you are trying to achieve
Python: detect black squares

Need deep explanation for viewport /perspective/frustum calculations

I have a lot of tutorials & books, but I'm unable to understand how my viewport, my near & far distance etc are used to calc perspective / frustum matrix.
I have the learningwebgl lessons, but.... I dont understand what viewport & 3D space adjustments are made.... What is my initial window projection size ? Why I see the triangle & square placed at z = -7.
Another thing I dont understand . A near plane of 0.001 creates the window projection just in front of my nose ? So what is my projection window dimension ?
I need a very deeper and basic help....
Can anybody help me ? Some really usefull links? I need graphical examples showing & teaching how frustum is calculated.
There's this
Imagine you're in 2D. You have a canvas that's 200x100 pixels. If you draw at x = 201 it will be off the canvas. Similarly at x = -1 it will be off the canvas.
In WebGL it works in a 3D space that goes from -1 to +1 in x, y and z. The perspective / frustum matrix is the matrix that takes your 3d scene and converts it to this -1 / +1 space. The near and far values define what range in world space get converted to the -1 / +1 "clipspace". Anything outside that range will be clipped just like the 2D example. If you set near to 10 and far to 100 then something at Z = 9 will be clipped because it's too near and something at 101 will also be clipped as something that's too far. More specifically the near and far settings will form a matrix such that when a point is at Z = near it will become -1 when multiplied by the matrix and when it's at Z = far it will become +1 when multiplied by the matrix.
The viewport setting tells WebGL how to convert from the -1 to +1 space back into pixels.

Obtaining 3D location of an object being looked at by a camera with known position and orientation

I am building an augmented reality application and I have the yaw, pitch, and roll for the camera. I want to start placing objects in the 3D environment. I want to make it so that when the user clicks, a 3D point pops up right where the camera is pointed (center of the 2D screen) and when the user moves, the point moves accordingly in 3D space. The camera does not change position, only orientation. Is there a proper way to recover the 3D location of this point? We can assume that all points are equidistant from the camera location.
I am able to accomplish this independently for two axes (OpenGL default orientation). This works for changes in the vertical axis:
x = -sin(pitch)
y = cos(pitch)
z = 0
This also works for changes in the horizontal axis:
x = 0
y = -sin(yaw)
z = cos(yaw)
I was thinking that I should just make combine into:
x = -sin(pitch)
y = sin(yaw) * cos(pitch)
z = cos(yaw)
and that seems to be close, but not exactly correct. Any suggestions would be greatly appreciated!
It sounds like you just want to convert from a rotation vector (pitch,yaw,roll) to a rotation matrix. The conversion can bee seen on the Wikipedia article on rotation matrices. The idea is that once you have constructed your matrix, to transform any point simply.
final_pos = rot_mat*initial_pose
where final and initial pose are 3x1 vectors and rot_mat is a 3x3 matrix.

Extract transform and rotation matrices from homography?

I have 2 consecutive images from a camera and I want to estimate the change in camera pose:
I calculate the optical flow:
Const MAXFEATURES As Integer = 100
imgA = New Image(Of [Structure].Bgr, Byte)("pic1.bmp")
imgB = New Image(Of [Structure].Bgr, Byte)("pic2.bmp")
grayA = imgA.Convert(Of Gray, Byte)()
grayB = imgB.Convert(Of Gray, Byte)()
imagesize = cvGetSize(grayA)
pyrBufferA = New Emgu.CV.Image(Of Emgu.CV.Structure.Gray, Byte) _
(imagesize.Width + 8, imagesize.Height / 3)
pyrBufferB = New Emgu.CV.Image(Of Emgu.CV.Structure.Gray, Byte) _
(imagesize.Width + 8, imagesize.Height / 3)
features = MAXFEATURES
featuresA = grayA.GoodFeaturesToTrack(features, 0.01, 25, 3)
grayA.FindCornerSubPix(featuresA, New System.Drawing.Size(10, 10),
New System.Drawing.Size(-1, -1),
New Emgu.CV.Structure.MCvTermCriteria(20, 0.03))
features = featuresA(0).Length
Emgu.CV.OpticalFlow.PyrLK(grayA, grayB, pyrBufferA, pyrBufferB, _
featuresA(0), New Size(25, 25), 3, _
New Emgu.CV.Structure.MCvTermCriteria(20, 0.03D),
flags, featuresB(0), status, errors)
pointsA = New Matrix(Of Single)(features, 2)
pointsB = New Matrix(Of Single)(features, 2)
For i As Integer = 0 To features - 1
pointsA(i, 0) = featuresA(0)(i).X
pointsA(i, 1) = featuresA(0)(i).Y
pointsB(i, 0) = featuresB(0)(i).X
pointsB(i, 1) = featuresB(0)(i).Y
Dim Homography As New Matrix(Of Double)(3, 3)
cvFindHomography(pointsA.Ptr, pointsB.Ptr, Homography, HOMOGRAPHY_METHOD.RANSAC, 1, 0)
and it looks right, the camera moved leftwards and upwards:
Now I want to find out how much the camera moved and rotated. If I declare my camera position and what it's looking at:
' Create camera location at origin and lookat (straight ahead, 1 in the Z axis)
Location = New Matrix(Of Double)(2, 3)
location(0, 0) = 0 ' X location
location(0, 1) = 0 ' Y location
location(0, 2) = 0 ' Z location
location(1, 0) = 0 ' X lookat
location(1, 1) = 0 ' Y lookat
location(1, 2) = 1 ' Z lookat
How do I calculate the new position and lookat?
If I'm doing this all wrong or if there's a better method, any suggestions would be very welcome, thanks!
For pure camera rotation R = A-1HA. To prove this consider image to plane homographies H1=A and H2=AR, where A is camera intrinsic matrix. Then H12=H2*H1-1=A-1RA, from which you can obtain R
Camera translation is harder to estimate. If the camera translates you have to a find fundamental matrix first (not homography): xTFx=0 and then convert it into an essential matrix E=ATFA; Then you can decompose E into rotation and translation E=txR, where tx means a vector product matrix. Decomposition is not obvious, see this.
The rotation you get will be exact while the translation vector can be found only up to scale. Intuitively this scaling means that from the two images alone you cannot really say whether the objects are close and small or far away and large. To disambiguate we may use a familiar size objects, known distance between two points, etc.
Finally note that a human visual system has a similar problem: though we "know" the distance between our eyes, when they are converged on the object the disparity is always zero and from disparity alone we cannot say what the distance is. Human vision relies on triangulation from eyes version signal to figure out absolute distance.
Well what your looking at is in simple terms a Pythagorean theorem problem a^2 + b^2 = c^2. However when it comes to camera based applications things are not very easy to accurately determine. You have found half of the detail you need for "a" however finding "b" or "c" is much harder.
The Short Answer
Basically it can't be done with a single camera. But it can be with done with two cameras.
The Long Winded Answer (Thought I'd explain in more depth, no pun intended)
I'll try and explain, say we select two points within our image and move the camera left. We know the distance from the camera of each point B1 is 20mm and point B2 is 40mm . Now lets assume that we process the image and our measurement are A1 is (0,2) and A2 is (0,4) these are related to B1 and B2 respectively. Now A1 and A2 are not measurements; they are pixels of movement.
What we now have to do is multiply the change in A1 and A2 by a calculated constant which will be the real world distance at B1 and B2. NOTE: Each one these is different according to measurement B*. This all relates to Angle of view or more commonly called the Field of View in photography at different distances. You can accurately calculate the constant if you know the size of each pixel on the camera CCD and the f number of the lens you have inside the camera.
I would expect this isn't the case so at different distances you have to place an object of which you know the length and see how many pixels it takes up. Close up you can use a ruler to make things easier. With these measurements. You take this data and form a curve with a line of best fit. Where the X-axis will be the distance of the object and the Y-axis will be the constant of pixel to distance ratio that you must multiply your movement by.
So how do we apply this curve. Well it's guess work. In theory the larger the measurement of movement A* the closer the object to the camera. In our example our ratios for A1 > A2 say 5mm and 3mm respectively and we would now know that point B1 has moved 10mm (2x5mm) and B2 has moved 6mm (2x6mm). But let's face it - we will never know B and we will never be able to tell if a distance moved is 20 pixels of an object close up not moving far or an object far away moving a much great distance. This is why things like the Xbox Kinect use additional sensors to get depth information that can be tied to the objects within the image.
What you attempting could be attempted with two cameras as the distance between these cameras is known the movement can be more accurately calculated (effectively without using a depth sensor). The maths behind this is extremely complex and I would suggest looking up some journal papers on the subject. If you would like me to explain the theory, I can attempt to.
All my experience comes from designing high speed video acquisition and image processing for my PHD so trust me, it can't be done with one camera, sorry. I hope some of this helps.
I was going to add a comment but this is easier due to the bulk of information:
Since it is the Kinect I will assume you have some relevant depth information associated with each point if not you will need to figure out how to get this.
The equation you will need to start of with is for the Field of View (FOV):
o/d = i/f
f is equal to the focal length of the lens usually given in mm (i.e. 18 28 30 50 are standard examples)
d is the object distance from the lens gathered from kinect data
o is the object dimension (or "field of view" perpendicular to and bisected by the optical axis).
i is the image dimension (or "field stop" perpendicular to and bisected by the optical axis).
We need to calculate i, where o is our unknown so for i (which is a diagonal measurement),
We will need the size of the pixel on the ccd this will in micrometres or µm you will need to find this information out, For know we will take it as being 14um which is standard for a midrange area scan camera.
So first we need to work out i horizontal dimension (ih) which is the number of pixels of the width of the camera multiplied by the size of the ccd pixel (We will use 640 x 320)
so: ih = 640*14um = 8960um
= 8960/1000 = 8.96mm
Now we need i vertical dimension (iv) same process but height
so: iv = (320 * 14um) / 1000 = 4.48mm
Now i is found by Pythagorean theorem Pythagorean theorem a^2 + b^2 = c^2
so: i = sqrt(ih^2 _ iv^2)
= 10.02 mm
Now we will assume we have a 28 mm lens. Again, this exact value will have to be found out. So our equation is rearranged to give us o is:
o = (i * d) / f
Remember o will be diagonal (we will assume of object or point is 50mm away):
o = (10.02mm * 50mm) / 28mm
Now we need to work out o horizontal dimension (oh) and o vertical dimension (ov) as this will give us the distance per pixel that the object has moved. Now as FOV α CCD or i is directly proportional to o we will work out a ratio k
k = i/o
= 10.02 / 17.89
= 0.56
o horizontal dimension (oh):
oh = ih / k
= 8.96mm / 0.56 = 16mm per pixel
o vertical dimension (ov):
ov = iv / k
= 4.48mm / 0.56 = 8mm per pixel
Now we have the constants we require, let's use it in an example. If our object at 50mm moves from position (0,0) to (2,4) then the measurements in real life are:
(2*16mm , 4*8mm) = (32mm,32mm)
Again, a Pythagorean theorem: a^2 + b^2 = c^2
Total distance = sqrt(32^2 + 32^2)
= 45.25mm
Complicated I know, but once you have this in a program it's easier. So for every point you will have to repeat at least half the process as d will change on therefore o for every point your examining.
Hope this gets you on your way,