Original Image:
OpenCV Processed Image:
The first Image is the original.
The second image is OpenCV's processed image.
I want to realize the effect in HALCON too.
Can someone give me advise on which method or HALCON operator to use?
According to Wikipedia (Image embossing) this can be achieved by a convolutional filter. In the example you provided it seems that the embossing direction is South-West.
In HALCON you can use the operator convol_image to calculate the correlation between an image and an arbitrary filter mask. The filter would be similar to this:
Embossing filter matrix
To apply such a filter matrix in HDevelop you can use the following line of code:
convol_image (OriginalImage, EmbossedImage, [3, 3, 1, 1, 0, 0, 0, 0, 0, 0, 0, -1], 0)
Related
I am currently working on a project using the draft for Compute shaders in WebGL 2.0. [draft].
Nevertheless I do not think that my question is WebGL specific but more an OpenGL problem. The goal build a pyramid of images to be used in a compute shader. Each level should be a squares of 2^(n-level) values (down to 1x1) and contain simple integer/float values...
I already have the values needed for this pyramid, stored in different
OpenGL Images all with their according sizes. My question is: how would you pass my image array to a compute shader. I have no problem in restricting the amount of images passed to lets say 14 or 16... but it needs to be at-least 12. If there is a mechanism to store the images in a Texture and than use textureLod that would solve the problem, but I could not get my head around how to do that with Images.
Thank you for your help, and I am pretty sure there should be an obvious way how to do that, but I am relatively new to OpenGL in general.
Passing in a "pyramid of images" is a normal thing to do in WebGL. They are called a mipmap.
gl.texImage2D(gl.TEXUTRE_2D, 0, gl.R32F, 128, 128, 0, gl.RED, gl.FLOAT, some128x128FloatData);
gl.texImage2D(gl.TEXUTRE_2D, 1, gl.R32F, 64, 64, 0, gl.RED, gl.FLOAT, some64x64FloatData);
gl.texImage2D(gl.TEXUTRE_2D, 2, gl.R32F, 32, 32, 0, gl.RED, gl.FLOAT, some32x32FloatData);
gl.texImage2D(gl.TEXUTRE_2D, 3, gl.R32F, 16, 16, 0, gl.RED, gl.FLOAT, some16x16FloatData);
gl.texImage2D(gl.TEXUTRE_2D, 4, gl.R32F, 8, 8, 0, gl.RED, gl.FLOAT, some8x8FloatData);
gl.texImage2D(gl.TEXUTRE_2D, 5, gl.R32F, 4, 4, 0, gl.RED, gl.FLOAT, some4x4FloatData);
gl.texImage2D(gl.TEXUTRE_2D, 6, gl.R32F, 2, 2, 0, gl.RED, gl.FLOAT, some2x2FloatData);
gl.texImage2D(gl.TEXUTRE_2D, 7, gl.R32F, 1, 1, 0, gl.RED, gl.FLOAT, some1x1FloatData);
The number in the 2nd column is the mip level, followed by the internal format, followed by the width and height of the mip level.
You can access any individual texel from a shader with texelFetch(someSampler, integerTexelCoord, mipLevel) as in
uniform sampler2D someSampler;
...
ivec2 texelCoord = ivec2(17, 12);
int mipLevel = 2;
float r = texelFetch(someSampler, texelCoord, mipLevel).r;
But you said you need 12, 14, or 16 levels. 16 levels is 2(16-1) or 32768x32768. That's a 1.3 gig of memory so you'll need a GPU with enough space and you'll have to pray the browser lets you use 1.3 gig of memory.
You can save some memory by allocating your texture with texStorage2D and then uploading data with texSubImage2D.
You mentioned using images. If by that you mean <img> tags or Image well you can't get Float or Integer data from those. They are generally 8 bit per channel values.
Or course rather than using mip levels you could also arrange your data into a texture atlas
What is the difference between :
self.viewBG.transform = CGAffineTransformMake(1, 0, 0, -1, 0, 0);
and
self.viewBG.transform = CGAffineTransformMakeRotation(-M_PI);
I am using table view in my chat application and i transform table with above method so please help me out on this.
Which one is better and why?
Thanks!
CGAffineTransformMake allows you to set the individual matrix values of the transform directly, whereas CGAffineTransformMakeRotation takes that work away from you and allows you to ask for a transform that rotates something by the amount you want, without you having to understand how the matrix works. The end result is the same.
The second option is much better - it is obvious what the transform is doing and by how much. Any readers that don't understand how the matrix maths of transforms work or what those individual unnamed parameters mean (which is going to be virtually all readers, myself included) are not going to know what the first line is doing.
Readability always wins.
The CGAffineTransformMake lets you set the transform yourself, while the others basically are convenience methods to do it for you.
For example, CGAffineTransformMakeRotation evaluates to this:
t' = [ cos(angle) sin(angle) -sin(angle) cos(angle) 0 0 ]
Note that the result is not the same. Using CGAffineTransformMakeRotation will result in (rounded values) [-1, 0, 0, -1, 0, 0 ]. There is also an accuracy aspect. The trigonometric functions do not always evaluate to the exact value you expect. In this case, sin(-M_PI) actual becomes -0.00000000000000012246467991473532 instead of zero because of inaccuracy of PI, I presume.
Personally, I always use the convenience methods, as the code is much easier to understand. The inaccuracy is usually something you wont notice anyway.
CGAffineTransformMake : Returns an affine transformation matrix constructed from values you provide.
CGAffineTransformMakeRotation : Returns an affine transformation matrix constructed from a rotation value you provide.
Source URL : http://mirror.informatimago.com/next/developer.apple.com/documentation/GraphicsImaging/Reference/CGAffineTransform/Reference/function_group_1.html
I have a 1-D array in numpy v. I'd like to copy it to make a matrix with each row being a copy of v. That's easy: np.broadcast_to(v, desired_shape).
However, if I'd like to treat v as a column vector, and copy it to make a matrix with each column being a copy of v, I can't find a simple way to do it. Through trial and error, I'm able to do this:
np.broadcast_to(v.reshape(v.shape[0], 1), desired_shape)
While that works, I can't claim to understand it (even though I wrote it!).
Part of the problem is that numpy doesn't seem to have a concept of a column vector (hence the reshape hack instead of what in math would just be .T).
But, a deeper part of the problem seems to be that broadcasting only works vertically, not horizontally. Or perhaps a more correct way to say it would be: broadcasting only works on the higher dimensions, not the lower dimensions. I'm not even sure if that's correct.
In short, while I understand the concept of broadcasting in general, when I try to use it for particular applications, like copying the col vector to make a matrix, I get lost.
Can you help me understand or improve the readability of this code?
https://en.wikipedia.org/wiki/Transpose - this article on Transpose talks only of transposing a matrix.
https://en.wikipedia.org/wiki/Row_and_column_vectors -
a column vector or column matrix is an m × 1 matrix
a row vector or row matrix is a 1 × m matrix
You can easily create row or column vectors(matrix):
In [464]: np.array([[1],[2],[3]]) # column vector
Out[464]:
array([[1],
[2],
[3]])
In [465]: _.shape
Out[465]: (3, 1)
In [466]: np.array([[1,2,3]]) # row vector
Out[466]: array([[1, 2, 3]])
In [467]: _.shape
Out[467]: (1, 3)
But in numpy the basic structure is an array, not a vector or matrix.
[Array in Computer Science] - Generally, a collection of data items that can be selected by indices computed at run-time
A numpy array can have 0 or more dimensions. In contrast in MATLAB matrix has 2 or more dimensions. Originally a 2d matrix was all that MATLAB had.
To talk meaningfully about a transpose you have to have at least 2 dimensions. One may have size one, and map onto a 1d vector, but it still a matrix, a 2d object.
So adding a dimension to a 1d array, whether done with reshape or [:,None] is NOT a hack. It is a perfect valid and normal numpy operation.
The basic broadcasting rules are:
a dimension of size 1 can be changed to match the corresponding dimension of the other array
a dimension of size 1 can be added automatically on the left (front) to match the number of dimensions.
In this example, both steps apply: (5,)=>(1,5)=>(3,5)
In [458]: np.broadcast_to(np.arange(5), (3,5))
Out[458]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In this, we have to explicitly add the size one dimension on the right (end):
In [459]: np.broadcast_to(np.arange(5)[:,None], (5,3))
Out[459]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
np.broadcast_arrays(np.arange(5)[:,None], np.arange(3)) produces two (5,3) arrays.
np.broadcast_arrays(np.arange(5), np.arange(3)[:,None]) makes (3,5).
np.broadcast_arrays(np.arange(5), np.arange(3)) produces an error because it has no way of determining whether you want (5,3) or (3,5) or something else.
Broadcasting always adds new dimensions to the left because it'd be ambiguous and bug-prone to try to guess when you want new dimensions on the right. You can make a function to broadcast to the right by reversing the axes, broadcasting, and reversing back:
def broadcast_rightward(arr, shape):
return np.broadcast_to(arr.T, shape[::-1]).T
I'm really new to TensorFlow and MI in general. I've been reading a lot, been searching days, but haven't really found anything useful, so..
My main problem is that every single tutorial/example uses images/words/etc., and the outcome is just a vector with numbers between 0 and 1 (yeah, that's the probability). Like that beginner tutorial, where they want to identify numbers in images. So the result set is just a "map" to every single digit's (0-9) probability (kind of). Here comes my problem: I have no idea what the result could be, it could be 1, 2, 999, anything.
So basically:
input: [1, 2, 3, 4, 5]
output: [2, 4, 6, 8, 10]
This really is just a simplified example. I would always have the same number of inputs and outputs, but how can I get the predicted values back with TensorFlow, not just something like [0.1, 0.1, 0.2, 0.2, 0.4]? I'm not really sure how clear my question is, if it's not, please ask.
I've read this question for which the accepted answer only mentions square odd-sized filters (1x1, 3x3), and I am intrigued about how tf.nn.conv2d() behaves when using a square even-sized filter (e.g. 2x2) given that none of its elements can be considered its centre.
If padding='VALID' then I assume that tf.nn.conv2d() will stride across the input in the same way as it would if the filter were odd-sized.
However, if padding='SAME' how does tf.nn.conv2d() choose to centre the even-sized filter over the input?
See the description here:
https://www.tensorflow.org/versions/r0.9/api_docs/python/nn.html#convolution
For VALID padding, you're exactly right. You simply walk the filter over the input without any padding, moving the filter by the stride each time.
For SAME padding, you do the same as VALID padding, but conceptually you pad the input with some zeros before and after each dimension before computing the convolution. If an odd number of padding elements must be added, the right/bottom side gets the extra element.
Use the formula pad_... formulae in the link above to work out how much padding to add. For example, for simplicity, let's consider a 1D convolution. For a input of size 7, a window of size 2, and a stride of 1, you would add 1 element of padding to the right, and 0 elements of padding to the left.
Hope that helps!
when you do filtering in the conv layer, you simply getting the mean of each patch:
taking example of 1D size 5 input [ 1, 2, 3, 4, 5 ],
use size 2 filter and do valid padding, stride is 1,
you will get size 4 output(use mean of the inner product for weight parameter metrix of [1,1]) [ (1+2)/2, (2+3)/2, (3+4)/2, (4+5)/2 ],
which is [ 1.5, 2.5, 3.5, 4.5 ],
if you do same padding with stride 1, you will get size 5 output [ (1+2)/2, (2+3)/2, (3+4)/2, (4+5)/2, (5 + 0)/2 ], the final 0 here is the padding 0,
which is [ 1.5, 2.5, 3.5, 4.5, 2.5 ],
vise versa, if you have a 224*224 input, when you do 2 by 2 filtering using valid padding, it will out put 223*223 output