Let's say we have layer0 with opacity 100%, and layer1 with opacity 50%, layer1 merged down with layer0 as new file.
Now we have this new file and the original layer0, any chance to work out the original layer1?
Thanks for any help.
The fact that layer0 has 100% opacity, is not that meaningful (if it's less than 100%, what should we see through it?). An opacity of 70% usually means, we see the the weighted average of this layer for 70% and the layer beneath it for 30%.
For every pixel new expressed in pixels from layer0 and layer1:
new = layer0*(1-0.5) + layer1*0.5
rearranging terms:
layer1 = 2*(new - layer0*(1-0.5))
Because of possible rounding (pixels are usually stored in range of 0-255), you might not get the exact original layer back. Every pixel could be off by 1.
Related
There are multiple pages (like this and this) that present examples about the effect of channel_shift_range in images. At first glance, it appears as if the images have only had a change in brightness applied.
This issue has multiple comments mentioning this observation. So, if channel_shift_range and brightness_range do the same, why do they both exist?
After long hours of reverse engineering, I found that:
channel_shift_range: applies the (R + i, G + i, B + i) operation to all pixels in an image, where i is an integer value within the range [0, 255].
brightness_range: applies the (R * f, G * f, B * f) operation to all pixels in an image, where f is a float value around 1.0.
Both parameters are related to brightness, however, I found a very interesting difference: the operation applied by channel_shift_range roughly preserves the contrast of an image, while the operation applied by brightness_range roughly multiply the contrast of an image by f and roughly preserves its saturation. It is important to note that these conclusions could not be fulfilled for large values of i and f, since the brightness of the image will be intense and it will have lost much of its information.
Channel shift and Brightness change are completely different.
Channel Shift: Channel shift changes the color saturation level(eg. light Red/dark red) of pixels by changing the [R,G,B] channels of the input image. Channel shift is used to introduce the color augmentation in the dataset so as to make the model learn color based features irrespective of its saturation value.
Below is the example of Channel shift from mentioned the article:
In the above image, if you observe carefully, objects(specially cloud region) are still clearly visible and distinguishable from their neighboring regions even after channel shift augmentation.
Brightness change: Brightness level of the image explains the light intensity throughout the image and used to add under exposure and over exposure augmentation in the dataset.
Below is the example of Brightness augmentation:
In the above image, at low brightness value objects(eg. clouds) have lost their visibility due to low light intensity level.
I am trying to detect some very small object (~25x25 pixels) from large image (~ 2040, 1536 pixels) using faster rcnn model from object_detect_api from here https://github.com/tensorflow/models/tree/master/research/object_detection
I am very confused about the following configuration parameters(I have read the proto file and also tried modify them and test):
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
I am kind of very new to this area, if some one can explain a bit about these parameters to me it would be very appreciated.
My Question is how should I adjust above (or other) parameters to accommodate for the fact that I have very small fix-sized objects to detect in large image.
Thanks
I don't know the actual answer, but I suspect that the way Faster RCNN works in Tensorflow object detection is as follows:
this article says:
"Anchors play an important role in Faster R-CNN. An anchor is a box. In the default configuration of Faster R-CNN, there are 9 anchors at a position of an image. The following graph shows 9 anchors at the position (320, 320) of an image with size (600, 800)."
and the author gives an image showing an overlap of boxes, those are the proposed regions that contain the object based on the "CNN" part of the "RCNN" model, next comes the "R" part of the "RCNN" model which is the region proposal. To do that, there is another neural network that is trained alongside the CNN to figure out the best fit box. There are a lot of "proposals" where an object could be based on all the boxes, but we still don't know where it is.
This "region proposal" neural net's job is to find the correct region and it is trained based on the labels you provide with the coordinates of each object in the image.
Looking at this file, I noticed:
line 174: heights = scales / ratio_sqrts * base_anchor_size[0]
line 175: widths = scales * ratio_sqrts * base_anchor_size[[1]]
which seems to be the final goal of the configurations found in the config file(to generate a list of sliding windows with known widths and heights). While the base_anchor_size is created as a default of [256, 256]. In the comments the author of the code wrote:
"For example, setting scales=[.1, .2, .2]
and aspect ratios = [2,2,1/2] means that we create three boxes: one with scale
.1, aspect ratio 2, one with scale .2, aspect ratio 2, and one with scale .2
and aspect ratio 1/2. Each box is multiplied by "base_anchor_size" before
placing it over its respective center."
which gives insight into how these boxes are created, the code seems to be creating a list of boxes based on the scales =[stuff] and aspect_ratios = [stuff] parameters that will be used to slide over the image. The scale is fairly straightforward and is how much the default square box of 256 by 256 should be scaled before it is used and the aspect ratio is the thing that changes the original square box into a rectangle that is more closer to the (scaled) shape of the objects you expect to encounter.
Meaning, to optimally configure the scales and aspect ratios, you should find the "typical" sizes of the object in the image whatever it is ex(20 by 30, 5 by 10 ,etc) and figure out how much the default of 256 by 256 square box should be scaled to optimally fit that, then find the "typical" aspect ratios of your objects(according to google an aspect ratio is: the ratio of the width to the height of an image or screen.) and set those as your aspect ratio parameters.
Note: it seems that the number of elements in the scales and aspect_ratios lists in the config file should be the same but I don't know for sure.
Also I am not sure about how to find the optimal stride, but if your objects are smaller than 16 by 16 pixels the sliding window you created by setting the scales and aspect ratios to what you want might just skip your object altogether.
As I believe proposal anchors are generated only for model types of Faster RCNN. In this file you have specified what parameters may be set for anchors generation within line you mentioned from config.
I tried setting base_anchor_size, however I failed. Though this FasterRCNNTutorial tutorial mentions that:
[...] you also need to configure the anchor sizes and aspect ratios in the .config file. The base anchor size is 255,255.
The anchor ratios will multiply the x dimension and divide the y dimension, so if you have an aspect ratio of 0.5 your 255x255 anchor becomes 128x510. Each aspect ratio in the list is applied, then the results are multiplied by the scales. So the first step is to resize your images to the training/testing size, then manually check what the smallest and largest objects you expect are, and what the most extreme aspect ratios will be. Set up the config file with values that will cover these cases when the base anchor size is adjusted by the aspect ratios and multiplied by the scales.
I think it's pretty straightforward. I also used this 'workaround'.
I have dataset of images which are half black in a upper triangular fashion, i.e. all pixels below the main diagonal are black.
Is there a way in Tensorflow to give such an image to a conv2d layer and mask or limit the convolution to only the relevant pixels?
If the black translates to 0 then you don't need to do anything. The convolution will multiply the 0 by whatever weight it has so it's not going to contribute to the result. If it's not you can multiply the data with a binary mask to make them 0.
For all black pixels you will still get any bias term if you have any.
You could multiply the result with a binary mask to 0 out the areas you don't want populated. This way you can also decide to drop results that have too many black cells, like around the diagonal.
You can also write your own custom operation that does what you want. I would recommend against it because you only get a speedup of at most 2 (the other operations will lower it). You probably get more performance by running on a GPU.
Given a volume [2, 2W, C], after applying pooling with 2x2 window and stride 2, I'm now left with [1, W, C] (height = 1px, width = half what it was before, channels = stays the same).
What I want to do now is apply a convolution op with the sole purpose of reducing that width dimension. Is this even possible?
Yes this is possible (though because it's unusual, the solution is a bit hackish).
Conceptually, there's no issue here. This is frequently done in the depth/channel dimension rather than width, where people usually call it a 1x1 convolution. Again the sole purpose is dimensionality reduction. A nice blog post about it is http://iamaaditya.github.io/2016/03/one-by-one-convolution/ (to be clear, I am not the author of that blog). That is, a typical 1x1 conv layer is really a bank of D2 filters of size 1x1xD, and dimensionality reduction is achieved by D2 < D. Here you want the same thing but in width: 1xWx1 filter size, W2 times. Conceptually then, that's it; it should be easy.
Practically of course, this is not so easy, as in CNNs convention treats width and depth differently: one convolves over width, but filters always operate on the full depth stack; making a 1x1 convolution easy in depth, but tricky in width. You have at least two options in tensorflow:
Use a full width filter with no zero padding
tf.nn.conv2d(input,filter,strides,padding="VALID",...)
such that filter_width = W (as in [filter_height, filter_width, in_channels, out_channels]). You then make several of these, which gets you the output information you want. Pro: This considers the full width of the stack, so serves as a dimensionality reduction in the equivalent sense as a typical (depth) 1x1 convolution. Con: This moves your width information to the depth stack (you get width of 1 for each filter, so your "reduced" dimension is not in the width, but in the depth. That's almost certainly not desirable. You could tf.reshape your way out of it, but yuck.
Use strides to sort of accomplish this
tf.nn.conv2d(input, filter, [1,1,2,1],padding="VALID",...)
where strides has been specified as [1,1,2,1] and you specify filter where filter_width = 2. This will reduce your width dimension by 2 (or 3 or any other factor that divides your width evenly), using a stride that matches your filter width (and critically zero padding that will be in effect 0). Pro this is clean and produces the data sizes you want without the reshaping annoyance above. Con this isn't doing a 1x1 convolution / dimension reduction in the usual sense. It is reducing dimension pairwise (every two adjacent dimensions are becoming one), not mixing all dimensions together. This is not a good dimensionality reduction method, so you might lose a lot of signal. Probably you should try this one because it's much cleaner, but be forewarned about that issue.
I want to draw tiled images and then transform them by using the usual panning and zooming gestures. The problem that brings me here is that, whenever I have a scaling transformation of a large number of decimal places, a thin line of pixels (1 or 2) appears in the middle of the tiles. I managed to isolate the problem like this:
CGContextSaveGState(UIGraphicsGetCurrentContext());
CGContextSetFillColor(UIGraphicsGetCurrentContext(), CGColorGetComponents([UIColor redColor].CGColor));
CGContextFillRect(UIGraphicsGetCurrentContext(), rect);//rect from drawRect:
float scale = 0.7;
CGContextScaleCTM(UIGraphicsGetCurrentContext(), scale, scale);
CGContextDrawImage(UIGraphicsGetCurrentContext(), CGRectMake(50, 50, 100, 100), testImage);
CGContextDrawImage(UIGraphicsGetCurrentContext(), CGRectMake(150, 50, 100, 100), testImage);
CGContextRestoreGState(UIGraphicsGetCurrentContext());
With a 0.7 scale, the two images appear correctly tiled:
With a 0.777777 scale (changing line 6 to "float scale = 0.777777;"), the visual artifact appears:
Is there any way to avoid this problem? This happens with CGImage, CGLayer and primitive forms such as a rectangle. It also happens on MacOSx.
Thanks for the help!
edit: Added that this also happens with a primitive form, like CGContextFillRect
edit2: It also happens on MacOSx!
Quartz has a floating point coordinate system, so scaling may result in values that are not on pixel boundaries, resulting in visible antialiasing at the edges. If you don't want that, you have two options:
Adjust your scale factor so that all your scaled coordinates are integral. This may not always be possible, especially if you're drawing lots of things.
Disable anti-aliasing for your graphics context using CGContextSetShouldAntialias(UIGraphicsGetCurrentContext(), false);. This will result in crisp pixel boundaries, but anything but straight lines might not look very good.
When all is said and done, iOS is dealing with discrete pixels on integer boundaries. When your frames are reduced 0.7, the 50 is reduced to 35, right on a pixel boundary. At 0.777777 it is not - so iOS adapts and moves/shrinks/blends whatever.
You really have two choices. If you want to use scaling of the context, then round the desired value up or down so that it results in integral scaled frame values (your code shows 50 as the standard multiplication value.)
Otherwise, you can not scale the context, but scale the content one by one, and use CGIntegralRect to round all dimensions up or down as needed.
EDIT: If my suspicion is right, there is yet another option for you. Lets say you want a scale factor of .77777 and a frame of 50,50,100,100. You take the 50, multiply it by the scale, then round the return value up or down. Then you recompute the new frame by using that value divided by 0.7777 to get some fractional value, that when scaled by 0.7777 returns an integer. Quartz is really good at figuring out that you mean an integral value, so small rounding errors are ignored. I'd bet anything this will work just fine for you.