Search for images in Flickr with size not smaller than x by y pixels - api

I have read Flickr API documentation and didn't find dimensions criteria for search so I suppose it's not present. Though there is getSizes method that lets me get available sizes for some specific image by its ID.
Assume I want to find images that are not smaller than 1920x1200 pixels. How do I better do it?

Thy this (found in address bar while searching at flickr)
//...
'dimension_search_mode': 'min',
'height': 1024,
'width': 1024

Related

How to exclude pixels above a certain threshold?

I am a beginner in image processing.
Currently, I’m developing an image analysis tool.
One of its features is calculating Pearson’s correlation coefficient between two images after excluding the oversaturated pixels of them.
The specific method I’m using is to set all pixel intensities above a certain threshold to 0.
For example:
The intensities of image1:
[0,0,0,50,180,210,80,60,45,18,25,250,247,150]
The intensities of image 2:
[0,0,60,0,0,240,96,70,65,8,55,232,238,110]
Let’s say the custom thresholds are 170, and 220, respectively.
After excluding all oversaturated pixels:
The intensities of image1:
[0,0,0,50,0,0,80,60,45,18,25,0,0,150]
The intensities of image 2:
[0,0,60,0,0,0,96,70,65,8,55,0,0,110]
Then calculate Pearson’s correlation coefficient based on them.
Here are my questions:
Is the above approach feasible? Because I saw a similar practice described here:
https://forum.image.sc/t/how-to-remove-bright-aspecific-speckles/50672/2
. I just want to make sure.
What exactly are oversaturated pixels? Is this generally a subjective judgment? As far as I know, for example, the maximum intensity value of an 8-bit image is 255, and the maximum intensity value of a 16-bit image is 65536. Can oversaturate pixels have intensities that exceed these values?
Are there any experts in this field who can answer my question? Thank you very much!

Simple Captcha Solving

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:
I tried to the remove the noisy dots with some filters:
import cv2
import numpy as np
import pytesseract
img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As #Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.
And here are the new resulting images:
I've taken a much more direct approach to filtering ink splotches from pdf documents. I won't share the whole thing it's a lot of code, but here is the general strategy I adopted:
Use Python Pillow library to get an image object where you can manipulate pixels directly.
Binarize the image.
Find all connected pixels and how many pixels are in each group of connected pixels. You can do this using the minesweeper algorithm. Which is easy to search for.
Set some threshold value of pixels that all legitimate letters are expected to have. This will be dependent on your image resolution.
replace all black pixels in groups below the threshold with white pixels.
Convert back to image.
Your final output image is too blurry. To enhance the performance of pytesseract you need to sharpen it.
Sharpening is not as easy as blurring, but there exist a few code snippets / tutorials (e.g. http://datahacker.rs/004-how-to-smooth-and-sharpen-an-image-in-opencv/).
Rather than chaining blurs, blur once either using Gaussian or Median Blur, experiment with parameters to get the blur amount you need, perhaps try one method after the other but there is no reason to chain blurs of the same method.
There is an OCR example in python that detect the characters. Save several images and apply the filter and train a SVM algorithm. that may help you. I did trained a algorithm with even few Images but the results were acceptable. Check this link.
Wish you luck
I know the post is a bit old but I suggest you to try this library I've developed some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.

Tuning first_stage_anchor_generator in faster rcnn model

I am trying to detect some very small object (~25x25 pixels) from large image (~ 2040, 1536 pixels) using faster rcnn model from object_detect_api from here https://github.com/tensorflow/models/tree/master/research/object_detection
I am very confused about the following configuration parameters(I have read the proto file and also tried modify them and test):
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
I am kind of very new to this area, if some one can explain a bit about these parameters to me it would be very appreciated.
My Question is how should I adjust above (or other) parameters to accommodate for the fact that I have very small fix-sized objects to detect in large image.
Thanks
I don't know the actual answer, but I suspect that the way Faster RCNN works in Tensorflow object detection is as follows:
this article says:
"Anchors play an important role in Faster R-CNN. An anchor is a box. In the default configuration of Faster R-CNN, there are 9 anchors at a position of an image. The following graph shows 9 anchors at the position (320, 320) of an image with size (600, 800)."
and the author gives an image showing an overlap of boxes, those are the proposed regions that contain the object based on the "CNN" part of the "RCNN" model, next comes the "R" part of the "RCNN" model which is the region proposal. To do that, there is another neural network that is trained alongside the CNN to figure out the best fit box. There are a lot of "proposals" where an object could be based on all the boxes, but we still don't know where it is.
This "region proposal" neural net's job is to find the correct region and it is trained based on the labels you provide with the coordinates of each object in the image.
Looking at this file, I noticed:
line 174: heights = scales / ratio_sqrts * base_anchor_size[0]
line 175: widths = scales * ratio_sqrts * base_anchor_size[[1]]
which seems to be the final goal of the configurations found in the config file(to generate a list of sliding windows with known widths and heights). While the base_anchor_size is created as a default of [256, 256]. In the comments the author of the code wrote:
"For example, setting scales=[.1, .2, .2]
and aspect ratios = [2,2,1/2] means that we create three boxes: one with scale
.1, aspect ratio 2, one with scale .2, aspect ratio 2, and one with scale .2
and aspect ratio 1/2. Each box is multiplied by "base_anchor_size" before
placing it over its respective center."
which gives insight into how these boxes are created, the code seems to be creating a list of boxes based on the scales =[stuff] and aspect_ratios = [stuff] parameters that will be used to slide over the image. The scale is fairly straightforward and is how much the default square box of 256 by 256 should be scaled before it is used and the aspect ratio is the thing that changes the original square box into a rectangle that is more closer to the (scaled) shape of the objects you expect to encounter.
Meaning, to optimally configure the scales and aspect ratios, you should find the "typical" sizes of the object in the image whatever it is ex(20 by 30, 5 by 10 ,etc) and figure out how much the default of 256 by 256 square box should be scaled to optimally fit that, then find the "typical" aspect ratios of your objects(according to google an aspect ratio is: the ratio of the width to the height of an image or screen.) and set those as your aspect ratio parameters.
Note: it seems that the number of elements in the scales and aspect_ratios lists in the config file should be the same but I don't know for sure.
Also I am not sure about how to find the optimal stride, but if your objects are smaller than 16 by 16 pixels the sliding window you created by setting the scales and aspect ratios to what you want might just skip your object altogether.
As I believe proposal anchors are generated only for model types of Faster RCNN. In this file you have specified what parameters may be set for anchors generation within line you mentioned from config.
I tried setting base_anchor_size, however I failed. Though this FasterRCNNTutorial tutorial mentions that:
[...] you also need to configure the anchor sizes and aspect ratios in the .config file. The base anchor size is 255,255.
The anchor ratios will multiply the x dimension and divide the y dimension, so if you have an aspect ratio of 0.5 your 255x255 anchor becomes 128x510. Each aspect ratio in the list is applied, then the results are multiplied by the scales. So the first step is to resize your images to the training/testing size, then manually check what the smallest and largest objects you expect are, and what the most extreme aspect ratios will be. Set up the config file with values that will cover these cases when the base anchor size is adjusted by the aspect ratios and multiplied by the scales.
I think it's pretty straightforward. I also used this 'workaround'.

How to get the default image of Wikipedia article?

Is it possible to get the (default) image associated with a Wikipedia article with an API call if only the URL of the article is known?
Is it possible to make constraints about the image size / resolution in the call (as an image is usually available in different resolutions)?
Is it possible to request the largest image version not larger than MAX_X px?
An example:
https://upload.wikimedia.org/wikipedia/commons/e/e9/Jaguar_%28Panthera_onca_palustris%29_female_Piquiri_River_2.JPG
is 4769x3179 large (1,5 ratio).
A request limiting the MAX_X size to 3000 would result in an image scaled to 3000x2000.
A request limiting the MAX_X size to 5000 would result in the image of the original size (4769x3179) because MAX_X is bigger than the image's x size.
Use MediaWiki API with pageimages. For example for Wikipedia article Jaguar and requested max image size 500 the query will be:
https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&titles=Jaguar&pithumbsize=500
From the response you can get also the largest image version, just remove the thumb part and everything from the pixels to the end of the link.

ImageMagick losing aspect ratio

Very simply I have a script that calls imagemagick on my photos.
The original image is 320 x 444, and I want to create a few scaled down versions but keeping the same aspect ratio
I call imagemagick using
convert oldfile.png -resize 290x newfile.png
I want to scale it to my set widths but the heights scale accordingly.
I do 80x, 160x and 290x in 3 separate commands.
The smallest 2 produce images with the same original aspect ratio, the 290x does not.
The size of the image it produces is 290 x 402
I have no idea why that one fails to keep the aspect ratio but the other 2 sizes maintaine it.
Any ideas?
I think that the problem in the third command is the requested size itself:
in the first command both dimensions are divided by 4: 320/4=80 and 444/4=111
in the second command both dimensions are divided by 2: 320/2=160 and 444/2=222
444 and 320 have only two common divisors: 2 and 4. You already used these divisors in your first two commands, so any other (width, height) couple will give you a slightly different aspect ratio: it is impossible to obtain the same exact aspect ratio fixing 290.
In fact while your original image has an aspect ratio of 1.3875, with a 290x403 image you would obtain an aspect ratio of 1,389655172 and with a 290x402 image you would get a 1,386206897 ratio: fixing 290 there is no other dimension's value that can give you the desired aspect ratio.
In general however Imagemagick always tries to preserve the aspect ratio of the image, as you can read in Imagemagick documentation:
The argument to the resize operator is the area into which the image
should be fitted. This area is not the final size of the image but the
maximum sizes for the image. that is because IM tries to preserve the
aspect ratio of the image more than the final (unless a '!' flag is
given), but at least one (if not both) of the final dimensions should
match the argument given image. So let me be clear... Resize will fit
the image into the requested size. It does NOT fill, the requested box
size.
For further reference see here