Is there a Wikipedia API to get the images of all resolutions given an image file name? - wikipedia-api

For example, given File:Wikipedia-logo-v2.svg as input, the output would include:
http://upload.wikimedia.org/wikipedia/en/thumb/8/80/Wikipedia-logo-v2.svg/200px-Wikipedia-logo-v2.svg.png
http://upload.wikimedia.org/wikipedia/en/thumb/8/80/Wikipedia-logo-v2.svg/500px-Wikipedia-logo-v2.svg.png
...

The original image is a vector, the urls you have provided link to the wikipedia thumbnail generator.
It will generate an image of any resolution you specify, you can just change the value in the url and it will generate an image of that size.
I don't know why you would want to use it however. You should just get the original image and scale it yourself to your own needs.

Related

How do I re-use the same image payload in several pages without repeating it?

I'm using tcpdf to generate a report that contains a logo from a svg vector image.
My goal is to efficiently re-use the same image payload over and over in the report, not storing the logo as if it was a different image on each page.
Right now, with the current data, the report generates 32 pages. The file size considerably increases with new pages being added. This seems to be due to the logo being repeated on every page.
I don't have tools to analyze what is inside the pdf but I can see from other reports that are generated by other applications, that the file size of pdfs containing repeated images peaks at 1 page and then on each consecutive page, the size increases very slightly, indicating that the first logo is efficiently re-used.
How can I achieve that using tcpdf?
If in my report, I place the logo only in page 1 and omit it in pages 2 - 32, still outputting all the text data, the file size is greatly reduced, just as in the examples that I mentioned before. This indicates that the svg data is repeated on every page.
From the example 009 in tcpdf's site documentation, I've tried loading the image from file and also tried using a "data stream" (this is encoding the svg in base64 and instead of referencing the image from a file, you use the text-based base64 variable content as a stream that contains the image payload).
I thought that using the data stream would take care of it, but it didn't.
Is there a way to reference the same image over and over in tcpdf?

pdf2svg leads to blurry images

I'm trying to convert a pdf figure to svg so I can edit some details with Inkscape. The problem I have is that the import changes slightly through some sort of smoothing.
In particular, this is the original figure:
And this is the figure after converting to SVG
This is the output of pdf2svg, which is exactly the same I get if I use Inkscape directly.
I attach a link where you can get both files.
https://www.dropbox.com/s/domxcc8pncyouy6/images.tar.gz?dl=0
Do you know a workaround to this issue?
Without seeing the SVG it is hard to tell for sure. However it looks like the "heat map" portion of your PDF/SVG may be a low resolution bitmap that is being enlarged in the page.
By default, SVG renderers will use interpolation when enlarging an image. This gives the image a smoothed/blurry look at large scales.
You could try locating the <image> element in your SVG and adding the attribute image-rendering="pixelated" to the <image> tag. Some browsers support that option and will scale the image using the nearest-neighbour scaling method.
Otherwise you may need to extract the image from the PDF or SVG; resample it at a higher (eg. 4x or 8x) resolution; then reinsert it back into the file.
Find the image in the SVG file (<image id="image5" .../>
Extract the Base64 encoded image from the DataURI. And decode it using a Base64 decoder.
Multiply the image resolution using an editor, cusch as Photoshop or Gimp.
Encode the file back to Base64
Update that <image> element with the new Base64.

Tesseract cannot recognize my image correctly

I am developing an Android app now, it needs to recognize captcha from website.
I utilize the tess-two to recognize captcha and follow TrainingTesseract3 instructions to train my own traineddata (using jTessBoxEditor to correct characters), but it cannot recognize correctly and even cannot recognize it.
The below TIFF image is that I use to train my Tesseract, I collect many captchas and merge them into a image.
TIFF image
The image that I want to recognize
For example, the expected result of the above image should be k8666, but the actual result is only 66.
Does anyone give me a help? Thanks.
I tried your images using a .NET wrapper for tesseract-ocr Tesseract-ocr .Net Wrapper by Charliesw.
I got some better results like (K8EEE, K8656), i think you have to increase the text font and make it bold and i saved the image in tiff format with 96DPI resolution to get a better results than mine.

Gracenote API: Fetching Images

I have a few questions related to images in gracenote API.
Let's take this image for example: http://akamai-b.cdn.cddbp.net/cds/2.0/image/5899/C629/E091/E3A2_medium_front.jpg
After trying to manipulate a bit the image name, i found that i could get another format for the same image: small.
Is there somewhere in gracenote documentation where i can get list of possible image formats ?
Another thing that i noticed in image name is "front".
Does that mean that there are other images for the same content that we can get ?
Thanks to anyone who can help me about that.
You can find options for image sizes in the eyeQ Web API Reference Documentation https://developer.gracenote.com/sites/default/files/eyeq-webapi-ref.pdf
The image sizes available are
THUMBNAIL (75)
SMALL (170)
MEDIUM (450)
LARGE (720)
XLARGE (1080)
(Note, all dimensions are max size - some images are not square)
You can select one or more image sizes through the IMAGE_SIZE option, e.g.:
<OPTION>
<PARAMETER>IMAGE_SIZE</PARAMETER>
<VALUE>XLARGE,SMALL</VALUE>
</OPTION>
The image format is .jpg but I assume you are asking about image sizes.
There are at most 5 sizes available: thumbnail, small, medium (default), large, xlarge.
Also there is only one image returned per content.

PDFBox : Converting to image : Quality loss when converting PDF containing scanned documents

My use case is pretty simple. I need to convert the PDFs to images.I tried using apache pdfbox and i am having some trouble in converting pdfs which contains scanned images. when i convert scanned image the image clarity is lost due to compression/scaling. So i was trying to extract the image data from the PDF and then store it. But the problem is i may get PDF files which will contain images and text in which case i would need to fallback to image conversion mode. The problem is how to differentiate between the pages/documents having only image and the ones with composite data. I was thinking i could use ProcSet defenition for this purpose but looks like it is marked as obsolete and non-reliable according to PDF specifications. Other possibility is to check all the objects linked to that page and see if it contains anything other than images. Please let me know if there is an easier way of doing this
Thanks
If your intention is convert pdf to image, It is better to use ImageMagick for that. If you use ImageMagick, there is a lot options to change the quality of the image. And converting pdf to image is pretty simple using ImageMagick.