A batch PNG processor for windows to pass googles page speed test - seo

I have googles page speed plugin installed: http://code.google.com/speed/page-speed/
It is saying that I have a lot of pngs that aren't compressed on my site.
I tried using the RIOT image optimizer: http://luci.criosweb.ro/riot/
However with attempts using multiple settings I couldn't get it to pass.
Any ideas? Thanks!

You could try pngcrush, but presumably you'll get much greater savings from converting to JPEG with quality slightly less than 100 (I usually find 92 pretty good). ImageMagick would be the tool of choice for bulk processing.
I never managed to create paletted PNGs, but in principle those should be pretty efficient when you're dealing with illustrations that only use a few colours.

The good png optimzers are:
pngout http://advsys.net/ken/utils.htm
pngcrush http://pmt.sourceforge.net/pngcrush/
optipng http://optipng.sourceforge.net/
advpng http://advancemame.sourceforge.net/comp-readme.html
For best results use all 4 in that order.
You can also use pngnq http://pngnq.sourceforge.net/ to reduce the image even more at the cost of some quality. (And after using pngnq, run the image through the optimizers.)

Thanks for all the suggestions guys. I think in my case the easiest method is to grab the cache files that google page speed produces. Here is the info: http://code.google.com/speed/page-speed/docs/using_firefox.html#savefiles
Also you'll need to run it in firefox as chrome doesn't produce the same files.

Related

Watermarking Plugin Performance - Is FastScaling an Option?

I'm wanting to use ImageResizer to serve thumbnails that are scaled and watermarked on the fly on a high traffic website.
My testing has shown that the Watermarking plugin results in a significant decrease in throughput compared to just scaling them with FastScaling.
Scaled: 150+ images per second
Scaled & Watermarked: 35 images per second
I dug through the Watermark Plugin code and saw that it's using GDI+ for its image manipulations. Is it possible to make it use the more performant FastScaling plugin instead?
This is something we would like to improve. Currently, if either Watermarking (or the DRM red dot) are in use, performance reverts to GDI+ levels.
I would be happy to assist on a pull request for this, or discuss other options.

Could someone explain to me about the training Tesseract OCR?

I'm trying to do the training process, but I don't understand even how to start. I would like to train for read it numbers. My images are from real world, so it didn't go so good with the reading process.
It says that I have to have a ".tif" image with the examples... is a single image of every number (in this case) or a image with a lot of different types of number (same font, though)?
And what about the makebox? The command didn't work here.
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
Could someone explain me better, at least how to start?
I saw a few softwares that do this more quickly, but I tryied one (SunnyPage 1.8) but isn't free. Anyone know any free software that does this? Or a good tutorial?
Using Tesseract 3, Windows 8 (32bits).
It is important to patiently follow the training wiki google code project site. If needed multiple times. It is an open source library and is constantly evolving.
You will have to create a training image(tiff) with a lot of different types of numbers probably should have all the numbers you wish the engine to recognize.
Please consider posting the exact error message you got with make box.
I think Tesseract is the best free solution available. You have to keep working and seek help from community.
There is a very good post from Cédric here explaining the training process for Tesseract.
A good free OCR software is PDF OCR X which is also based on Tesseract. I tried to copy my notes from German which I had scanned at 1200dpi, and the results were commendable but not perfect. I found that this website - http://onlineocr.net - is a lot more accurate. If you are not registered, it allows a maximum of 4mb file size from most image formats (BMP, PNG, JPEG etc.) and PDF. It can output them as a Word file, an Excel file or an txt file.
Hope this helps.

Website optimization

have can i speed up the loading of images - specialy when i open the website for the first time it takes some time for images to load...
Is there anything i can do to improve this (html, css)?
link
Thank to all for your answers.
Crop the size of http://www.ursic-ei.si/datoteke/d4.jpg! It's 900 pixels wide, and most of that (half?) is empty and white. Make the image smaller and then use background-position and backgroud-color to compensate for anything you trimmed off the edges.
You have a lot of extra newlines in your HTML source. Not hugely significant, but theoretically - since in HTML there's no practical difference between one new line and two - you might want to remove some.
For images, you should consider a content delivery network (CDN), which will cache your images and other files and server them faster than you web server.
This is a must for any high-traffic website.
On the client, you can multipart download; e.g. in Firefox there's a bunch of settings under network.http.pipelining that help speed downloads.
On the server, there's little you can do (though you can gzip text-based files). The client must just know how to cache.
Since in your question you only ask about the images, I guess you already know that the cost of php processing and/or javascript is minor. If you want to speed up the images you can reduce their size, increase the compression rate... also try different formats. JPG is not always the best one.
Try GIF and/or PNG, also with these you can reduce the number of colors. Usually this formats are way better than JPG when you have simple pictures with few colors.
Also consider if some of your images are a simple patter that can be reproduced/repeated several times. For example, if you have a background image with a side banner, you just need one line and repeat it several times.

Looking for a lossless compression api similar to smushit

Anyone know of an lossless image compression api/service similar to smushit from yahoo?
From their own FAQ:
WHAT TOOLS DOES SMUSH.IT USE TO SMUSH IMAGES?
We have found many good tools for reducing image size. Often times these tools are specific to particular image formats and work much better in certain circumstances than others. To "smush" really means to try many different image reduction algorithms and figure out which one gives the best result.
These are the algorithms currently in use:
ImageMagick: to identify the image type and to convert GIF files to PNG files.
pngcrush: to strip unneeded chunks from PNGs. We are also experimenting with other PNG reduction tools such as pngout, optipng, pngrewrite. Hopefully these tools will provide improved optimization of PNG files.
jpegtran: to strip all metadata from JPEGs (currently disabled) and try progressive JPEGs.
gifsicle: to optimize GIF animations by stripping repeating pixels in different frames.
More information about the smushing process is available at the Optimize Images section of Best Practices for High Performance Web pages.
It mentions several good tools. By the way, the very same FAQ mentions that Yahoo will make Smush.It a public API sooner or later so that you can run at it your own. Until then you can just upload images separately for Smush.It here.
Try Kraken Image Optimizer: https://kraken.io/signup
The developer's plan is free - but only returns dummy results. You must subscribe to one of the paid plans to use the API, however, the Web Interface is free and unlimited for images of up to 1MB.
Find out more in the Kraken documentation.
See this:
http://github.com/thebeansgroup/smush.py
It's a Python implementation of smushit that can be run off-line to optimise your images without uploading them to Yahoo's service.
As I know the best image compression for me is : Tinypng
They have also API : https://tinypng.com/developers
Once you retrieve your key, you can immediately start shrinking
images. Official client libraries are available for Ruby, PHP,
Node.js, Python and Java. You can also use the WordPress plugin, the
Magento 1 extension or improved Magento 2 extension to compress your
JPEG and PNG images.
And First 500 images per month is for free
Tip : Via using their API, you have no limit about file-size (not max 5MB each as their online tool)

How to give best chance of success to an OCR software?

I am using Tesseract OCR (via pytesser) and PIL (Python Image Library) for automated test of an application.
I am checking that the displayed text is ok by making a screenshot and getting the text thanks to tesseract.
I had some issues in the beginning and it seems to work better since I have increased the size of the screenshot thanks to the bicubic interpolation of PIL.
Unfortunatelly, I still have some mistakes like confusion between '0' and 'O'. I can imagine that I will have other similar issues in the future.
I would like to know if there are some techniques to prepare an image in order to help the OCR. Any idea is welcomed.
Thanks in advance
Shameless plug and disclaimer: my company packages Tesseract for use in .NET
Tesseract is an OK OCR engine. It can miss a lot and gets readily confused by non-text. The best thing you can do for it is to make sure it gets text only. The next best thing is to give it something sanely binarized (adaptive or dynamic threshold to get there) or grayscale and let it try to do binarization.
Train tesseract to recognize your font
Make image extra clean and with enough free space around characters
Profit :)
Here are few real world examples.
First image is original image (croped power meter numbers)
Second image is slightly cleaned up image in GIMP, around 50% OCR accuracy in tesseract
Third image is completely cleaned image - 100% OCR recognized without any training!
Even under the best conditions OCR variants will sneak up on you. Your best option will be to design your tests to be aware of them.
For distinguishing between 0 and O, one simple solution is to choose a font that distinguishes between both (eg: 0 has a dash or dot in its middle). Would that be acceptable in your application?
Another solution is to apply a dictionary-based step after the character-by-character analysis of the text - feeding the recognized text into some form of spell-checker or validator to differentiate between difficult characters.
For instance, a round symbol followed by other numbers is most likely to be a zero, while the same symbol followed by letters is most likely to be a capital o. It's a trivial example, but it shows how context is necessary to make a more reliable OCR system.