ImageMagick Pdf to JPG bad quality - pdf

I am not able to get good color quality from using ImageMagick to convert PDF to images.
MagickReadSettings settings = new MagickReadSettings();
settings.Verbose = true;
settings.Density = new Density(600, 600);
MagickImageCollection images = new MagickImageCollection();
images.Read("C:\\" + Path.GetFileName(fileUrl));
List <string> files = new List<string>();
for (var x = 0; x < images.Count; x++)
{
images[x].Quality = 100;
images[x].BitDepth(24);
images[x].Contrast(true);
images[x].Resize(3675, 2400);
images[x].Write("C:\\websites\\FlyerEditor2\\FlyerEditor\\src\\assets\\" + Path.GetFileNameWithoutExtension(fileUrl) + "-" + (x + 1) + ".jpeg");
files.Add("assets/" + Path.GetFileNameWithoutExtension(fileUrl) + "-" + (x + 1) + ".jpeg");
}
cropped screenshot from pdf
jpg from pdf using imageMagick

if use python3, you can try use wand.
in terminal:
brew install imagemagick#6
pip install wand
in python:
from wand.image import Image
pdf_file = '.../example/a.pdf'
def convert_pdf_to_jpg(file_name, pic_file, resolution=120):
with Image(filename=file_name, resolution=resolution) as img:
print('pages = ', len(img.sequence))
with img.convert('jpeg') as converted:
converted.save(filename=pic_file)

I found major differences in imagemagick's convert handling of colorspace between Linux and Windows.
Using commands
convert -density 300 -colorspace RGB my.pdf my.jpg
convert -density 300 -colorspace sRGB my.pdf my.jpg
On Linux, both -colorspace sRGB and -colorspace RGB generated images where contrast and palette were a major diversion from the original, contrast was increased and colors were a far match from the original.
On Windows, both -colorspace sRGB and -colorspace RGB were acceptable if not perfect.

Ok the issue has nothing to do with imageMagick. It is a simple issue with color pallets. Converting a pdf to a jpeg by default uses cmyk, while the web standard is RGB

Related

convert a .csv file to yolo darknet format

I have a few annotations that is originally in .csv format. I would need to convert it to yolo darknet format inorder to train my model with yolov4.
my .csv file :
YOLO format is : object-class x y width height
where, object_class, widht, height is know from my .csv format. But finding x,y is confusing .Note that x and y are center of rectangle (are not top-left corner).
Any help would be appreciated :)
You can use this function to convert bounding boxes to the yolo format. Of course you will need to write some code to read the csv. Just use this function as a template for your needs.
This function was extracted from the labelimg app:
https://github.com/tzutalin/labelImg/blob/master/libs/yolo_io.py
def BndBox2YoloLine(self, box, classList=[]):
xmin = box['xmin']
xmax = box['xmax']
ymin = box['ymin']
ymax = box['ymax']
xcen = float((xmin + xmax)) / 2 / self.imgSize[1]
ycen = float((ymin + ymax)) / 2 / self.imgSize[0]
w = float((xmax - xmin)) / self.imgSize[1]
h = float((ymax - ymin)) / self.imgSize[0]
# PR387
boxName = box['name']
if boxName not in classList:
classList.append(boxName)
classIndex = classList.index(boxName)
return classIndex, xcen, ycen, w, h

Grayscale image using opencv from numpy array failed

I use the following numpy array that hold an image which is black and white image with the following shape
print(img.shape)
(28, 112)
when I try to grayscale the image, to use it to get contours using opencv with following steps
#grayscale the image
grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#thredshold image
thresh = cv2.threshold(grayed, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
I got the following error
<ipython-input-178-7ebff17d1c18> in get_digits(img)
6
7 #grayscale the image
----> 8 grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
9
10
error: C:\projects\opencv-python\opencv\modules\imgproc\src\color.cpp:11073: error: (-215) depth == 0 || depth == 2 || depth == 5 in function cv::cvtColor
the opencv errors have no information in it to be able to get what is wrong
Here is the working code for how you were trying it:
img = np.stack((img,) * 3,-1)
img = img.astype(np.uint8)
grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(grayed, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
A simpler way of getting the same result is to invert the image yourself:
img = (255-img)
thresh = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)[1]
As you discovered, as you perform different operations on images, the image is required to be in different formats.
cv2.THRESH_BINARY_INV and cv2.THRESH_BINARY are designed to take a color image (and convert it to grayscale) so you need a three channel representation.
cv2.THRESH_OTSU works with grayscale images so one channel is okay for that.
Since your image was already grayscale from the start, you weren't able to convert it from color to grayscale nor did you really need to. I assume you were trying to invert the image but that's easy enough on your own (255-img).
At one point you tried to do an cv2.THRESH_OTSU with floating point values but cv2.THRESH_OTSU requires integers between 0 and 255.
If openCV had more user-friendly error messages it would really help with issues like these.

How to use crop from PDF to PNG tiles using ImageMagick

Good day,
I have large issue cropping the PDF to PNG
PDF is about 1,6MB (2500x2500) and one process takes about 7-10min and generates 700MB of temporary files.e.g.
exec("convert -density 400 'file.pdf' -resize 150% -crop 48x24# png32:'file_%d.png'");
One PDF must generate PNGs from size 25% to 200%
Here i generate attributes like density, size for resizing in % and grids row and column count
$x = 0; $y = 0;
for ($i = 25; $i <= 200; $i += 25) {
$x += 8; $y += 4;
$convert[$i] = ['density' => (($i < 75) ? 200 : ($i < 150) ? 300 : ($i < 200) ? 400 : 500), 'tiles' => implode("x", [$x, $y])];
}
After i launch converter one after one and it's extremely expensive in time.
$file_cropper = function($filename, $additional = '') use ($density, $size, $tiles) {
$pid = exec("convert -density $density ".escapeshellarg($filename)." -resize $size% -crop $tiles# ".$additional." png32:".escapeshellarg(str_replace(".pdf", "_%d.png", $filename))." >/dev/null & echo $!");
do {
/* some really fast code */
} while (file_exists("/proc/{$pid}"));
};
If i launch it simultaneously (8 processes) then ImageMagick eats all the space i have (40GB) => ~35GB of temporary files
Where is my problem, what am i doing wrong?
i tried to pass params below to functions $additional var:
"-page 0x0+0+0"
"+repage"
"-page 0x0+0+0 +repage"
"+repage -page 0x0+0+0"
nothing changes
Version: ImageMagick 6.7.7-10 2016-06-01 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC
Features: OpenMP
Ubuntu 14.04.4 LTS
2GB / 2CPU
EDITED
After a while managed to replace ImageMagick on GhostScript
gs -dNOPAUSE -dBATCH -sDEVICE=pngalpha -r240 -sOutputFile=\"file.png\" file.pdf but can't understand how to scale image and crop it.
crop with ImageMagick generates ~35GB temporary files and takes more time than previously.
I managed to resolve my problem that way:
$info = exec("identify -ping %w {$original_pdf_file}"); preg_match('/(\d+x\d+)/', $info, $matches);
"gs -dNOPAUSE -dBATCH -sDEVICE=pngalpha -r{$r} -g{$dim} -dPDFFitPage -sOutputFile=\"{$png}\" {$filename}"
"convert ".escapeshellarg($png)." -gravity center -background none -extent {$ex}x{$ex} ".escapeshellarg($png)
"convert ".escapeshellarg($png)." -crop {$tiles}x{$tiles}! +repage ".escapeshellarg(str_replace(".png", "_%d.png", $png))
where:
$filename = file.pdf
$png = file.png
$r = 120
$ex = 4000
$dim = $matches[1]
Step:
gives me dimension of original file after what i can play with size of png in the future
converts pdf to png with size i need with aspect ratio
converts png to size i wish with aspect ratio 1:1
cropping everything
this process takes 27.59s on my machine with image resolution 4000x4000 and size of file - only 1,4MB & 0-30MB of temporary files.

The size of PDF documents, how do I convert from millimeters to pixels using Spire.pdf?

The size of PDF documents, how do I convert from millimeters to pixels using Spire.pdf?
PdfDocument doc = new PdfDocument();
doc.PageScaling = PdfPrintPageScaling.ActualSize;
doc.LoadFromFile("myDocument.pdf");
foreach (PdfPageBase page in doc.Pages)
{
//Result returns the pixel type. But I want to show in millimeters
Console.WriteLine("PageSize: {0}X{1}", page.Size.Width, page.Size.Height);
}
The size of PDF pages is not expressed in pixels but in points.
1 inch = 72 points
1 inch = 25.4 mm
That leads to:
1 point = 0.352777778 mm

ImageMagick fails to remove alpha in pdf to png conversion but only on some pages

I'm trying to convert a pdf to png files per page and removing the 4th alpha channel. ImageMagick is behaving weirdly in that it is removing alpha correctly for all but one page. Is there an error in my command?
Here is the pdf: http://papers.nips.cc/paper/3723-anomaly-detection-with-score-functions-based-on-nearest-neighbor-graphs.pdf
Command I'm executing:
convert -units PixelsPerInch -density 300 -alpha remove nips09_4.pdf nips09_4.png
Result:
$ identify -verbose nips09_4-2.png
Format: PNG (Portable Network Graphics)
Mime type: image/png
Class: DirectClass
Geometry: 2480x3508+0+0
Resolution: 118.11x118.11
Print size: 20.9974x29.7011
Units: PixelsPerCentimeter
Type: TrueColorAlpha
Endianess: Undefined
Colorspace: sRGB
Depth: 16-bit
Channel depth:
red: 16-bit
green: 16-bit
blue: 16-bit
alpha: 1-bit
$ identify -verbose nips09_4-1.png
Format: PNG (Portable Network Graphics)
Mime type: image/png
Class: DirectClass
Geometry: 2480x3508+0+0
Resolution: 118.11x118.11
Print size: 20.9974x29.7011
Units: PixelsPerCentimeter
Type: Palette
Endianess: Undefined
Colorspace: sRGB
Depth: 16-bit
Channel depth:
red: 16-bit
green: 16-bit
blue: 16-bit
To reproduce:
cd ~/Downloads
wget http://papers.nips.cc/paper/3723-anomaly-detection-with-score-functions-based-on-nearest-neighbor-graphs.pdf
mv 3723-anomaly-detection-with-score-functions-based-on-nearest-neighbor-graphs.pdf nips09_4.pdf
convert -units PixelsPerInch -density 300 -alpha remove nips09_4.pdf nips09_4.png
Try using -alpha off instead, after loading the PDF, and see if that helps. Like this:
convert -density 300 some.pdf -alpha off nips%03d.png
I believe -alpha remove will remove the effect of the alpha channel but actually leave it still present, but opaque, in the image and that gets transferred onwards to your PNG images.
In contrast, alpha off actually removes the channel altogether and so it doesn't show up in the PNG images.