I have around 10k images that I need to get the Hex colour from for each one. I can obviously do this manually with PS or other tools but I'm looking for a solution that would ideally:
Run against a folder full of JPG images.
Extract the Hex from dead center of the image.
Output the result to a text file, ideally a CSV, containing the file name and the resulting Hex code on each row.
Can anyone suggest something that will save my sanity please? Cheers!
I would suggest ImageMagick which is installed on most Linux distros and is available for OSX (via homebrew) and Windows.
So, just at the command-line, in a directory full of JPG images, you could run this:
convert *.jpg -gravity center -crop 1x1+0+0 -format "%f,%[fx:int(mean.r*255)],%[fx:int(mean.g*255)],%[fx:int(mean.b*255)]\n" info:
Sample Output
a.png,127,0,128
b.jpg,127,0,129
b.png,255,0,0
Notes:
If you have more files in a directory than your shell can glob, you may be better of letting ImageMagick do the globbing internally, rather than using the shell, with:
convert '*.jpg' ...
If your files are large, you may better off doing them one at a time in a loop rather than loading them all into memory:
for f in *.jpg; do convert "$f" ....... ; done
Related
Can someone give me the script I would need to run to batch convert many *.jpeg files to *.png in Script.Fu in GIMP?
Currently I am spending way too much time manually exporting every image and it's a waste of time.
I can't install anything right now so can't use alternative applications.
Alright, after a lot of trials and errors I finally figured out how to convert one file format to another using only GIMP.
This is the Script-Fu script for conversion to PNG:
(
let* ((filename "{{filename}}")
(output "{{output}}")
(image (car (gimp-file-load 1 filename filename)))
(drawable (car (gimp-image-get-active-layer image))))
(file-png-save-defaults 1 image drawable output output)
)
Where {{filename}} is input file that needs to be converted (a jpeg file, for example), {{output}} is the output file that you need (it can be simply the same file name but with PNG extension)
How to run it: it can probably be improved
gimp -i -n -f -d --batch "{{one-line script-fu}}"
More about command line options you can find in GIMP online documentation.
The place that needs to be changed is {{one-line script-fu}} and it has to be... one-line! You can probably do all of this in one file using cmd (in case if you use Windows), but for me it was easier to use Python, so here's the script for it:
import subprocess, os
def convert_to_png(file_dds):
#Loads the command to run gimp cli (second code block)
#Note: remove "{{one-line script-fu}}" and leave one space after the --batch
with open("gimp-convert.bat", "r") as f:
main_script = f.read()
#Prepares the Script-Fu script to be run, replacing necessary file names and makes it one-line (the firs code block)
with open("gimp-convert-png.fu", "r") as f:
script = f.read().replace("\n", " ").replace("{{filename}}", file_dds) \
.replace("{{output}}", file_dds[:-3]+"PNG").replace("\\", "\\\\").replace("\"", "\\\"")
subprocess.run(main_script + " \"" + script + "\" --batch \"(gimp-quit 1)\"",
cwd = os.getcwd(),
shell = True)
And you should get your file converted to PNG!
I needed this for my texture upscale project, all of the code below you can find here.
Tested with GIMP 2.10
The real solution is to use ImageMagicks convert, as simple as magick convert some.jpeg some.png. There must be a "portable" version somewhere that you can use off a USB key.
Otherwise with Gimp, a much less manual way that doesn't need for a new script, since it uses an existing script:
get/install ofn-export-layers
File>Open the first JPEG
File>Open as layers more Jpegs. You can select several/all jpegs in one call (actual number limited by available RAM mostly). Once this is done you have many Jpegs stacked in the same image
File>Export all layers, making sure the name pattern you use ends in .png (the doc that comes with the script explains how that works).
I'm using inkscape to produce vector figures, save them in SVG format to export them later as "PDF + Latex" much in the vein of TUG inkscape+pdflatex guide.
Trying to produce a simple figure, however, turns out to be extremely frustating.
The first figure
is an example of the figure I would like to export in the form of "PDF + Latex" (shown here in PNG format).
If I export this to a PDF figure without latex macros the PDF produced looks exactly the same, except for some minor differences with the fonts used to render the text.
When I try to export this using the "PDF + Latex" option the PDF file produced consists on a PDF document of 2 pages (again as .png here):
This, of course, does not looks good when compiling my latex document. So far the guide at TUG has been very helpful, but I still can't produce a working "PDF + Latex" export from inkscape.
What am I doing wrong?
I worked around this by putting all the text in my drawing at the top
select text and then Object -> Raise to top
Inkscape only generates the separate pages if the text is below another object.
I asked this question on the Inkscape online discussion page and got some very helpful guidance from one of the users there.
This is a known bug https://bugs.launchpad.net/ubuntu/+bug/1417470 which was inadvertently introduced in Inkscape 0.91 in an attempt to fix a previous bug https://bugs.launchpad.net/inkscape/+bug/771957.
It seems this bug does two things:
The *.pdf_tex file will have an extra \includegraphics statement which needs to be deleted manually as described in the link to the bug above.
The *.pdf file may be split into multiple pages, regardless of the size of the image. In my case the line objects were split off onto their own page. I worked around this by turning off the text objects (opacity to zero) and then doing a standard PDF export.
If you can execute linux commands, this works:
# Generate the .pdf and .pdf_tex files
inkscape -z -D --file="$SVGFILE" --export-pdf="$PDFFILE" --export-latex
# Fix the number of pages
sed -i 's/\\\\/\n/g' ${PDFFILE}_tex;
MAXPAGE=$(pdfinfo $PDFFILE | grep -oP "(?<=Pages:)\s*[0-9]+" | tr -d " ");
sed -i "/page=$(($MAXPAGE+1))/,\${/page=/d}" ${PDFFILE}_tex;
with:
$SVGFILE: path of the svg
$PDF_FILE: path of the pdf
It is possible to include these commands in a script and execute it automatically when compiling your tex file (so that you don't have to manually export from inkscape each time you modify your svg).
Try it with an illustration that is less wide.
Alternatively, use a wider paperwidth setting.
I wanted to try and manually code a PostScript file. Why? Why not. From Wikipedia, I copied and pasted their basic Hello World program for PostScript which is:
%!PS
/Courier % name the desired font
20 selectfont % choose the size in points and establish
% the font as the current one
72 500 moveto % position the current point at
% coordinates 72, 500 (the origin is at the
% lower-left corner of the page)
(Hello world!) show % stroke the text in parentheses
showpage % print all on the page
When I try to open it in GIMP, I get
Opening 'Hello World.ps' failed. Could not interpret file 'Hello World.ps'
I can use ImageMagick to convert the file
convert "Hello World.ps" "Hello World.pdf"
convert "Hello World.ps" "Hello World.eps"
The PDF opens successfully and displays 'Hello World' in Courier.
The EPS yields the same error as the PS.
Is there something wrong with the syntax of the PS file?
Are PS files just not meant to be viewed directly, and should instead be viewed in a containing format like PDF?
Is GIMP just not able to handle this particular format of PS file?
To answer your questions, one by one:
You PostScript file is completely OK.
PostScript files can be viewed directly if you use a PostScript-capable viewer. (BTW: PDF may be regarded as a 'container format' -- but it never embeds a PostScript file for 'viewing'...)
For Gimp to be able and handle PS/EPS files, you need a working Ghostscript (installation link) on your system.
The same as point '3.' is true for your convert command: ImageMagick cannot handle PS/EPS or PDF input files unless there is a functional Ghostscript installation available on the local system. This would work as a so-called 'delegate', employed by ImageMagick to handle file formats which it cannot handle itself. A delegate converts such a format into a raster file, which ImageMagick in turn can then take over for further processing.
To check for available ImageMagick delegates, run these commands:
convert -list delegate
convert -list delegate | grep -Ei --color '(eps|ps|pdf)'
I tried this:
for ((i=1; i<=10; i++)); do convert 100MB.pdf 10MB.pdf 100MB.pdf; done
to create 100MB file but very quickly run out of RAM.
The most simple tool: use pdftk (or pdftk.exe, if you are on Windows):
pdftk 10_MB.pdf 100_MB.pdf cat output 110_MB.pdf
This will be a valid PDF. Download pdftk here.
Update: if you want really large (and valid!), non-optimized PDFs, use this command:
pdftk 100MB.pdf 100MB.pdf 100MB.pdf 100MB.pdf 100MB.pdf cat output 500_MB.pdf
or even (if you are on Linux, Unix or Mac OS X):
pdftk $(for i in $(seq 1 100); do echo -n "100MB.pdf "; done) cat output 10_GB.pdf
Windows: fsutil
Usage:
fsutil file createnew [filename].[extension] [# of bytes]
Source: https://www.windows-commandline.com/how-to-create-large-dummy-file/
Linux: fallocate
Usage:
fallocate -l 10G [filename].[extension]
Source: Quickly create a large file on a Linux system?
For those using macOS mkfile might be a good alternative to fallocate or dd
mkfile 100m some100mfile.pdf
reference -
https://stackoverflow.com/a/33478049/711401
according to http://www.maketecheasier.com/combine-multiple-pdf-files-with-pdftk/ the command should be
pdftk file1.pdf file2.pdf file3.pdf cat output newfile.pdf
note that you should download windows version of pdftk
I had problems using pdftk with the cat parameter had a better success with output.
The following command worked for me:
pdftk file_1.pdf file_1.pdf file_1.pdf file_1.pdf cat output.pdf
Using cat produced the following error:
Error: Unexpected text in page range end, here:
output.pdf
Exiting.
Acceptable keywords, for example: "even" or "odd".
To rotate pages, use: "north" "south" "east"
"west" "left" "right" or "down"
Errors encountered. No output created.
Done. Input errors, so no output created.
http://www.pdflabs.com/docs/pdftk-cli-examples/.
I created a 172mb PDF is no time at all.
If you want a really big valid PDF file, then
take all the biggest valid pdf you can
With a tool like PDF24Creator make a fusion of pdfs
It works for me to create a big file (140MB) after some minutes.
Under Linux there is pdfunite (part of poppler) that can concatenate the same pdf files to get one large pdf file:
pdfunite in.pdf in.pdf in.pdf out.pdf
see manpage
Partly it depends on what you are trying to increase the size of... number of pages, number of images, size of a single image. In my experience, the vast bulk (90%+) of any given 'large' PDF file will be the images.
You could try using a pro product like Adobe InDesign to quickly build a large project and export it as a PDF.
Adobe Acrobat Pro has built-in tools to optimize PDF files -- you try using the tools to 'un-optimize' your file. :)
One possibility is, if you are familiar with PDF format:
Create some simply PDF with one page (Page should be contained within one object)
Copy object multiply times
Add references to the copied objects to the page catalog
Fix xref table
You get an valid document of any size, entire file will be processed by a reader.
Have you tried using cat to combine the files?
cat 10MB.pdf 10MB.pdf > 20MB.pdf
That should result in a 20MB file.
I want to convert PDF to SVG please suggest some libraries/executable that will be able to do this efficiently. I have written my own java program using the apache PDFBox and Batik libraries -
PDDocument document = PDDocument.load( pdfFile );
DOMImplementation domImpl =
GenericDOMImplementation.getDOMImplementation();
// Create an instance of org.w3c.dom.Document.
String svgNS = "http://www.w3.org/2000/svg";
Document svgDocument = domImpl.createDocument(svgNS, "svg", null);
SVGGeneratorContext ctx = SVGGeneratorContext.createDefault(svgDocument);
ctx.setEmbeddedFontsOn(true);
// Ask the test to render into the SVG Graphics2D implementation.
for(int i = 0 ; i < document.getNumberOfPages() ; i++){
String svgFName = svgDir+"page"+i+".svg";
(new File(svgFName)).createNewFile();
// Create an instance of the SVG Generator.
SVGGraphics2D svgGenerator = new SVGGraphics2D(ctx,false);
Printable page = document.getPrintable(i);
page.print(svgGenerator, document.getPageFormat(i), i);
svgGenerator.stream(svgFName);
}
This solution works great but the size of the resulting svg files in huge.(many times greater than the pdf). I have figured out where the problem is by looking at the svg in a text editor. it encloses every character in the original document in its own block even if the font properties of the characters is the same. For example the word hello will appear as 6 different text blocks. Is there a way to fix the above code? or please suggest another solution that will work more efficiently.
Inkscape can also be used to convert PDF to SVG. It's actually remarkably good at this, and although the code that it generates is a bit bloated, at the very least, it doesn't seem to have the particular issue that you are encountering in your program. I think it would be challenging to integrate it directly into Java, but inkscape provides a convenient command-line interface to this functionality, so probably the easiest way to access it would be via a system call.
To use Inkscape's command-line interface to convert a PDF to an SVG, use:
inkscape -l out.svg in.pdf
Which you can then probably call using:
Runtime.getRuntime().exec("inkscape -l out.svg in.pdf")
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Runtime.html#exec%28java.lang.String%29
I think exec() is synchronous and only returns after the process completes (although I'm not 100% sure on that), so you shoudl be able to just read "out.svg" after that. In any case, Googling "java system call" will yield more info on how to do that part correctly.
Take a look at pdf2svg (also on on github):
To use
pdf2svg <input.pdf> <output.svg> [<pdf page no. or "all" >]
When using all give a filename with %d in it (which will be replaced by the page number).
pdf2svg input.pdf output_page%d.svg all
And for some troubleshooting see:
http://www.calcmaster.net/personal_projects/pdf2svg/
pdftocairo can be used to convert pdf to svg. pdfcairo is part of poppler-utils.
For example to convert 2nd page of a pdf, following command can be run.
pdftocairo -svg -f 1 -l 1 input.pdf
pdftk 82page.pdf burst
sh to-svg.sh
contents of to-svg.sh
#!/bin/bash
FILES=burst/*
for f in $FILES
do
inkscape -l "$f.svg" "$f"
done
I have encountered issues with the suggested inkscape, pdf2svg, pdftocairo, as well as the not suggested convert and mutool when trying to convert large and complex PDFs such as some of the topographical maps from the USGS. Sometimes they would crash, other times they would produce massively inflated files. The only PDF to SVG conversion tool that was able to handle all of them correctly for my use case was dvisvgm. Using it is very simple:
dvisvgm --pdf --output=file.svg file.pdf
It has various extra options for handling how elements are converted, as well as for optimization. Its resulting files can further be compacted by svgcleaner if necessary without perceptual quality loss.
inkscape (#jbeard4) for me produced svgs with no text in them at all, but I was able to make it work by going to postscript as an intermediary using ghostscript.
for page in $(seq 1 `pdfinfo $1.pdf | awk '/^Pages:/ {print $2}'`)
do
pdf2ps -dFirstPage=$page -dLastPage=$page -dNoOutputFonts $1.pdf $1_$page.ps
inkscape -z -l $1_$page.svg $1_$page.ps
rm $1_$page.ps
done
However this is a bit cumbersome, and the winner for ease of use has to go to pdf2svg (#Koen.) since it has that all flag so you don't need to loop.
However, pdf2svg isn't available on CentOS 8, and to install it you need to do the following:
git clone https://github.com/dawbarton/pdf2svg.git && cd pdf2svg
#if you dont have development stuff specific to this project
sudo dnf config-manager --set-enabled powertools
sudo dnf install cairo-devel poppler-glib-devel
#git repo isn't quite ready to ./configure
touch README
autoreconf -f -i
./configure && make && sudo make install
It produces svgs that actually look nicer than the ghostscript-inkscape one above, the font seems to raster better.
pdf2svg $1.pdf $1_%d.svg all
But that installation is a bit much, too much even if you don't have sudo. On top of that, pdf2svg doesn't support stdin/stdout, so the readily available pdftocairo (#SuperNova) worked a treat in these regards, and here's an example of "advanced" use below:
for page in $(seq 1 `pdfinfo $1.pdf | awk '/^Pages:/ {print $2}'`)
do
pdftocairo -svg -f $page -l $page $1.pdf - | gzip -9 >$1_$page.svg.gz
done
Which produces files of the same quality and size (before compression) as pdf2svg, although not binary-identical (and even visually, jumping between output of the two some pixels of letters shift, but neither looks wrong/bad like inkscape did).
Inkscape does not work with the -l option any more. It said "Can't open file: /out.svg (doesn't exist)". The long form that option is in the man page as --export-plain-svg and works but shows a deprecation warning. I was able to fix and update the command by using the -o option on Inkscape 1.1.2-3ubuntu4:
inkscape in.pdf -o out.svg