Batch file to convert all pdf to text (with xpdf)

Batch file to convert all pdf to text (with xpdf) - pdf

I would like to run a batch conversion in a folder with full of pdf files. I have using xPDF and this is the command prompt for a single file:
c:\Test\pdftotext -layout firstpdftoconvert.pdf firstpdfconverted.txt
Could somebody help please to do it in one go (convert all the pdf files only) using a batch file? Thanks in advance!

Combining your question with this answer iterating over files of a directory:
for /r %i in (*.pdf) do "c:\Test\pdftotext" -layout "%i"
This will work on all pdf files in the current directory.
Be sure to double the % signs if you run this from a batch file.

Related

GIMP Script.Fu script to batch convert JPEG to PNG

Can someone give me the script I would need to run to batch convert many *.jpeg files to *.png in Script.Fu in GIMP?
Currently I am spending way too much time manually exporting every image and it's a waste of time.
I can't install anything right now so can't use alternative applications.

Alright, after a lot of trials and errors I finally figured out how to convert one file format to another using only GIMP.
This is the Script-Fu script for conversion to PNG:
(
let* ((filename "{{filename}}")
(output "{{output}}")
(image (car (gimp-file-load 1 filename filename)))
(drawable (car (gimp-image-get-active-layer image))))
(file-png-save-defaults 1 image drawable output output)
)
Where {{filename}} is input file that needs to be converted (a jpeg file, for example), {{output}} is the output file that you need (it can be simply the same file name but with PNG extension)
How to run it: it can probably be improved
gimp -i -n -f -d --batch "{{one-line script-fu}}"
More about command line options you can find in GIMP online documentation.
The place that needs to be changed is {{one-line script-fu}} and it has to be... one-line! You can probably do all of this in one file using cmd (in case if you use Windows), but for me it was easier to use Python, so here's the script for it:
import subprocess, os
def convert_to_png(file_dds):
#Loads the command to run gimp cli (second code block)
#Note: remove "{{one-line script-fu}}" and leave one space after the --batch
with open("gimp-convert.bat", "r") as f:
main_script = f.read()
#Prepares the Script-Fu script to be run, replacing necessary file names and makes it one-line (the firs code block)
with open("gimp-convert-png.fu", "r") as f:
script = f.read().replace("\n", " ").replace("{{filename}}", file_dds) \
.replace("{{output}}", file_dds[:-3]+"PNG").replace("\\", "\\\\").replace("\"", "\\\"")
subprocess.run(main_script + " \"" + script + "\" --batch \"(gimp-quit 1)\"",
cwd = os.getcwd(),
shell = True)
And you should get your file converted to PNG!
I needed this for my texture upscale project, all of the code below you can find here.
Tested with GIMP 2.10

The real solution is to use ImageMagicks convert, as simple as magick convert some.jpeg some.png. There must be a "portable" version somewhere that you can use off a USB key.
Otherwise with Gimp, a much less manual way that doesn't need for a new script, since it uses an existing script:
get/install ofn-export-layers
File>Open the first JPEG
File>Open as layers more Jpegs. You can select several/all jpegs in one call (actual number limited by available RAM mostly). Once this is done you have many Jpegs stacked in the same image
File>Export all layers, making sure the name pattern you use ends in .png (the doc that comes with the script explains how that works).

Batch extract Hex colour from images to file

I have around 10k images that I need to get the Hex colour from for each one. I can obviously do this manually with PS or other tools but I'm looking for a solution that would ideally:
Run against a folder full of JPG images.
Extract the Hex from dead center of the image.
Output the result to a text file, ideally a CSV, containing the file name and the resulting Hex code on each row.
Can anyone suggest something that will save my sanity please? Cheers!

I would suggest ImageMagick which is installed on most Linux distros and is available for OSX (via homebrew) and Windows.
So, just at the command-line, in a directory full of JPG images, you could run this:
convert *.jpg -gravity center -crop 1x1+0+0 -format "%f,%[fx:int(mean.r*255)],%[fx:int(mean.g*255)],%[fx:int(mean.b*255)]\n" info:
Sample Output
a.png,127,0,128
b.jpg,127,0,129
b.png,255,0,0
Notes:
If you have more files in a directory than your shell can glob, you may be better of letting ImageMagick do the globbing internally, rather than using the shell, with:
convert '*.jpg' ...
If your files are large, you may better off doing them one at a time in a loop rather than loading them all into memory:
for f in *.jpg; do convert "$f" ....... ; done

Hive output to xlsx

I am not able to open an .xlsx file. Is this the correct way to output the result to an .xlsx file?
hive -f hiveScript.hql > output.xlsx

hive -S -f hiveScript.hql > output.xls
This will work

There is no easy way to create an Excel (.xlsx) file directly from hive. You could output you queries content to an older version of Excel (.xls) by the answers given above and it would open in Excel properly (with an initial warning in latest versions of Office) but in essence it is just a text file with .xls extension. If you open this file with any text editor you would see the contents of the query output.
Take any .xlsx file on your system and open it with a text editor and see what you get. It will be all junk characters since that is not a simple text file.
Having said that there are many programming languages that allow you to convert/read a text file and create xlsx. Since no information is provided/requested on this I will not go into details. However, you may use Pandas in Python to create excels.

output csv or tsv file, and I used Python to do converting (pandas library)

I am away from my setup right now so really cannot test this. But you can give this a try in your hive shell:
hive -f hiveScript.hql >> output.xls

Inkscape "PDF + Latex" export

I'm using inkscape to produce vector figures, save them in SVG format to export them later as "PDF + Latex" much in the vein of TUG inkscape+pdflatex guide.
Trying to produce a simple figure, however, turns out to be extremely frustating.
The first figure
is an example of the figure I would like to export in the form of "PDF + Latex" (shown here in PNG format).
If I export this to a PDF figure without latex macros the PDF produced looks exactly the same, except for some minor differences with the fonts used to render the text.
When I try to export this using the "PDF + Latex" option the PDF file produced consists on a PDF document of 2 pages (again as .png here):
This, of course, does not looks good when compiling my latex document. So far the guide at TUG has been very helpful, but I still can't produce a working "PDF + Latex" export from inkscape.
What am I doing wrong?

I worked around this by putting all the text in my drawing at the top
select text and then Object -> Raise to top
Inkscape only generates the separate pages if the text is below another object.

I asked this question on the Inkscape online discussion page and got some very helpful guidance from one of the users there.
This is a known bug https://bugs.launchpad.net/ubuntu/+bug/1417470 which was inadvertently introduced in Inkscape 0.91 in an attempt to fix a previous bug https://bugs.launchpad.net/inkscape/+bug/771957.
It seems this bug does two things:
The *.pdf_tex file will have an extra \includegraphics statement which needs to be deleted manually as described in the link to the bug above.
The *.pdf file may be split into multiple pages, regardless of the size of the image. In my case the line objects were split off onto their own page. I worked around this by turning off the text objects (opacity to zero) and then doing a standard PDF export.

If you can execute linux commands, this works:
# Generate the .pdf and .pdf_tex files
inkscape -z -D --file="$SVGFILE" --export-pdf="$PDFFILE" --export-latex
# Fix the number of pages
sed -i 's/\\\\/\n/g' ${PDFFILE}_tex;
MAXPAGE=$(pdfinfo $PDFFILE | grep -oP "(?<=Pages:)\s*[0-9]+" | tr -d " ");
sed -i "/page=$(($MAXPAGE+1))/,\${/page=/d}" ${PDFFILE}_tex;
with:
$SVGFILE: path of the svg
$PDF_FILE: path of the pdf
It is possible to include these commands in a script and execute it automatically when compiling your tex file (so that you don't have to manually export from inkscape each time you modify your svg).

Try it with an illustration that is less wide.
Alternatively, use a wider paperwidth setting.

Error in Converting PDF to PostScript with GhostScript, Access is denied Unable to open command line file _.at

I installed ghostscript and updated the appropriate path variables ... however, I'm getting an error when I try to execute this command:
C:\PROGRA~1\gs\gs8.64\lib>pdf2ps mydocument.pdf mydocument.ps
Access is denied.
Unable to open command line file _.at
Is this the right command? Did I miss some configuration or path setting? Otherwise, is there a sane method of doing this conversion?

Access is denied suggest something to do with access to paths etc. I'd suggest rechecking the folder permission (although I'm sure you've done that). Also, you might want to try running the gswin32c.exe instead of the pdf2ps to see if you still get the error, you might get something a little more specific.
gswin32c.exe ^
-dNOPAUSE ^
-dBATCH ^
-sDEVICE=pswrite ^
-sOutputFile=mydocument.ps ^
mydocument.pdf

Using pdf2ps runs a batch file, really named pdf2ps.bat or pdf2ps.cmd. You can easily look up and understand its "source code". If you do, you'll see it tries to write some of its commandline options into a temporary file named _.at, in order to overcome the 128 character limit for DOS/cmd commandline length that exist on some Win/DOS platforms.
Since you are invoking pdf2ps from the %programs% directory where Ghostscript is installed, you don't seem to be using an account that is permitted to write stuff in there. :-)

With Ghostscript version gs9.10 the method pswrite didn't worked for me instead I tried using ps2write instead, and it worked for me, so the command worked for me is as below:
gswin32c.exe ^
-dNOPAUSE ^
-dBATCH ^
-sDEVICE=ps2write ^
-sOutputFile=mydocument.ps ^
mydocument.pdf
and if this thing doesn't even works, then one can do this :
try getting help by typing gswin32c.exe -h and then it will list all the available devices as shown below:
Default output device: display
Available devices:
bbox bit bitcmyk bitrgb bj10e bj200 bjc600 bjc800 bmp16 bmp16m bmp256
bmp32b bmpgray bmpmono bmpsep1 bmpsep8 cdeskjet cdj550 cdjcolor cdjmono
cp50 declj250 deskjet devicen display djet500 djet500c eps9high eps9mid
epson epsonc epswrite ibmpro ijs inkcov jetp3852 jpeg jpegcmyk jpeggray
laserjet lbp8 lj250 ljet2p ljet3 ljet3d ljet4 ljet4d ljetplus m8510
mswindll mswinpr2 necp6 nullpage pamcmyk32 pamcmyk4 pbm pbmraw pcx16
pcx24b pcx256 pcxcmyk pcxgray pcxmono pdfwrite pgm pgmraw pgnm pgnmraw pj
pjxl pjxl300 pkmraw plan planc plang plank planm plib plibc plibg plibk
plibm png16 png16m png256 pngalpha pnggray pngmono pngmonod pnm pnmcmyk
pnmraw ppm ppmraw **ps2write** psdcmyk psdrgb pxlcolor pxlmono r4081 spotcmyk
st800 stcolor svg t4693d2 t4693d4 t4693d8 tek4696 tiff12nc tiff24nc
tiff32nc tiff48nc tiff64nc tiffcrle tiffg3 tiffg32d tiffg4 tiffgray
tifflzw tiffpack tiffscaled tiffscaled24 tiffscaled32 tiffscaled4
tiffscaled8 tiffsep tiffsep1 txtwrite uniprint xpswrite
Search path:
C:\Program Files (x86)\gs\gs9.10\bin ;
C:\Program Files (x86)\gs\gs9.10\lib ;
C:\Program Files (x86)\gs\gs9.10\fonts ; %rom%Resource/Init/ ;
%rom%lib/ ; c:/gs/gs9.10/Resource/Init ; c:/gs/gs9.10/lib ;
c:/gs/gs9.10/Resource/Font ; c:/gs/fonts
Initialization files are compiled into the executable.
As one can see only for the convenience only I have placed star(*) around the ps2write

use gimp open PDF file.
file -> export -> postscript.

If you want to use the gs executable you have to change the permissions.In the command prompt go to the location where gs executable is located and then use chmod 755 gs.

What you are doing is you are not writing command line at right place first you have to find out the instillation exe of ghostscript which is by default located at
c:\Program Files(x86)\gs\gs9.20(your ghostscript
version)\bin\gswin32c.exe
there are two exe
1- gswin32.exe
2- gswin32c.exe
you have to use the second one because it execuit commmands at cmd not in gs cmd
ok now what you have to do is write command like
...bin\gswin32c.exe -dNOPAUSE -dBATCH -sDEVICE=pswrite -sOutputFile=mydocument.ps mydocument.pdf
note please check the file path correctly and one more thing
file path like
"D:\htmltopdf\document.ps"
should be write as
"D:/htmltopdf/document.ps"
yes exactly replace backward slash with foreword slash only in file path
and the command line is case sensitive also so be carefull with cases

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Batch file to convert all pdf to text (with xpdf) - pdf

Combining your question with this answer iterating over files of a directory: for /r %i in (*.pdf) do "c:\Test\pdftotext" -layout "%i" This will work on all pdf files in the current directory. Be sure to double the % signs if you run this from a batch file.

Related

GIMP Script.Fu script to batch convert JPEG to PNG

Batch extract Hex colour from images to file

Hive output to xlsx

Inkscape "PDF + Latex" export

Error in Converting PDF to PostScript with GhostScript, Access is denied Unable to open command line file _.at

Categories

Resources