Batch Convert .xls files and add background image | header/footer - background

i am running latest arch linux and tried to batch convert xls files to pdf which workes fine with the following command:
lowriter -convert-to pdf:writer_pdf_Export *.xls
But those xls files require an additional background image / header / footer stuff with the company information. Dont ask why they have done it that way ;). I just need to figure out a pretty and fast solution for this.
Any help is very appreciated.
Regards and thanks in advance
Sascha

I was able to figure out a simple and fast solution myself. Found a tool in the AUR of arch called pdftk which works as follows:
pdftk fg.pdf background bg.pdf output new.pdf
i just wrote a simple php script to execute it in a loop.
Done, thanks anyway.

Related

PDF Table Lines Missing from GhostScript

I am trying to convert a PDF file to an image format (ideally PNG), but some of the table lines do not render in the output, which is an issue since the purpose of my conversion is to use computer vision on it.
I unfortunately do not have access to the file used to generate the PDF.
Thank you in advance for your help!
Attached is the ghostscript rendering vs the actual pdf:
Original
GhostScript
EDIT: Thanks for the answers. Here is what I had already tried:- ---
Changing the scaling & Changing the Antialiasing (I doubt that any combination of this will work in Ghostscript at this point)
Converting to PostScript and then to PNG/PDF
Saving from a Browser
Saving from various virtual printers to PDF
Using Poppler to do the rendering
All to no avail. Digging deeper, I found some interesting things which may be helpfull. Ghostscript does recognize the lines when using -sDevice=X11 and -sDevice=PS2Write (apologies for coding typos). That is, using Ghostscript to visualize the PDF does work, but not to process them into anything else than Postscript.
Also, printing into a PDF from Adobe Acrobat does fix my problem, however this is something that I need to be able to do from the command line on thousands of files.
Hope this helps!
EDIT2:
Link to a concerned file
https://transfer.sh/PuIF90/e176ad9824ddc6cb5e6aead2d389c131-filer.pdf
I thought that I would share the fix that I found. Turns out that a bunch of the pdf we need to process were generated using a specific HTML5 to PDF conversion tool which turns each lines of the PDF into a rectangle with size 0. Solution for me has been to automate decompressing PDFs, and looking through the text file for "A A A A re", with all "A's" being numbers. Should the last or next to last A be a zero, I change it to size 1.
For instance (once again, after decompressing the PDF):
1000 2000 0 14 re
to
1000 2000 1 14 re
Hope this helps someone else out there and let me know if there is a more elegant way of doing this, I am still a beginner about all things PDF.

Tesseract : Line detection too sensitive

I am trying to detect the .pdf file text.
They are first converted to an image, then given to Tesseract.
The detection is good but they make too many line breaks.
For example if the file is a bit panched on the right, the sentence:
"I like Tesseract for reading text"
become:
"text read for Tesseract like I"
And that's already after a treatment because the raw text is :
"textreadforTesseractlikeI"
The bug occurs since the source .pdf are in 300DPI, I understand that the problem comes from the resolution but I cannot find how to solve it.
Here is my Tesseract cmd Tesseract.exe dummy.pdf dumy-ocr.pdf --psm 12 --dpi 300 -l bvr+fra+eng+deu hocr pdf
First, I would like to solve the problem of too many lines,
Then I would find out how to make the image perfectly straight
Thank you in advance for your help
https://i.stack.imgur.com/crmdO.jpg
You seem to be working backwards.
The "many" lines and thus word reversal are due to the anti-clockwise rotation.
text"
reading
for
Tesseract
like
"I
Fix that first and then the words will naturally all be placed on the same lines.
If using Leptonica in conjunction with Tesseract it is supposed to help with the pre-processing including deskew.
However there is a very small but powerful open source GUI and Command Line tool for Windows, Linux, and macOS that you could use from a shell see https://galfar.vevb.net/wp/projects/deskew/ it is also available on GitHub as an appveyor CI artifact so for the most up to date version (currently 5 days ago) follow the green tick at https://github.com/galfar/deskew

Batch plotting dwg & dxf files to pdf in AutoCAD

I have a problem with making the batch plotting files in the autoCAD.
The similar query is here, but it solves an issue within 1 file only.
Convert dwg file to pdf
In turn, the main tutorial doesn't explain it enough.
https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/How-to-publish-multiple-drawings-into-PDF-in-AutoCAD.html
I have the problem with creating the batch list itself.
My problem looks like this.
I have got a command "Layout not initialized", as per below:
As a result I have got nothing.
Is anyone able to help?
One of the solutions is running the command PAGESETUP and setting the page layout for each document, as follows:
https://knowledge.autodesk.com/support/autocad/troubleshooting/caas/sfdcarticles/sfdcarticles/layout-not-initialized-when-publish.html

Use photoshop in command window hiddenly

Is there a way to use Photoshop to convert image from command line.
for example :
Photoshop.exe -convert c:/img1.tif c:/img1.png
I want to run this command from command line, without opening Photoshop application.
I don't want to see Photoshop window.
Photoshop.exe -convert c:/img1.tif c:/img1.png
Won't actually do anything. Photoshop Scripts work in three flavours; Visual Basic, JavaScript and Applescript. There are no commands to "convert" between file types. You can write a script to save a .tiff to a .png BUT it will involve opening the Photoshop application.
I think you're actually after imageMagick - which can do conversions like the one above.
You can use ImageMagick, which is available for Windows from here. The command you want is convert like this:
convert c:\img1.tif c:\img1.png

Why is Texmaker not refreshing the pdf?

I am changing a LaTeX document in Texmaker and apparently the pdf does not get updated and I don't know why. I have as quick Build pdflatex + view pdf.
I am using Texmaker version 4.4.1 in Windows 7.
Before I was using the same tex file in another version of Texmaker and it worked, but know that I am trying to change things in the new version it doesn't.
It doesn't matter what I change in the latex file, it won't show any errors or anything while compiling, and when finished there is no change in the pdf at all.
Do you know what could be the cause for this issue?
Thank you very much to everyone!
Jorge