Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I have been trying the whole day to convert several. pdf files which contain traffic flow for São Paulo to spreadsheets like MS Office Excel, or LibreOffice Calc in Ubuntu. When I open the .pdf file with LibreOffice Calc it opens LibreOffice Draw, and I can't get the spreadsheet.
The most promising method that I found was here with pdftotext. It works fine and I can get the tables in LibreOffice Calc but adjusting manually the columns.
My problem is that I have so many .pdf files that it would take me a lot of time.
Does anyone know a better method?
Another option is to use Okular (http://okular.kde.org).
It has table selection tool (Ctrl+5).
You may select a table, add lines for additional rows and columns and copy the resulting table into a clipboard.
It works fine for me.
Tabula can work quite well. PDF is not an easy format to extract structured information from, so it's not always possible.
Maybe the -layout would be useful for you. With this option set, pdftotext will try to keep the column layout in the resulting text file.
Now, you can import the text file into LibreOffice Calc with the appropriate import settings. When opening a txt file in Calc, you will get asked how to parse the file content (see screenshot below). Under Separator Options, select both the Options [separated by] Space and Merge Delimiters. This way, Calc will be able to restore the column structure (assuming the cell data doesn't contain spaces).
Tool called Able2Extract is the option that can do for you exactly wat you want with minimum errors
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
In a nutshell - i want to take an existing PDF and read just the tool numbers and add barcodes for the tool number to the PDF/Word doc. since word will convert pdf's.
I need some ideas to get data from a PDF which is a printout of an access database.
So we pull up the doc after filling out a few things on the form (access) then we print it. well this database is not available for me to play around with so i wanted to print to a PDF and then read the "TOOL NUMBERS"using TABULA or something similar then export to excel. turn them into either 39extended barcode or QR code. then import into word the original PDF and insert the BAR CODE under the tool number and print. yes crazy as it sounds this is the only work around i can come up with.
i wrote the excel column with tool numbers to QR code (.png's) "toolnumber.png". or is there a way for me to find the MDB file and extract data from that? the column in that datafile should be "ToolNumber".
Since you ultimately want the output in Excel, there's no need to involve Word or PDFs in the process at all - simply query the Access DB directly from Excel and format the output as required.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I have a PDF with the following text:
Localização
When I copy this text and paste, it gives me:
localizac¸ ˜ao
Any help is appreciate
Tks
For computer generated documents (not OCRd/scanned)
Some systems like LaTeX generates composed characters because the system's font doesn't contain (or support) such glyph in the current encoding. As consequence. They are generated on the fly using Composed Glyphs.
Making two glyphs look like one:
A + ´ -> Á
Because of this 'trick', the selectable PDF Text Information contains the two separated glyphs. But graphically they are both rendered at the same spot.
The quick solution:
Luckily, the generated character pairs do not happen naturally in a well written paragraph (maybe in any language). So is quite safe just search/replace them using a case-sensitive method. You can do it manually with your favorite text editor, or using a python script, etc. Automated or not, the principle of the solution is the same.
It is important to know how you are copying the text. If you are merely using a text editor and altering the underlying PDF code, you are going to have problems. PDF files are organized in a very complicated and non-human-readable way that require specialized programs to alter successfully. If you want to make this change, you will need to use a PDF editor to either edit the document, or generate a new document from scratch.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I would like to analyze Excel files, especially those which contain VBA programs inside. because I plan to run this analysis on lots of Excel files one by one, I don't want to open these files within Microsoft Excel to analyze them.
One difficulty is to find and parse the VBA macros of an Excel file. We know that an Excel file can be converted to .zip file which contains lots of .xml and vbaProject.bin, it is pretty certain that VBA macros are in vbaProject.bin. However, the problem is how to read it?
Does anyone know if there is any tool or API to find and parse the VBA macros?
Does anyone know if there is any tool or API to read vbaProject.bin?
There is a very large PDF from Microsoft which documents how to extract functions from the vbaproject.bin:
https://interoperability.blob.core.windows.net/files/MS-OVBA/%5bMS-OVBA%5d.pdf [Source]
This resource is current & available as of June 27, 2019. The event that this link goes stale (Microsoft periodically changes their permalink structure or otherwise alters how they implement their documentation/answer repositories, etc.), search for MS-OVBA.pdf.
Some additional information which may or may not be complimentary to the above:
http://www.codeproject.com/Articles/15216/Office-2007-bin-file-format
Here's an article updated in 2017 that lists several tools for helping with this. I was able to extract the VBA code out of a vbaProject.bin using the OfficeMalScanner tool. Edit: some months after successfully using this tool, Windows is detecting malware in it. The link was www dot reconstructer dot org / code / OfficeMalScanner.zip. Use at your own risk - it worked for me to extract a bunch of needed VBA code from the project after the source was lost. Edit 2: per #HackSlash comment below, probably a false positive.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a CV in PDF format which is to be converted to LaTeX code. Is there a way to 'reverse engineer' the PDF so that I can get the latex code?
Short answer: No
Slightly longer answer:
You may get the plain text back but you can't restore the original latex source.
You may be able to import PDF into a word processor and export LaTeX from it (Either AbiWord of KOffice can do that, if I remember correctly), but the result will not be pretty. This won't get you the original LaTeX, but a very poor approximation. I think recreating the CV from scratch in LaTeX will be easier.
No. An explanation can be found here:
The job just can’t be done automatically: DVI, PostScript and PDF are
“final” formats, supposedly not susceptible to further editing —
information about where things came from has been discarded. So if
you’ve lost your (La)TeX source (or never had the source of a document
you need to work on) you’ve a serious job on your hands. In many
circumstances, the best strategy is to retype the whole document, but
this strategy is to be tempered by consideration of the size of the
document and the potential typists’ skills.
Just like you can automatically reverse engineer C code (though not very readable and with certain limitations) from a compiled exe you should be able to reverse engineer the LaTeX code from a compiled PDF. There just don't seem to be any tools around that even attempt this. This would sure be an interesting thing to implement.
There's some research going on in that area:
http://www.fi.muni.cz/~sojka/dml-2011-baker-sexton-sorge.pdf
The Latex file will have been printed to PDF, converting the contents into Postcript commands.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking for an easy-to-use, free source code comparison app for Windows, which will highlight differences side-by-side between two pieces of source code. Some apps get close to what I want, but are too restrictive by requiring you load in entire files and compare them in their entirety. Sometimes I just want to compare a section of my file, such as a single function, which may be in totally different locations in the two versions I'd be comparing, making it hard to find in both panes in large files. Basically, I'd like to be able to simply edit/copy/paste the content in both panes rather than have the restriction of using files. That way I can copy and paste one function into one pane and another into the other, editing/re-ordering as necessary.
(Note that I realize there are other comparison app recommendation threads out there, but I'm having a hard time finding a free app that isn't a strict file-to-file comparison app)
Thanks for any pointers or links, thanks!
I do this in Vim all the time. Here's what I do:
Run gvimdiff -O a b. This is Vim is GUI diff mode.
Paste "before" into left pane of Vim.
Paste "after" into right pane of Vim.
:diffupdate (not always necessary)
You can also edit either side of the diff, which can be handy.
I imagine that any text editor that supports side-by-side diff and copy and paste should be able to do something similar.
And before you object that Vim is file-based (which is true), the above procedure doesn't require creating any actual files in the file-system.
Notepad++ has a nice diff function that will suit your needs also.
Winmerge can be used to do what you outlined (i.e., edit/copy/paste snippets into two windows and then make comparisons or mergers).