Parse pdf in Net Core [closed] - pdf

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I ve a Asp.Net core projet. This project reference another library in which I should extract informations from a pdf. I was using itextsharp, but it seems not compatible with .net core.
Any idea how I could extract Text from a pdf file ?

If you want to write your own pdf parser, you will need to read up on all the different versions of the pdf file format. They are all officially documented here.

Text extraction from PDF is a complex task. I would not recommend you to do this without a library.
For an Asp.Net Core library I can recommend you Docotic.Pdf library (I work for the vendor). The library supports .NET Standard and can be used to extract not only text but paths and images too.
Here are some samples / tutorials:
Extract text
Extract text by words
Extract text, paths and images

Related

What documentation system to use for user guide [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm currently trying to find a documentation (user guide) system that would have following features:
documentation files in text mode (so svn could diff/merge it)
possibility to use images, table, cross-references and table of
contents
export to pdf (or .doc/.odt) that would support cross-references
I tried markdown for documentation source files and pandoc for pdf export but markdown does not support tables.
I really appreciate any help you can provide.
We use Sphinx for this scenario.
It can generate html, pdf and some other formats from reStructuredText Files.
And have a look at list-table when you want to add complex tables.
I use the TeX for electonic and printable documentation
https://tex.stackexchange.com/
https://en.wikipedia.org/wiki/TeX
Probably the most commonly used solution set for documentation is XML in Docbook or DITA. You can certainly manage those in SVN as well as perform diffs. They both provide processing toolchains with many output types include PDF through XSL FO.

BPMN to Image convertor [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Our client is using oryx editor to render bpmn in browser .Now they have asked me to capture the image of the Bpmn diagram and save it. Is there any thing in java or javascript that can change BPMN to jpeg,svg format.
Please tell me how can I do this
Thanks in advance
The bpmn-to-image utility may help you out. It allows you to convert BPMN 2.0 diagrams to images from the command line.
Have you searched for browser plugins like
https://chrome.google.com/webstore/detail/export-svg-with-style/dkjdcaddoplepioppogpckelchefhddi?hl=en-GB
?
This thread may be helpful
How to save svg canvas to local filesystem
Export the BPMN and load it with another (free) BPMN tool which has export an function, e.g.
http://www.bizagi.com/
if you are looking for an SVG representation for the BPMN, you can use the recent version of oryx called "Signavio core components". This project save BPMN as .xml file which contains an svg representation... so you can extract it !
There is an SVG diagram generator available written by Martin Grofcik: https://github.com/gro-mar/activiti-crystalball/blob/8e7d3ed387a7a43c92d053a3407788c49c3c4ccb/image-builder/src/main/java/org/activiti/crystalball/diagram/svg/SVGCanvasFactory.java

Free library to read PDF files [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there a free way to read PDF files through VBA to extract basic text content? I need to automate a weekly data acquisition process at my company where data is contained in PDF files (which are updated weekly by the data provider). Also, is there a reference I can look into to understand the file structure (DOM?) of a PDF?
Adobe's PDF reference is online here: http://www.adobe.com/devnet/pdf/pdf_reference.html
I'm not sure about the best way to read PDFs from VBA directly, but if you can call an external Java or C# program, then I would recommend using iText for basic text extraction.
EDIT: I should maybe mention that Adobe's PDF reference is an 800 page beast. I found that it's good for looking up answers to particular questions (eg, storing widths of embedded truetype fonts), but it may not be a good place to start. For that, reading through the iText book helped me to get started on the format.
The IText book contains lots of worked examples for general PDF tasks and lots of background info to help you understand PDF files. It more than pays for itself very quickly!

tool or library for Google docs file upload [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I need a simple library or tool with which I can upload arbitrary files (other than the explicitly supported formats, like .doc, .docx, .xls, .pdf, .txt, .ppt etc.) to Google Docs. The Perl module WWW::Google::Docs::Upload doesn't work, I get an exception (Link not found at /usr/local/share/perl/5.8.8/WWW/Google/Docs/Upload.pm line 39; it's from 2008). Any programming language which is easy to run on Linux should be fine.
The responses How to programatically upload document on Google Docs? suggest using the API directly. Is there a tool or library which is a convenient wrapper around the API?
You can upload arbitrary files by automating the Web UI.
See how to do this here: http://code.google.com/a/eclipselabs.org/p/googledocs-rse/wiki/UploadAnyFileToGoogleDocs
The project you want is called googlecl - see http://code.google.com/p/googlecl/wiki/Manual
The googlecl (google command line) tool allows you to upload docs.
http://code.google.com/p/googlecl/

Desktop search utility for pdf,chm and djvu files [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I want to write a tool that helps me search pdf/chm/djvu files in linux. Any pointers on how to go about it?
The major problem is reading/importing data from all these files. Can this be done with C and shell scripting?
Tracker ships with Ubuntu 8.04 -- it was a significant switch from Beagle which users believed was too resource (CPU) intensive and didn't yield good enough results. It indexes both pdf and chm and according to this bug report it also indexes djvu.
Note that djvu is an image compression format (optimized to compress 'pictures of text', typically the results of scanning). As such, you won't be able to search for text, except in the metadata -this is what the link sent by cdleary refers to-, or if you first use OCR on the document to convert it into text.
The same is true for PDFs which content are scanned articles/books.
How about a plugin for Beagle ?
It already searches PDFs but you can add other file types.
Here is the relevant wikipedia page : http://en.wikipedia.org/wiki/Beagle_(software)