Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Is there a way to convert a map pdf file to a kml file? How can I convert it or is there any guidelines to do so?
Apache PDFBox is an open source Java library that can parse PDF document and extract content. The API includes methods to extract text, metadata, and embedded files from PDF files as well create PDF files from scratch. Apache PDFBox also includes several command-line utilities. One command-line tool called pdfbox-app has options to extract all text or images from PDF files.
There is also Apache Tika library which focuses solely on text extraction from a variety of file formats including PDF.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
We are trying to View following files on Autodesk Forge Viewer in our
application.
DWF
DWFX
DWG
DXF
NWD
RVT
NWC
PDF
RCP
GBXML
IFC
As per Autodesk documentation
-(https://developer.autodesk.com/en/docs/model-derivative/v2/overview/supported-translations/)
these files format are supported to viewer.
But, on (https://viewer.autodesk.com/) this site some files are getting
Format error such as PDF, RCP.
So my questions are:
Which file format is supporting to viewer
Can we open PDF files on viewer or it require any specific PDF file to
launch on viewer.
The files that the Viewer support are in the link you provided from the Model Derivative API. so PDF and RCP are supported. The A360 Viewer does support just to a number of file types, you can see the list from that website here
As you can see those 2 types are not mentioned there, but it doesn't mean they are not supported from the Forge platform. You will have to use the Model Derivative API in order to translate those file types and use the Viewer API in order to visualize them.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Setup
XSL processor: Saxon
FO processor / PDF renderer: Antenna House Formatter V6.2
Is it possible to embed a 3D PDF, XVL or 3DU via a FO transformation / PDF rendering into the current publication?
The source data would have several XML, XVL (whatever) 3D data nodes that have to be processed into the generated PDF.
Thanks in advance.
You can embed a 3D PDF using AH Formatter V6.2 or V6.3. Use fo:external-graphic to refer to the PDF just as you would for any other external image.
In the AH Formatter GUI, you can select to embed 3D annotations in the 'PDF Option Setting' dialog box (see https://www.antennahouse.com/product/ahf60/docs/ahf-gui.html#others-page). On the AHFCmd (or run.sh on Linux/Unix) command line, you may need to specify -p3da (see https://www.antennahouse.com/product/ahf60/docs/ahf-xslcmd.html#keyIDAR1YD) and/or enable 3D annotations in the Option Setting File (see https://www.antennahouse.com/product/ahf60/docs/ahf-optset.html#keyIDAVUFU).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need to extract the "articles" from this magazine which has both text and images. The image content has to be placed separately, the text extracted (as far as possible) and placed separately.
How do i go about doing this? Is there a commercial service / api that does this already? The input to the program/service will just be the file.
Eg of input: http://edition.pagesuite-professional.co.uk/pdfspool/rQBvRbttuPUWUoJlU6dBVSRnIlE=.pdf
(the actual file would be a normal pdf-file, not a seured one)
Docotic.Pdf library can extract images and text from PDF files for you.
Here are couple of samples for your task:
Extract text from PDFs
Extract images from a PDF
Extracted images can be saved as JPEG and TIFFs. You can extract text from each page or from the whole document. And you can extract text chunks with their coordinates.
Disclaimer: I work for Bit Miracle, vendor of the library.
Try this one:
http://asp.syncfusion.com/sfaspnetsamplebrowser/9.1.0.20/Web/Pdf.Web/samples/4.0/Importing/TextExtraction/CS/Default.aspx?args=7
The same component has the image-extraction feature also.
You could make a try!!
If you can afford a commercial option, Amyuni PDF Creator will allow you to enumerate all components inside the pdf file (text, image, etc), you will be able to extract them as independent objects and you can create new PDF files with them.
You may use Aspose.Pdf.Kit to extract text and images separately from a PDF file. The API is quite simple. You can also find samples, tutorials and support on Aspose website.
Note: I'm working as Developer Evangelist at Aspose.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
In which document would a file specification belong? Perhaps this file is used as an input to a third-party system. Would it belong in its own document? Or would it be better to put it in the functional or design spec? Or somewhere else?
When I say file specification, I mean a description of what format the file is (CSV, fixed width, etc), columns, data types, etc.
Also, where should you document how the file is generated? i.e. business rules/algorithms which are used to generate the file.
I always document file formats and layouts in the design spec. Unless it is a simple file I document the layout in an Excel document embedded in the design spec.
Regarding your second question, the rule of thumb is the functional spec documents the "what" and the technical/design spec documents the "how". Business rules, triggering events, and algoritms should be documented in the functional spec. How your program implements those rules is documented in the technical design spec.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to create some diagrams for some papers.
Diagrams will contain some text, e.g. some console output. I need images for using also in html files.
There is TikZ so can create images like this:
http://www.texample.net/tikz/examples/boxes-with-text-and-math/
http://www.texample.net/tikz/examples/rule-based-diagram/
http://www.texample.net/tikz/examples/scenario-tree/
but as a result I get some ps/pdf files, not images.
What's more I want to generate the pictures from text files as I want to track changes in some VCS, any binary files are not suitable for that.
The program convert from the ImageMagick suite can convert PDF files to other formats, like PNG. In its simplest form:
convert diagram.pdf diagram.png
See the manual for additional options.