I'm sure this is just a google search away, but I can't find the right search terms to find what I'm looking for.
I've created a DataPackage that has both HTML annd plain text content. I've used this in my copy and my sharing code and it works fine. I now want to create RTF output as some apps don't seem to accept HTML clipboard content.
I'm looking for a good guide to making RTF text that can be added to the DataPackage. I just need simple formatting including changing the font family, font size, font weight and adding newlines. The data comes from a list of objects taht I want to serialise as RTF, not from a text control on the screen.
WordPad outputs fairly clean RTF and some other text editors do as well. If that's not enough, you can download the RTF Specification 1.9.1 although like any specification that's probably overkill for what you're doing.
You can also use the SaveToStream method on the Document property of a RichEditBox from a Metro style app to share out as well.
Related
Using the following code to generate thumbnails from PDFs (ColdFusion 8):
<cfpdf
action="thumbnail"
source="#LOCAL.PathToMyPDF#"
destination="#LOCAL.ImageDestination#"
format="png"
scale="100"
resolution="high"
overwrite="true"
pages="1" />
Sometimes it works great and generates a beautiful PNG representation of the first page. However, many times, it ends up creating a PNG with none of the text that's in the PDF, or with the text mangled or background images out of arrangement.
Is there any way to prevent this? I'm open to using a non-commercial java library, if necessary.
Without looking into this too deep, I would think you are having a font problem.
Try to run that bit of code with this parameter nofonts = "true" (which removes font styling) and see if you get your text (not styled).
If that works then you may need to register your fonts in Coldfusion (so Coldfusion has access to the fonts library). If you are not sure what fonts your PDF uses then you can check file, properties and click the font tab to see the fonts your PDF uses.
Check this link for more explanation on Coldfusion and fonts.
Again, I am not sure about your server and font set up because it wasn't mentioned in your post, so this is my best guess for you...
:)
I'm developing a new function to "my" program. This function is able to write PDF files by the simple way, making a simple text file with some codes of PDF standard.
I'm trying to understand how it works yet, but my first problem is about how to apply bold on some line of my document.
I've already downloaded the PDF REFERENCES GUIDE, but I've not found nothing about it.
Any idea?
PDF is not like HTML where you can apply formatting tags for emphasis. As you've read in the PDF reference, all that you do in PDF is to setup a graphics environment (colours used, fonts used, etc) and then put text on the page.
If you want to have something show in bold, use a font that is bold. If you want to have something show in italic, use a font that is italic.
Older software used dirty tricks to create "bold-alike" text, but the good (and easy) way to do it is to make sure you select the correct font before you start drawing text.
I am trying to make an iOS app which would extract plain text from a pdf file and display it in a UITextView. Its simply not a pdf reader to view a pdf file but i would later wish to perform certain operations on that text.
I have already googled a lot but still not able to get an exact solution.
i already tried using https://github.com/zachron/pdfiphone
but the files are using ARMV6 architecture which seems obsolete with xcode 4.5
And if anyone can suggest some exact and non-confusing code using Quartz-2d framework of iOS then it would be great.
Here is An Sample code to Extract text from PDF Hope this Might Help You.
https://github.com/zachron/pdfiphone
This is a library to get the text out of a PDF for the iPhone.
Another Demo is there Which uses OCR technology find the link below
https://github.com/nolanbrown/Tesseract-iPhone-Demo
Also Check this page of the Quartz 2D Programming Guide, it covers everything you need to open and parse a PDF file in iOS. Note that it is not a simple task, since there's no method to extract the full text in one line. You have to work with the data as an input stream, using a CGPDFScanner
Two Other Libraries
https://github.com/KurtCode/PDFKitten/
https://github.com/mobfarm/FastPdfKit
This question comes up all the time. It is VERY hard to extract text from PDF in general. The PDF specification is not designed with text extraction in mind. There are many libraries that try to do the job, essentially by reconstructing the text from the geometric placement of the individual glyphs. These libraries have varying degrees of success, but will all fail on certain PDF documents. In fact, some PDF documents have Glyphs but no way to associate the glyph with a character. For these documents it is simply not possible to extract text, short of using some kind of OCR approach.
PDF is designed as a read-only format that is portable in the sense that a PDF document will be rendered identically on any platform. That is what it is best at, and what it should be used for.
If text is to be edited, do not use PDF.
Here (Extracting text from pdf using objective-c), I found an answer to your question and it works. But not so fine as i need it :(
it can extract only ascii
it return me only one paragraph
Good luck.
I have PDF files with text that should be replaced. More specificly, the text should be translated and replaced with the translated version.
It's important that the rest of the PDF structure stays intact. Note that the text is available in the PDFs and techniques like OCr are not needed. Also, it would be nice if font and other text attributes are kept.
Which libraries would you recommend for extracting the text to an easy to edit format (such as CSV) and put the new text back in again?
Assuming you are replacing text with a different language, you will have to choose a different font in most cases, and the font choice is non-trivial. I've used the Foxit libraries to change text or create PDFs with success.
My client wants us to build a custom document viewer for their app. (It really, truly needs to be custom, because there are a ton of application-specific features they need.)
We built one for them last year that took PDFs, generated page images, and backed the images using a hidden layer of text that could be selected and copied. We did it in Flex. It was a nightmare. PDF is horrid.
This year, we need to build one in HTML 5 with similar requirements, except that most of the documents now are in Word or HTML, that is, they have reflowable text, instead of the fixed layout and glyphs of PDF. But they still want to do PDF in the same viewer.
I'm thinking that we need to convert all documents to some common file format that can handle both reflowable text and also the fixed-position glyphs of PDF. (Each document would probably support one or the other, but not both). It would be nice if it were an XML-like markup language that would say:
<text>here's some text</text>
-- or --
<glyph letter="a" name="my_a_glyph" position="10,10"/>
<image src="my_image" position="20,20"/>
or something like that.
Is there any existing file format out there that can handle it? EPUB won't do the fixed-position text, and PDF sucks in too many ways to describe.
I think you can look at FB2 (FictionBook 2) format . That is an XML-based format, designed for publishing books. It includes images, though I am not sure if they can be aligned absolutely.
Also, you can simply go with HTML and do HTML-to-PDF rendering when needed (there exists various components and libraries for this). I don't see (or you have not listed) any reasons why this way doesn't work.
GROFF? Maybe build a macro library to customize it, as needed.
Groff/troff/nroff, the "run off" programs of Unix, can output to postscript or HTML. The jump from postscript to PDF is built in to some PDF viewers; there are also several existing programs for it, pstopdf, for example.
GROFF has some fixed layout options and some flow-like options. With GROFF, it's almost easier to base most of the printout on flowing text, within proscribed bounds.