I have an issue printing pdf file in applet. I got input from http and the stream is consutructed using the pdfstamper. The problem is that i want to send the resulted stream to printer, but i did not find how to do that.
UNless the printer supports PDF you cannot send it directly to the printer. You need to rasterize it. I wrote a blog article on printing PDFs from Java at http://www.jpedal.org/PDFblog/2010/01/printing-pdf-files-from-java/
PDFBox might manage it. I'm not aware of any other Java-specific PDF renderers out there, though I wouldn't be shocked to find there's a couple more out there.
Basically, any app that can convert a PDF to an image can probably act as a print driver.
GhostScript perhaps?
Related
I'm trying to extract text from PDF files using the Google Cloud Vision API. It works most of the times, but I get gibberish in a few cases. I tried both DOCUMENT_TEXT_DETECTION and TEXT_DETECTION, I tried forcing the language in the languageHints but it didn't help.
Then I tried with a screenshot saved as tiff and this did work, so I'm guessing that Google tries to use the text in the PDF if it's not just a picture. Indeed, when I select all "text" in the PDF, I get gibberish.
When I print the tiff back into PDF, text extraction works. So it's really something weird with the PDF. But other extraction software (such as abbyy) work well with the original PDF.
Has anyone had the same kind of issues?
One thing that could help would be an option to force treat the PDF as an "image PDF". Is there such an option?
Thanks for your help!
FYI, I am unfortunately not allowed to show the PDF, and I use the dotnet library.
Edit:
The info on the PDF is:
Creator: "PScript5.dll Version 5.2.2"
Producer: Acrobat Distiller 10.1.16 (Windows)
I've been tinkering with Ghostscript with a port monitor(on a HP PCL 6 Universal driver) to convert print job into PDF. I've tested with a few applications such as Words, Excel, Adobe Reader, Microsoft Edge etc and they are all working properly.
However upon testing Microsoft Powerpoint 2016, it seems like there are some graphics that are unable to be rendered properly through Ghostscript.
Actual Slide Below
Output From Ghostscript in PDF Below
I've tested this even with some other PDF generators such as BioPDF,CutePDF as well as AdobePDF and they would all result in the same output as above.
Just wondering has anyone tried and have faced similar issues before? if so could someone point me in the right direction??
What you are doing isn't a single step PowerPoint to PDF and Ghostscript is not rendering the PowerPoint. In fact if you are creating a PDF file Ghostscript isn't (ideally) rendering anything.
What's actually happening is that you are asking PowerPoint to print to a canvas, which is then passed to the PostScript printer driver. That produces PostScript which is sent to the Port. Your (and others) Port Monitor then sends the PostScript to the 'Distiller' (in your case Ghostscript and the pdfwrite device). The Distiller reformats the vector drawing commands into a PDF format and builds a PDF file from them. It doesn't render (turn into a bitmap image) anything unless forced to.
Obviously there are several places along that road where the problem could creep in. Given that you say that the Adobe product (the others you mention al use Ghostscript) has the same problem, I think its safe to assume that the problem isn't Ghostscript.
This also means that you aren't using the driver you think you are. Adobe can't handle PCL as an input medium as far as I'm aware, and nor can Ghostscript. GhostPCL will handle PCL as an input, but that's not what you say you are using.
Of course you haven't linked to an example file to demonstrate the problem, nor supplied an example command line, so this is all supposition.
Now if, somehow, you are using a PCL6 device, then the problem is most likely due to the presence of rasterOps in the output. Rasterops are part of the PCL imaging model which do not exist in PDF and are a form of transparency. There are three ways to handle such content for a PDF output device; firstly render the whole page content to an image, secondly ignore the rasterOps objects, thirdly treat the rasterOps as opaque.
GhostPCL and the pdfwrite device take the third option. So, its just conceivable that your original content has some transparent objects which are being handled as rasterOps by the PCL printer driver, and then rendered as opaque by GhostPCL and the pdfwrite device.
If that's somehow the case then the solution is simple; don't use a PCL printer driver, use the PostScript one.
If you post a link to a (simple, eg single page) example of what you are sending to Ghostscript, and a command line, then I can look at it. Please don't send me the PowerPoint, I can't use it and even if I could, my print setup would not match yours. I need the data being sent to Ghostscript.
[EDIT after looking at files]
Don't mean to sound like I'm lecturing, the problem is people find these result on Google searches and then try to apply them based on a poor understanding of what's happening. So I find it best to be really clear in my answers about what's going on. It saves questions later :-)
The first thing I see is that the PCL is indeed PCL, and if you try running that through Ghostscript it throws horrible errors and exits. So presumably you aren't doing that.
The PostScript file contains nothing except huge images, rendered (presumably at 600 dpi) contains 2 pages, the two pages look like your images above. Which is why the PostScript is better than 20 times larger than the PCL file.
But.... If I open the .ppt file with OpenOffice (4.0.0 is what I have to hand) I see exactly the same thing. I don't, I'm afraid, have a copy of Microsoft PowerPoint, but from what I see here there are two conclusions;
firstly that the PDF I get looks pretty much like the PowerPoint when viewed with OpenOffice at least. So there's something 'interesting' about your PowerPoint.
secondly, even if that's not what you expect, its what's in the PostScript program. That means that either PowerPoint rendered the slide to a bitmap or the Windows printing system/HP driver did.
Now, if I run the PCL through GhostPCL instead of Ghostscript (rendering, not producing a PDF) then the result is more like what I think you are expecting. However, when sent to a PDF file the result is horrible. Which strongly suggests to me that there's some form of transparency involved, PostScript doesn't support transparency at all, and PCL does it through rasterOPs.
I'm afraid that this means that the problem lies either in PowerPoint, the Windows print system or the PostScript printer driver you are using. Since the PCL is at least close to what you expect, I suspect that this means PowerPoint is doing the right thing, and its the printer driver messing up. It appears you are using the Windows PostScript printer driver.
So there's no way you can 'fix' this for files like this, at least not with Ghostscript. You would need to 'fix' the Windows PostScript printer driver, or possibly the Windows print system. You could try reporting a bug to Microsoft, presumably these files print incorrectly when sent to physical PostScript printers too.
I know there's tons of threads about this "out there" but all I can find is bitmap to pdf and how do add images to a PDF.
I have a PDF which I would like to convert to JPEG. I've tried to use the iTextSharp but I can only find info about making a pdf, not the other way araound. Any ideas or links to actual code?
ImageMagick uses Ghostscript to handle PDFs so if this is your only task I'd recommend just using Ghostscript. There's a managed wrapper here and you can get the Ghostscript binaries from here. They come in an installer but you can just extract them using 7-Zip. See this discussion on what you need to deploy in your app. You might have to play around with 32-bit vs 64-bit. Also, on the Ghostscript download page please read the "Which license is right for me?" section.
We are using Tesseract's Java library, Called Tess4j to convert PDF files to text.
It works nicely with Tiff files as well as one page PDF files. But with multi-page PDF's it does generate the output file, when it comes to the last page, the control doesn't seem to come back to the original application which invoked the doOCR call. It just stays/hangs there without doing anything.
Is it an issue with the native call not returning back.i have no clue,
Please let me know if there is a solution to this issue, as soon as possible.
Regards
Vish
Tess4J does support multi-page PDF and multi-page TIFF. Substitute with your PDF file in the unit test case and give it a try.
I've already referred to this SO post. I've been embedding images using an AlternateView for PNG files. Now I'm wondering how to do it with PDFs.
Should it work, for the LinkedResource, to just say:
Dim document As New LinkedResource(pdfFilePath, "image/pdf")
I'm just trying to figure out how to get the PDF to be embedded like I could with an image, or is that not possible and I'll have to do it as an attachment?
You can embed images since they can be rendered in place by an email client. PDFs cannot do that, so I'd recommend either having a thumbnail of the PDF that links to your web site with the actual PDF. Or just attach the PDF to the email message.
There are a few options that I know of.
1) Is the simplest way okay? The easiest by far would be to attach the PDF as a normal attachment. Then render the first page of the pdf as an image, embed it in the email and link it to open the PDF if you can. Entourage kind of does this on the Mac.
Alternatively, what I found was the following:
2) FLASHPAPER embedded in HTML displaying a PDF. Adobe has a technology called Flashpaper. It is a flash based file viewer. You can use flashpaper format documents that go into it, or PDFs as the source.
Check out some examples. That's really flash. http://www.adobe.com/products/flashpaper/examples/
Assuming you send an HTML email that will get through (images aren't turned off, etc), you can can embed the Flashpaper viewer right in your HTML code as a normal Flash object.
Most HTML email clients use Internet Explorer Bits, Webkit bits, or Gecko bits to render the html. Flash player is pretty well installed on everything, so it works well. A good example of this is when we open an email and it has video playing in it. It's almost always Flash.
I have had luck doing it this way -- the only thing you'd have to decide is if most of your clients can see this and how much (if any) today's software might block it.
What I ended up doing was a hybrid. 1) Attach it to the email, 2) Embed the Flashpaper viewer. They get it either way.
Flashpaper is available seperately for $75. It has come in handy where the client was not able to install adobe acrobat on each computer and it had to be 100% web based.
I would imagine you should be able to do the same using any language with a little more effort and using something like Flashpaper.
Hope that helps
This is not possible--at least not in a way that will work with many clients. You'll need to just attach the file.
If you have only one client to worry about, it might be possible--but not likely without manually changing settings on each client.
The MIME type of a PDF is "application/pdf" not "image/pdf"