I'm trying to get the text from a pdf document using pdfbox, the problem is I'm getting the header and footer text as well. Does anyone know if there's a way to filter that out? Maybe via some settings in TextPosition?
I want to create a document from an existing Word 2010 document and convert it to PDF using docx4j 3.1.0. I've built upon the sample in
https://github.com/plutext/docx4j/blob/master/src/samples/docx4j/org/docx4j/samples/ConvertOutPDF.java
The Word document already contains a header with text and an image that I do not modify in my processing. The resulting PDF document, however, doesn't contain the header.
Is this someting that is supposed to work? If yes: how can I find out what I am missing?
Yes, if you can see the header when you "save as PDF" in Word, then you should also see the header in docx4j's PDF output.
To have it fixed, we'll need to see the docx.
Just for the curious reader: the specific cause for the missing header turned out to be a wrong approach of setting page margins on the document. Instead of modifiying the existing settings via body.getSectPr().getPgMar() (or even simpler: setting it in the template right away), the code created new PageDimensions and set a new SectPtr on the body, thereby somehow overwriting or removing the header.
I have a pdf file that I am putting on a website for a client. It is located here...
http://www.optiphysicaltherapy.com/dev/wp-content/uploads/2014/02/OPTI_NewPatientForms.pdf
The title should be OPTI New Patient Forms but if you look at the tab in the browser and the name at the top of the browser window it says "Coury And..."
Where can I go to change this?
The website is using Wordpress 3.8.1 and I am not sure if it is in Wordpress or in the actual pdf file.
Thank you,
Matt
Ok, So I found out how to change the meta-data in a .pdf form here: http://help.adobe.com/en_US/acrobat/X/pro/using/WS58a04a822e3e50102bd615109794195ff-7c63.w.html (dead link; archived version here)
Sure enough the Title in the Meta Data within the .pdf was "Coury And..."
Once I changed this the Tab and the Title in Firefox web browser changed to have the title that I wanted.
This shows us that the meta-data in the .pdf does show in Firefox as if it were the meta-title of the webpage when displaying a .pdf within the browser.
Open the PDF with Notepad++ and search (CTRL+F) for /Title
Change title between brackets (and leave the brackets)
For instance:
Change "/Title (OLD TITLE)" into "/Title (This is my new title)"
Save the PDF and Voila
If you have access to the Word document in which the PDF is based, you can define the title when you save the file.
Whatever was on that link, I did it opening the PDF with a hex editor (HxD) and searching Title, so I found /Title (untitled) somewhere and just edited it (changed the value between parentheses, here untitled).
no need to change in meta of pdf. just to following change in iframe url
http://localhost:8080/getDataPDF//?patientId=145. use // to solve this problem it can hide your title.
Open the PDF document in Adobe Acrobat Pro: (OR use google chrome extension)
(1) Go to Select File > Properties
(2) Select the Description tab to view the metadata in the document, including the document information dictionary
(3) Modify the Title field to add or change the document's Title entry
When you open pdf in chrome you can hit print and save as pdf. As file name write what you want as title in browser, it should be the same now.
Open File > Properties, then in the box labeled 'Title', add your title.
Click on the 'Initial View' tab, where it says Show:, make sure the drop down says 'Document Title' instead of 'File Name'. This works for Chrome, but sadly not IE yet.
For change my pdf tittle I just open it on nano terminal, or with another text editor that open the raw, and I edit the Title field.
The title can be changed inside MS Office or LibreOffice if you have access to the source by going to file/properties/description.
As another answer suggested, printing as a PDF works here if you have the source document. What the other answer perhaps got wrong was that there is an option to add a title in the print dialog.
You can also use this online pdf editor to change metadata of a pdf file.
The title does not come from the pdf. it comes from the word file you export it from.
Right click on the word file, go to details. change the title and export again
Good luck
I have a CKEditor on my website, and I would like it so that when the user clicks on the printer icon, the headers don't print and only the text inside of the editor is printed on the page.
This is the site: http://strawberrycv.com/4.php
For example, if you go to the site and click print, the top will say "Rich Text Editor, editor1" (in the left corner) and the date/time (in the right corner). Is there a way to remove these?
These are browser-specific headers, not added by CKEditor. You can't really control the browser-specific print headers I'm afraid, see this for futher details and a weird workaround: Remove the default browser header and footer when printing HTML
I've worked on a requirement that allows me to show a PDF file inside a browser by doingo a Response.ContentType = "application/pdf".
The problem is that the default view of the PDF is always showing the bookmarks menu at the left, is there a way by using HTTP headers or something to tell the PDF viewer not to show the bookmarks section?
Thanks in advance.
There's two ways that you can do it. The way that I would recommend is to actually open the PDF in Adobe Acrobat and go to File, Properties. On the Initial View tab you'll see a lot of options for how to display the PDF. The second way I haven't tested but Adobe says you can pass various querystring options to the PDF. The one you'd probably want is http://example.org/doc.pdf#pagemode=none
The way how a PDF document is displayed can be configured inside the PDF document.
There are a lot of PDF editors that can modify the "viewer preferences" as it is mostly called. One free example is BeCyPDFMetaEdit.