Open the PDF file
pdf_file = open(file, 'rb')
Create a PDF reader object
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
Get the number of pages in the PDF file
pages = pdf_reader.numPages
Initialize a variable to store the extracted text
text = ''
Loop through each page
for page in range(pages):
# Get the current page
pdf_page = pdf_reader.getPage(page)
# Extract the text from the page
page_text = pdf_page.extractText()
# If the page contains text, add it to the overall text
if page_text:
text += page_text
Close the PDF file
pdf_file.close()
Print the extracted text
print(text)
**Error:
**
TypeError: 'NumberObject' object is not subscriptable
Tried changing the pdf reader from WPF to Adobe Acrobat XI
Related
I am trying to generate a report from a Shiny app where the user can select either HTML or PDF. I am able to generate a PDF from my .rmd file (LaTex format) and everything looks good with the layout/formatting. However, the PDF is opened (using Foxit PDF Reader) with a filename other than what I specified in my downloadhandler. I am also unable to save or print the PDF through the Foxit Reader window. The file name that is output is RStudio-randomletters.pdf (ex: RStudio-FoZvSx.pdf).
Generating an html report works fine with no issues, generates the correct file name that I specified, and opens a window for me to save or rename the file.
It seems that the PDF is the only issue so I'm not sure if it is just related to reading the PDF in the Foxit Reader or if it is something else?
Update
Using Adobe Acrobat instead of Foxit allows me to now save and print, but I am still having issues with the file name for the PDF.
Here is the code for my downloadhandler
output$downloadReport <- downloadHandler(
filename = function() {
paste('Report', sep = '.', switch(
input$format, PDF = 'pdf', HTML = 'html'))},
content = function(file) {
out <- if (input$format == 'HTML'){rmarkdown::render('report.Rmd',
params = list(Name = input$Name,
Reference = input$Reference),
switch(input$format,
PDF = pdf_document(), HTML = html_document()),
envir = new.env(parent = globalenv()))}
else if (input$format == 'PDF'){rmarkdown::render('pdfreport2.Rmd',
params = list(Name = input$Name,
Reference = input$Reference),
switch(input$format,
PDF = pdf_document(), HTML = html_document()),
envir = new.env(parent = globalenv()))}
file.rename(out, file)})
Save in DB and import data and create a pdf file using jspdf.
Data is stored up to html tag...
select ct_contents from contract where ct_id = 659;
RESULT : `<p style="text-align:justify"><span style="font-size:10.5pt"><span style="font-family:Century,serif"><span style="font-family:"MS Mincho"">氏 名</span></span></span></p>`
I have this js code :
let pdfName = this.newTemplate.tp_title.trim()
var doc = new jsPDF();
doc.addFileToVFS('NotoSansCJKjp-Regular.ttf', VFS);
doc.addFont('NotoSansCJKjp-Regular.ttf', 'NotoSansCJKjp', 'Bold');
doc.setFont('NotoSansCJKjp', 'Bold');
doc.setFontSize(12);
var paragraph = this.contract.ct_contents;
var lines = doc.splitTextToSize(paragraph, 150);
doc.text(15, 60, lines);
doc.save(pdfName + '.pdf');
add a font to work on it, but check the downloaded pdf, the html tag will also appear.
I want to remove this tag and make it appear only in text.
image is the result of downloading by pdf.
And it is page 3 in ms word and only page 1 of pdf is download.....
How can I get the font to come out without getting the html tag?
I'm creating a web app and using a php FPDF script to creates and print out a pdf. This also prints out on the pdf images that are uploaded to the app, which is no problem. However, it needs to do the same thing for an uploaded pdf and that is where my problem lies. How can I print a pdf as an image inside of a pdf using fpdf?
I have heard of imageMagick but I don't think that is an option because it is a software and I don't know where I would run it.
You can use FPDI to import a page of an existing PDF document and place it with FPDF onto a newly created page:
<?php
require_once('fpdf.php');
require_once('fpdi.php');
$pdf = new FPDI();
// load the document and get the total page count
$pageCount = $pdf->setSourceFile("the/document/you/want/to/import/a/page/from.pdf");
// import the first page
$tplIdx = $pdf->importPage(1);
// add a page
$pdf->AddPage();
// use the imported page on x=10, y=10 and a width of 90
$pdf->useTemplate($tplIdx, 10, 10, 90);
// output the PDF document
$pdf->Output();
Is it possible to create a thumbnail image from a PDF file using Coldfusion 8? (thumbnail of a given page, defaulting to page 1)
Generate thumbnails from pages in a PDF document
<cfpdf
required
action = "thumbnail"
source = "absolute or relative pathname to a PDF file|PDF document variable|
cfdocument variable"
optional
destination = "directory path where the thumbnail images are written"
format = "png|jpeg|tiff"
imagePrefix = "string used as a prefix in the output filename"
overwrite = "yes|no"
password = "PDF source file password"
pages = "page or pages to make into thumbnails"
resolution= "low|high"
scale = "percentage between 1 and 100"
transparent = "yes|no">
http://livedocs.adobe.com/coldfusion/8/htmldocs/Tags_p-q_02.html
http://cfquickdocs.com/cf8/?getDoc=cfpdf#cfpdf
Adobe added quite a lot of support for PDFs in ColdFusion after they took over Macromedia. So you can also print PDFs and manipulate them.
I pieced together some code to insert a dynamic image into a PDF using both ColdFusion and iText, while filling in some form fields as well. After I got it working and blogged about it, I couldn't help but think that there might be a better way to accomplish this. I'm using the basic idea of this in a production app right now so any comments or suggestion would be most welcomed.
<cfscript>
// full path to PDF you want to add image to
readPDF = expandpath(”your.pdf”);
// full path to the PDF we will output. Using creatUUID() to create
// a unique file name so we can delete it afterwards
writePDF = expandpath(”#createUUID()#.pdf”);
// full path to the image you want to add
yourimage = expandpath(”dynamic_image.jpg”);
// JAVA STUFF!!!
// output buffer to write PDF
fileIO = createObject(”java”,”java.io.FileOutputStream”).init(writePDF);
// reader to read our PDF
reader = createObject(”java”,”com.lowagie.text.pdf.PdfReader”).init(readPDF);
// stamper so we can modify our existing PDF
stamper = createObject(”java”,”com.lowagie.text.pdf.PdfStamper”).init(reader, fileIO);
// get the content of our existing PDF
content = stamper.getOverContent(reader.getNumberOfPages());
// create an image object so we can add our dynamic image to our PDF
image = createobject(”java”, “com.lowagie.text.Image”);
// get the form fields
pdfForm = stamper.getAcroFields();
// setting a value to our form field
pdfForm.setField(”our_field”, “whatever you want to put here”);
// initalize our image
img = image.getInstance(yourimage);
// centering our image top center of our existing PDF with a little margin from the top
x = (reader.getPageSize(1).width() - img.scaledWidth()) - 50;
y = (reader.getPageSize(1).height() - img.scaledHeight()) / 2 ;
// now we assign the position to our image
img.setAbsolutePosition(javacast(”float”, y),javacast(”float”, x));
// add our image to the existing PDF
content.addImage(img);
// flattern our form so our values show
stamper.setFormFlattening(true);
// close the stamper and output our new PDF
stamper.close();
// close the reader
reader.close();
</cfscript>
<!— write out new PDF to the browser —>
<cfcontent type=”application/pdf” file = “#writePDF#” deleteFile = “yes”>
<cfpdf> + DDX seems possible.
See http://forums.adobe.com/thread/332697
I have made it in another way with itext library
I don´t want overwrite my existing pdf with the image to insert, so just modify the original pdf inserting the image, just insert with itext doesn´t work for me.
So, I have to insert the image into a blank pdf (http://itextpdf.com/examples/iia.php?id=59)
And then join my original pdf and the new pdf-image. Obtaining one pdf with several pages.
(http://itextpdf.com/examples/iia.php?id=110)
After that you can overlay the pdf pages with this cool concept
http://itextpdf.com/examples/iia.php?id=113