pyPDF2 error - PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly - pdf

I am Python newbie and wrote a script about a year back to retrieve pdf files and merge them into a single file/book. The script works and does what I need it to do, but lately as I am no longer the only user there seems to be some files that I think are causing it to crash. My suspicion is that it may be newer pdf files that have higher resolution images or forms etc. that pyPDF2 may not be able to handle, but I don't know that for sure. One thing that would help to know is what file(s) is creating the issue to scan the files, instead of using the native pdf file, but I can't figure that out since the error occurs upon the writing of all files. Below is the snippet of my code that I use for merging them along with the error. Is pyPDF2 still being updated? Is there are better supported/commercial utility out there that can do this? Any suggestions and help would be greatly appreciated!
for f in Submittal_Files:
h_path, f_name = os.path.split(f)
outfile.append(open(f, 'rb'))
outfile.write(open(Final_Submittal_File_Name + ".pdf", 'wb'))
print("\nSUCCESSFULLY COMPLETED!")
print("PRESS ENTER TO END!")
program_holder = input()
break
Sorry about the long error log before. I won't be able to get the files until Monday 7/23 when I will have access to them. I will provide 2 examples.
LOG:
https://www.dropbox.com/s/ymbe1tnak7uuvs7/PDF-Merge-Crash-log.txt?dl=0
File Set #1 - Large Submittal - I had to delete some files to protect the innocent. They are mostly cover pages for sections and I know those are not an issue as they are used all the time.
https://www.dropbox.com/s/xavrpfxmo6dr7mb/Case%20%231%20-%20JUST%20PDFs.rar?dl=0
File Set #2 - Smaller Submittal - Same as above.
https://www.dropbox.com/s/2lxk8bts0w2qsx8/Case%20%232%20-%20JUST%20PDFS.rar?dl=0

Related

Problem when trying to read EXCEL after implementing OFFICE 365 "Confidentiality Label"

I have an ETL routine in PENTAHO and I'm migrating to APACHE HOP.
But I came across a situation, the HOP step/plugin "Microsoft Excel Input" cannot read the data before I open the excel file and click confirm Add Confidentiality Label.
In PENTAHO PDI this problem does not occur, does anyone have any tips?
IMG 1
After clicking and adding a confidentiality label like "public" for example and saving and closing the file, the process works perfectly.
Note: This only happens with some files.
This sounds like a problem that will not have a clear and direct answer and will require some changes in the code.
The code for Apache Hop is managed on Github.
You can create an issue there and one of the developers will help you get this sorted out. When creating a ticket please be as specific as you can be and add a sample, that will improve the chances of getting a fix on short notice.

Somehow send command line commands on windows externally and get back the response

Problem: Need to convert local html (with local images etc) to pdf from an AIX box running Universe 11.2.5 with System Builder
Current solution: FTP over html file to a Windows server which converts in batches and sends the e-mail to the destination
Proposed Solution: Do everything on the AIX box, from converting html to pdf and sending the e-mail.
Current problem: Unable to find a way to convert local html to PDF on the AIX box. I have been trying many different ways from trying to install Python3, but to no avail.
The only really difficult part of the process is getting the HTML to render into a format will properly display your html into pages that are suitable for printing. There is a fair amount of magic that goes on between HTTP:GET and clicking print on a browser window that needs to be accounted for.
I was trying accomplish something similar many moons ago on AIX but kind of ran into a skill level/time wall because I was going to have essentially create a headless browser to render the html. It looks like there are now some utilities that you might be able to leverage. I found this recent updated article on Super User that actually got me somewhat excited, especially since I don't use AIX anymore so precompiled binaries and well understood and easily attainable dependencies are something I can actually have in my life.
https://superuser.com/questions/280552/how-can-i-render-a-website-as-an-image-from-the-shell
Good Luck.
There seems to be several questions rolled into this one item.
Converting HTML to PDF, while that is just a data manipulation that you could do in basic, writing such code would be a large task. The option you use sending it to another system is valid, but put more points of failure into the system. I would think you could find code to do it on the AIX box.
Rocket plans on getting the MV Python to work on AIX, this will make the converting of html to PDF much easier since there are a lot of open source modules.
As for my suggestion of using sockets, that would be if you intend to send it to a service that will take the htms, and return the pdf document.
i.e. Is there a web service for converting HTML to PDF?
Once you have the pdf document, you can either store it in a UniVerse type-19 file, or do the base64 encoding and store it in UniVerse hash file.
Hope this helps,
Mike

VB.net: Is there a way to get the Printed File Path from Printform?

im working on my first big program. I will try to explain short how the important part of the program works, and then try to explain what my problem is.
My program is used by few people, they drive around the Europe and repair our machines. After the work, they start my program, and write a report. Until now the program was generating at the end 3 Files. (PDF file generated by printform, text file which contained the same information's again and the last file that was an Excel file, that one contained Data that was written inside the Datagridview.
These workers, used Email to send all 3 files separated. As you imagine, sometimes that can end bad, cause after work they are tired and sometimes they send the wrong files. So I made a upgrade, which gives the user a possibility to send the files directly from the Program, being sure everything is fine. In background I created a directory where 2 of 3 files always get saved. The problem is, while using printform, there opens a window where the user can select the path. And here start the troubles, some of the workers select different a different path, but then my program wont find the files again(its very important that are 3 files are together). I searched for something that would look like
dim printformpath as string = printform.getpath
is there something that works that way? I was searching but I didn't found anything helpful.
Thank you for understanding & help
Thanks, I added the path to the printFileName property and changed printform settings from Print to preview to print to file :)
Have a nice day

site moved to a new server and now pdfs wont download

I had a joomla 1.5 site for a couple of years on a linux cpanel - everything was running fine. Last week we moved it to a new linux server and now we have a strange problem. The general navigation etc works fine but linking to pdfs seems to have gone "Random". For example there are 4 pdf links on this page http://www.coinstreet.org/spacehire/conferencemeetings.html - they all have different links. However, for 3 of them the same pdf is downloaded (despite the different links) and one doesn't work at all - get a 406 error.
The new install was taken by doing a straight backup of the cpanel and then a re-install. All other functionality seems to be fine.
I am at a bit of a loss - so any suggestions would be gratefully received!
PS Just noticed that I see a lot of lines like this in the error logs
[Sat Mar 31 14:50:10 2012] [error] [client 65.92.86.225] File does not exist: /data03/c9566644/public_html/406.shtml, referer: http://www.coinstreet.org/images/stories/coinstreet/JDPS%20Childcare%20Assistant.pdf
don't know if that's relevant?
UPDATE
I created a test link to fred.pdf that DIDN'T exist on the server. WHen the link is clicked rather than giving an error as expected a pdf was downloaded. TUrns out that several pdfs are missing it seems and that this same file is downloaded. When I put a fred.pdf on the server then the test link worked as expected. So, where is this "Wrong" pdf coming from...
ANother update
I have now discovered that the same problem was happening on the old server... too oh boy!
Also, the mystery pdf that downloads is actually part of one of the articles on the site, turned into a pdf on the fly. If I unpublish that article then I get just a blank pdf appear. Time to look closely at the .htaccess file me thinks... anyone else got any thoughts?
I believe you have the filename typed in wrongly!
The link you have on your site:
http://www.coinstreet.org/images/stories/coinstreet/Meeting%20Space%20User%20Special%20Requirements%20Policy3.pdf
The link I used to see the PDF successfully:
http://www.coinstreet.org/images/stories/coinstreet/Meeting%20Space%20User%20Special%20Requirements%20Policy(3).pdf
Notice the ( ) around the 3
I think you are seeing the results of two different effects here:
a) Some form of incorrect encoding when the files were uploaded - clearly the spaces and extended characters in the file name are being url encoded prior to saving. I've seen ftp programs do this and I've also seen Joomla do this periodically. I don't know if cPanel backups don't sometimes do this too. Best advice I can give is to always rename files to remove spaces(swapping for hyphens) as well as stripping brackets, ampersands and exclamation marks (bangs). Train users to do likewise to avoid these issues - and check error logs periodically to pick up on anything you miss or that changes.
b) Now with your requests for the files not matching the actual filename you would expect to see 404 errors - but sometimes you get a file, not always the right one though. My guess here is that you have the apache module that attempts to match files based on misspellings - sorry can't recall the exact name off hand. A quick search shows mod_speling (mis-spelling of spelling is deliberate) is the most likely contender.

Word error when processing multiple documents

Right now we have a program that opens all of the word files in a folder and adds some information to the footer. After switching from Windows XP and Word 2003 to Windows 7 and Word 2007 we have started getting the following error:
Run-time error '5096':
EOALPHABETICARABICARABICABJADARABICALPHABAHTTEXTCAPSCA
(c:\Users...\Content.MSO\8BE508C6.docx)
It occurs at random in any of the files (you can be 10 files in or 100). I thought that it might be a problem with the files being on a network drive so I modified the program to copy each file locally first, add the footer to that copy, and then copy it back out to the network. However, that didn't seem to work. I'm looking for any suggestions on how to fix the problem or possibly rewrite the solution. However, I need to keep the solution in VBA since this app is part of a bigger suite which in not ready for a .NET re-write.
As it turns out I was able to workaround this problem by copying the word files locally, running the main code on the local copies, then copying them back out to the network. It didn't seem to be working at first because the user's machine on which I tested it did not have the most recent version of the program.
Adding this note for those still looking for answers to this error...
I have a macro that creates several docs from one that is open based on sections in the main document.
I was getting the Run-time error '5096' and was an able to find any solutions that worked.
Upon further examination while debugging the code I discovered that while pulling data out of the main doc and building a file name that there was a carriage return after the file name and before the extension. After adding a replace statement to remove carriage returns I am able to process documents all day without an error.
I have found that the error is caused by the total number of characters in the file name is too long. This usually occurs when manipulating file names and the full path is set in the variable. I will manually chop pieces out of real long names to force it to work.
I suppose if I were smart, I'd move into the directory and just deal with the file names without the directory prepended to the name.