Extract data from PDF file with VB.Net - vb.net

I'm trying to pull some data from a large PDF file in VB.Net I found the following code online, but it's not helping:
Sub PrintPDF (strPDFFileName as string)
Dim sAdobeReader as String
'This is the full path to the Adobe Reader or Acrobat application on your computer
sAdobeReader = "C:\Program Files\Adobe\Acrobat 6.0\Reader\AcroRd32.exe"
RetVal = Shell(sAdobeReader & "/P" & Chr(34) & sStrPDFFileName & Chr(34), 0)
End Sub
I'm really lost. Any ideas?

/P will just display load the file and display the Print dialog - dead end.
You will probably need a library of some sort to get to the contents of the PDF.
If the file has a very simple structure you maybe able to extract the data just by reading the bytes. See if you can open it with a file like Notepad++ and see the contents.
BTW VS2010 has several editors for looking at/editing files:
File, Open File... pick the file then use the dropdown on the Open button.

Related

VBA - Macro - To Print Multiple digital signed PDF file and save it in subfolder by using "Microsoft Print to PDF" Printer

I am printing multiple digital signed PDF file into PDF via "Microsoft print to PDF" ( To Edit document) . Below mention VBA code is working perfectly. But when run this code each time, it is asking Filename & Destination folder for printed file.
My Expection:
It has to capture file name from existing saved documents file name and destination folder path we have include in VBA Code.
Please help me, How to solve this
Public Sub Print_All_PDF_Files_in_Folder()
Dim folder As String
Dim PDFfilename As String
folder = "C:\Users\Desktop\VBA\" 'CHANGE AS REQUIRED
If Right(folder, 1) <> "\" Then folder = folder & "\"
PDFfilename = Dir(folder & "*.pdf", vbNormal)
While Len(PDFfilename) <> 0
Print_PDF folder & PDFfilename
PDFfilename = Dir() ' Get next matching file
Wend
End Sub
Private Sub Print_PDF(sPDFfile As String)
Shell "C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe /p /h " & Chr(34) & sPDFfile & Chr(34), vbNormalFocus
End Sub
Path with spaces must be in quotes, because it is has spaces. Keys /p and /h must be separate from Program name. I check it this way:
i make this command in cmd.exe and when i see what it correct - I revrite it into macro.
Private Sub Print_PDF(sPDFfile As String)
Shell "" & Chr(34) & "C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe" & Chr(34) & " /p /h " & Chr(34) & sPDFfile & Chr(34)
End Sub
You seem to have multiple conflicts
Your command includes the command to open the Printer Dialog
/P <filename> - Open and go straight to the Printer Prompt dialog
And for "Microsoft Print to PDF" that will allow you to make the manual changes you require to the PDF then manually save to a folder or filename of your choosing.
However you say you want Acrobat to save to a known filename without that prompting. Which in turn makes me question WHY are you using Acrobat to open a PDF and re-save it as a file name without interaction ?
You could do that simply by renaming the PDF without opening it in Acrobat.
One advantage of programmatically opening a "Complex" PDF in Acrobat and Re-Printing as a "Dumber" PDF using "Microsoft Print to PDF" is it can pseudo-manically emulate much more efficient ways of flattening by using a very inefficient reprinting and for that you need to use:-
/T <filename> <printername> <drivername> <portname> - Print the file on the specified printer.
Where printername and drivername are "Microsoft Print to PDF" and portname is where you want it printed.
There are much lighter ways to process a PDF from the command line, but if you already have installed heavyweight Adobe Reader then this is the defacto standard.
[EDIT] in the comments you imply you still need to use acrobat for processing before printing to a fixed name. Then in that case, you need to run those actions first. Before saving as new PDF, prior to printing, thus you need to
get filename
make changes
save changes as new filename
send new filename to printer using:-
"C:\Full path\to\AcroRd32.exe" /T "C:\path to\Input.pdf" "Microsoft Print to PDF" "Microsoft Print to PDF" "C:\path to\Output.pdf"
The problem with batch printing, using /T = TSR (Terminate and Stay Resident), is that the window stays open waiting for the next print in the batch, and most users then add /H to hide it, then afterwards complain its not accessible so as to close at the end of the batch (which simply requires sendkeys %FX or Alt+F4 to close the open window)!
One way round that is, on the last print invoke /T without H, and then a VB focused command (object.AppActivate title) and at simplest sendkeys %FX will close the window.
If using the command line or a .cmd it is simple to use Wscript with a single line .VBS command, however in this case you are already using VBA.

Printing to PDF with correct file path and with correct file name

I have written a small macro that takes an daily Excel report and prints it to a specific printer (printing to PDF). When I run the macro, I am still missing the final steps. Running it as is, I still need to click the "save" button that pops up, and have to navigate to the correct file path. Is there a way to have it automatically hit the save button for me, and save the file into the correct folder (as seen in the code below)?
Sub printToPDF()
'declare variable for my file path
Dim filePath As String
'declare variable for my file name
Dim fileName As String
fileName = "Operations_Daily_Outage_Report_" & Format(Date, "yyyy-mm-dd")
filePath = "M:\Daily_Outage_Report\Active"
Worksheets("general_report").PageSetup.CenterVertically = False
ActiveWindow.SelectedSheets.PrintOut Copies:=1, ActivePrinter:="Foxit Reader PDF Printer"
End Sub
I think everyone is trying to give you answers that you can try out. I'm not sure why you can't just test it and tell us if it works for you?
If you have a reasonably new version of Access (within last 10 years), then you should be able to use the built-in Office PDF converter
Change this line:
ActiveWindow.SelectedSheets.PrintOut Copies:=1, ActivePrinter:="Foxit Reader PDF Printer"
To This:
Worksheets("general_report").ExportAsFixedFormat Type:=xlTypePDF _
FileName:=filePath & "\" & fileName Quality:=xlQualityStandard
Come back and tell us if it worked for you.

Cannot delete file because it is being used by another process - VB.net

I am converting an TIFF file to a .PNG file in vb.net. First, I save the TIFF file to a specific location (given by NewFileName). Then I convert the file to .PNG using the Bitmap.Save method. Later in my program, I attempt to delete all the .tif files. However, I get an error that says the files are still in use. I have done some research about the reasons for this and I have read that a lot of the errors come from not closing filestream. However, I do not use filestream in my program so I think it is something else. Another possibility that was suggested was that the file was opened twice. I have scoured my code and I am pretty sure the files were never opened, only saved and accessed with the bitmap.Save command. I also downloaded handle.exe and process explorer to find which process was locking the files. Apparently, the files are only in use by the program once I convert them to PNG using the bitmap.save command. Maybe there is a way to close bitmap.Save? Any other suggestions of what to add would be appreciated as well.
objApplication.StartCommand(SolidEdgeConstants.AssemblyCommandConstants.AssemblyViewBottomView)
view = window.View
view.Fit()
withoutExt = "C:\Folder" & "\" & shortForms(12) & FileName
NewFileName = withoutExt & ".tif"
view.SaveAsImage(NewFileName, Width:=width, Height:=height, Resolution:=resolution, ColorDepth:=colorDepth)
System.Drawing.Bitmap.FromFile(NewFileName).Save(withoutExt & ".png", System.Drawing.Imaging.ImageFormat.Png)
Thanks in advance!
I figured it out. All you need is a using statement so that NewFileName is released before I delete it. I changed the code to this and it worked:
view = window.View
view.Fit()
withoutExt = ChosenFile & "\" & shortForms(13) & FileName
NewFileName = withoutExt & ".tif"
view.SaveAsImage(NewFileName, Width:=width, Height:=height, Resolution:=resolution, ColorDepth:=colorDepth)
Using tempImage = System.Drawing.Bitmap.FromFile(NewFileName)
tempImage.Save(withoutExt & ".png", System.Drawing.Imaging.ImageFormat.Png)
End Using
My.Computer.FileSystem.DeleteFile(NewFileName)
Thanks everyone for your help!

How to open a file from folder where EXE was opened. VB

Part of a program I am making I need to open a file (for example a txt file) from the folder where the program was opened.
The idea is that it can be zipped up and put anywhere without having to place the file in a certain location.
It's got to be Visual Basic and I will really appreciate some help.
I have googled this but found nothing for VB. I'm relatively new to the language.
Thanks, Jack
To open the file do this:
Dim fileName as String = "yourfile.txt"
Dim appDir as String = System.IO.Path.GetDirectoryName( _
System.Reflection.Assembly.GetExecutingAssembly().CodeBase)
Process.Start(appDir & "\" & fileName)
You can use this to get the path to the folder where the currently executing assembly (i.e. the EXE) is located:
System.Reflection.Assembly.GetExecutingAssembly().Location.Substring(0, assembly.Location.LastIndexOf(System.IO.Path.DirectorySeparatorChar))

Printing to a pdf printer programmatically

I am trying to print an existing file to PDF programmatically in Visual Basic 2008.
Our current relevant assets are:
Visual Studio 2008 Professional
Adobe Acrobat Professional 8.0
I thought about getting a sdk like ITextSharp, but it seem like overkill for what I am trying to do especially since we have the full version of Adobe.
Is there a relatively simple bit of code to print to a PDF printer (and of course assign it to print to a specific location) or will it require a the use of another library to print to pdf?
I want to print a previosly created document to a pdf file. In this case it a .snp file that I want to make into a .pdf file, but I think the logic would be the same for any file type.
I just tried the above shell execute, and it will not perform the way I want it to. as it prompts me as to where I want to print and still does not print where I want it to (multiple locations), which is crucial as we create a lot of the same named PDF files (with different data within the PDF and placed in corresponding client folders)
The current process is:
Go to \\report server\client1
create pdf files of all the snp documents in the folder by hand
copy the pdf to \\website reports\client1
then repeat for all 100+ clients takes roughly two hours to complete and verify
I know this can be done better but I have only been here three months and there were other pressing concerns that were a lot more immediate. I also was not expecting something that looks this trivial to be that hard to code.
The big takeaway point here is that PDF IS HARD. If there is anything you can do to avoid creating or editing PDF documents directly, I strongly advise that you do so. It sounds like what you actually want is a batch SNP to PDF converter. You can probably do this with an off-the-shelf product, without even opening Visual Studio at all. Somebody mentioned Adobe Distiller Server -- check your docs for Acrobat, I know it comes with basic Distiller, and you may be able to set up Distiller to run in a similar mode, where it watches Directory A and spits out PDF versions of any files that show up in Directory B.
An alternative: since you're working with Access snapshots, you might be better off writing a VBA script that iterates through all the SNPs in a directory and prints them to the installed PDF printer.
ETA: if you need to specify the output of the PDF printer, that might be harder. I'd suggest having the PDF distiller configured to output to a temp directory, so you can print one, move the result, then print another, and so on.
This is how I do it in VBScript. Might not be very useful for you but might get you started. You need to have a PDF maker (adobe acrobat) as a printer named "Adobe PDF".
'PDF_WILDCARD = "*.pdf"
'PrnName = "Adobe PDF"
Sub PrintToPDF(ReportName As String, TempPath As String, _
OutputName As String, OutputDir As String, _
Optional RPTOrientation As Integer = 1)
Dim rpt As Report
Dim NewFileName As String, TempFileName As String
'--- Printer Set Up ---
DoCmd.OpenReport ReportName, View:=acViewPreview, WindowMode:=acHidden
Set rpt = Reports(ReportName)
Set rpt.Printer = Application.Printers(PrnName)
'Set up orientation
If RPTOrientation = 1 Then
rpt.Printer.Orientation = acPRORPortrait
Else
rpt.Printer.Orientation = acPRORLandscape
End If
'--- Print ---
'Print (open) and close the actual report without saving changes
DoCmd.OpenReport ReportName, View:=acViewNormal, WindowMode:=acHidden
' Wait until file is fully created
Call waitForFile(TempPath, ReportName & PDF_EXT)
'DoCmd.Close acReport, ReportName, acSaveNo
DoCmd.Close acReport, ReportName
TempFileName = TempPath & ReportName & PDF_EXT 'default pdf file name
NewFileName = OutputDir & OutputName & PDF_EXT 'new file name
'Trap errors caused by COM interface
On Error GoTo Err_File
FileCopy TempFileName, NewFileName
'Delete all PDFs in the TempPath
'(which is why you should assign it to a pdf directory)
On Error GoTo Err_File
Kill TempPath & PDF_WILDCARD
Exit_pdfTest:
Set rpt = Nothing
Exit Sub
Err_File: ' Error-handling routine while copying file
Select Case Err.Number ' Evaluate error number.
Case 53, 70 ' "Permission denied" and "File Not Found" msgs
' Wait 3 seconds.
Debug.Print "Error " & Err.Number & ": " & Err.Description & vbCr & "Please wait a few seconds and click OK", vbInformation, "Copy File Command"
Call sleep(2, False)
Resume
Case Else
MsgBox Err.Number & ": " & Err.Description
Resume Exit_pdfTest
End Select
Resume
End Sub
Sub waitForFile(ByVal pathName As String, ByVal tempfile As String)
With Application.FileSearch
.NewSearch
.LookIn = pathName
.SearchSubFolders = True
.filename = tempfile
.MatchTextExactly = True
'.FileType = msoFileTypeAllFiles
End With
Do While True
With Application.FileSearch
If .Execute() > 0 Then
Exit Do
End If
End With
Loop
End Sub
Public Sub sleep(seconds As Single, EventEnable As Boolean)
On Error GoTo errSleep
Dim oldTimer As Single
oldTimer = Timer
Do While (Timer - oldTimer) < seconds
If EventEnable Then DoEvents
Loop
errSleep:
Err.Clear
End Sub
PDFforge offers PDFCreator. It will create PDFs from any program that is able to print, even existing programs. Note that it's based on GhostScript, so maybe not a good fit to your Acrobat license.
Have you looked into Adobe Distiller Server ? You can generate PostScript files using any printer driver and have it translated into PDF. (Actually, PDFCreator does a similar thing.)
What you want to do is find a good free PDF Printer driver. These are installed as printers, but instead of printing to a physical device, render the printer commands as a PDF. Then, you can either ShellExecute as stated above, or use the built in .net PrintDocument, referring the the PDF "printer" by name. I found a couple free ones, including products from Primo and BullZip (freedom limited to 10 users) pretty quickly.
It looks like SNP files are Microsoft Access Snapshots. You will have to look for a command line interface to either Access or the Snapshot Viewer that will let you specify the printer destination.
I also saw that there is an ActiveX control included in the SnapshotViewer download. You could try using that in your program to load the snp file, and then tell it where to print it to, if it supports that functionality.
I had the same challenge. The solution I've made was buying a component called PDFTron. It has an API to send pdf documents to a printer from an unattended service.
I posted some information in my blog about that. Take a look!
How to print a PDF file programmatically???
Try using ShellExecute with the Print Verb.
Here is a blog I found with Google.
http://www.vbforums.com/showthread.php?t=508684
If you are trying to hand generated the PDF (with and SDK or a PDF printer driver) it's not very easy. The PDF format reference is available from Adobe.
The problem is that the file is a mix of ASCII and tables that have binary offsets within the file to reference objects. It is an interesting format, and very extensible, but it is difficult to write a simple file.
It's doable if you need to. I looked at the examples in the Adobe PDF reference, hand typed them in and worked them over till I could get them to work as I needed. If you will be doing this a lot it might be worth it, otherwise look at an SDK.
I encountered a similar problem in a C# ASP.NET app. My solution was to fire a LaTeX compiler at the command line with some generated code. It's not exactly a simple solution but it generates some really beautiful .pdfs.
Imports System.Drawing.Printing
Imports System.Reflection
Imports System.Runtime.InteropServices
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim pkInstalledPrinters As String
' Find all printers installed
For Each pkInstalledPrinters In _
PrinterSettings.InstalledPrinters
printList.Items.Add(pkInstalledPrinters)
Next pkInstalledPrinters
' Set the combo to the first printer in the list
If printList.Items.Count > 0 Then
printList.SelectedItem = 0
End If
End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Try
Dim pathToExecutable As String = "AcroRd32.exe"
Dim sReport = " " 'pdf file that you want to print
'Dim SPrinter = "HP9F77AW (HP Officejet 7610 series)" 'Name Of printer
Dim SPrinter As String
SPrinter = printList.SelectedItem
'MessageBox.Show(SPrinter)
Dim starter As New ProcessStartInfo(pathToExecutable, "/t """ + sReport + """ """ + SPrinter + """")
Dim Process As New Process()
Process.StartInfo = starter
Process.Start()
Process.WaitForExit(10000)
Process.Kill()
Process.Close()
Catch ex As Exception
MessageBox.Show(ex.Message) 'just in case if something goes wrong then we can suppress the programm and investigate
End Try
End Sub
End Class
Similar to other answers, but much simpler. I finally got it down to 4 lines of code, no external libraries (although you must have Adobe Acrobat installed and configured as Default for PDF).
Dim psi As New ProcessStartInfo
psi.FileName = "C:\Users\User\file_to_print.pdf"
psi.Verb = "print"
Process.Start(psi)
This will open the file, print it with default settings and then close.
Adapted from this C# answer