VBA in Excel to Download all PDFs from Webpage - vba

I am needing to download all the PDFs from a webpage and save them into a folder. The PDFs on the webpage are downloaded via different links. Here is the webpage that the PDFs are located: NRCS Engineering Manuals and Handbooks | NRCS North Dakota. I have already created the folder location as shown below:
Function FileFolderExists(strFullPath As String) As Boolean
'Macro Purpose: Check if a folder exists
On Error GoTo EarlyExit
If Not Dir(strFullPath, vbDirectory) = vbNullString Then FileFolderExists = True
EarlyExit:
On Error GoTo 0
End Function
'Creates file folder for saving imported precipitation data
'Used as Micro for "Make Destination Folder" Button
Sub Make_Folder()
Dim objFSO As Object
Dim objFolder As Object
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get the folder object associated with the directory
Set objFolder = objFSO.GetFolder(CurDir())
Range("u11").Select
Selection.ClearContents
' Opens windows explorer for creation of folder to save .pdf files
If Len(Dir(CurDir() & "\Stockwater PDFs", vbDirectory)) = 0 Then MkDir(CurDir() & "\Stockwater PDFs")
If FileFolderExists(CurDir() & "\Stockwater PDFs") Then
MsgBox "Folder Created Sucessfully!!!"
Else
MsgBox "Folder does not exist!"
End If
If FileFolderExists(CurDir() & "\Stockwater PDFs") Then
ActiveSheet.Range("u11").Value = "Stockwater PDFs folder made in the " & objFolder.Name
End If
End Sub
Once the files are downloaded from the website (the first part of the code below that I need help with) I would then list what files are located within the folder they were saved into (which I have already):
Sub GetWebPageDocs()
' Erases all listed files shown to be located in the CurDir()\Stockwater PDFs folder
Range("n17:n50").Select
Selection.ClearContents
Range("n16").Select
' Lists current files located in the CurDir()\Stockwater PDFs folder
Dim objFSO As Object
Dim objFolder As Object
Dim objFile As Object
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get the folder object associated with the directory
Set objFolder = objFSO.GetFolder(CurDir() & "\Stockwater PDFs")
irow = 17
icolumn = 14
ActiveSheet.Range("N16").Value = "The files found in the " & objFolder.Name & " folder are:"
'Loop through the Files collection
For Each objFile In objFolder.Files
ActiveSheet.Cells(irow, icolumn).Value = objFile.Name
irow = irow + 1
icolumn = icolumn
Next
'Clean up!
Set objFolder = Nothing
Set objFile = Nothing
Set objFSO = Nothing
End Sub
If it would be easier to have all the desired PDFs located within a ZIP file that was a single link on the website then that may be the best option.
Thanks in advance for the help.
I have tried the following but get a compile error on Dim xHTTP As MSXML2.XMLHTTP
Dim sUrl As String
Dim xHttp As MSXML2.XMLHTTP
Dim hDoc As MSHTML.HTMLDocument
Dim hAnchor As MSHTML.HTMLAnchorElement
Dim Ret As Long
Dim sPath As String
Dim i As Long
sPath = CurDir() & "\Stockwater PDFs"
sUrl = "http://www.nrcs.usda.gov/wps/portal/nrcs/detail/nd/technical/engineering/?cid=stelprdb1269591"
'Get the directory listing
Set xHttp = New MSXML2.XMLHTTP
xHttp.Open "GET", sUrl
xHttp.send
'Wait for the page to load
Do Until xHttp.readyState = 4
DoEvents
Loop
'Put the page in an HTML document
Set hDoc = New MSHTML.HTMLDocument
hDoc.body.innerHTML = xHttp.responseText
'Loop through the hyperlinks on the directory listing
For i = 0 To hDoc.getElementsByTagName("a").Length - 1
Set hAnchor = hDoc.getElementsByTagName("a").Item(i)
'test the pathname to see if it matches your pattern
If hAnchor.pathname Like "*.pdf" Then
Ret = URLDownloadToFile(0, sUrl & hAnchor.pathname, sPath & hAnchor.pathname, 0, 0)
If Ret = 0 Then
Debug.Print sUrl & hAnchor.pathname & " downloaded to " & sPath
Else
Debug.Print sUrl & hAnchor.pathname & " not downloaded"
End If
End If
Next i

Related

Move File into specified Parent Folder, if format of files are different make child folders as well

Respected Experts,
Today I have received a wonderful code which matches the folders name and move the files in them accordingly. However, i have forgotten to mention that it also requires to create a child folders within the folder if the files which moved are of different formats. there are 4 different formats of files (XML, PDF, RAR ZIP) i.e. If 100 files been moved into a Folder Name "Robert Davidson" and if out of 100 the 50 files are of XML and 50 files are PDF format then it automatically create 2 more child folders named XML and PDF within Robert Davidson and moved the files in them accordingly. I just have a request if anyone can amend the code. I hope I have clarified the question :)
Sub moveFilesToFolder()
Dim lRow As Long
Dim wb As Workbook
Dim ws As Worksheet
Dim ccell As Range
Dim fsO As Object, oFolder As Object, oFile As Object
Dim pathFiles As String, sFolderPath As String, sSource As String, sDestination As String
Set wb = ActiveWorkbook
lRow = Range("A" & Rows.Count).End(xlUp).Row
pathFiles = "E:\Archiving system\" 'could be gotten from wb technically
Set fsO = CreateObject("Scripting.FileSystemObject")
Set oFolder = fsO.GetFolder(pathFiles)
For Each oFile In oFolder.Files 'go through all the files
For Each ccell In Range("A2:A" & lRow).Cells 'go through all the folder-names
'Debug.Print ccell.Value2
'Debug.Print oFile.Name
If InStr(oFile.Name, ccell.Value2) > 0 Then 'if folder name is in file name
sFolderPath = wb.Path & "\" & ccell.Value2 & "\"
If Dir(sFolderPath, vbDirectory) <> "" Then 'if Folder exists
sDestination = sFolderPath & oFile.Name
If Dir(sDestination) = "" Then 'file doesn't exist yet
sSource = pathFiles & oFile.Name
'Debug.Print sSource
'Debug.Print sDestination
Call fsO.MoveFile(pathFiles & oFile.Name, sFolderPath & oFile.Name)
GoTo Skip
End If
Else
MsgBox ("Folder " & ccell.Value2 & " doesn't exist yet")
End If
End If
Next ccell
Skip:
Next oFile
End Sub
Thanks in Advance

VBA: Replace hyperlinks in Word files within every subfolder in a given directory

Recently, my organization changed the name of our One Drive causing hyperlinks in my documents linking to other files in the One Drive to become obsolete.
I am trying to create a VBA script that will ask the user for a directory, go through each word doc in the folder and subfolder (and subfolder's subfolder, etc...) and replace a section of all the hyperlinks. I am new to VBA so please, don't hesitate to point out anything.
So far, this is what I have but I can't get it to change what's in the file, let alone go through all the subfolders:
Sub getDirectory()
Dim xFd As FileDialog
Dim xFdItem As Variant
Dim xFileName As String
Dim FileSystem As Object
Dim HostFolder As String
Set xFd = Application.FileDialog(msoFileDialogFolderPicker)
'Get the directory from user
If xFd.Show = -1 Then
xFdItem = xFd.SelectedItems(1) & Application.PathSeparator
xFileName = Dir(xFdItem & "*.doc*")
End If
'convert directory into string
With xFd
.Filters.Clear
.AllowMultiSelect = False
.Show
Path = .SelectedItems(1)
End With
Set FileSystem = CreateObject("Scripting.FileSystemObject")
DoFolder FileSystem.GetFolder(Path)
End Sub
Sub DoFolder(Folder)
'loop through folders after calling LoopThroughFiles
Dim SubFolder
Dim File
For Each SubFolder In Folder.SubFolders
DoFolder SubFolder
Next
For Each File In Folder.Files
' Operate on each file
Call LoopThroughFiles
Next
End Sub
Sub LoopThroughFiles()
Dim xFd As FileDialog
Dim xFdItem As Variant
Dim xFileName As String
Dim h As Hyperlink
Dim sOld As String
Dim sNew As String
Set xFd = Application.FileDialog(msoFileDialogFolderPicker)
Do While xFileName <> ""
With Documents.Open(xFdItem & xFileName)
Application.ScreenUpdating = False
'Replace this:
sOld = "AAAA"
'With this:
sNew = "BBBB"
'Example: C:\Users\Me\AAAA\file.doc ---> C:\Users\Me\BBBB\file.doc
'replace hyperlinks
For Each h In ActiveDocument.Hyperlinks
h.Address = Replace(h.Address, sOld, sNew)
Next h
Application.ScreenUpdating = True
End With
xFileName = Dir
Loop
End Sub

Query user to choose path

I have a code that reads a folder contents (only other folders) and lists them into excel in a certain range.
The problem is that the path where the code reads contents (/CtrExtrase) is given in the code.
I need the path to be choosen by the user. Tried and failed totally.
My code:
Sub distribuire_foldere()
Dim objFSO As Object
Dim objFolder As Object
Dim objSubFolder As Object
Dim i As Integer
'CLEARS ALL PREVIOUS CONTENT
Sheets("DISTRIBUIRE foldere").Range("A2:A2000").ClearContents
'INSERTS IN CELL THE PATH WHERE THE SCRIPT IS READING
Sheets("DISTRIBUIRE foldere").Range("$E$1").Value = ThisWorkbook.path & "\CtrExtrase"
' LISTS THE CONTENT OF THE CHOOSEN FOLDER
Application.StatusBar = ""
'Create an instance of the FileSystemObject
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get the folder object
On Error GoTo nuexistafolderul
'THIS IS THE PROBLEM, AS I NEED THE USER TO CHOOSE THE PATH:
Set objFolder = objFSO.GetFolder(Application.ThisWorkbook.path & "\CtrExtrase")
i = 1
'loops through each folder in the directory and prints their names
On Error GoTo handleCancel
Application.EnableCancelKey = xlErrorHandler
For Each objSubFolder In objFolder.subfolders
Application.StatusBar = objSubFolder.path & " " & objSubFolder.Name
'OUTPUTS THE FOLDERS NAME
Cells(i + 1, 1) = objSubFolder.Name
i = i + 1
Next objSubFolder
handleCancel:
If Err = 18 Then
MsgBox "Ai anulat procesul inainte de finalizare! Reia procedura!"
nuexistafolderul:
MsgBox "Nu exista folderul pentru extractia contractelor! Extrage intai contractele!"
End If
'CALLS A MODULE THAT INSERTS CERTAIN TEXT INTO A BATCH FILE
Call Module1.batchfile2
End Sub
Use FileDialog with FolderPicker, here it's wrap in a function :
Function GetFolder(Optional strPath As String = "C:\") As String
Dim fldr As FileDialog
Dim sItem As String
GetFolder = vbNullString
Set fldr = Application.FileDialog(msoFileDialogFolderPicker)
With fldr
.Title = "Select a Folder"
.AllowMultiSelect = False
.InitialFileName = strPath
If .Show <> -1 Then GoTo NextCode
sItem = .SelectedItems(1)
End With
NextCode:
GetFolder = sItem
Set fldr = Nothing
End Function
And your code, you can set the default path in GetFolder(ThisWorkbook.Path & "\") :
Sub distribuire_foldere()
Dim objFSO As Object
Dim objFolder As Object
Dim objSubFolder As Object
Dim i As Integer
'CLEARS ALL PREVIOUS CONTENT
Sheets("DISTRIBUIRE foldere").Range("A2:A2000").ClearContents
'INSERTS IN CELL THE PATH WHERE THE SCRIPT IS READING
Sheets("DISTRIBUIRE foldere").Range("$E$1").Value = ThisWorkbook.Path & "\CtrExtrase"
' LISTS THE CONTENT OF THE CHOOSEN FOLDER
Application.StatusBar = ""
'Create an instance of the FileSystemObject
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get the folder object
On Error GoTo nuexistafolderul
'THIS IS THE PROBLEM, AS I NEED THE USER TO CHOOSE THE PATH:
Set objFolder = objFSO.GetFolder(GetFolder(ThisWorkbook.Path & "\"))
i = 1
'loops through each folder in the directory and prints their names
On Error GoTo handleCancel
Application.EnableCancelKey = xlErrorHandler
For Each objSubFolder In objFolder.SubFolders
Application.StatusBar = objSubFolder.Path & " " & objSubFolder.Name
'OUTPUTS THE FOLDERS NAME
Cells(i + 1, 1) = objSubFolder.Name
i = i + 1
Next objSubFolder
handleCancel:
If Err = 18 Then
MsgBox "Ai anulat procesul inainte de finalizare! Reia procedura!"
nuexistafolderul:
MsgBox "Nu exista folderul pentru extractia contractelor! Extrage intai contractele!"
End If
'CALLS A MODULE THAT INSERTS CERTAIN TEXT INTO A BATCH FILE
Call Module1.batchfile2
End Sub

VBA to read input from a file

I am trying to modify the following code, it will merge the Word Documents fine, but I have text file with every line being "*Name*.docx" "*Name2*.docx", etc, I would like the VBA macro to read the text file line by line and merge all the documents that match the pattern, should be 27 documents when done and save each one preferably with the a title that includes the "*Name" tag so I can know which is which. Any help would be greatly appreciated
Sub MergeDocs()
Dim rng As Range
Dim MainDoc As Document
Dim strFile As String
Const strFolder = "C:\test\"
Set MainDoc = Documents.Add
strFile = Dir$(strFolder & "*Name*.docx")
Do Until strFile = ""
Set rng = MainDoc.Range
rng.Collapse wdCollapseEnd
rng.InsertFile strFolder & strFile
strFile = Dir$()
Loop
MsgBox ("Files are merged")
End Sub
I think it's just a matter of adding an extra loop that reads the input file line by line and then uses your loop above.
This example uses the scripting filesystemobject to open the file and read it.
I assume what you've said above is what you actually mean - and the file spec is in the text file. Change the constants to fit your needs
Sub MergeDocs()
Const FOLDER_START As String = "C:\test\" ' Location of inout word files and text file
Const FOLDER_OUTPUT As String = "C:\test\output\" ' send resulting word files here
Const TEST_FILE As String = "doc-list.txt"
Dim rng As Range
Dim MainDoc As Document
Dim strFile As String
Dim strFileSpec As String
Dim strWordFile As String
Dim objFSO As Object ' FileSystemObject
Dim objTS As Object ' TextStream
Set objFSO = CreateObject("Scripting.FileSystemObject")
strFile = FOLDER_START & TEST_FILE
If Not objFSO.FileExists(strFile) Then
MsgBox "File Doesn't Exist: " & strFile
Exit Sub
End If
Set objTS = objFSO.OpenTextFile(strFile, 1, False) 'The one was ForReading but for me it threw an error
While Not objTS.AtEndOfStream
Set MainDoc = Documents.Add
' Read file spec from each line in file
strFileSpec = objTS.ReadLine ' get file seacrh spec from input file
'strFileSpec = "*NAME2*"
strFile = Dir$(FOLDER_START & strFileSpec & ".docx") ' changed strFolder to FOLDER_START
Do Until strFile = ""
Set rng = MainDoc.Range
rng.Collapse wdCollapseEnd
rng.InsertFile FOLDER_START & strFile ' changed strFolder again
strFile = Dir$() ' Get next file in search
Loop
strWordFile = Replace(strFileSpec, "*", "") ' Remove wildcards for saving filename
strWordFile = FOLDER_OUTPUT & strWordFile & ".docx"
MainDoc.SaveAs2 strWordFile
MainDoc.Close False
Set MainDoc = Nothing
Wend
objTS.Close
Set objTS = Nothing
Set objFSO = Nothing
MsgBox "Files are merged"
End Sub

How do you convert a large number of files to docx?

I had a large number of doc files that I wanted to convert to docx files.
I discovered that there was not a really good way to automatically do this conversion.
I have submitted the method I used to do this but perhaps there are now other ways.
I found a few thing that might help:
Microsoft Bulk Converter
Simple Microsoft Word macro
However I was not satisfied with macro provided. I needed something recursive to also convert nested files. So I expanded it to do so.
Sub SaveAllAsDOCX()
'Search #EXT to change the extensions to save to docx
Dim strDocName As String
Dim strPath As String
Dim oDoc As Document
Dim fDialog As FileDialog
Dim intPos As Integer
'Create a folder dialog
Set fDialog = Application.FileDialog(msoFileDialogFolderPicker)
With fDialog
.Title = "Select root folder and click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then
MsgBox "Cancelled By User", , "List Folder Contents"
Exit Sub
End If
'Select root folder
strPath = fDialog.SelectedItems.Item(1)
'Ensure the Folder Name ends with a "\"
If Right(strPath, 1) <> "\" Then strPath = strPath + "\"
End With
'Close any open documents
If Documents.Count > 0 Then
Documents.Close SaveChanges:=wdPromptToSaveChanges
End If
'remove any quotes from the folder string
If Left(strPath, 1) = Chr(34) Then
strPath = Mid(strPath, 2, Len(strPath) - 2)
End If
'begin recusion
recurse (strPath)
End Sub
'This method controls the recusion
Function recurse(folder As String)
'save all the files in the current folder
SaveFilesInFolder (folder)
'get all the subfolders of the current folder
Dim folderArray
folderArray = GetSubFolders(folder)
'Loop through all the non-empty elements for folders
For j = 1 To UBound(folderArray)
If folderArray(j) <> "" Then
'begin recusion on subfolder
recurse (folder & folderArray(j) & "\")
End If
Next
End Function
'Saves all files with listed extensions
Function SaveFilesInFolder(folder As String)
'List of extensions to look for #EXT
Dim strFilename As String
extsArray = Array("*.rtf", "*.doc")
'Loop through extensions
For i = 0 To (UBound(extsArray))
'select the 1st file with the current extension
strFilename = Dir(folder & extsArray(i), vbNormal)
'double check the current extension (don't to resave docx files)
Dim ext As String
ext = ""
On Error Resume Next
ext = Right(strFilename, 5)
If ext = ".docx" Or ext = "" Then
'Don't need to resave files in docx format
Else
'Save the current file in docx format
While Len(strFilename) <> 0
Set oDoc = Documents.Open(folder & strFilename)
strDocName = ActiveDocument.FullName
intPos = InStrRev(strDocName, ".")
strDocName = Left(strDocName, intPos - 1)
strDocName = strDocName & ".docx"
oDoc.SaveAs FileName:=strDocName, _
FileFormat:=wdFormatDocumentDefault
oDoc.Close SaveChanges:=wdDoNotSaveChanges
strFilename = Dir
Wend
End If
Next
strFilename = ""
End Function
'List all the subfolders in the current folder
Function GetSubFolders(RootPath As String)
Dim FS As New FileSystemObject
Dim FSfolder As folder
Dim subfolder As Variant
Set FSfolder = FS.GetFolder(RootPath)
'subfolders is variable length
Dim subfolders() As String
ReDim subfolders(1 To 10)
Dim i As Integer
i = LBound(subfolders)
For Each subfolder In FSfolder.subfolders
subfolders(i) = subfolder.Name
'increase the size of subfolders if it's needed
i = i + 1
If (i >= UBound(subfolders)) Then
ReDim subfolders(1 To (i + 10))
End If
Next subfolder
Set FSfolder = Nothing
GetSubFolders = subfolders
End Function
Yeah I know it's a lot of code. :)