scraping webpage data modification - vba

its just that i have come up with some code, which does the copy paste webpage into text format in my excel sheet.
few modification were required.
Addition modification requires to make a loop through code so that it access the input from Excel(in attachment-Input sheet) and make changes to URL(i noticed in URL that only last word needs to be changed which will be taken from excel file column 1and so on till its find blank).
As, its looping correctly but there is no loop for data pasting henece its dumping all the looped data to one cell.
My basic requirment of this macro is to access link from column A, and paste its data to column B.
Sub Trial()
Dim IE As Object
Dim URL As Range
For Each URL In Range("A1:A3").Cells
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate "1ox11is" & URL
Do Until .readyState = 4: DoEvents: Loop
'Range("B1").Value = .document.body.innerText
'wsSheet.Range("B" & Rows).Value = .document.body.innerText
Sheets("Sheet1").Range("B1").Value = .document.body.innerText
.Quit
End With
Next
End Sub

Assuming that links are in cells A1,A2,A3 etc. and data from websites is supposed to appear next to them in cells B1,B2,B3 etc, change:
Sheets("Sheet1").Range("B1").Value = .document.body.innerText
to:
Sheets("Sheet1").Range("B" & URL.Row).Value = .document.body.innerText

Related

Return Address from Google term search to excel using VBA

I am familiar with StackOverflow but have just recently signed up. I am trying to search a Hotel on google and return the address in Excel using VBA. Below is a photo of what Information I am trying to return from Google. From my research, I was able to find a VBA that allowed me to return the Results stats.
Would it be possible to modify my code and return the box at the top of my google search?
I would really appreciate your help! Below is the VBA I am using to return search results.
Sample Image - Red Roof Inn & Address
Sub SearchGoogle()
Dim ie As Object
Dim form As Variant
Dim button As Variant
Dim LR As Integer
Dim var As String
Dim var1 As Object
LR = Cells(Rows.Count, 1).End(xlUp).Row
For x = 2 To LR
var = Cells(x, 1).Value
Set ie = CreateObject("internetexplorer.application")
ie.Visible = True
With ie
.Visible = True
.navigate "http://www.google.co.in"
While Not .readyState = READYSTATE_COMPLETE
Wend
End With
'Wait some to time for loading the page
While ie.Busy
DoEvents
Wend
Application.Wait (Now + TimeValue("0:00:02"))
ie.document.getElementById("lst-ib").Value = var
'Here we are clicking on search Button
Set form = ie.document.getElementsByTagName("form")
Application.Wait (Now + TimeValue("0:00:02"))
Set button = form(0).onsubmit
form(0).submit
'wait for page to load
While ie.Busy
DoEvents
Wend
Application.Wait (Now + TimeValue("0:00:02"))
Set var1 = ie.document.getElementById("resultStats")
Cells(x, 2).Value = var1.innerText
ie.Quit
Set ie = Nothing
Next x
End Sub
Right now your code loads the page and then loads the value of the resultStats element.
So the section of your code that you will need to alter is:
Set var1 = ie.document.getElementById("resultStats")
Cells(x, 2).Value = var1.innerText
The first step to your problem is to understand the DOM of the HTML page you are attempting to use, in this case Google. I would suggest using a browser to navigate the DOM as it would give you a good idea of what the whole page is doing.
If you are aiming to do this on a macro basis you will need a path through the DOM that will always take you where you want to go. I would suggest having two pages with different searches open so that you can check you hypothesis as you go.
For example the boxes that you refer to seem to be located in a class called kp-header from knowing this you can build out your path through the DOM to return the text value displayed on screen. Again you will need to do your own investigations to find the best stating point for your search as kp-header was just the first potently helpful result I could find.
Although please note that depending on the speed you are loading these webpages you may hit a limit from google as they discourage scraping. What would be a better option to avoid these limits and to avoid yourself having to investigate all of google's DOM would be to try and incorporate one of google's API's

Web scraping - create object for IE

Sub Get_Data()
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate "http://www.scramble.nl/military-database/usaf"
Do While ie.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
SendKeys "03-3114"
SendKeys "{ENTER}"
End Sub
The code below searches for keyboard typed value 03-3114 and gets a data in the table. If I 'd like to search for value which is already in cell A1 and scrape values from table for "Code, Type, CN, Unit" in cell range ("B1:E1") what should I do?
You are using SendKeys which are highly unreliable :) Why not find the name of the textbox and the search button and directly interact with it as shown below?
Sub Get_Data()
Dim ie As Object, objInputs As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate "http://www.scramble.nl/military-database/usaf"
Do While ie.readystate <> 4: DoEvents: Loop
'~~> Get the ID of the textbox where you want to output
ie.Document.getElementById("serial").Value = "03-3114"
'~~> Here we try to identify the search button and click it
Set objInputs = ie.Document.getElementsByTagName("input")
For Each ele In objInputs
If ele.Name Like "sbm" Then
ele.Click
Exit For
End If
Next
End Sub
Note: To understand how I got the names serial and sbm, refer to the explanation given just above the image below.
The code below searches for keyboard typed value 03-3114 and gets a data in the table. If I 'd like to search for value which is already in cell A1 and scrape values from table for "Code, Type, CN, Unit" in cell range ("B1:E1") what should I do?
Directly put the value from A1 in lieu of the hardcoded value
ie.Document.getElementById("serial").Value = Sheets("Sheet1").Range("A1").Value
To get the values from the table, identify the elements of the table by right clicking on it in the browser and clicking on "Inspect/Inspect Element(In Chrome it is just Inspect)" as shown below.
I can give you the code but I want you to do it yourself. If you are still stuck then update the question with the code that you tried and then we will take it from there.
Interesting read: html parsing of cricinfo scorecards

How can I grab URLs contained in webpages?

I'm trying to get URLs from within an external webpage using a macro. Here's my current code:
Sub GoToWebSite()
Dim IE As Object
Application.ScreenUpdating = False
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Navigate "www.website.com/careers/"
.Visible = True
End With
Application.ScreenUpdating = True
Set IE = Nothing
End Sub
From here, I want to supply the macro with a particular URL, tell it to search for particular text within www.website.com/careers/, then tell it to grab the hyperlink corresponding to the text, and paste the hyperlink in a cell in a spreadsheet. So for example, search for "Sales" then paste the URL corresponding to "Sales" in a particular cell.
There's no way to select an element based on its innerText, so you'll need to iterate the anchor/links node list and check each to see if it's the one you're looking for.
For example:
Dim objLink
For Each objLink in IE.document.getElementsByTagName("a")
If StrComp(objLink.innerText, "sales", vbTextCompare) = 0 Then
' Found the link matching our text. Display its URL...
Debug.Print objLink.href
Exit For
End If
Next

vba code to download multiple files from website

I was wondering that I can get some help with this code,
I am trying to download files from a website, the code I am using is
Public Sub DownloadFile()
Dim weblink As String
weblink = "https://documentflowmanagerus.abcabc.com/DFM_HBTC/Default.aspx?page=RetrDocView.aspx&URN="
Set Rng = Range("A1:A100")
For Each cell In Rng
'test if cell is empty
If cell.Value <> "" Then
link = weblink & cell.Value
ActiveWorkbook.FollowHyperlink Address:=link, NewWindow:=False
End If
Next
End Sub
this code is working alright, on the column A there is a list of all URN numbers where it loops,
this code loops through column A and goes to hyperlink to download file
the trouble is I have to save all the files manually , since each time it loops it opens a new tab in the browser
I am looking for a way that it closes the tab automatically and download file without manually saving it

Code returning 90 empty values when pulling hyperlinks from a document

I am particularly new to coding, not to mention VBA. After a week of really cracking down on learning VBA, I've started to get the hang of it. At the moment, I'm trying to put together a code that will pull the hyperlinks (both addresses and names) out of a word document (eventually word, excel, and power point files), and dump them into the excel file I run the code from. It also dumps the file path and name at the top of the list. I can run the code and pull links from 1 file at a time, and the code pops it out after the end of the last filled line. It will save me endless amounts of time when I have to update links.
Sub ExtractWordLinks()
'the following code gets and sets an open file command bar for word documents
Dim Filter, Caption, SelectedFile As String
Dim Finalrow As String
Filter = "docx Files (*.docx),*.docx, doc Files (*.doc),*.doc, xlsm Files (*.xlsx),*.xlsx"
Caption = "Please Select .doc, .docx, .xlsx files only, " & TheUser
SelectedFile = Application.GetOpenFilename(Filter, , Caption)
'check if value is blank if it is exit
Finalrow = Cells(Rows.Count, 1).End(xlUp).Row
If (Trim(SelectedFile) = "") Then
Exit Sub
Else
'setting up the inital word application object
Set wordapp = CreateObject("word.Application")
'opening the document that is defined in the open file dialog
wordapp.documents.Open (SelectedFile)
'ability to change wether it needs to burn cycles updating the UI
wordapp.Visible = False
'declare excel sheet
Dim xlsSheet As Excel.Worksheet
'set active sheet
Set xlsSheet = Application.ActiveSheet
Dim i As Integer
i = 1
'MsgBox (wordapp.ActiveDocument.Hyperlinks.Count)
For i = 1 To wordapp.ActiveDocument.Hyperlinks.Count
'puts the title of the document in the formatted cells
'xlsSheet.Cells(Finalrow + 1, 1).Value = wordapp.ActiveDocument.Path & "\" & wordapp.ActiveDocument.Name
'formats the file name cell to be a bit easier to discern from the listing.
Range(Cells(Finalrow + 1, 1), Cells(Finalrow + 1, 2)).Font.Bold = True
Range(Cells(Finalrow + 1, 1), Cells(Finalrow + 1, 2)).Merge
'save the links address.
xlsSheet.Cells(Finalrow + i, 1).Value = wordapp.ActiveDocument.Hyperlinks(i).Address
'save the links display text
xlsSheet.Cells(Finalrow + i, 2).Value = wordapp.ActiveDocument.Hyperlinks(i).TextToDisplay
Next
wordapp.ActiveDocument.Close SaveChanges:=wdDoNotSaveChanges
wordapp.Quit SaveChanges:=wdDoNotSaveChanges
End If
End Sub
My problem, is that when I run this code on a simple sample file with 3 or so hyperlinks in it across a single page, it returns everything exactly how I want, with the file path/name at the top and all the links in the page directly below it (address in one column, displayed text in the other). However, when I run it on one of the files I am writing this code for (a 95+ page .docx file with ~30 links), it prints out the path/file in the formatted section, and then drops 90 (90 every time) blank lines before printing out the path/file a second time, and then all the links in the document. It does it perfectly, except for the inexplicable second path/file (even there if I comment out the bit I put in) and the 90 blank entries.
Can anyone explain what's going on, or should I try to figure out a way to just bypass the issue by removing my own link code, and including a bit that removes all blank lines?