How can I grab URLs contained in webpages? - vba

I'm trying to get URLs from within an external webpage using a macro. Here's my current code:
Sub GoToWebSite()
Dim IE As Object
Application.ScreenUpdating = False
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Navigate "www.website.com/careers/"
.Visible = True
End With
Application.ScreenUpdating = True
Set IE = Nothing
End Sub
From here, I want to supply the macro with a particular URL, tell it to search for particular text within www.website.com/careers/, then tell it to grab the hyperlink corresponding to the text, and paste the hyperlink in a cell in a spreadsheet. So for example, search for "Sales" then paste the URL corresponding to "Sales" in a particular cell.

There's no way to select an element based on its innerText, so you'll need to iterate the anchor/links node list and check each to see if it's the one you're looking for.
For example:
Dim objLink
For Each objLink in IE.document.getElementsByTagName("a")
If StrComp(objLink.innerText, "sales", vbTextCompare) = 0 Then
' Found the link matching our text. Display its URL...
Debug.Print objLink.href
Exit For
End If
Next

Related

Sub to find text in a Word document by specified font and font size

Goal: Find headings in a document by their font and font size and put them into a spreadsheet.
All headings in my doc are formatted as Ariel, size 16. I want to do a find of the Word doc, select the matching range of text to the end of the line, then assign it to a variable so I can put it in a spreadsheet. I can do an advanced find and search for the font/size successfully, but can't get it to select the range of text or assign it to a variable.
Tried modifying the below from http://www.vbaexpress.com/forum/showthread.php?55726-find-replace-fonts-macro but couldn't figure out how to select and assign the found text to a variable. If I can get it assigned to the variable then I can take care of the rest to get it into a spreadsheet.
'A basic Word macro coded by Greg Maxey
Sub FindFont
Dim strHeading as string
Dim oChr As Range
For Each oChr In ActiveDocument.Range.Characters
If oChr.Font.Name = "Ariel" And oChr.Font.Size = "16" Then
strHeading = .selected
Next
lbl_Exit:
Exit Sub
End Sub
To get the current code working, you just need to amend strHeading = .selected to something like strHeading = strHeading & oChr & vbNewLine. You'll also need to add an End If statement after that line and probably amend "Ariel" to "Arial".
I think a better way to do this would be to use Word's Find method. Depending on how you are going to be inserting the data into the spreadsheet, you may also prefer to put each header that you find in a collection instead of a string, although you could easily delimit the string and then split it before transferring the data into the spreadsheet.
Just to give you some more ideas, I've put some sample code below.
Sub Demo()
Dim Find As Find
Dim Result As Collection
Set Find = ActiveDocument.Range.Find
With Find
.Font.Name = "Arial"
.Font.Size = 16
End With
Set Result = Execute(Find)
If Result.Count = 0 Then
MsgBox "No match found"
Exit Sub
Else
TransferToExcel Result
End If
End Sub
Function Execute(Find As Find) As Collection
Set Execute = New Collection
Do While Find.Execute
Execute.Add Find.Parent.Text
Loop
End Function
Sub TransferToExcel(Data As Collection)
Dim i As Long
With CreateObject("Excel.Application")
With .Workbooks.Add
With .Sheets(1)
For i = 1 To Data.Count
.Cells(i, 1) = Data(i)
Next
End With
End With
.Visible = True
End With
End Sub

Word VBA Getting page number of a specific footer in section

Couldn't find the answer I was looking for.
I want to get the current page number String including its format.
For example: Some sections may have chapter identifier (1-1), some are in Roman style, etc..
My hope was to get the selection of the specific footer, then loop through the fields and get the Page field data (Output is the String I want).
So far as I can see, there is no option to loop through the footers of a given section, just get the general template and try working with it.
I'm aware of wdActiveEndAdjustedPageNumber from Selection.Range.Information, but it just gives me partial information.
Am I wrong? Is there a way to work with a specific footer I choose?
If not, can you guide me how to get the following data:
Closest chapter number value
Getting the page number value of a special format such as Roman, Alphabetical font (Meaning applying the page format on the wdActiveEndAdjustedPageNumber)
Thanks.
Edit for clarification:
In my word template, the Heading 1 style creates the following header: Chapter 1, followed by Chapter 2 and so on.
In page number format, there is an option to include the current Chapter value to the page number.
For example: Assuming the following setup
will result with these pages in the { PAGE } field: 1-1, 1-2, 1-3, ...
My goal is to somehow get this entire "value" for a specific page footer.
Here is a code snippet which won't work properly:
Sub getPageFieldInFooter()
' get current section number
Dim sectionNum As Integer
sectionNum = Selection.Range.Information(wdActiveEndSectionNumber)
'select first page footer, loop through its fields and find Page field
ActiveDocument.Sections(sectionNum).Footers(wdHeaderFooterPrimary).Range.Select
Dim f As Field
For Each f In Selection.Fields
If f.Type = wdFieldPage Then
' do something with the page data
MsgBox f.Data
End If
Next f
End Sub
The output of such a method is '1-1'
The reason it won't work is because it can retrieve the first page only (or the second using wdHeaderFooterEvenPages).
Same goes for Roman number format, or any other from that list.
For the following page number settings, I wish to get the "value" in a specific footer.
The code above will return the values for first or second page, and that's it.
Is there a way to access any footer in the document and perform my code example?
If not, how can I get the page number "value" for any footer I choose?
Hope this is clearer.
The following is working for me, although I'm not certain how reliable it is. Apparently, if I query the Footer (or Header) of the current selection in the document it will return the information for the Footer (or Header) of that page.
Things get very complicated as soon as you start working with multiple sections and Different First Page. I've done some testing for that in the code below, but I wouldn't swear it's "production code". However, it should give you a starting place.
Sub GetFormattedPageNumberFromSelection()
Dim sel As word.Selection
Dim sec As word.Section
Dim r As word.Range, rOriginal As word.Range
Dim fld As word.Field
Dim secCurrIndex As Long
Dim sNoPageNumber As String
Set sel = Selection
If Not sel.InRange(sel.Document.content) Then Exit Sub
Set sec = sel.Sections(1)
If Not sec.Footers(wdHeaderFooterFirstPage).exists Then
Set r = sec.Footers(wdHeaderFooterPrimary).Range
Else
Set r = sel.Range
Set rOriginal = r.Duplicate
secCurrIndex = sec.index
If secCurrIndex <> 1 Then
sel.GoToPrevious wdGoToPage
If sel.Sections(1).index = secCurrIndex Then
Set r = sec.Footers(wdHeaderFooterPrimary).Range
Else
Set r = sec.Footers(wdHeaderFooterFirstPage).Range
End If
rOriginal.Select 'return to original selection
ElseIf r.Information(wdActiveEndPageNumber) = 1 Then
Set r = sec.Footers(wdHeaderFooterFirstPage).Range
Else
Set r = sec.Footers(wdHeaderFooterPrimary).Range
End If
End If
For Each fld In r.Fields
sNoPageNumber = "No page number"
If fld.Type = wdFieldPage Then
Debug.Print fld.result
sNoPageNumber = ""
Exit For
End If
Next
If Len(sNoPageNumber) > 0 Then Debug.Print sNoPageNumber
End Sub
...and sometimes we don't see the simplest way.
Insert a Page field at the current selection, read the result, then delete it again:
Sub GetFormattedPageNumberFromSelection2()
Dim rng As word.Range
Dim fld As word.Field
Set rng = Selection.Range
Set fld = rng.Fields.Add(rng, wdFieldPage)
Debug.Print fld.result
fld.Delete
End Sub
What you haven't told us is how you're 'choosing' the page you want the reference for. Assuming it's based in whatever page is selected/displayed, you could use something like the following for a page header:
Sub Demo()
Application.ScreenUpdating = False
Dim Rng As Range, Fld As Field
ActiveWindow.ActivePane.View.SeekView = wdSeekCurrentPageHeader
For Each Fld In Selection.HeaderFooter.Range.Fields
If Fld.Type = wdFieldPage Then
MsgBox Fld.Result
Exit For
End If
Next
ActiveWindow.ActivePane.View.SeekView = wdSeekMainDocument
Application.ScreenUpdating = True
End Sub
Unfortunately, wdSeekCurrentPageFooter returns the next page's footer!, so you can't use that for the current footer. The following, however, should work wherever the PAGE # field is located:
Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, Fld As Field, bExit As Boolean: bExit = False
With ActiveWindow.ActivePane.Pages(Selection.Information(wdActiveEndAdjustedPageNumber))
For i = 1 To .Rectangles.Count
With .Rectangles(i).Range
For Each Fld In .Fields
If Fld.Type = wdFieldPage Then
MsgBox Fld.Result
bExit = True: Exit For
End If
Next
End With
If bExit = True Then Exit For
Next
End With
Application.ScreenUpdating = True
End Sub

scraping webpage data modification

its just that i have come up with some code, which does the copy paste webpage into text format in my excel sheet.
few modification were required.
Addition modification requires to make a loop through code so that it access the input from Excel(in attachment-Input sheet) and make changes to URL(i noticed in URL that only last word needs to be changed which will be taken from excel file column 1and so on till its find blank).
As, its looping correctly but there is no loop for data pasting henece its dumping all the looped data to one cell.
My basic requirment of this macro is to access link from column A, and paste its data to column B.
Sub Trial()
Dim IE As Object
Dim URL As Range
For Each URL In Range("A1:A3").Cells
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate "1ox11is" & URL
Do Until .readyState = 4: DoEvents: Loop
'Range("B1").Value = .document.body.innerText
'wsSheet.Range("B" & Rows).Value = .document.body.innerText
Sheets("Sheet1").Range("B1").Value = .document.body.innerText
.Quit
End With
Next
End Sub
Assuming that links are in cells A1,A2,A3 etc. and data from websites is supposed to appear next to them in cells B1,B2,B3 etc, change:
Sheets("Sheet1").Range("B1").Value = .document.body.innerText
to:
Sheets("Sheet1").Range("B" & URL.Row).Value = .document.body.innerText

VBA checkbox website interaction

I'm currently experiencing a problem with VBA. I'm trying to tick checkbox on an external website. Here is a html snippet that I'm working with:Html code from external website
I have redacted any confidential information from the snippet.
Here is my VBA code:
Set elements = objIE.document.getElementsByTagName("input")
For Each ele In elements
ele.toString
If ele.Value = "xxx" Then ele.Click
Next
So in this code the elementsvariable is an object and the loop variable ele is not being populated at all. I need to compare that the value in the check box contains value xxx. I'm not too experienced in VBA.
Any help would be appreciated
Thanks
I really don't know if that works. But in VBA the
.Value
property of a Checkbox is eighter true or false. The property of the text that is shown is:
.Caption
This code is looping properly:
Sub test()
Set objIE = CreateObject("InternetExplorer.Application")
Dim elements, ele
objIE.Visible = True
For Each ip In Sheets("Sheet1").Range("A2:A13").Value
objIE.Navigate ip
Do Until Not objIE.Busy And objIE.ReadyState = 4
DoEvents
Loop
Set elements = objIE.document.getElementsByTagName("input")
For Each ele In elements
ele.toString
If ele.Value = "xxx" Then ele.Click
Next
Next
End Sub

Web scraping - create object for IE

Sub Get_Data()
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate "http://www.scramble.nl/military-database/usaf"
Do While ie.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
SendKeys "03-3114"
SendKeys "{ENTER}"
End Sub
The code below searches for keyboard typed value 03-3114 and gets a data in the table. If I 'd like to search for value which is already in cell A1 and scrape values from table for "Code, Type, CN, Unit" in cell range ("B1:E1") what should I do?
You are using SendKeys which are highly unreliable :) Why not find the name of the textbox and the search button and directly interact with it as shown below?
Sub Get_Data()
Dim ie As Object, objInputs As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate "http://www.scramble.nl/military-database/usaf"
Do While ie.readystate <> 4: DoEvents: Loop
'~~> Get the ID of the textbox where you want to output
ie.Document.getElementById("serial").Value = "03-3114"
'~~> Here we try to identify the search button and click it
Set objInputs = ie.Document.getElementsByTagName("input")
For Each ele In objInputs
If ele.Name Like "sbm" Then
ele.Click
Exit For
End If
Next
End Sub
Note: To understand how I got the names serial and sbm, refer to the explanation given just above the image below.
The code below searches for keyboard typed value 03-3114 and gets a data in the table. If I 'd like to search for value which is already in cell A1 and scrape values from table for "Code, Type, CN, Unit" in cell range ("B1:E1") what should I do?
Directly put the value from A1 in lieu of the hardcoded value
ie.Document.getElementById("serial").Value = Sheets("Sheet1").Range("A1").Value
To get the values from the table, identify the elements of the table by right clicking on it in the browser and clicking on "Inspect/Inspect Element(In Chrome it is just Inspect)" as shown below.
I can give you the code but I want you to do it yourself. If you are still stuck then update the question with the code that you tried and then we will take it from there.
Interesting read: html parsing of cricinfo scorecards