Modify data retrieval macro - vba

I have the below code that pulls a table from two pages that require a login. The code opens IE, goes to the login page, puts in the credentials, then pulls the 2 tables.
However, if IE is already logged in with a user, it directly takes you to this page (and thus the code errors out because there is no login fields):
https://www.example.com/taskprocessing/manage.jsp
I need to add an IF statement that if it lands on this page, to click on this link to log out, then proceed with logging in with the credentials:
https://www.example.com/taskprocessing/logout.jsp
Sub GetTable()
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.Navigate "https://www.example.com/taskprocessing/login.jsp"
Do Until .ReadyState = 4
DoEvents
Loop
.Document.all.Item("Username").Value = "username123"
.Document.all.Item("Password").Value = "password123"
.Document.forms(0).submit
End With
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;https://www.example.com/taskprocessing/report_pending_tasks.jsp", Destination:=Range("J1"))
.WebSelectionType = xlAllTables
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;https://www.example.com/taskprocessing/report_task_processing_stats.jsp", Destination:=Range("A1"))
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = """user"""
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End With
End Sub

You can just skip the login process if not needed. To do this, you might re-structure your top-code as follows:
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.Navigate "https://www.example.com/taskprocessing/login.jsp"
Do Until .ReadyState = 4
DoEvents
Loop
On Error Resume Next 'this will help to understand if there was an error when trying retrieving the UserName field
Set userField = .Document.getElementById("Username")
If Err.Number = 0 Then 'if there was no error when trying to search for the username...
.Document.all.Item("Username").Value = "username123"
.Document.all.Item("Password").Value = "password123"
.Document.forms(0).submit
End If
On Error GoTo 0 'turn-off the error handling once again
End With
The lines I added are:
On Error Resume Next
...to avoid the code breaking if there's an error when looking for a non-existing element;
Set userField = .Document.getElementById("Username")
...to see if the page is logged-in or logged-out. If it's logged-out, there will be no error while doing this. If it is not, then this will cause a "object does not exist error".
If Err.Number = 0 Then
...so if there was no error while doing this, we proceed with the login... otherwise we just go and keep on doing what we were doing.
On Error GoTo 0
...finally, we turn off the error handling so if there's any other error in the page you will keep on being informed regularly.

Thanks! I needed it to log out, so I added an Else statement.
On Error Resume Next 'this will help to understand if there was an error when trying retrieving the UserName field
Set userField = .Document.getElementById("Username")
If Err.Number = 0 Then 'if there was no error when trying to search for the username...
.Document.all.Item("Username").Value = "username123"
.Document.all.Item("Password").Value = "password123"
.Document.forms(0).submit
Else
.Navigate "https://www.example.com/taskprocessing/logout.jsp"
Do While ie.Busy: DoEvents: Loop
Do Until .ReadyState = 4
DoEvents
Loop
.Document.all.Item("Username").Value = "username123"
.Document.all.Item("Password").Value = "password123"
.Document.forms(0).submit
End If
On Error GoTo 0 'turn-off the error handling once again

Related

Excel VBA: message box error in connection

There is a website, that can create thousands of .csv files that contain tables. The CSV files are based on the information the user asks.
I created an excel file with VBA script. The user enters the data to the excel file, then the VBA script generates the correct URL, and tries to get the required data from the .csv in that URL.
In my excel file, the user can ask for hundreds of .csv tables, and I want the user to be able to enter the hundreds of information kinds he wants, then run the VBA script and leave the computer to work on it.
I first do URL check, and if it is ok, I try to get the data in the .csv file in that URL.
most of the times, it works completely fine. Works fine for a case when HttpExists returns TRUE, and also works fine for a case that HttpExists returns FALSE (it just skips the current active cell and goes to the next one).
But there are a few times, that the URL check answers that the URL is fine (HttpExists returns TRUE), but when it tried to get the data, it opens a message box that says "sorry, we couldn't open 'url address' ". (message box for Run Time Error 1004) and then the VBA scripts terminates.
I would like to know how can I fix it. How can I just skip the current URL in case of error, instead of showing a message box that terminates the script run?
Sub my_method()
On Error GoTo handleCancel
Dim errorFlag As Boolean
.......
Do Until ActiveCell.Value = ""
errorFlag = True
URLstring= ....
........
If Not HttpExists(URLstring) Then
symbolStatus = "Data unavailable"
logAddress = updateLog("invalid URL " & ActiveCell.Value, logAddress, debugString)
Application.DisplayAlerts = False
Sheets(currentSymbol).Delete
Application.DisplayAlerts = True
Else
With Sheets(currentSymbol).QueryTables.Add(Connection:= _
"TEXT;" & URLstring _
, Destination:=Sheets(currentSymbol).Range(dataAddress))
.Name = ""
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 850
.TextFileStartRow = 2
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(2, 2, 2, 2, 2, 2, 9)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
.......
errorFlag = False
handleCancel:
ActiveCell.Offset(1, 0).Select
If errorFlag = True Then
symbolStatus = "Data unavailable"
logAddress = updateLog("invalid URL " & ActiveCell.Value,
logAddress, debugString)
Application.DisplayAlerts = False
Sheets(currentSymbol).Delete
Application.DisplayAlerts = True
End If
Loop
End Sub
Function HttpExists(sURL As String) As Boolean
Dim oXHTTP As Object
Set oXHTTP = CreateObject("MSXML2.ServerXMLHTTP")
If Not UCase(sURL) Like "HTTP:*" Then
sURL = "http://" & sURL
End If
On Error GoTo haveError
oXHTTP.Open "HEAD", sURL, False
oXHTTP.send
HttpExists = IIf(oXHTTP.status = 200, True, False)
Exit Function
haveError:
HttpExists = False
End Function
It sometimes goes out with a messagebox of Run Time Error 1004, and it happens in the line of:
With Sheets(currentSymbol).QueryTables.Add(Connection:= _
"TEXT;" & URL _
I would like it just to skip the current cell in a case of error, and go on with the next cell, without any messagebox and without crashing.
How can I fix it?
Thanks
See if this error handling structure works better. I eliminated parts that are unnecessary and adjusted to what should work, but I am not sure what code is in the ..... sections. Anyway, this should at least give you a general understanding. I commented a few things to explain more clearly in code.
Option Explicit
Sub my_method()
Do Until ActiveCell.Value = ""
'URLstring= ....
If Not HttpExists(URLstring) Then
LogError 'create sub since you do same thing twice
Else
On Error GoTo handleBadURL 'now is only time you need to move to actual error handling
With Sheets(currentSymbol).QueryTables.Add(Connection:= _
"TEXT;" & URLstring _
, Destination:=Sheets(currentSymbol).Range(dataAddress))
.Name = ""
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 850
.TextFileStartRow = 2
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(2, 2, 2, 2, 2, 2, 9)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
On Error Go To 0 'reset error handling (doesn't matter so much here, but good practice to always reset when not needed
End If
ActiveCell.Offset(1, 0).Select
Loop
Exit Sub 'leave sub when all is done (so it doesn't move to error handling code below
handleBadURL:
LogError 'created sub since you do same thing twice
Resume Next 'this statement will allow code to continue from point of error onward (the loop will keep going
End Sub
Sub LogError()
symbolStatus = "Data unavailable"
logAddress = updateLog("invalid URL " & ActiveCell.Value, logAddress, debugString)
Application.DisplayAlerts = False
Sheets(currentSymbol).Delete
Application.DisplayAlerts = True
End Sub
You need to add error handling to your code. Server timeout notices doesn't reflect an issue with your coding, but an issue with the server (which is out of your control, unless of course, you entered an incorrect URL).
In your code, you need to place On Error GoTo ErrHandler, make sure you have the error number, and since you are wanting to just resume to the next cell you can do something like this:
Sub Test()
On Error GoTo ErrHandler
'Your code
Exit Sub
ErrHandler:
If Err.Number = 123456 Then
'Get the code ready for the next cell, if necessary
Resume Next
Else
'Other Errs
End If
End Sub

VBA Error Handling in QueryTables Loop

I am importing multiple csv files from Google Finance and saving the data to a single worksheet. Whenever a URL is not valid I would like it to go to an error handler where it essentially says "this info could not be found" and then continue back to grab the next csv file from the web.
The code works fine UNTIL it gets to an invalid url, and executes the error handler. The error handler does its job the first time, but then when the loop continues with a new URL, it errors again even when the url is valid.
Once the querytable .Refresh errors once, each url after will also error even if it is a valid url. Is there a way to clear the error and move on to the next URL?
Code:
Sub getQuotes()
For i = 1 To 3775
sym = Worksheets("2").Range("C" & i)
lookup = "TEXT;http://www.google.com/finance/historical?q=" + sym + "&startdate=Jun+1%2C+2016&enddate=Aug+5%2C+2016&num=30&authuser=1&output=csv"
With Worksheets("Raw Data").QueryTables.Add(Connection:=lookup, Destination:=Worksheets("Raw Data").Range("A2"))
.Name = _
" "
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 775
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(1)
.TextFileDecimalSeparator = ","
.TextFileTrailingMinusNumbers = True
On Error GoTo Err1:
.Refresh BackgroundQuery:=True
End With
Next
Err1:
Worksheets("Raw Data").Range("A:F").EntireColumn.Insert
Worksheets("Raw Data").Range("A2") = sym + " data could not be extracted"
Resume Next
End Sub
Although I was not able to reproduce the specific problem you've referenced, you're missing Exit Sub before your Error handling section. Without it, the code under Err1: gets executed after your For loop finishes, regardless of whether there was an error.
Simply add Exit Sub on the line above your Err1: and let 'er rip. My code worked with an error in the middle of my sym list with this change.

Copy table (with no id) from password protected website

I would like to retrieve a table from my bank's website, and paste it in the same format in an excel sheet.
Thanks to a few posts on this very useful website, I managed to write a vba code that automatically :
Open the Internet Explorer windows
Fills in the Log-in and Password details
Clicks on the submit Button, thus connecting me to my bank account
Clicks on the "previous day transaction report" button, thus generating the report
And finally shows this report in Internet Explorer
Now my problem is :
How can I copy this report (which is a table) and paste it in an excel sheet, with the Exact same format as it appears on the website ?
On the html code of the website page, this table has no "id". But it has a class, called "report". And it is the only one who has this class.
I suppose I have to use this :
IE.document.getElementsByClassName("report")(0).outerHTML
But I am not sure how to use it... Basically the simplest thing for me would be to write a last portion of code that copies the report to the clipboard, and paste it with the same format on the spreadsheet.
Any idea how I could do that?
i recorded a macro to get the code, using this should copy the table for you. I think its the best way for you. You have to change the URL and the webtable. This will copy it.
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://THE URL AFTER LOGIN WITH THE TABLE", _
Destination:=Range("$A$1"))
.CommandType = 0
.Name = "SEQUENCE OF URL"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "11"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
Lets make this work!! :) step by step.
You are right. Fill a form using it.
Try open the page, press F12 and use the console to make sure everything is right.
To fill
document.getElementById("Id").value = "xxxxx"
document.getElementById("Id2").value = "pass"
To click
document.getElementById("name").click()
Set objie = CreateObject("InternetExplorer.Application")
objie.Visible = True
With objie
.AddressBar = False
.StatusBar = False
.MenuBar = False
.Toolbar = 0
.NAVIGATE "http://www.loginpage.com"
While .BUSY
Wend
Do While .READYSTATE <> 4: DoEvents: Loop
'first you login
Set htmldoc = .document
htmldoc.getElementById("UserId").Value = "login"
htmldoc.getElementById("Password").Value = "pass"
htmldoc.getElementById("submit-btn").Click
End With
Try using this one, its first step... Makes sense to you? Im new here, so i will do my best

Using Data from Web, how to login?

I have a macro that uses the "Data from Web" function. I have logged into the website (in Internet Explorer) that I'm pulling data from - but the results I get in Excel just keep telling me I'm not logged in.
Is there a special way to login via Excel for "Data from Web"? I know it works, as I used the Macro Recorder to learn how Excel gets the data - and doing so manually, the website asked me to login in the "Excel IE Browser window"...but it's been over an hour, so I was logged out. How do I log in again to use it?
here's the applicable data pull code if it helps (the URL works fine, once logged in):
With ActiveSheet.QueryTables.Add(Connection:="URL;" & theURL, Destination:=webInfoWS.Range("$A$2"))
.name = cel.Value & " hex"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "3"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
For the mean time, I found a work around (mainly from this thread):
Adding this after Debug.Print "Opening " & theURL and just before With ActiveSheet.QueryTables.Add(...)
''' Log in to the web stuff
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.Navigate theURL
Do Until .READYSTATE = 4
DoEvents
Loop
If Left(.Document.Title, 5) <> "Welcome!" Then
.Document.all.Item("username").Value = "My User Name"
.Document.all.Item("password").Value = "MyPassword"
.Document.forms(0).submit
End If
' .Quit
End With
''''''
What this does is actually opens the IE Window, then (automated) puts in my username and password, and submits.
However, if I run the macro again (meaning that I already did log in), this gives me an error, because there is no username/password entry form.
Thoughts on how to get around - use On Error Goto Next, but I don't like using that, but it might be the best option. I think I'll try instead to get the Window title (via HTML) and check if that is the login form or not...
Edit: A note on how to know that the .Item("____") is "username" and "password". That just comes from the HTML Input ID tag:
You'll notice in the post I found this in, the text in the .Item() is different - I assume because that HTML ID is different as well.
EDIT 2: This doesn't work! I'm able to log in, see the web page in IE, but when I get to .Refresh BackgroundQuery:=False, the resulting information is the text saying I need to login :/

Fetch specific table only from website into Excel

I need to fetch the table from http://www.zillow.com/homes/comps/67083361_zpid/ into Excel using VBA. I just want the table, nothing else. But when I'm using:
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Visible = True
.Navigate "http://www.zillow.com/homes/comps/67083361_zpid/"
Do While .ReadyState <> 4: DoEvents: Loop
Debug.Print .document.Body.outerText
End With
it gives me text like:
4723 N 63rd Dr$63,50008/17/201241.752,0747,6751972$360.11
for each product which I can't analyze and store into different cells of Excel.
So is there a way I can fetch the page data in a manageable way. I am OK if I need to traverse a loop for this. Also I can do additional processing to fill the row data into Excel properly.
I'd use the below since I find query tables slow and IE excruciatingly slow ;)
Sub GetData()
Dim x As Long, y As Long
Dim htm As Object
Set htm = CreateObject("htmlFile")
With CreateObject("msxml2.xmlhttp")
.Open "GET", "http://www.zillow.com/homes/comps/67083361_zpid/", False
.send
htm.body.innerhtml = .responsetext
End With
With htm.getelementbyid("comps-results")
For x = 0 To .Rows.Length - 1
For y = 0 To .Rows(x).Cells.Length - 1
Sheets(1).Cells(x + 1, y + 1).Value = .Rows(x).Cells(y).innertext
Next y
Next x
End With
End Sub
I have done it using following code:
Sub FetchData()
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://www.zillow.com/homes/comps/67083361_zpid", Destination:=Range( _
"$A$1"))
.Name = "67083361_zpid"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub