I wrote a simple macro function in VBA for Excel to extract text appearing in a particular location in an HTML document, first retrieving the HTML document from a URL value in another cell. The macro function itself is not important, except for the fact that it sends an HTTP request and creates an HTML file object, which I fear will cause Excel to crash if I paste, say, a column of 100 or more URLs and it starts trying to calculate all the values at once. I can see that it stops and churns for a moment if I drag the formula down 10 cells where there is already a column of URLs. Is there a setting to force Excel to calculate one formula at a time, so that it may take longer but is less likely to freeze up or crash?
Update: I incorporated a static collection variable into the function to at least avoid repeated slowdowns retrieving the same HTML in the same worksheet:
Function GetUSPatentAbstract(ByVal url As String) As String
Static colAbstract As New Collection
Dim abstract As String
On Error Resume Next
abstract = colAbstract(url)
`If there is no abstract for the URL in the collection yet, then it is retrieved:
If Err.Number <> 0 Then
Dim description As String
Dim abstractStart As Long
Dim abstractEnd As Long
Dim abstractLength As Long
Set html_doc = CreateObject("htmlfile")
Set xml_obj = CreateObject("MSXML2.XMLHTTP")
xml_obj.Open "GET", url, False
xml_obj.send
html_doc.body.innerhtml = xml_obj.responseText
Set xml_obj = Nothing
description = html_doc.body.innertext
abstractStart = InStr(description, "Abstract") + 8
abstractEnd = InStr(description, "Inventors:")
abstractLength = abstractEnd - abstractStart
abstract = Mid(description, abstractStart, abstractLength)
colAbstract.Add Item:=abstract, Key:=url
End If
On Error GoTo 0
GetUSPatentAbstract = abstract
End Function
Related
I want to import the web page source code in excel what I see using View Page Source option in Chrome. But when I import it using below code, it doesn't import all content. The values that I'm looking for do not get displayed on web page.
I'm also unable to locate the element using getElementsByClassName or other methods.
Private Sub HTML_VBA_Excel()
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
'Change the URL before executing the code
sURL = "http://pntaconline/getPrDetails?entry=8923060"
'Extract data from website to Excel using VBA
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
sPageHTML = oXMLHTTP.responseText
'Get webpage data into Excel
' If longer sourcecode mean, you need to save to a external text file or somewhere,
' since excel cell have some limits on storing max characters
ThisWorkbook.Sheets(1).Cells(1, 1) = sPageHTML
MsgBox "XMLHTML Fetch Completed"
End Sub
Data I want to import is IDs and Name:
So you need to understand the DOM in order to realize why this isnt loading everything.
XMLHTTP is going to load that specific resource you requested. A lot of web pages, sorry pretty much all web pages, load extra resources after the initial request is done.
If you're missing stuff, it's probably loaded on a different network request. So open up your DevTools in Chrome, make sure Network tab is recording, and watch how many network requests go in and out when you load your target page.
Essentially, this if you're using XMLHTTP, you'd have to simulate each of those to get the requests you want to scrape.
EDIT
So you're just kind of pasting the data response into Excel.
Better to create HTMLDocument variable then set the response from XMLHTTP to be the response like here: https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms762275(v=vs.85)
set xmlhttp = new ActiveXObject("Msxml2.XMLHTTP.3.0");
xmlhttp.open("GET", "http://localhost/books.xml", false);
xmlhttp.send();
Debug.print(xmlhttp.responseText);
Dim xString as String
xSring = xmlhttp.responseText
'search the xString variable
You can then split that response for the sheet or search it and extract the values in VBA memory, rather than print to the sheet.
You could also set the xString responseText as the innerHTML for a new HTMLDocument variable
Dim xHTML as HTMLDocument
Set xHTML.innertext = xString
Hello fellow Excel vba coders
I have this great macro in my excel sheet, where i compile XML based on the rows the user puts in - after this it post the xml to a webservice.
You can have a look at my code below - It is fairly simple:
Set XMLHttpRequest = New MSXML2.XMLHTTP
With XMLHttpRequest
.Open "POST", URL, False
.setRequestHeader "Content-Type", "text/xml; encoding='utf-8'"
.setRequestHeader "Content-Length", strLength
.send strXML
End With
Right now it works great when there is less than 200 rows, yet it times out when the row number gets above 1000 rows. The string of XML I post is really big, and i'm quite sure that's the reason it times out.
Now my problem is, how do i post this huge dataset, that exceed 1.000 rows, maybe even above 20.000 rows, to a webservice?
So far i have spend a lot of time to look for a possible solution around the web, but have yet to find a way to handle this. So far i have the following ideas to solve the problem:
Copy the sheet to a new workbook, take the location of the new workbook and convert the file to a Base64 string and post the entire file to a new .asmx service and handle the "workbook" in C# code.
Convert the huge string to some kind of byte array and post that to a new .asmx webservice and handle the C# code.
I really hope one of you guys can point me in the right direction and help me solve this problem?
Instead of sending the entire sheet as a single object, send each row individually.
This will require your web service to be modified to accept just a row, rather than, what I would guess is currently being sent as an array of rows.
This would allow you to process any number of rows as the most you are processing at any one time is exactly one.
It is possible the issue of configuration on server side... See this post
and this.
Ok - I found a solution to my problem, and it seems like the best way to handle this problem.
I use the following function to make a copy of the workbook:
Private Function saveAS(Path As String)
Application.EnableEvents = False
Application.DisplayAlerts = False
ActiveWorkbook.Sheets.Copy
ActiveWorkbook.saveAS Path, FileFormat:=51
ActiveWorkbook.Close savechanges:=True
Application.EnableEvents = True
Application.DisplayAlerts = True
End Function
Then i encode the file into a base64string, like so:
Private Function EncodeFileBase64(Filename As String) As String
Dim arrData() As Byte
Dim fileNum As Integer
Filename = Filename + ".xlsx"
fileNum = FreeFile
Open Filename For Binary As fileNum
ReDim arrData(LOF(fileNum) - 1)
Get fileNum, , arrData
Close fileNum
Dim objXML As MSXML2.DOMDocument
Dim objNode As MSXML2.IXMLDOMElement
Set objXML = New MSXML2.DOMDocument
Set objNode = objXML.createElement("b64")
objNode.DataType = "bin.base64"
objNode.nodeTypedValue = arrData
EncodeFileBase64 = objNode.Text
Set objNode = Nothing
Set objXML = Nothing
End Function
And then i send the encoded string to my own .asmx webservice and work with it in C# The webmethod looks like this:
[WebMethod]
[ScriptMethod(UseHttpGet = true)]
public string UploadXML(string base64string)
{
try
{
byte[] bytes = Convert.FromBase64String(base64string);
using (MemoryStream ms = new MemoryStream(bytes))
{
using (var package = new ExcelPackage(ms))
{
ExcelWorkbook workBook = package.Workbook;
ExcelWorksheet settings = workBook.Worksheets.SingleOrDefault(w => w.Name == "sheet1");
ExcelWorksheet data = workBook.Worksheets.SingleOrDefault(w => w.Name == "sheet2");
//Getting data
string SS1 = (string)settings.Cells[8, 3].Value;
string ss2 = (string)settings.Cells[7, 3].Value;
}
}
return "success";
}
catch (Exception ee)
{
return ee.Message;
}
}
I just need to find a good way to pull out all the data in a smart algorithm, i dont think that will be a problem at all :)
I know you have a solution but I'd like to give an opinion...
If the data is really so tabular then shouldn't it be in a database like SQL Server? If you are uploading to a database then SQL Server has some nice bulk upload features to load workbooks efficiently.
I am using the below code in Blue prism for filtering in excel for multi criteria.
But i am not able to filter multi criteria for Not equal to scenario.
Dim wb As Object
Dim excel as Object
Dim range as Object
Try
wb = GetWorkbook(Handle, Workbook)
excel = wb.Application
range = excel.Range(FRange)
Dim listOfValues as Array
listOfValues = Split(FCriteria,";")
wb.worksheets(Worksheet).select
range.select
range.Autofilter(FCol,listOfValues,7)
Success = True
Catch e As Exception
Success = False
Message = e.Message
Finally
wb = Nothing
End Try
Please help me tweaking the script
I'm almost sure that there is no filter option to set a "negative list". You can specify either a (positive) list of values (this is what your code does so far, for this you have to set the 7 as third parameter), or you can give a maximum of 2 individual criteria (in Excel, choose "Custom Filter" to set them.
You should play with the filter directly in Excel and try to set it like you want. Once you are satisfied with it, clear the filter, record a macro and repeat the filtering. Go to the VBA editor and see what's in there. It is straightforward to translate this into C# code.
But:
It's not possible to set any filtering by code (neither C# nor VBA) that you cannot set via the Excel GUI
I would question what you are trying to do. Since you are using Blue Prism, you should be trying to access the underlying data in a BP Collection(VB DataTable), rather than applying a filter, which is a visual tool for humans to further play with the interface. The robot will still have to do something with the filtered data, and it far easier to write code to proceed with data during the loop.
Otherwise use the Filter Collection Page of the 'Utilities - Collection Manipulation' VBO to get a filtered collection.
Also you are using VBA Split function, when you should use Split in VB as a method of the String.
Try this for a new page in the 'Utilities - Collection Manipulation' VBO(untested):
Dim NewRow As DataRow
Collection_Out = Collection_In.Clone
Dim Select_Concat As String
Select_Concat = "NOT(" & fieldName & " = '" & [String].Join("' OR " & fieldName & " = '", FCriteria.Split(";"c)) & "')"
For Each parentRow As DataRow In Collection_In.Select(Select_Concat)
NewRow = Collection_Out.NewRow
For Each c As DataColumn In NewRow.Table.Columns
NewRow(c.ColumnName) = parentRow(c.ColumnName)
Next c
Collection_Out.Rows.Add(NewRow)
Next parentRow
NewRow = Nothing
Collection_In = Nothing
Inputs: Collection_In(Collection), fieldName(Text), FCriteria(Text)
Outputs: Collection_Out(Collection)
You first need to get the entire range into an unfiltered Collection(which will be your Collection_In to this page, and then get the filtered Collection out....
I have following code in an lotusscript agent that removes attachments from NotesDocuments. But NotesDocument.save() causes loss of rich text formatting (font, color). Is there any way to retain the formatting?
Sub removeAttachments_v2(doc As NotesDocument)
Dim session As NotesSession
Dim rtitem As Variant
Dim filename As String
Dim ans As Variant
Set session = New NotesSession
Dim richstyle As NotesRichTextStyle
Set richstyle = session.CreateRichTextStyle
richstyle.NotesColor = COLOR_BLUE
If doc.HasEmbedded Then
Set rtitem = doc.getfirstitem("Body")
If (rtitem.type = RICHTEXT) Then
ForAll object In rtitem.EmbeddedObjects
If (object.Type = EMBED_ATTACHMENT) Then
filename = object.source
Call object.remove
Call rtitem.AddNewLine( 2 )
Call rtitem.AppendStyle(richstyle)
Call rtitem.AppendText( "Attachemnt removed: " & filename )
Call doc.Save( True, True , True )
End If
End ForAll
End If
End If
End sub
Edit1: Initialize function
Sub Initialize
Dim db As New NotesDatabase("","")
Dim col As NotesDocumentCollection
Dim doc As NotesDocument
Call db.Open("", "C:\this\is\db\dir\test.nsf")
Set col = db.Alldocuments
Set doc = col.Getfirstdocument()
While Not ( doc Is Nothing )
Call RemoveAttachments_v2(doc)
Call doc.Save(False, False, False)
Set doc = col.GetNextDocument( doc )
Wend
End Sub
Despite of the fact, that you save the document for every attachment I cannot find any reason, why this should happen. I just copied your code in an agent, and it removes the attachments as desired and appends the text in blue...
No formatting is lost...
The error has to be somewhere else in your code, probably in the calling function.
OLD ANSWER (wrong due to own tests, just kept here as history):
The Problem here most probably is: you defined rtitem as Variant. And
getfirstitem gets you a NotesItem instead of a NotesRichtextItem, so
when saving, it is converted to a "plain Text" item.
Most probably you used Variant instead of NotesRichtextItem, because
there are Mime- mails where defining the variable as NotesRichtextItem
will cause an "Type Missmatch" or similar error. As long as you do not
write anything back this is OK.
As Mime Mails need complete different handling to achieve your goal,
you should first fix the code for pure NotesRichtextItems by using the
right type, and then write another code- branch for handling Mime-
items
I create a browser like so, and manually navigate to the web page I need to be. I intend to automatically pull certain elements once I get to the page I need to be on via a seperate macro
Sub Test()
Set CAS = New SHDocVw.InternetExplorer ' create a browser
CAS.Visible = True ' make it visible
CAS.navigate "http://intraneturl"
Do Until CAS.readyState = 4
DoEvents
Loop
This works fine, then I do
Public Sub Gather
Set HTMLDoc2 = CAS.document.frames("top").document
Call Timer1
With HTMLDoc2
.getElementById("tab4").FirstChild.Click
End With
Call Timer2
Dim fir, las, add1, add2, cit, stat, zi As String
Dim First As Variant
Dim Last As Variant
Dim addr1 As Variant
Dim addr2 As Variant
Dim city As Variant
Dim Thisstate As Variant
Dim Zip As Variant
Call Timer2
Set HTMLDoc = CAS.document.frames("MainFrame").document
Call Timer2
With HTMLDoc
First = .getElementsByName("IndFirst")
Last = .getElementsByName("IndLast")
addr1 = .getElementsByName("txtAdd_Line1")
addr2 = .getElementsByName("txtAdd_Line2")
city = .getElementsByName("txtAdd_City")
Thisstate = .getElementsByName("cmb_Add_State")
Zip = .getElementsByName("txtAdd_Zip")
End With
fir = First.Value
las = Last.Value
add1 = addr1.Value
add2 = addr2.Value
cit = city.Value
stat = Thisstate.Value
zi = Zip.Value
'navigate back to start page
With HTMLDoc2
.getElementById("tab3").FirstChild.Click
End With
End Sub
This works the first time, but after the first time, I get "Object variable or with block variable not set" when trying to run the gather() sub again, on a different web page that contains similar information. Any Ideas as to what im doing wrong?
"The error "object variable or with block variable not set" occurs on: Set HTMLDoc2 = CAS.document.frames("top").document the second time i try running Gather()."
This is probably one of three things:
CAS is no longer an object
To check this, set a breakpoint on the line, press ctr+G in the VBA Editor and type ?CAS Is Nothing in the Immediate Window; the result should be False; if it is True CAS is no longer an object
Like Daniel Dusek suggested, make sure CAS.document.frames("top") is an actual element on the page.
To check this, open the webpage you are trying to script, press F12 in Internet Explorer, click on the arrow in the toolbar and click on the "top" frame element in the webpage, switch back to the Developer Tool and look at the line highlighted. Make sure the frame element is named "top".
The HTML hasn't fully loaded when you try to reference the frame element. Set a longer delay or a loop.
i.e. (untested):
Do Until HtmlDoc2 Is Nothing = false
Set HTMLDoc2 = CAS.document.frames("top").document
Loop
Maybe the more important question is why manually navigate to another page? Can't you automate that part of your process too?