VBA/DOM - Get elements based on attribute - vba

Excel 2013 on Windows 7. XPath/Javascript/jQuery is out of scope.
I am trying to iterate over select div elements in a page, namely elements that have a specific data-level attribute.
My current approach is similar to this, but I was unable to find a non-manual way to select elements based on attributes. The closest I came was something like:
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", url, False
.Send
pHTML.body.innerHTML = .ResponseText
End With
Set eCollection = pHTML.getElementsByClassName("chapter").getElementsByTagName("div")
For i = 0 To eCollection.Length
If eCollection(i).getAttribute("data-level") >= 0 Then ' Throw cake
Next i
This solution, while I am sure it is viable (if unelegant), seems sub-optimal if only for how big the loop is going to end up being when I start looking for specific elements and sequences of elements within these elements.
So I am looking for a way to do something like this:
For Each pElement In pHTML.getElementsByClassName("chapter").getElementsByTagName("div").getElementsByAttribute("data-level")
' Throw cake at the element
Next
I'm aware that there is no method getElementsByAttribute, hence the question.
Is there some approach here that I am blind to, or am I locked to manual iteration?
Alternatively, if I swap my current approach for creating an IE instance, á la this answer, could I concievably use querySelectorAll to end up with something resembling the result I have outlined above?

For anyone else coming this way, the outer shell, so to speak, can look like this:
Sub ScrapeWithHTMLObj(url As String, domClassName As String, domTag As String, domAttribute As String, domAttributeValue As String)
' Dependencies:
' * Microsoft HTML Object Library
' Declare vars
Dim pHTML As HTMLDocument
Dim pElements As Object, pElement As Object
Set pHTML = New HTMLDocument
' Basic URL healthcheck
Do While (url = "" Or (Left(url, 7) <> "http://" And Left(url, 8) <> "https://"))
MsgBox ("Invalid URL!")
url = InputBox("Enter new URL: (0 to terminate)")
If url = "0" Then Exit Sub
Loop
' Fetch page at URL
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", url, False
.Send
pHTML.body.innerHTML = .ResponseText
End With
' Declare page elements
Set pElements = pHTML.getElementsByClassName(domClassName)
Set pElement = pElements(0).getElementsByTagName(domTag)
' Extract only elements with wanted attribute
pEleArray = getElementsByAttribute(pElement, domAttribute, domAttributeValue)
For Each e In pEleArray
' Do stuff to elements
Debug.Print e.getAttribute(domAttribute)
Next
End Sub
If you go this route, you'll also need something like this:
Function getElementsByAttribute(pObj As Object, domAttribute As String, domAttributeValue As String) As Object()
Dim oTemp() As Object
ReDim oTemp(1 To 1)
For i = 0 To pObj.Length - 1
'Debug.Print pObj(i).getAttribute(domAttribute)
If pObj(i).getAttribute(domAttribute) = domAttributeValue Then
Set oTemp(UBound(oTemp)) = pObj(i)
ReDim Preserve oTemp(1 To UBound(oTemp) + 1)
End If
Next i
ReDim Preserve oTemp(1 To UBound(oTemp) - 1)
getElementsByAttribute = oTemp
End Function
Depending on the HTML tree, you'll need to change which elements you zero in on in the sub, obviously. For the site I used in testing, this structure worked flawlessly.
Example usage:
Call ScrapeWithHTMLObj("https://somesite", "chapter-index", "div", "data-level", "1")
It will enter the first class named chapter-index, select all elements with the div tag, and finally extract all elements containing the attribute data-level with value 1.

Related

Listbox.List(i) error - Method or Data Member not Found

I'm trying to use a multi-select listbox so users can select cleaning tasks they have completed and mark them as done. While looping through the list I want to see if the item is selected and create a record if so. When I try to use the .List method to return the data from a specific row, I keep getting the method not found error.
I originally did not have the forms 2.0 library loaded so I thought that was the issue, but that did not resolve the problem. I've also compacted and repaired thinking it might just be an odd fluke, but that did not help either.
'loop through values in listbox since its a multi-select
For i = 0 To listCleaningTasks.ListCount - 1
If listCleaningTasks.Selected(i) Then
'add entry to cleaning log
Set rsCleaning = CurrentDb.OpenRecordset("SELECT * FROM cleaning_log;")
With rsCleaning
.AddNew
.Fields("cleaning_task_id") = Form_frmCleaning.listCleaningTasks.List(i)
.Fields("employee_id") = Me.cmbUser
.Fields("cleanroom_id") = Me.cmbCleanroom
.Fields("cleaning_time") = Now()
.Update
.Close
End With
End If
Next i
Any ideas?
Use .listCleaningTasks.ItemData(r) to pull bound column value from row specified by index.
Use .listCleaningTasks.Column(c, r) to pull value specified by column and row indices.
Open and close recordset only one time, outside loop.
Really just need to loop through selected items, not the entire list.
Dim varItem As Variant
If Me.listCleaningTasks.ItemsSelected.Count <> 0 Then
Set rsCleaning = CurrentDb.OpenRecordset("SELECT * FROM cleaning_log")
With rsCleaning
For Each varItem In Me.listCleaningTasks.ItemsSelected
`your code to create record
...
.Fields("cleaning_task_ID") = Me.listCleaningTasks.ItemData(varItem)
...
Next
.Close
End With
Else
MsgBox "No items selected.", vbInformation
End If
Of course the solution of June7 is correct. If you need to store the selected items and then later recall and re-select the list box items, consider to get the selected items comma delimited using this function
Public Function GetSelectedItems(combo As ListBox) As String
Dim result As String, varItem As Variant
For Each varItem In combo.ItemsSelected
result = result & "," & combo.ItemData(varItem)
Next
GetSelectedItems = Mid(result, 2)
End Function
Store it into one column of a table and after reading it back pass it to this sub:
Public Sub CreateComboBoxSelections(combo As ListBox, selections As String)
Dim N As Integer, i As Integer
Dim selectionsArray() As String
selectionsArray = Split(selections, ",")
For i = LBound(selectionsArray) To UBound(selectionsArray)
With combo
For N = .ListCount - 1 To 0 Step -1
If .ItemData(N) = selectionsArray(i) Then
.Selected(N) = True
Exit For
End If
Next N
End With
Next i
End Sub
This will select items in your ListBox as they were before.

Get AutoCAD Block Handle - Differing Results w/ Attout VB.NET

I have an attribute tab delimited text file that I want to apply to multiple drawings. In order for AutoCAD to NOT pop up and say "One or more blocks could not be found, do you want to select the data interactively?" , I have to use the HANDLE property of the block. On a given drawing, if I use ATTOUT to see the Handle of my block, I get a value such as '8B3F. Using ATTIN with that Handle works. Applying this to multiple drawings that have different handles, I have to get the handle for each block if each drawing. Here is my code - writing the handle to an excel doc.
xlbook = xlapp.Workbooks.Open(attInText,, False)
xlsheet = xlbook.Worksheets(dwgName)
Dim Handle As String = ""
'get the handle to the CHS11x17TB title block
For Each blk As AutoCAD.AcadBlock In cadDOC.Blocks
If blk.Name.ToUpper = "CHS11X17TB" Then
Handle = blk.Handle
xlsheet.Cells(2, "A").value = Handle
Exit For
End If
Next
Now, the problem is that the Handle is NOT the same as the one generated using ATTOUT - I'll get something like '75B0 using the code. Why do you think ATTOUT gives me a different handle than looping through the blocks of the drawing? I would note that my block is in paperspace, if that makes any difference. If that question cannot be answered, I'm interested in any alternative solutions for getting the handle to my block.
It looks like you're confusing block definition (Block) contained in the block table (Blocks) and block reference (BlockReference) inserted in the ModelSpace or PaperSpace.
Here's a not tested snippet which serac for a block reference in the model space (you can replace ModelSpace with PaperSpace to search the active paper space.
xlbook = xlapp.Workbooks.Open(attInText,, False)
xlsheet = xlbook.Worksheets(dwgName)
Dim Handle As String = ""
'get the handle to the CHS11x17TB title block
For Each obj As AutoCAD.AcadObject In cadDOC.ModelSpace
If obj.ObjectName = "AcDbBlockReference" Then
If obj.EffectiveName.ToUpper = "CHS11X17TB" Then
Handle = obj.Handle
xlsheet.Cells(2, "A").value = Handle
Exit For
End If
End If
Next
Here's what I did to make it work. The block reference I wanted was in the paperspace. Note that EntityType 7 is an AcadBlockReference.
Dim Handle As String = ""
Dim count As Integer
count = cadDOC.PaperSpace.Count
Dim newObjs(count) As AutoCAD.AcadEntity
Dim index As Integer
For index = 0 To count - 1
newObjs(index) = cadDOC.PaperSpace.Item(index)
Next
For i = 0 To count - 1
Try
If newObjs(i).EntityType = 7 Then
Dim blk As AutoCAD.AcadBlockReference = newObjs(i)
If blk.Name.ToUpper = "CHS11X17TB" Then
Handle = "'" & blk.Handle
End If
End If
Catch ex As Exception
End Try
Next

How to identify string in htm.getelementbyid("mystring") using vba?

I am trying to get data from a different website using the vba code bellow, but I don't know how to identify the string inside the parenthesis in this statement "With htm.getelementbyid("comps-results"). How do I get the string in the parenthesis from, for example, this website
I would appreciate very much if someone could help me on this matter.
Thank you in advance.
Sub GetData()
Dim x As Long, y As Long
Dim htm As Object
Set htm = CreateObject("htmlFile")
With CreateObject("msxml2.xmlhttp")
.Open "GET", "http://www.zillow.com/homes/comps/67083361_zpid/", False
.send
htm.body.innerhtml = .responsetext
End With
With htm.getelementbyid("comps-results")
For x = 0 To .Rows.Length - 1
For y = 0 To .Rows(x).Cells.Length - 1
Sheets(1).Cells(x + 1, y + 1).Value = .Rows(x).Cells(y).innertext
Next y
Next x
End With
End Sub
The getElementByID method takes a unique ID as an argument and returns a single HTML element if there is one with such an ID value.
Probably what you need to do is use the getElementsByTagName method, which returns a collection of matching elements. Since this may result in multiple matches, I find it best to create an object first, and an iterator variable:
Dim compresults
Dim el
Set compresults = htm.getelementsbytagname("comps-results")
For each el in compresults
MsgBox el.InnerText
Next
BTW, I am fairly certain ( but have not verified) that an HTMLElementCollection does not have a .Rows member, so the next line in your code will probably raise an error. Likewise, the .Rows does not have a .Length property, so there's at least two errors on that single line of code AND in the next line, note that .Cells does not have a .Length member, either, so another error.
For assistance with those parts of your code, I urge you to ask a new question. This answer addresses your original question.

Find and replace all names of variables in VBA module

Let's assume that we have one module with only one Sub in it, and there are no comments. How to identify all variable names ? Is it possible to identify names of variables which are not defined using Dim ? I would like to identify them and replace each with some random name to obfuscate my code (O0011011010100101 for example), replace part is much easier.
List of characters which could be use in names of macros, functions and variables :
ABCDEFGHIJKLMNOPQRSTUVWXYZdefghijklmnopqrstuvwxyzg€‚„…†‡‰Š‹ŚŤŽŹ‘’“”•–—™š›śťžź ˇ˘Ł¤Ą¦§¨©Ş«¬­®Ż°±˛ł´µ¶·¸ąş»Ľ˝ľżŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙ÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙
Below are my function I've wrote recenlty :
Function randomName(n as integer) as string
y="O"
For i = 2 To n:
If Rnd() > 0.5 Then
y = y & "0"
Else
y = y & "1"
End If
Next i
randomName=y
End Function
In goal to replace given strings in another string which represent the code of module I use below sub :
Sub substituteNames()
'count lines in "Module1" which is part of current workbook
linesCount = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.CountOfLines
'read code from module
code = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.Lines(StartLine:=1, Count:=linesCount)
inputStr = Array("name1", "name2", "name2") 'some hardwritten array with string to replace
namesLength = 20 'length of new variables names
For i = LBound(inputStr) To UBound(inputStr)
outputString = randomName(namesLength-1)
code = Replace(code, inputStr(i), outputString)
Next i
Debug.Print code 'view code
End Sub
then we simply substitute old code with new one, but how to identify strings with names of variables ?
Edition
Using **Option Explicit ** decrease safety of my simple method of obfuscation, because to reverse changes you only have to follow Dim statements and replace ugly names with something normal. Except that to make such substitution harder, I think it's good idea to break the line in the middle of variable name :
O0O000O0OO0O0000 _
0O00000O0OO0
the simple method is also replacing some strings with chains based on chr functions chr(104)&chr(101)&chr(108)&chr(108)&chr(111) :
Sub stringIntoChrChain()
strInput = "hello"
strOutput = ""
For i = 1 To Len(strInput)
strOutput = strOutput & "chr(" & Asc(Mid(strInput, i, 1)) & ")&"
Next i
Debug.Print Mid(strOutput, 1, Len(strOutput) - 1)
End Sub
comments like below could make impression on user and make him think that he does not poses right tool to deal with macro etc.:
'(k=Äó¬)w}ż^¦ů‡ÜOyúm=ěËnóÚŽb W™ÄQó’ (—*-ĹTIäb
'R“ąNPÔKZMţ†üÍQ‡
'y6ű˛Š˛ŁŽ¬=iýQ|˛^˙ ‡ńb ¬ĂÇr'ń‡e˘źäžŇ/âéç;1qýěĂj$&E!V?¶ßšÍ´cĆ$Âű׺Ůî’ﲦŔ?TáÄu[nG¦•¸î»éüĽ˙xVPĚ.|
'ÖĚ/łó®Üă9Ę]ż/ĹÍT¶Mµę¶mÍ
'q[—qëýY~Pc©=jÍ8˘‡,Ú+ń8ŐűŻEüńWü1ďëDZ†ć}ęńwŠbŢ,>ó’Űçµ™Š_…qÝăt±+‡ĽČg­řÍ!·eŠP âńđ:ŶOážű?őë®ÁšńýĎáËTbž}|Ö…ăË[®™
You can use a regular expression to find variable assignments by looking for the equals sign. You'll need to add a reference to the Microsoft VBScript Regular Expressions 5.5 and Microsoft Visual Basic for Applications Extensibility 5.3 libraries as I've used early binding.
Please be sure to back up your work and test this before using it. I could have gotten the regex wrong.
UPDATE:
I've refined the regular expressions so that it no longer catches datatypes of strongly typed constants (Const ImAConstant As String = "Oh Noes!" previously returned String). I've also added another regex to return those constants as well. The last version of the regex also mistakenly caught things like .Global = true. That was corrected. The code below should return all variable and constant names for a given code module. The regular expressions still aren't perfect, as you'll note that I was unable to stop false positives on double quotes. Also, my array handling could be done better.
Sub printVars()
Dim linesCount As Long
Dim code As String
Dim vbPrj As VBIDE.VBProject
Dim codeMod As VBIDE.CodeModule
Dim regex As VBScript_RegExp_55.RegExp
Dim m As VBScript_RegExp_55.match
Dim matches As VBScript_RegExp_55.MatchCollection
Dim i As Long
Dim j As Long
Dim isInDatatypes As Boolean
Dim isInVariables As Boolean
Dim datatypes() As String
Dim variables() As String
Set vbPrj = VBE.ActiveVBProject
Set codeMod = vbPrj.VBComponents("Module1").CodeModule
code = codeMod.Lines(1, codeMod.CountOfLines)
Set regex = New RegExp
With regex
.Global = True ' match all instances
.IgnoreCase = True
.MultiLine = True ' "code" var contains multiple lines
.Pattern = "(\sAs\s)([\w]*)(?=\s)" ' get list of datatypes we've used
' match any whole word after the word " As "
Set matches = .Execute(code)
End With
ReDim datatypes(matches.count - 1)
For i = 0 To matches.count - 1
datatypes(i) = matches(i).SubMatches(1) ' return second submatch so we don't get the word " As " in our array
Next i
With regex
.Pattern = "(\s)([^\.\s][\w]*)(?=\s\=)" ' list of variables
' begins with a space; next character is not a period (handles "with" assignments) or space; any alphanumeric character; repeat until... space
Set matches = .Execute(code)
End With
ReDim variables(matches.count - 1)
For i = 0 To matches.count - 1
isInDatatypes = False
isInVariables = False
' check to see if current match is a datatype
For j = LBound(datatypes) To UBound(datatypes)
If matches(i).SubMatches(1) = datatypes(j) Then
isInDatatypes = True
Exit For
End If
'Debug.Print matches(i).SubMatches(1)
Next j
' check to see if we already have this variable
For j = LBound(variables) To i
If matches(i).SubMatches(1) = variables(j) Then
isInVariables = True
Exit For
End If
Next j
' add to variables array
If Not isInDatatypes And Not isInVariables Then
variables(i) = matches(i).SubMatches(1)
End If
Next i
With regex
.Pattern = "(\sConst\s)(.*)(?=\sAs\s)" 'strongly typed constants
' match anything between the words " Const " and " As "
Set matches = .Execute(code)
End With
For i = 0 To matches.count - 1
'add one slot to end of array
j = UBound(variables) + 1
ReDim Preserve variables(j)
variables(j) = matches(i).SubMatches(1) ' again, return the second submatch
Next i
' print variables to immediate window
For i = LBound(variables) To UBound(variables)
If variables(i) <> "" And variables(i) <> Chr(34) Then ' for the life of me I just can't get the regex to not match doublequotes
Debug.Print variables(i)
End If
Next i
End Sub

Keeping a count in a dictionary, bad result when running the code, good result adding inspections

Weird problem. Stepping through the code with inspections gives me correct answers. Just running it doesn't.
This program loops through each cell in a column, searching for a regex match. When it finds something, checks in a adjacent column to which group it belongs and keeps a count in a dictonary. Ex: Group3:7, Group5: 2, Group3:8
Just stepping through the code gives me incorrect results at the end, but adding and inspection for each known item in the dictionary does the trick. Using Debug.Print for each Dictionary(key) to check how many items I got in each loop also gives me a good output.
Correct // What really hapens after running the code
Group1:23 // Group1:23
Group3:21 // Group3:22
Group6:2 // Group6:2
Group7:3 // Group7:6
Group9:8 // Group9:8
Group11:1 // Group11:12
Group12:2 // Group12:21
Sub Proce()
Dim regEx As New VBScript_RegExp_55.RegExp
Dim matches
Dim Rango, RangoJulio, RangoAgosto As String
Dim DictContador As New Scripting.Dictionary
Dim j As Integer
Dim conteo As Integer
Dim Especialidad As String
regEx.Pattern = "cop|col"
regEx.Global = False 'True matches all occurances, False matches the first occurance
regEx.IgnoreCase = True
i = 3
conteo = 1
RangoJulio = "L3:L283"
RangoAgosto = "L3:L315"
Julio = Excel.ActiveWorkbook.Sheets("Julio")
Rango = RangoJulio
Julio.Activate
For Each celda In Julio.Range(Rango)
If regEx.Test(celda.Value) Then
Set matches = regEx.Execute(celda.Value)
For Each Match In matches
j = 13 'column M
Especialidad = Julio.Cells(i, j).Value
If (Not DictContador.Exists(Especialidad)) Then
Call DictContador.Add(Especialidad, conteo)
GoTo ContinueLoop
End If
conteo = DictContador(Especialidad)
conteo = CInt(conteo) + 1
DictContador(Especialidad) = conteo
Next
End If
ContinueLoop:
i = i + 1
'Debug.Print DictContador(key1)
'Debug.Print DictContador(key2)
'etc
Next
'Finally, write the results in another sheet.
End Sub
It's like VBA saying "I'm going to dupe you if I got a chance"
Thanks
Seems like your main loop can be reduced to this:
For Each celda In Julio.Range(Rango)
If regEx.Test(celda.Value) Then
Especialidad = celda.EntireRow.Cells(13).Value
'make sure the key exists: set initial count=0
If (Not DictContador.Exists(Especialidad)) Then _
DictContador.Add Especialidad, 0
'increment the count
DictContador(Especialidad) = DictContador(Especialidad) +1
End If
Next
You're getting different results stepping through the code because there's a bug/feature with dictionaries that if you inspect items using the watch or immediate window the items will be created if they don't already exist.
To see this put a break point at the first line under the variable declarations, press F5 to run to the break point, then in the immediate window type set DictContador = new Dictionary so the dictionary is initialised empty and add a watch for DictContador("a"). You will see "a" added as an item in the locals window.
Collections offer an alternative method that don't have this issue, they also show values rather than keys which may be more useful for debugging. On the other hand an Exists method is lacking so you would either need to add on error resume next and test for errors instead or add a custom collection class with an exists method added. There are trade-offs with both approaches.