Trying to use VBA to scrape some product pages with 30 products per page. If you inspect the page in dev tools, the elements I need are all in span tags with straightforward class names; "part-number", "price", etc. There will be one for each product on the page, and then one empty one. But if you Ctrl + U and look at the source code, only the empty one is there. So if you look in dev tools there are 31 span tags with class "part-number", 30 populated ones and one empty. But if you look at the source code, only the empty one is there.
If I attempt to do something like...
For x = 0 to 29
Debug.Print "Part number = " & ie.document.getElementsByClassName("part-number")(x).innerText
Next x
...it returns one blank value and then errors off with "Object variable or With block variable not set."
Pretty sure what's happening here is that the source code being served up by the server only includes the one blank span tag for each class, and then the rest of the HTML is being dynamically created by Javascript or whatever, but VBA is only attempting to scrape the original source code.
Any way I can get it to scrape the DOM after being rendered instead?
Related
I suspect my strategy is incorrect as I can't seem to apply my search results for the keywords I've used over the last few days. ( https://stackoverflow.com/questions/54496552] seems to be the closest. )
I am able to populate a single document from the two forms I've built and save it under a new name
...
MsgBox "Saving as " & aFullPath
ActiveDocument.SaveAs FileName:=aFullPath, FileFormat:=wdFormatDocumentDefault
But that changes the name of my parent document that contains my forms (Word2016 document name: waiver.docm). In practice, that won't be a problem because the user will not be saving "waiver.docm" except by accident.
But that's why I think my approach is wrong.
Ideally my VBA code would
Load a protoype waiver template with the page heading, bookmarks and table as I have now in my waiver.docm.
Upon filling that template, append another waiver template as a new page.
And return control to the form
Repeat above two steps of appending of sheets until the user indicates completion (e.g. "Finish" command button). Typically after 1 to 4 pages
Print and save the entire document.
Right now I interrupt the process after each page to force my prospective user to print and save the document (and under a unique name).
I have a table in Word that has column titles. When the page breaks the table rolls over to the next page and the headers repeat. However, I also have section titles that are important to see as well. If you look at the example below, I have the section '2' at the top next to sub-section 'C'.
a) I will be generating MHTML dynamically for import into Word so if it is possible to generate MHTML that will enable the above then that would be great. Otherwise ...
b) Is there any way within Word to manually or using VBA mark up the sections so they know to roll over to the next page automatically, so that the table will update itself if there are any changes to page-break locations. Alternatively...
c) I might have to write some VBA that checks that the section numbers are in the right place every time the VBA code is manually run, although I suspect that might start to get messy as I will also have to remove any existing 'pulled' section numbers that might have been inserted.
Thanks
I've tried SeleniumIde for first time few days ago and everyday been struggling a lot to just understand basic workflow however now I managed to login to webpage and store variable.
That variable, i wanted then to use as list.
The variable looks like this:
item1,item2,item3,item4
However, when use for each loop on this variable, it doesnt go item1>item2>item3, it goes i>t>e>m>1>i>t>e>m>2 etc. Either Selenium doesn't automatically parse it (make array from variable) as array, or im doing something wrong.
I'm sorry I have no clue how to import code from SeleniumIDE, therefore ill show screenshot.
https://i.imgur.com/0XaoanA.png
You can see on screen that I store variable, make foreach loop of it, and the loop writes one letter instead of one word in certain field on webpage.
Any help appreciated
I was attempting to create a simple program that would pull a text item from a website and add it to the textbox. I'm simply just experimenting and thought I could do it but it is not that easy for me. I know how to get the entire source code of a website(below). It has a id I know but it does not have a tag name. So Im not really sure how to make it read through the text and only keep the part next to the id . Or would it be better to use a Webbrowser tool and then try and get the text item like that. I'm just trying to do whatever is faster. I think my 1st option is better because it would be better for the computer's ram. Using the code below I don't know what to add next?
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("Website")
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
Dim source As String = sr.ReadToEnd()
Lets say the id is "name" for example. Viewing the source of the page this is what the part looks like(below). How can I parse through the source which is a string and find this section, get the name Brandon, and add it to the textbox.
<span id="name">Brandon</span>
There are a few ways to go about this. I'm not going to write any source code though since I haven't used Visual Basic in a very long time. But if you Google for how to do any of the following you should find many tutorials and documents on it.
Regular Expressions
Using a Regular Expression on the full source code can help you find the element by searching for the ID attribute which should be unique. Regular Expressions can sometimes be very slow, which is why if you have to perform a lot of searches on large sections of text, it should be avoided.
/<([a-z0-9]+)\sid="name"(.*?)>(.*?)<\// -> Not Tested, but might help you
String Position
Using a function that will find the position of a substring in a string would be useful. In C it's strstr and in PHP it's strpos. These type of functions will give you starting position of a string, in which your case would be searching for id="name". Once you find that, you will find the position of the end of the tag and then find the closing tag for that element. You then will perform a substring function that will get you the text starting at position X for the length of that you specify, which would be the closing tag position - end of opening tag position.
HTML / XML Library
There are probably a ton of HTML / XML libraries that will parse the document into some sort of object or an array. You then can loop through these elements until you find the one you are looking for. Some of these libraries may even have search functions of element ID's similar to how JavaScript will sort for a specific element.
These libraries may be hard to get started with, but they will offer you a lot of options in the future if you need to continue finding more HTML elements.
I am fairly new to VBA (Word 2010) and I'm unsure if something I'd like to do is even possible in the way that I want to do it, or if I need to investigate completely different avenues. I want to be able to print ranges (or items) that are not currently enumerated as part of either wdPrintOutRange or wdPrintOutItem. Is it possible to define a member of a wd enumeration?
As an example, I'd like to be able to print comments by a particular user. wdPrintComments is a member of the wdPrintOutItem enumeration, but, I only want comments that have an Initial value of JQC. Can I define a wdPrintCommentsJQC constant? My code is reasonably simple; I have a userform that lets the user pick some settings (comments by user, endnotes only, etc.) and a Run button whose Click event should generate a PrintOut method with the proper attributes. Am I on the wrong track?
(If it matters, the Initial values will be known to me as I write the code. I have a discrete list.)
No, it's not possible to add a constant to a predefined enumeration type.
However, one possible way to do this would be to build a string of page numbers which contain the items you wish to print, open the print dialog in the "dialogs" collection, and set it to print a specified range, andinsert the string containing the list of pages (separate them with commas). Finally, execute the .show method of the print dialog to show it to the user and give them the opportunity to set any other items and click the "ok" button. I've done something very similar when I needed to print a specific chapter of a long document, and so I had to specify the "from" section and page and the "to" section and page for the user. Below I just show how to specify a list of pages instead of the ".form" and "to" I was using:
With Dialogs(wdDialogFilePrint)
.Range = wdPrintRangeOfPages
.Pages = "3,5,7-11"
.show
end with
I'm not sure how you want to print the comments (or other elements), but you could create another document and insert what you want to print on this document.
According to what you want, you could insert them as they were (comments, footnotes, etc) or as plain text, or any other format.