How to access comment section of a page source using selenium? - selenium

How can we access the content after class="hidden_elem" using selenium, as it does not get loaded into the page source while accessing it. The content in the comments is actually what we need to extract as they are a part of the script that gets run.

Related

Is it possible to scrape an Angular Website using Selenium-python?

I have been trying to scrape an Angular Website using Selenium. To my surprise it doesn't let you scrape the html rendered contents as it renders it dynamically using Javascript. I want to locate those tags for the purpose of scraping but I am unable to do so. What is the right way to scrape them? Here is some more context:
They say you can't do it using python.
Some also tried downloading all the html content and then read them. But again this isn't my use case.
But my use case is a lot different:
I want to login to my google account then it redirects me to an angular page where I click a button called reporting and from there I am redirected to a page from where I have to finally click download button to download the report.

Does Selenium/Protractor look for element in current loaded page?

Does Selenium/Protractor look for element in current loaded page, or look in the entire application when using the same Css Selector for same elements?
Eg:
Save Button on Customers Form class="save"
Save Button on Vendors Form class="save"
Protractor works by interacting with the browser (via selenium). Selenium uses browser drivers to interact with your page, and the browser only contains the code that it asked for (returned from the server, based on the type of request that was made).
So yes, it only looks for elements in the currently loaded page. It has no access to your entire application code.
Selenium looks for the element only on the loaded page. Not sure about Protractor though.

Windows Phone 8: Load/Create HTML on the fly and load into browser

I am working on an app that reads XML and displays content accordingly with whats contained n the XML. Now i have the XML part done but i need one other part and that is to load a Small section of html code into a web browser element. Is there anyway for me to either dynamically create a html file (i was thinking maybe create one and save in storage then load from there?) or directly insert code into the web browser element.
Failing this i'll just create a php page on my server that adjusts according to value its passed.
You can store your entire HTML code in a string variable and call the NavigateToString method.
myWebBrowser.NavigateToString("myHTMLcode")
How you create the HTML string depends on your app but you could store a basic template and use String.Replace to replace any particular items in the code.

Disable browser cache for displayed in an iFrame PDF by means of TCPDF

I am trying for hours to solve the following caching problem.
My application has the following structure (simplified):
index.php - main page (contains various input fields, submit button and an iframe for dispaying PDF content with the help of TCPDF)
generate.php - generates PDF file based on the supplied POST parameters and stores the file to the filesystem
viewer.php - Displays the PDF document (TCPDF libraries). The iframe loads this script to show the pdf file
The workflow is pretty simple - the user chooses some options and clicks the submit button on the main page. The selected parameters are sent per AJAX by POST to the generate.php script. The script generates the PDF file and stores it to the filesystem. At the end it returns the newly created/edited filename. The filename is fetched in the AJAX callback function, which then refreshes the iframe with the new/edited filename:
viewer.php?filename=NEW_OR_EDITED_FILENAME
Everything is working, but when the file is being replaced, sometimes (NOT ALWAYS), the browser shows the old pdf file, although the new version is on the hard drive. I tried the following solutions:
Add Meta tags to disable cache to the generated HTML by index.php and viewer.php
Disabling cache for jQuery AJAX calls by: jQuery.ajaxSetup({cache: false});
Adding some random string to the the filename parameter:
viewer.php?filename=FILENAME_RANDOMSTRING
The RANDOMSTRING is then removed from the script and the filename is extracted.
None of these solutions worked for me. Tested browsers are: Chrome 25.0.1364.152 and Firefox 19.0. Can someone help me with this?
Thanks in advance
Just had the same problem but after adding a random string it works perfect:
<iframe src="file.pdf?=<?=time();?>"></iframe>
After many hours of trying, the solution I found is to really generate a new file each time (Solution 3 from the question without removing the random string at the end of the file). As a result it was necessary to update the database and to delete the old files on every change. My initial intention was to avoid these actions, but unfortunately no other solution was found

how to read/parse dynamically generated web content?

I need to find a way to write a program (in any language) that will connect to a website and read dynamically generated data from the website.
Note that it's dynamically generated--it's not enough to get the source html, because the data I'm interested in is generated via javascript that references back-end code. So when i view the webpage source, I can't see the data. (For example, go to google, and do a search. Check the source code on the search results page. Very little of the data your browser is displaying is reflected in the source--most of it is dynamically generated. I need some way to access this data.)
Pick a language and environment that includes an HTML renderer (e.g. .NET and the WebBrowser control). Use the HTML renderer to get the URL and produce an HTML DOM in memory (making sure that scripting is enabled). Read the contents of the HTML DOM after the renderer has done its work.
Example (you'll need to do this inside a System.Windows.Form derived class):
WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");
HtmlDocument document = browser.Document;
// extract what you want from the document
I used to have a Perl program to access Mapguide.com to get the drive direction from one location to another location. I parsed the returned page and save to database. If the source never change their format, it is OK. the problem is the source format often change, your parser also need change.
A simple thought: if we're talking about AJAX, you can rather look up the urls for the dynamic data. Then you can use the javascript on the page you're talking about to reformat this.
If you have Firefox/greasemonkey making a DOM dumper should be a simple matter.