Is it possible to scrape an Angular Website using Selenium-python? - selenium

I have been trying to scrape an Angular Website using Selenium. To my surprise it doesn't let you scrape the html rendered contents as it renders it dynamically using Javascript. I want to locate those tags for the purpose of scraping but I am unable to do so. What is the right way to scrape them? Here is some more context:
They say you can't do it using python.
Some also tried downloading all the html content and then read them. But again this isn't my use case.
But my use case is a lot different:
I want to login to my google account then it redirects me to an angular page where I click a button called reporting and from there I am redirected to a page from where I have to finally click download button to download the report.

Related

Reason SPA pages are refreshing?

Just finished learning Vuejs and after visiting a few websites that use Vuejs like;
a) https://coderstape.com
b) https://www.thenetninja.co.uk
c) https://laracasts.com
I noticed that by navigating around the websites we by clicking on navbar links and some other links then the pages refresh and I haven't been able to find out the reason online. Could someone kindly explain what's happening in that? Doesn't it go against the purpose of SPA?
For example the last site you specified: https://laracasts.com.
On its main page there is a white button "BROWSE COURSES". If you open Chrome DevTools panel(look at the picture with explanations), go to tab "Networks" (1) and then click on this white button, you can see GET request to "series?curated" (2). If you open its details, you can see that as response, new page is received in the form of an HTML code (3), not JSON for example, as is usually the case in SPA.
Also, if you look at what programming language is used on this site, for example, using service https://whatcms.org/?s=laracasts.com, you can see that this is a PHP, namely Laravel.
From all this, I can make the assumption that they use Vue.js only partially, maybe in several components, but the site navigation itself is presented in the form of traditional static pages, which is why the page reloads.
Also, for example, if you take a look at this website https://www.spendesk.com/, you can see that they use Vue.js+Nuxt.js, as well as Node.js, as indicated by service whatcms.org, and if you try to navigate to various pages on this site, you will see no page loading. I can say that this site is a true SPA in the form in which you mean it.
I heard that you can do a SPA with a Laravel backend, but I think that's another story.

Populate the device data contents from thingsboad on mobile?

I have this device data that are shown in real time using ThingsBoard.
And I have an iframe to show the device data content on a web page.
If I used the same iframe on my ionic app's HTML page, I am getting all the header, submenu and other unnecessary things that are not needed in the app.
What I need is the main content area of the device data section.
Can I do that with the iframe or do I need to call all the individual ThingsBoard API's to populate the dashboard on my own?
If you are managing the Thingsboard then you can develop custom iframes with unnecessary headers/footers removed. Then you will get clean iframes to display inside your Ionic app.
In case the Thingsboard is not managed by you then you have to develop adapters to scrape the html data from the Webpage. These adapters will provide your Ionic app only the data you require.
You can use PHP Simple HTML DOM Parser for implementing these data scraping adapters.
PHP Simple HTML DOM Parser

capturing a browser refreshed event using Selenium Web Driver

I am writing a program to automate link validations in a site. Our site is having more than 400 links per page and we need to open each link and see it is returning a valid page i.e 200, there are other requirements as well to check if the page is a 404 redirection page etc. It means to validate 400 inks it will take about 30 minutes or so.
My design is to integrate this with the Front-End (Selenium) automation in a way that each time the browser loads a new page or browser refreshes it will trigger a new thread by passing the page source for validating all the href available.
We are not following a page object model otherwise I could trigger this in my each page.
Question here is that is there any way we can listen to a browser refresh or page load event using Selenium Web Driver?
Correct me if I don't understand your question, but page_refresh and page_load_event can be two very different goals for you, if you are dealing with AJAX. You can try this article about the AJAX part
and this one for selenium custom events synchronization.
This solution here is the most actual I could find.
Actually Selenium is JS driver so this answers can be helpful if you want to try it too:
check-if-page-reloaded-or-refresh-in-js
is-page-reloaded-or-refreshed-using-jquery-or-javascript
post_detect_refresh_with_javascript

Login to Google from iFrame

I have seen that it is not possible to display any Google page from an iframe. An error message is displayed: cannot display, open in a new window.
I need to login to Google (OpenId authentication) from an iFrame in Joomla (cannot change this). Is there a workaround for this? I thought I could open the authentication page in a new window, and then try to kill that window and reload the original one, but I am not sure I can do that.
Thanks
Well you can just get the form (html code) and put it in your iframe but this will get very messy, for example, there maybe certain JS files that you need to include as well.
Redirecting to Google is best way to implement it.As Using IFrame Sometime does not allowed by some Companies Due to Security.

using selenium-rc can i load a page, click a bookmarklet, and fill up the in-page loaded form?

Is there anyway by which I can automate the following steps
in Selenium-rc
open a page
click on bookmarklet in browser toolbar
fill up data in the form loaded into the page by said bookmarklet.
If the bookmarklet is not accessible as it is part of the browser/bookmark toolbar,
is there a way in which I can inject the javascript into the page and have it execute?
You are 99% there! You're right, you can't actually click the bookmarklet, but you can inject the same JavaScript in to the page. Simply use the getEval() command to evaluate the JavaScript.