I am trying to load through the information about each stock page in investing.com starting from the drop-down list of "Dow Jones Industrial Average" on page investing.com/equities
I have been thinking about using scrapy with
options = response.css("select[class=stocksFilter] option[id=166]")
but this does not simulate a selection action.
After the selection action, I will be going through the table items one by one in #cross_rate_markets_stocks_1, and crawl those equity pages recursively
Can you point out how to simulate a click action?
The selection action is user interaction with the browser UI, but scrapy doesn't render a webpage, we cannot simulate user interaction or run Javascript with it. However, if you're interested in crawling by simulate user interaction, selenium might be a good tool for you.
Back to the question, if we are to crawl with scrapy, we should focus on requests and responses sent to/by the target website, you can log them in the Developer Tools of your browser. After you opened the Developer Tool, click the dropdown menu, you can see the corresponding request is sent to this url:
https://cn.investing.com/equities/StocksFilter?noconstruct=1&smlID=0&sid=&tabletype=price&index_id=166
It's a GET request, with index_id assigned with selected stock ID, you can get the stock ID and name from HTML element of https://investing.com/equities
'xpath of stock ID: //*[#id="stocksFilter"]/option/#id'
'xpath of stock Name: //*[#id="stocksFilter"]/option/text()'
Related
Trying to automate Goibibo Hotel Search, but after clicking on Search Button, it goes to the Results page, but not loading complete page. It is not displaying the hotel options and other options on page. It is showing like processing to load. What could be the possible reason?
Wondering the best way to prevent a GTM tag from firing. I found https://rbardini.com/automating-gtm-data-layer-tests/ which tags about fetching the dataLayer variable and comparing it in an assertion, but this looks like a clumsy approach when you want to write to the dataLayer on every page.
For example, it suggests:
const getDataLayer = ClientFunction(() => window.dataLayer)
We use Google Tag Manager to automatically load tags on our website. Unfortunately one of them is CloudIQ (from PayPal) which pops up an iframe overlay offering a newsletter signup or ability to save your shopping basket. The Trigger in our GTM setup for that tag is simply 'All Pages'. When it pops up it generally blocks our test because Selectors cannot be clicked.
Our page flow is over several pages of an online shop, e.g.:
visit home page, click a product - navigates to a product page
click some options on the product page, then add to cart
go through checkout flow
So there might be many pages visited due to click actions.
There is an ability in GTM to define Variables and then use them in Exceptions for a tag, so I could prevent the CloudIQ tag firing either via a/ a global variable or b/ a dataLayer variable. However, I can't see how to elegantly get these set for each page visited during my test, such that they would exist when the GTM examines variables in order to block a Tag from being loaded. Fixture.beforeEach isn't right because it would only run once per fixture, and any data it set on the page's scope would be lost as soon as a page navigation occurs.
Anyone got experience of this sort of thing?
(The alternative of course is to detect the overlay, use switchToIframe to switch into the CloudIQ iframe and close it manually, but it pops up quite erratically and I'd prefer to simply disable the Tag altogether during tests as it's not core functionality of our website that we need to test.)
One way would be to set a custom user agent string to your test suite, create a custom javascript variable that returns the value for navigator.useragent, and make an exception trigger that blocks the tag.
Or any variation on that theme - set a cookie, use a url parameter, or if you test suite allow inject a global js variable, and check for the value in an exception trigger.
There is no need to avoid firing of events on the client side. Just mock the service routes for Google Tag Manager and CloudIQ and imitate correct responses for them.
I am developing an e-commerce application using broad-leaf commerce.
My requirement is I have to add an product from login panel and display that product to only logged in users. Here few product products will be visible to all users (guest too) and few will be visible to only logged in users.
is there any way to do this?
Thanx soulfly1983 fou your try,but I found another alternative to do this without any customization. here is the full procedure..
Add a new category from admin panel.
Add a new page from admin panel (under content tab) and note the URL should be the same of category and page.
3 In the page click on rule tab.
4 Check the yes button in "Restrict to certain customers?"
Click +rule button and the select "match all" and select customer registered is equal to false
So this page will be visible to only guest users.
In the HTML body section of the rule (in general tab) write a message "you need to log in to view this stuff"
When user will log in successfully the user will not be able to the page , because we applied a rule that only logged out users can see the page so this time user will see the category and products added to that category.
am I doing right? any regarding this suggestion?
You can either extend the Product entity and add a field that will indicate whether that product will be visible to all users, or alternatively you could simply add an attribute for each product via the admin interface. Either way you will need to modify the UI logic so that it will take this additional field (or attribute) into consideration.
I am writing a program to automate link validations in a site. Our site is having more than 400 links per page and we need to open each link and see it is returning a valid page i.e 200, there are other requirements as well to check if the page is a 404 redirection page etc. It means to validate 400 inks it will take about 30 minutes or so.
My design is to integrate this with the Front-End (Selenium) automation in a way that each time the browser loads a new page or browser refreshes it will trigger a new thread by passing the page source for validating all the href available.
We are not following a page object model otherwise I could trigger this in my each page.
Question here is that is there any way we can listen to a browser refresh or page load event using Selenium Web Driver?
Correct me if I don't understand your question, but page_refresh and page_load_event can be two very different goals for you, if you are dealing with AJAX. You can try this article about the AJAX part
and this one for selenium custom events synchronization.
This solution here is the most actual I could find.
Actually Selenium is JS driver so this answers can be helpful if you want to try it too:
check-if-page-reloaded-or-refresh-in-js
is-page-reloaded-or-refreshed-using-jquery-or-javascript
post_detect_refresh_with_javascript
Is it possible to make a program that open a page (as if a bookmark file were opened by IE), and based on its content generate a feedback, that should be fedback in a textbox on said page by pressing a button on said page?
I need this program to execute on a set time schedule to feed some data to a web server based on time dependent web page data.
Yes, that is possible. It is generally called screen scraping. You basically retrieve the web page in question via a HTTP request, parse/analyze the page you got, then send back the data that should go into the textbox (again a HTTP request).
There are libraries to do that. Here is an article describing an example in Perl:
http://www.perl.com/pub/a/2003/01/22/mechanize.html