I would like to use selenium to scrape the data from the table on this website: https://app.powerbi.com/view?r=eyJrIjoiM2ZiOGM4ODMtYzU0ZS00NzVlLTkyNjgtY2EwYzg0ZWVmMGI1IiwidCI6IjRlMjRkMDI2LWI5MTYtNGNiMS04YWZmLTI1ZmZhNzA1ZWVhMSIsImMiOjEwfQ%3D%3D
It seems that I cannot inspect the page. Does anyone have any clue how I can access the table and save the data in a pandas dataframe?
Thank you!
Anything that is an HTML element and that is present on the page can be inspected. Hit the F12 key on Chrome and hover over the table.
The table is a div with role of "presentation", each row is a div with role of "row".
Related
I'm scraping some data about car renting from getaround.com. I recently saw that it was possible to get cars availability with scrapy-splash from a calendar rendered with Javascript. An example is given in this url :
https://fr.getaround.com/location-voiture/liege/ford-fiesta-533656
The information I need is contained in the div tag with class owner_calendar_month. However, I saw that some data seem to be accessible in the div tag with class js_car_calendar calendar_large, in which the attribute data-path specify /dashboard/cars/533656/calendar. Do you know how to access this path ? And to scrape the data within it using Scrapy ?
If you visit https://fr.getaround.com/dashboard/cars/533656/calendar you get an error saying you have to be logged in to view the data. So first of all you would have to create a method in Scrapy to sign in to the website if you want to be able to scrape that data.
I am trying to crawl a website : https://www.firstpost.com/search/sachin-tendulkar
steps followed :
a. fetch("https://www.firstpost.com/search/sachin-tendulkar")
b. view(response) --> everything is working as expected till this point.
Once i start to extract the data with the below syntax I am able to only get divs upto certain levels
response.xpath('//div[#id="results"]').extract()
after this div i am not able to access any other divs and its content.
I haven't faced this kind of issue in past when developing crawler for other website.. is the issue site specific..?
Can you please let me know a way to crawl the internal divs?
Can you elaborate on "not able to access any other divs and its content"? Do you get any error?
I can access all the div's and their content. For ex. the main content of the search result is inside the div - gsc-expansionArea which can be accessed via
//div[class="gsc-expansionArea"]
and this can give you an iterable to work.
Only the first result is outside this div which can be accessed via another div
//div[class="gsc-webResult gsc-result"]
And the last sibling of this //div[class="gcsc-branding"] has no search results in it.
I have a group of year 9 girls who have entered a national competition. One of their tasks is to find the token that is displayed after they have clicked on a link 1,000,000 times. The webpage is simple - it has one button on it. I am sure that we can write some code to do this for us - I have heard of the Beautiful soup thing - does anyone have instructions how to do this? Thank you!
BeautifulSoup is a package for parsing HTML, i.e., retrieving elements or text from a request. You want something that simulates interacting with a web browser. Selenium is a good choice for this and works with Python.
Can you help me to identify element ID or any other locator of timeline composer in Facebook profile ?
I need this to use in Robot framework with selenium2library to post something on my wall.
I can log in to Facebook, navigate to profile, but I cant input text into timeline composer. I tried to use Click element before inserting text, but no success.
I am using "inspect element" in browsers/firebug add-on to identify elements.
In this case, unfortunately all locators I have tried giving errors like:
Element does not appear in 5 seconds
or
Element must be user editable in order to clear it
Non dynamic locator for FB timeline-composer has name "xhpc_message_text" (18.10.2016)
Input text name=xhpc_message_text test
In a Browser I have couple of img id's which shall dynamically change in every different selection of the release date.
Can you please guide me with a selenium web driver code which shall click the checkbox image, even if the image id changes over every different selection of the release date.
How to retrieve all the img id's in a web browser?
<img id="R1410ENDec14001001" class="child-img" src="../dyn/assets/checkbox_unchecked.png">
You can go with class or other properties if the id is dynamic like
//img[#class='child-img'] or
//img[contains(#src,'checkbox_unchecked.png')]
To retrieve all the image's which has id you can go with the following xpath
//img[#id]
Use findElements By xpath using the above one, and iterate over and get the id atribute of each element