How to get data in dashboard with Scrapy? - scrapy

I'm scraping some data about car renting from getaround.com. I recently saw that it was possible to get cars availability with scrapy-splash from a calendar rendered with Javascript. An example is given in this url :
https://fr.getaround.com/location-voiture/liege/ford-fiesta-533656
The information I need is contained in the div tag with class owner_calendar_month. However, I saw that some data seem to be accessible in the div tag with class js_car_calendar calendar_large, in which the attribute data-path specify /dashboard/cars/533656/calendar. Do you know how to access this path ? And to scrape the data within it using Scrapy ?

If you visit https://fr.getaround.com/dashboard/cars/533656/calendar you get an error saying you have to be logged in to view the data. So first of all you would have to create a method in Scrapy to sign in to the website if you want to be able to scrape that data.

Related

How Do I Get MetaObject Values On Single MetaObject Page In Shopify

I am using metaObjects in shopify. I have created a metaObject name "Projects" . I created a page "Projects" to display those saved projects through liquid code. It is working as per the requirement on Projects page but now i want to display each project detail on single page. How do i approach this ? Like should i create a page and set a template on it and in that template , fetch the metaobject handle to display its details . I am facing issue like how do i call the same single page for all projects details and display its details. Can someone please guide what is the possible way to get each metaObject values on single project page .
Thanks

XHR request pulls a lot of HTML content, how can I scrape it/crawl it?

So, I'm trying to scrape a website with infinite scrolling.
I'm following this tutorial on scraping infinite scrolling web pages: https://blog.scrapinghub.com/2016/06/22/scrapy-tips-from-the-pros-june-2016
But the example given looks pretty easy, it's an orderly JSON object with the data you want.
I want to scrape this https://www.bahiablancapropiedades.com/buscar#/terrenos/venta/bahia-blanca/todos-los-barrios/rango-min=50.000,rango-max=350.000
The XHR response for each page is weird, looks like corrupted html code
This is how the Network tab looks
I'm not sure how to navigate the items inside "view". I want the spider to enter each item and crawl some information for every one.
In the past I've succesfully done this with normal pagination and rules guided by xpaths.
https://www.bahiablancapropiedades.com/buscar/resultados/0
This is XHR url.
While scrolling the page it will appear the 8 records per request.
So do one thing get all records XPath. these records divide by 8. it will appear the count of XHR requests.
do below process. your issue will solve. I get the same issue as me. I applied below logic. it will resolve.
pagination_count = xpath of presented number
value = int(pagination_count) / 8
for pagination_value in value:
url = https://www.bahiablancapropiedades.com/buscar/resultados/+[pagination_value]
pass this url to your scrapy funciton.
It is not corrupted HTML, it is escaped to prevent it from breaking the JSON. Some websites will return simple JSON data and others, like this one, will return the actual HTML to be added.
To get the elements you need to get the HTML out of the JSON response and create your own parsel Selector (this is the same as when you use response.css(...)).
You can try the following in scrapy shell to get all the links in one of the "next" pages:
scrapy shell https://www.bahiablancapropiedades.com/buscar/resultados/3
import json
import parsel
json_data = json.loads(response.text)
sel = parsel.Selector(json_data['view']) # view contains the HTML
sel.css('a::attr(href)').getall()

How to follow lazy loading with scrapy?

I am trying to crawl a page that is using lazy loading to get the next set of items. My crawler follows normal links, but this one seems to be different:
The page:
https://www.omegawatches.com/de/vintage-watches
is followed by https://www.omegawatches.com/de/vintage-watches?p=2
But only if you load it within the browser. Scrapy will not follow the link.
Is there a way to make scray follow the pages 1,2,3,4 automatically?
The page follows Virtual scrolling and the api through which it gets data is
https://www.omegawatches.com/de/vintage-watches?p=1&ajax=1
it returns a json data which contains different details including products in html format, and if the next page exist or not in a a tag with class link next
increase the page number till there is no a tag with link next class.

Scrapy is not returning any data after a certain level of div

I am trying to crawl a website : https://www.firstpost.com/search/sachin-tendulkar
steps followed :
a. fetch("https://www.firstpost.com/search/sachin-tendulkar")
b. view(response) --> everything is working as expected till this point.
Once i start to extract the data with the below syntax I am able to only get divs upto certain levels
response.xpath('//div[#id="results"]').extract()
after this div i am not able to access any other divs and its content.
I haven't faced this kind of issue in past when developing crawler for other website.. is the issue site specific..?
Can you please let me know a way to crawl the internal divs?
Can you elaborate on "not able to access any other divs and its content"? Do you get any error?
I can access all the div's and their content. For ex. the main content of the search result is inside the div - gsc-expansionArea which can be accessed via
//div[class="gsc-expansionArea"]
and this can give you an iterable to work.
Only the first result is outside this div which can be accessed via another div
//div[class="gsc-webResult gsc-result"]
And the last sibling of this //div[class="gcsc-branding"] has no search results in it.

Having one spider use items returned from another spider?

So I've written a spider that extracts certain desired links from a webpage and puts the URL, link text, and other information not necessarily contained in the <a> tag itself, into an item for each link.
How should I pass this item onto another spider which scrapes the URL provided in that item?
This question has been asked many times.
Below are some links on this site that answer your question.
Some answer it directly ie passing items to another function but you may realise that you do not need to do it that way, so other methods are linked to show whats possible.
Using multiple spiders at in the project in Scrapy
Scrapy - parse a page to extract items - then follow and store item url contents
Scrapy: Follow link to get additional Item data?