Scrapy + Splash doesn't render the whole page - scrapy-splash

I'm trying to use scrapy and splash to get information from this webpage: https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?locations=XD-XP-XM.
Here is my Splash script:
function main(splash, args)
assert(splash:go(args.url))
splash:wait(20)
splash.private_mode_enabled = false
splash.html5_media_enabled = true
splash.request_body_enabled = true
splash:set_viewport_full()
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
But for some reason, the splash can't render the entire page.
In the webpage, it has a list of countries:
But in splash obtained HTML, it only has this:
I have tried solutions on this link
https://splash.readthedocs.io/en/stable/faq.html#website-is-not-rendered-correctly, but nothing seems work.
Could someone please help me solve this problem?
Thank you very much.

Related

React Native not able to draw image on canvas from file

I am having trouble getting a iOS asset file drawn to react-native-canvas.
I am loading the image with Expo Image Picker, and that is working (since it is working on the website as well)
Somewhere in this code, there is an exception being thrown, and I believe it is because the image src is not valid.
CanvasImage is imported from {Image} from react-native-canvas
#setImage2[0] is the current image, loaded from expo image picker, starts with file://, is a png
imageDraw = new CanvasImage(canv,setImage2[0].width,setImage2[0].height);
var imageinfo = await FileSystem.readAsStringAsync(setImage2[0].uri,{ encoding: FileSystem.EncodingType.Base64 });
imageDraw.src = "data:image/png;base64, "+imageinfo
imageDraw.height = setImage2[0].height
imageDraw.width = setImage2[0].width
var image_drawn = await can_context.drawImage(imageDraw,0,0,new_width,new_height);
const gID = await can_context.getImageData(0,0,new_width,new_height);
I am looking to draw the image to the canvas (context is can_context), and then get the pixels after drawing on the image.
Any help is greatly appreciated, even a different route to the same solution works.
Thanks in advance.

SAPUI5: PDF in iframe blank content

I've retrieved my pdf from my odata service call. However when I insert to html content using iframe tag, it display a blank page.
I tried below code. When i hardcode url with drive.google, my app able to show the pdf page but when i remove the hard coded value, my html content not able to render the pdf page. Any inputs on this?
// var pdfURL = "https://drive.google.com/viewerng/viewer?
//url=https://assets.cdn.sap.com/sapcom/docs/2015/07/c06ac591-5b7c-0010-82c7-
//eda71af511fa.pdf?pid=explorer&efh=false&a=v&chrome=false&embedded=true";
var pdfURL = "https://assets.cdn.sap.com/sapcom/docs/2015/07/c06ac591-5b7c-
0010-82c7-eda71af511fa.pdf";
var oHtmlChange = new sap.ui.core.HTML({
content: "<iframe src=" + pdfURL + " width='800' height='800'></iframe>"
});
poFormPanel.addContent(oHtmlChange);
Try to set the following content property value:
...
content: "<embed src='https://drive.google.com/viewerng/viewer?embedded=true&url=https://assets.cdn.sap.com/sapcom/docs/2015/07/c06ac591-5b7c-0010-82c7-eda71af511fa.pdf' width='800' height='800'>"
...
Here is a working example.

Crawling website by selecting java script drop down menu in scrapy using splash

I am trying to get daily prices from https://www.steelmint.com/ingot-prices-indian . I have setup a scrapy script using splash where I need to select drop down menu of different dates and scrap price as number. I just need two data from page, date and price.
I am not able to get drop down change it value and now able to find any tutorial which guides towards it. Most deal with form handling but is not working.
My lua script using Splash is:
function main(splash, args)
local form = splash:select('form-control')
local values = assert(form:form_values())
values.frmDt = "14"
values.frmMt = "March"
values.frmYr = "2018"
assert(form:fill(values))
assert(splash:go(args.url))
assert(splash:wait(0.5))
return {
html = splash:html(),``
png = splash:png(),
har = splash:har(),
}
end
Once page is rendered I am easily getting value. Newbie here. Thanks in advance.
I think that you should run the javascript through splash on the page, it is simpler. Look at the following working example:
function main(splash, args)
assert(splash:go(args.url))
assert(splash:runjs('document.getElementById("frmDt").value = "14"'))
assert(splash:runjs('document.getElementById("frmMt").value = "March"'))
assert(splash:runjs('document.getElementById("frmYr").value = "2018"'))
assert(splash:wait(0.5))
return {
html = splash:html(),``
png = splash:png(),
har = splash:har(),
}
end

How can I make sure scrapy-splash had render the entire page successfully

Problem Occurred When I Was Crawled The Whole Website By Using splash To Render The Entire Target Page.Some Page Was Not Random Successfully So I Was False To Get The Information That Supports To Be There When Render Job Had Done.That Means I Just Get Part Of The Information From The Render Result Although I Can Get The Entire Information From Other Render Result.
Here Is My Code:
yield SplashRequest(url,self.splash_parse,args = {"wait": 3,},endpoint="render.html")
settings:
SPLASH_URL = 'XXX'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
# Enable SplashDeduplicateArgsMiddleware:
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
# Set a custom DUPEFILTER_CLASS:
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter
# a custom cache storage backend:
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
I am replying this late because the question has no answer and because it is visible on Google search.
I had similar problem and the only solution I found (besides increasing the wait argument, which may or may not work, but is not reliable) is using the execute endpoint and custom lua script for waiting for an element. If this sounds unnecessarily complex, it is, Scrapy and Splash are not well designed in my opinion, but I have found nothing better yet for my needs.
My Lua script looks something like this:
lua_base = '''
function main(splash)
splash:init_cookies(splash.args.cookies)
splash:go(splash.args.url)
while not splash:select("{}") do
splash:wait(0.1)
end
splash:wait(0.1)
return {{
cookies = splash:get_cookies(),
html=splash:html()
}}
end
'''
css = 'table > tr > td.mydata'
lua_script = lua_base.format(css)
and I generate requests like this:
yield SplashRequest(link, self.parse, endpoint='execute',
args={
'wait': 0.1,
'images': 0,
'lua_source': lua_script,
})
It is very ugly, but it works.

Babylon Text Select

How does Babylon recognizing text selection in any software? I have to implement this.
Thanks.
Provided your just using Babylon just for your 3D content and html for your UI, try window.getSelection(). You could call this on a mouseup event, as you'd expect this to occur at the end of the highlight process.
Something like...
const elem = document.getElementById('yourDiv);
let textSelection;
elem.addEventListener('mouseup', ()=> { textSelection = window.getSelection() })
https://developer.mozilla.org/en-US/docs/Web/API/Window/getSelection