Splash is not loading javascript properly for crawling a website (scrapy-splash) - scrapy

I am having difficulty debugging why Splash is unable to load the javascript properly for this website: https://www.bigc.co.th/2000003703081.html
I have referred to this answer to disable Splash from running in private mode and also other solutions here in the docs, such as increasing wait time and setting user agent.
However, the website is still loading as though without JS on Splash, as seen in this picture below
I am currently debugging why the website is not loading properly for the purpose of incorporating this with scrapy-splash code.
Here is my current script. Thank you for your time and assistance in advance!
function main(splash, args)
splash:set_user_agent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36')
splash.private_mode_enabled = false
splash:set_viewport_full()
splash:go{url=args.url}
assert(splash:wait(1))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end

Related

React website showing a blank white page and no content

My question is related to:
Scrapy-splash not rendering dynamic content from a certain react-driven site
But disabling private mode did not fix the issue (passing --disable-private-mode as an argument to the Docker container).
I made a single page app using React and all it does is add an element to root:
function F(){
class HelloMessage extends React.Component {
render() {
return React.createElement(
"div",
null,
"Hello ",
this.props.name
);
}
}
return(React.createElement(HelloMessage, { name: "Taylor" }));
}
But Splash only shows a blank white page!
I also made a single page website using Django and the HTML loads a script on head:
<script src="/static/js/modify-the-dom.js"></script>
And all the script does is to write something on the page after the content is loaded:
window.onload = function(){
document.write('Modified the DOM!');
}
This change shows up on the browsers! But on on Splash, if I remove the window.onload and just write document.write('Modified the DOM!'); it works fine! So the issue seems to be with window.onload even though I'm waiting long enough about 10 seconds on the LUA script.
So I was wondering how to fix that.
I also checked my Splash with this website and it seems JS is not enabled!
https://www.whatismybrowser.com/detect/is-javascript-enabled I wonder if there's an option I have not enabled?
One example of a client-rendering website that Splash does not crawl properly (same issues above) is https://www.digikala.com/ it only shows a blank white page with some HTML that needs to be populated with AJAX requests. Even https://web.archive.org/ does not crawl this website's pages properly and shows a blank page.
Thanks a lot!
Update:
The Splash script is:
function main(splash, args)
splash:set_user_agent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36")
assert(splash:go(args.url))
assert(splash:wait(30)) -- Tried different timeouts.
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
I also tried setting these to no avail:
splash.plugins_enabled = true
splash.resource_timeout = 0

Twitch thumbnail not visible in webview

I am working on a React Native app that displays HTML content using the react-native-render-html library. This HTML data is coming from content formatted in WordPress. The same content is also being displayed in the web app made with ReactJS. I am also using this iframe plugin, as recommended by the render-html library, to display the iframes in my HTML.
I have the following problem with rendering Twitch video embeds.
The Twitch embeds are in this format:
<iframe src="https://clips.twitch.tv/embed?clip=ShortHelpfulSquirrelCmonBruh-hWj49qxBfx-VKseO&autoplay=false&parent=my.parent.domain" width="640px" height="360px" frameborder="0" scrolling="no" allowfullscreen="true"></iframe>
Please note that these are not live streams, but clips. The ReactJS website displays these embeds correctly with the thumbnail:
With the same embed code, however, my React Native app displays the embed without the thumbnail but with just a black background (although it shows the title and controls):
Once I play and pause the video, the background does not remain black, however:
Here is the rendererProp that I am passing to the library for iframes:
iframe: {
scalesPageToFit: true,
webViewProps: {
scrollEnabled: false,
mediaPlaybackRequiresUserAction: true,
javaScriptEnabled: true,
domStorageEnabled: true,
userAgent: Platform.select({
ios: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Safari/605.1.15",
android: "Chrome/56.0.0.0 Mobile android",
}),
},
}
I am passing the IframeRenderer and iframeModel as imported directly from #native-html/iframe-plugin (linked at the beginning) into the renderers and customHTMLElementModels props of the <RenderHtml/> component. Also I am using the WebView from react-native-webview lib.
My team and I cannot figure out why the thumbnail is not appearing in the react-native-webview, but works fine in the website. If anyone has any experience rendering Twitch video iframe embeds on react-native-webview, please take a look. Any help will be appreciated.

Selenium Webdriver (Appium) - switching to a UIWebview in a Hybrid app (iOS)

I am using a hybrid app and writing tests using Appium + Selenium Webdriver in Ruby.
I start my test with some textbox editing + click a button to open the UIWebview (so far everything works). The problem is when the UIWebview is opened - I cannot access it (it is immediately closed when I'm trying to click a html element (I am using Appium inspector to find elements and to record my Ruby test). I understand that I have to switch to the UIWebview (as I found here), but I cannot make it to work.
Code example:
require 'rubygems'
require 'selenium-webdriver'
capabilities = {
'browserName' => 'iOS',
'platform' => 'Mac',
'version' => '7.1',
'device' => 'iPhone Retina (4-inch)',
'app' => '/Users/{my user here}/Library/Developer/Xcode/DerivedData/{path here}/Build/Products/Debug-iphonesimulator/SDK.app'
}
server_url = "http://127.0.0.1:4723/wd/hub"
#wd = Selenium::WebDriver.for(:remote, :desired_capabilities => capabilities, :url => server_url)
# ...
# Do all kind of native actions here
# ...
#wd.find_element(:name, "showWebviewButton").click
#wd.manage.timeouts.implicit_wait = 30 # seconds
# ???
# How do I switch to my UIWebview here???
# (cannot access html elements here with #wd.find_element(:name, "htmlElement"))
# ???
#wd.quit
EDIT:
Using Appium inspector I found that my UIWebview is "window(1) ", so I tried:
#wd.switch_to.window(1)
This gives me the error:
A request to switch to a different window could not be satisfied because the window could not be found
(The error is thrown before the UIWebview is loaded)
It seems , you are switching to WebView before it loads. Please pause the script for some time and then switch to the WebView after it appears.
I have tried the following in java and worked fine for me, you may need to find the ruby version of the same.
driver.switchTo().window("WEBVIEW");
try this
#wd.switch_to.window("WEBVIEW") // i am not sure about the syntax
You need to access the element as shown below
findElement(By.xpath("//input[#name='firstName']"))
The only solution that finally worked for me (using Appium 1.2) is (taken from here, written in node.js)
// javascript
// assuming we have an initialized `driver` object for an app
driver
.contexts().then(function (contexts) { // get list of available views. Returns array: ["NATIVE_APP","WEBVIEW_1"]
return driver.context(contexts[1]); // choose the webview context
})
// do some web testing
.elementsByCss('.green_button').click()
.context('NATIVE_APP') // leave webview context
// do more native stuff here if we want
.quit() // stop webdrivage
In the above link you could find the solution written in other languages.

CasperJS/PhantomJS unable to open Facebook

I've seen examples such as this one showing how to login to facebook with casperJS:
How to login into a website with CasperJS?
but am unable to get this code to work for me. I'm not interested in the login portion, I just want to load any facebook page into casperjs or phantomjs but keep getting a fail on load.
Is this working for anyone else? or has facebook detected the browser and not allowing access anymore?
Here is a simplified version of what I am unable to do:
var casper = require('casper').create({
verbose: true,
logLevel: 'debug',
pageSettings:{
userAgent :'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4' ,
loadImages: false, // The WebPage instance used by Casper will
loadPlugins: false // use these settings
}
});
casper.start()
casper.thenOpen('https://www.facebook.com/pfchangs', function() {
this.echo(this.getHTML());
//this just prints out empty page: <html><head></head><body></body></html>
});
casper.run();
I'm running this on Windows 7
Looks like this is known bug with casperjs windows batch file:
https://github.com/n1k0/casperjs/commit/0d659f140f1e2120bed967d8301657b5fe79f19c

Issues hiding a Flash SWF object in Safari 5.1 on OSX 10.7 (It works fine with Safari 5.1 on 10.6)

I am using JS to hide and show a flash video object at various stages in a flow. The function works perfectly in all browsers including Safari 5.1 on OSX 10.6, but does not work on Safari 5.1.3, 5.1.4 and 5.1.5 on OSX 10.7. It repositions on the page but remains visible.
You can see the issue here.
Any help really appreciated!
Embed code:
var swfVersionStr="10.2.0";
var xiSwfUrlStr="/video/expressInstall.swf";
var flashvars={
sToken:"#{#stream_name}",
sSWFPath: "/video/Recorder.swf",
sConfigPath: "#{current_recorder_config_file}"
};
var params={
bgcolor:"#FFFFFF",
allowfullscreen:"true",
allownetworking:"all",
allowscriptaccess:"always",
base:".",
devicefont:"false",
menu:"false",
play:"true",
quality:"high",
salign:"tl",
scale:"showall",
seamlesstabbing:"false",
swliveconnect:"true",
wmode:"window"
};
var attributes={
id:"Recorder",
name:"Recorder"
};
swfobject.embedSWF("/video/Recorder.swf", "flashContent", "384", "318", swfVersionStr, xiSwfUrlStr, flashvars, params, attributes);
JS for Hide and Show:
function hideVideo() {$('.step_video, #flashContent').css({visibility:'hidden', height:1})}
function showVideo() {$('.step_video, #flashContent').css({visibility:'visible', height:'auto'})}
Already had this bug before you shouldn't hide it, it's a bug with flash.
My workaround was :
position: absolute;
left: -5000px;