React website showing a blank white page and no content - scrapy-splash

My question is related to:
Scrapy-splash not rendering dynamic content from a certain react-driven site
But disabling private mode did not fix the issue (passing --disable-private-mode as an argument to the Docker container).
I made a single page app using React and all it does is add an element to root:
function F(){
class HelloMessage extends React.Component {
render() {
return React.createElement(
"div",
null,
"Hello ",
this.props.name
);
}
}
return(React.createElement(HelloMessage, { name: "Taylor" }));
}
But Splash only shows a blank white page!
I also made a single page website using Django and the HTML loads a script on head:
<script src="/static/js/modify-the-dom.js"></script>
And all the script does is to write something on the page after the content is loaded:
window.onload = function(){
document.write('Modified the DOM!');
}
This change shows up on the browsers! But on on Splash, if I remove the window.onload and just write document.write('Modified the DOM!'); it works fine! So the issue seems to be with window.onload even though I'm waiting long enough about 10 seconds on the LUA script.
So I was wondering how to fix that.
I also checked my Splash with this website and it seems JS is not enabled!
https://www.whatismybrowser.com/detect/is-javascript-enabled I wonder if there's an option I have not enabled?
One example of a client-rendering website that Splash does not crawl properly (same issues above) is https://www.digikala.com/ it only shows a blank white page with some HTML that needs to be populated with AJAX requests. Even https://web.archive.org/ does not crawl this website's pages properly and shows a blank page.
Thanks a lot!
Update:
The Splash script is:
function main(splash, args)
splash:set_user_agent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36")
assert(splash:go(args.url))
assert(splash:wait(30)) -- Tried different timeouts.
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
I also tried setting these to no avail:
splash.plugins_enabled = true
splash.resource_timeout = 0

Related

<video> tag. DOMException: The element has no supported sources, when not utilizing require()

I am trying to play a video when developing locally with VueJS 2.
My code is the following :
<video class="back_video" :src="`../videos/Space${videoIndex}.mp4`" id="background-video"></video>
...
data :
function() {
return {
videoIndex:1
}
}
...
const vid = document.getElementById("background-video");
vid.crossOrigin = 'anonymous';
let playPromise = vid.play();
if (playPromise !== undefined) {
playPromise.then(function() {
console.log("video playing");
}).catch(function(error) {
console.error(error);
});
}
This code is causing the exception given in title. Tried in several browsers, always the same.
If I change the src by :
:src="require(`../videos/Space${videoIndex}.mp4`)"
it works.
But in that case building time is very long as I have many different videos in my videos directory, because adding require() will force to copy all videos in the running directory at build phase (vue-cli serve), and this is really annoying. In other words I want to refer videos that are outside the build directory to avoid this (but also to avoid having videos in my git).
It is interesting to note that when I deploy server side, it works perfectly with my original code
:src="`../videos/Space${videoIndex}.mp4`"
Note also that if i replace my code with simply
src="../videos/Space1.mp4"
it works too. So the video itself, or its location, are not the source of the problem.
Any clue ?
You can host your videos on a CDN to have something faster and easier to debug/work with.
Otherwise, it will need to bundle it locally and may take some time.

Splash is not loading javascript properly for crawling a website (scrapy-splash)

I am having difficulty debugging why Splash is unable to load the javascript properly for this website: https://www.bigc.co.th/2000003703081.html
I have referred to this answer to disable Splash from running in private mode and also other solutions here in the docs, such as increasing wait time and setting user agent.
However, the website is still loading as though without JS on Splash, as seen in this picture below
I am currently debugging why the website is not loading properly for the purpose of incorporating this with scrapy-splash code.
Here is my current script. Thank you for your time and assistance in advance!
function main(splash, args)
splash:set_user_agent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36')
splash.private_mode_enabled = false
splash:set_viewport_full()
splash:go{url=args.url}
assert(splash:wait(1))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end

Uploading image [Cypress]

I'm trying to upload a jpeg image from local files to a webpage developed here in my job. The thing is, we need to click on the page to open the file explorer and then select the image (or drag and drop into the same spot that may be clicked).
Here is a picture from the web page
I don't know how could i do that, i was trying some code that i've seen in "https://medium.com/#chrisbautistaaa/adding-image-fixtures-in-cypress-a88787daac9c". But don't worked. I actually don't know how it works exactly, could anyone help me?
Here is my code
After #brendan's help, I was able to solve the problem by finding an input that was "hidden" under an element. However, before that I tried drag-n-drop, and cypress returned me an error (despite the successful upload). The context was, immediately after the upload, the element re-renders and cypress told me that:
.
Beside the success with input element, i was wondering how it would be possible to resolve this error, is it possible to do something internally to cypress to ignore or wait until the element re-renders back to normal?
Solutions suggested by cypress:
We're doing this using cypress-file-upload
Here's an example from our code:
cy.fixture(fileName).then(fileContent => {
cy.get(selectors.dropZoneInput).upload(
{ fileContent, fileName, mimeType: "application/pdf" },
{ subjectType: "drag-n-drop" }
);
});
For your purpose, I think this will work:
cy.fixture(imagePath).then(fileContent => {
cy.get(".upload-box").first().upload(
{ fileContent, fileName, mimeType: "image/jpeg" },
{ subjectType: "drag-n-drop" }
);
});

reCAPTCHA not showing after page refresh

Why is google reCaptcha2 (gReCaptcha) not showing after a page refresh, but showing if page is reopened by the link?
See this video for explanation: http://take.ms/I2a9Z
Page url: https://orlov.io/signup
Page first open: captcha exists.
Navigate by link: captcha exists.
Open new browser tab: captcha exists.
Refreshing page by refresh icon, ctrl+R, ctrl+F5: captcha NOT exists.
I added body unload event to prevent browser cache, it did not help.
Browsers for testing:
Firefix 39.0
Chome: 44.0.2403.125 m
Opera: 30.0
In all browsers I get the same result. So does this mean there's an error on my side?
I think it has to do with the browser and the speed of your network. You are calling ReCaptcha with a callback, but you call it before you define the callback. Depending on your network speed or browser quirks, it might execute before the rest of the script has loaded.
Line 330:
<script src="//www.google.com/recaptcha/api.js?onload=renderReCaptchaCallback&render=explicit&hl=en-US" async defer></script>
Line 351:
<script type="text/javascript">if (typeof (renderReCaptchaCallback) === "undefined") {
var reCaptchaWidgets = {};
var renderReCaptchaCallback = function() {
jQuery.each(reCaptchaWidgets, function(widgetId, widgetOptions) {
grecaptcha.render(document.getElementById(widgetId), widgetOptions);
});
};
}</script>
So I would move the definition of renderReCaptchaCallback to the top of the page so it is defined well before trying to load it.

Casperjs switch MainFrame to a new tab

I'm using casperjs to navigate a site, but I'm having trouble with the login process:
In the site, when you login, the browser is switched to a new tab and the login form is reset back to blank, I'm seeing this new tab with the data I need being requested in the navigation debug, like this:
[debug] [phantom] Navigation requested: url=www.thesite.com, type=FormSubmitted, willNavigate=true, isMainFrame=false
I've noticed the isMainFrame = false and I've tried to switch the frame with some methods like switchToChildFrame, casper.withFrame() or casper.withPopup(), but I've failed.
Is there any way that I can retrieve and interact with the content of that request?
My code so far:
casper.start('www.thesite.com', function() {
casper.userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X)');
this.wait(2000);
casper.withFrame('centro', function() {
this.fillSelectors('form[name="teclado"]', {
'input[name="IdGroup"]': 'AAA',
'input[name="IdUser"]': 'BBB',
'input[name="password"]': 'CCC'
}, false);
//this capture retrieves the form filled with my data
this.capture('../../../1.png');
this.click("a[onclick='enviar();return true;']");
});
});
casper.withFrame( 0, function(){
//2.png is getting the form with blank fields
this.capture('../../../2.png');
});
Thanks in advance.
Ok I solved my issue
The problem was that I wrote the withPopup code badly
I wrote the regex to match the popup url too strict by putting most of the url I expected:
casper.withPopup(/Estatico\/BEComponentesGeneralesAccesoSEI\/Html\/login.htm/, function() {
This works OK for me:
casper.withPopup(/true$/, function() {
this.capture('../../../2.png');
});