Im trying to scrape a site that requires logging in. so i login.
The pages directly after the login seem to have alot of javascript.
How can i test out the responses of splash on scrapy shell?
Once i login, how can i run splash on the next url via splash command line and have it process the javascript and give me a response i can parse?
I dont understand what i actually need to do to run the splash service and how it functions with scrapy... Please point me in the right direction.
Related
I am trying to scrape from a fashion website that uses Javascript, using Scrapy
This is the page: https://www.thekooples.com/us_en/women/ready-to-wear/dresses.html
I have docker, and followed the instructions on splash docs to set up splash on localhost:8050.
I am able to render https://quotes.toscrape.com/js/ properly.
As I understand, that is a js page, and I disabled js, and it does look different when I do so.
However, I am unsuccessful in rendering the fashion webpage. This is what I get:
That is actually how the page looks like without js, so I know it is unsuccessful. What could be happening?
You can do try the following two things:
Increase the delay. You can also find an example script in splash server's homepage where you can wait for a particular element to appear.
You can download and print the HAR and see if any of the requests failed. If it failed then you may need to add some user-agent in your splash request.
I've looked up basic tutorials on Scrapy but don't have any idea where to start with this. I know that I will have to do a FormRequest of some sort in order to access the main website. From there on, I don't know how to simulate clicking the button so that Scrapy can access the other website. Using the inspect function, it's clear that the button simply redirects to another website, but requires credentials that are not the same as the main website.
I haven't begun any code because I want to lay it all out and plan it from there to see if it is feasible, but am very confused on where to start.
Any help would be appreciated.
I'm creating a app in Ionic 2, wich consumes a web api from an existing site. To use this API i have to make athenticate in it in the following way (Similar to facebook login):
I call the api login page in a InAppBrowser component, sending the proper keys and a return URL.
the user types the login and password in the form displayed, the API will validate it and authenticate it.
The API calls the return URL passing the authorization token.
I 'hijack' this redirect to the return url in the InAppBrowser 'loadstart' event, and extract and store the authorization token.
In the following calls to the API, i send the authorization token in the header.
This is all working fine in the emulator, but it doesn't work in the browser (with ionic serve), because when i call InAppBrowser it actually calls window.open, and the events doesnt work. I can't detect the redirect action made in the opened window.
I'd like to make this work in the browser since its better to debug the application there. My first thought was to send "http://localhost:8001" as the return url, but I couldn't find a way to catch the token parameter in the ionic application.
Does anyone know how I can catch this parameter or any other way to make this login work in the browser? It is for development and debug purposes only, so strict security is not a issue (I can comment out any unsecure code in the production version).
Edit: Hayden Braxton answer didn't solve my problem, but since it was because of something exclusively to my app, and it could really help someone who wants to make plugins work, I'll keep it as the selected answer.
Besides that, I'll share the solution I found to my problem in case it could help anyone. It was simple, actually:
I pass "http://localhost:8001" as the api return_uri parameter
the api will, after checking the login and password, redirect to http://localhost:8001?token=MY_AUTH_TOKEN.
This will reload the application and call login page again.
In the login page i call this.platform.getQueryParam("token"); to get the token.
Add
"browser": "ionic-app-scripts serve --iscordovaserve --sourceMap source-map --wwwDir platforms/browser/www/ --buildDir platforms/browser/www/build",
to the script section of your package.json. Then instead of doing ionic serve, instead run
npm run browser
We use ionic2 to develop our apps where I work, and this is what we figured out after some research.
Before using this, you need to have the browser platform added. You can accomplish this with the following:
ionic add platform browser
If the browser platform is already added, delete the browser directory from your platforms directory and then run the add platform command, just to be on the safe side.
I am developing a cocoa app for Mac OSX. It's a basic browser application and I use webview component.
In the page I want to connect, there is standard Login with Google Account button in order to login with my existing Google Account. When I clicked on this button nothing happens.
The same functionality works properly when I visited the same page by using Safari or any other browser but there is no reaction on webview component.
I've checked the action behind the Google's login button and here is the JS code.
onclick="return Dialog.Login.loginWithGoogle(false, 'https://www.mywebsite.com/-/oauth2callback', 'https://www.mywebsite.com/')"
As a part of the standard oauth process the process also has many redirections after this URL is called and normally should be completed at my site's login screen as expected. However, webview doesn't handle this.
Please note that the web site I am trying to connect in my webview is not belong to me and I have no control on it.
I checked many solutions on the web for 2 days but nothing helped.
Any help/hint will be appreciated.
I have had the facebook connect set up for over a few months and have done a lot of testing on it and everything seems to be working correctly. Suddenly when I try to login using the php facebook sdk I get redirected to the following page https://www.facebook.com/help/258359927634494
It seems to let me login on occasion, but it usually redirects to this page. I assume my app was reported however we only have a few test users at this time as we are in beta and I stopped my app from asking for repeatedly for post permissions a while back as soon as I was aware it was doing that. What can I do to clear up the report?
This message appears when an app loads the login dialog many times in a short duration. Make sure you are only loading the dialog once per user. Until this redirect behavior is tweaked to be me more lenient, try waiting ~30 seconds between tests.