Scrapy- Is there any way to clear cookies by spider - scrapy

I am crawling a website through scrapy. After a bunch of requests, the spider is stopping which means there is no pagination. When I open the same in browser it is showing me as Please close the browser and try again Or return to the home page and try again, But when I do inspect element and clear cookies in Resources tab I am able to view the page again...
How do I clear the cookies now with my spider???? I have added download_delay as 4 but still I see the same problem

Try to disable cookies within your settings.py file. Set:
COOKIES_ENABLED = False

Related

TestCafe: why t.navigateTo(URL) clears out the cookie

I'm manually setting the cookie during the test run as the userRole is not working for me in my local environment as the client sets the cookie and due to some reasons TestCafe clears it out.
When my first test is ran it kicks out and I set the cookie before the second test. I can see the cookie exists but as I'm already in the login page I need to use t.navigateTo(URL) to login to the homepage. When I use it t.navigateTo(URL) clears out the cookie and I keep in the login page instead of the homepage.
If I manually enter the URL of my homepage while the test is stopped at a breakpoint I'm successfully able to go to my home page and my test pass if the debug is resumed from this point onwards.
I was hopeful that the t.navigateTo(URL) would be just doing the redirect to URL but it seems with the redirect it is clearing out cookies as well. Any fix for this?
I actually figured out. By running the tests with flag --disable-page-caching it worked.

Session variables inside iframe 3rd part on safari doesnt work

My sites when used in iframes stop working in Safari browsers because session variables stop working, in IE or Chrome they still work.
To resolve this issue, you had to configure IIS as follows:
If you have ajax calls, remove the back slash from url "\Compra"

Setting cookies through Selenium WebDriver is not persisting if I set it on base url and then navigate away to some other URL (in the same domain)

I am getting the login session through some API. I do the following -
Navigate to the www.example.com
Set cookies through Selenium WebDriver.
Navigate the browser to www.example.com\some-other-path
And voila, the cookies don't get apply as I get the login page again.
However, If I reload the page after doing #2, and then navigate away, the cookies seems to be applied correctly. Any idea guys what could be the issue?
Here is my code -
driver.get("http://www.example.com");
driver.manage().deleteAllCookies();
driver.manage().addCookie(c1); //I have the cookie object
//driver.navigate().refresh(); If I uncomment it, works good
driver.navigate().to("http://www.example.com/some-other-url");
You could go through the following post which talks about using the same cookies
https://sqa.stackexchange.com/questions/15594/selenium-how-to-access-the-same-session-in-a-new-window
Hope this helps

Saving cookies between scrapy scrapes

I'm collecting data from a site on a daily basis. Each day I run scrapy and the first request always gets redirected to the sites homepage because it seems scrapy doesnt have any cookies set yet. However after the first request,scrapy receives the cookie and from then on works just fine.
This however makes it very difficult for me to use tools like "scrapy view" etc with any particular url because the site will always redirect to the home page and thats what scrapy will open in my browser.
Can scrapy save the cookie and I specify to use it on all scrapes? Can I specify to use it with scrapy view etc.
There is no builtin mechanism to persist cookies between scrapy runs, but you can build it yourself (source code just to demonstrate the idea, not tested):
Step 1: Writing the cookies to file.
Get the cookie from the response header 'Set-Cookie' in your parse function. Then just serialize it into a file.
There are several ways how to do this explained here: Access session cookie in scrapy spiders
I prefer the direct approach:
# in your parse method ...
# get cookies
cookies = ";".join(response.headers.getlist('Set-Cookie'))
cookies = cookies.split(";")
cookies = { cookie.split("=")[0]: cookie.split("=")[1] for cookie in cookies }
# serialize cookies
# ...
Ideally this should be done with the last response your scraper receives. Serialize the cookies that come with each response into the same file, overwriting the cookies you serialized during processing previous responses.
Step 2: Reading and using cookies from file
To use the cookies after loading it from the file you just have to pass them into the first Request you do as 'cookies' parameter:
def start_requests(self):
old_cookies #= deserialize_cookies(xyz)
return Request(url, cookies=old_cookies, ...)

Clicking the back button after logging out still renders my password protected page

I'm writing a Web Application using ASP.NET 4.0 and C#. In my application when I logout the page redirects to the Default page. But when I click the back button in my browser, it goes back to the Web page that I was working even though I'm logged out.
How do I stop it doing this?
You could set cache headers in authenticated pages to avoid them being cached downstream on the client. Here's an article you may take a look at.
So you could set the following headers on authenticated pages:
Response.Cache.SetExpires(DateTime.UtcNow.AddMinutes(-1));
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.Cache.SetNoStore();
This could also be done in a custom HTTP module to avoid repeating the code in all pages.