Browser doesnt cache script tag requests upon page reload even if the url is same - browser-cache

this might sound like a very basic question, but i couldnt find much help from google..
so, i have a html file -
<!doctype html>
<html>
<title>New Form Title</title>
<head>
<script type='text/javascript' src='http://localhost/whatever.js'></script>
</head>
<body>
</body>
</html>
when i hit f5(after loading the page for first time), i can see the server returned a 304 status, but i was under assumption that a server request will not even be sent in the first place (i.e the browser would not send a request because the url is the same, and the browser would use the cached item)
what am i missing? is this the actual behaviour?
thank you..

Related

Unable to scrape parts of a page webpage with scrapy

I'm using scrapy to crawl an e-commerce website I'm experienced with simpler websites where scrapy alone or with splash/selenium handle most cases.
I have a new situation where I have no experience to deal with. From my investigations it could be like a captcha but without any request to the user.
I've made tests to solve it with scrapy alone, scrapy and selenium with no success.
With my scrapy request I receive the following response
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Challenge Validation</title>
<link rel="stylesheet" type="text/css" href="/_sec/cp_challenge/sec-2-9.css">
<script type="text/javascript">function cp_clge_done(){location.reload(true);}</script>
<script src="/_sec/cp_challenge/sec-cpt-int-2-9.js" async defer></script>
<script type="text/javascript">sessionStorage.setItem('data-duration', 5);</script>
</head>
<body>
<div class="sec-container">
<div id="sec-text-container"><iframe id="sec-text-if" class="custmsg" src="https://beta.elcorteingles.es/sgfm/statics/eci_non_food/contents/cc/cca.html"></iframe></div>
<div id="sec-if-container">
<iframe id="sec-cpt-if" class="crypto" data-key="" data-duration=5 src="/_sec/cp_challenge/ak-challenge-2-9.htm"></iframe>
</div>
</div>
</body>
</html>
With the chrome inspector i see also noticed two GET requests (non-java) that might be related:
check -> returns HTML ( ... <title>RP iframe</title> ...)
check-session?origin=https%3A%2F%2Fwww.elcorteingles.es -> returns HTML (...<title>OP iframe</title>...)
Using scrapy shell with view(response) it looks like a captcha situation, waiting for something. Page example could be:
scrapy shell "https://www.elcorteingles.es/supermercado/0110120903000022-coosur-aceite-de-oliva-intenso-1-botella-1-l/"
The title 'challenge validation' suggests it. I have no idea how to handle with this case. From research, I've seen solutions involving scrapy middleware but for cases where input was asked from the user. I found no example similar to this case. Any guidance on how to proceed is appreciated.

Why is the referer from my server alway null?

I am trying to work out why my referrer from my server always seems to be blank. I have knocked together the following to test it:
<html>
<head>
<meta http-equiv="Refresh" content="0; url='https://www.whatismyreferer.com/'" />
<meta name="referrer" content="origin" />
</head>
<body>
</body>
</html>
When I go to this page I get this:
Is this something that is being set at a server level in Apache? I have a case where I need to pass the referrer so finding out what is controlling this would be good.
The referrer header (with the famous referer spelling) is sent by the browser. If the browser decides not to send it (e.g. for privacy reasons) it just won't do. You should never rely on the header to be there. Even if you find configurations that currently work: The request is valid with or without this header. And browsers might change their opinion any time (they did: The header used to be omnipresent, not it's less present)

How to capture JS redirects in Selenium?

Is there any way to capture all the redirects on the page performed in JS? For instance, let's take a look at this web page making redirect using window.location
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Redirect JS</title>
</head>
<body>
<script>
window.location = "http://www.example.com";
</script>
</body>
</html>
or meta tag
<meta http-equiv="refresh" content="0; url=http://example.com/">
I would like to render web page and get all urls where user has been redirected. Is it possible? How to do that in selenium?
In Python: http://selenium-python.readthedocs.org/en/latest/api.html : webdriver has property current_url. After you driver.get() the page, I would assume current_url is the redirected URL. Is it not?
Your requirement "in Selenium" will make this impossible. Selenium interacts with a browser as a human would - a human should generally not know or care about all the redirects. If you are willing to abandon Selenium for this purpose, then there are libraries such as HttpBuilder (in the Java world) and many others (for other languages) that allow you to manipulate and watch HTTP traffic, which is what you are after here.

Permalinks vs pretty URLs

Let's say i have a simple blog engine. I've posted a simple post with URL
http://example.org/blog/awesomr-post
Few days later i've noticed the typo and fix my URL
http://example.org/blog/awesome-post
But search engines have already indexed "awesomr-post" and if somebody follow this link he'll get 404 error. There is the same issue with bookmarked pages.
So i think the post should be accepted by two links
http://example.org/blog/awesome-post
http://example.org/permalinks/1
Now i have to specify relationships somehow. What i can do
http://example.org/permalinks/1
<!DOCTYPE html>
<html>
<head>
<link rel="canonical" href="http://example.org/blog/awesome-post">
</head>
<body>
page content
</body>
</html>
http://example.org/blog/awesome-post
<!DOCTYPE html>
<html>
<head>
<link rel="bookmark" href="http://example.org/permalinks/1">
</head>
<body>
page content
</body>
</html>
Is it right solution? And should i use the canonical or permalink URL when linking from another site pages?
One of the way is to have 301 (permanent) redirect from http://example.org/blog/awesomr-post to http://example.org/blog/awesome-post

Sniff and modify URL requests coming from UIWebViewController

I have one html page open inside UiWebViewController with cordova. While index.html loading inside the Uiwebviewcontroller can we sniff the requests that is originating from index.html?
for example I have following html that is getting opened in UiWebviewcontroller:
<html>
<head>
<link rel="stylesheet" type="text/css" href="theme.css">
<script src="app.js"></script>
</head>
<body>
<img src="img.jpg"/>
</body>
</html>
Can I sniff and modify the url that is getting requested inside Uiwebviewcontroller ie. img.jpg,theme.css,app.js to something like content/img.jpg, css/theme.css, js/app.js using Objective-C.
Yes, that’s possible using NSURLProtocol, see this blog post by NSHipster and this related Stack Overflow thread.