Change link behavior for non-existent files / broken links - apache

When users click on a link to a file (eg: csv, zip), there are two things that could happen:
If the file exists: it starts downloading (the web page does not change).
If the file does not exist: the user will be redirected to a 404 page (default or custom).
If when the file exists there's no redirection (or to be proper, the user doesn't get the idea of), why is there a redirection when the file doesn't exist?
I understand that when the browser gets a 404 response code, it needs to somehow display it to the user, but shouldn't it take into consideration the event that triggered the request? It is not the same to type an incorrect URL in the address bar than to click on a link. Wouldn't the browser showing an alert and remaining on the same page be a more "appropriate behavior"?
It would be relatively easy to create some back-end (or front-end) script that checks all the links, verifies if the files exists (or not), and then replaces the behavior accordingly. But is there something different that can be done? is it possible to (programatically) change the browser's settings so it doesn't redirect if the file doesn't exist? Or have the server return an HTTP code different than 404 (maybe in the 400s family) to cause such behavior?

Related

What are the risks of using 301?

We want to migrate from CMS to own system.
Page addresses in CMS and own system are different. We want to use 301 redirect to all website's pages.
The output HTML of CMS and own system have some few differences:
OpenGraph semantic
No javascript generated by CMS
Should we be afraid the failing of search traffic?
If done properly, the 301 is the way to go when the url is changed permanently. The most common mistake our clients do is to change the URL of a page and never redirects the old page's url to the updated location. This causes 404 pages.
My advice is to structure everything. Start with generating a file containing all current URL of the website. Then use an excel file or google sheet and pages those in. Right next to that column add another one - this is the column where you decide what you should do with each URL (either keep, remove/kill, combine it another page or change the URL).
Since you want to change the CMS, I am not sure how your have structured your pages right now and how they are indexed, BUT whatever you do, make sure that if anything in the URL has been changed, use a 301 to permanently redirect the old URL to the new location. Otherwise if that page receives some traffic, you will lose the traffic as the visitors would land on 404 (page not found) page.
Go through the URL list, one by one and determine what will happen with that page/url.

How to Avoid a Mixed-Content Error When Displaying a Search Result?

Question:
How can I include both https: and http: results from a single domain in a Google custom search engine but display any such result in an iframe with a secure parent window?
How It's Structured:
My Google custom search engine currently searches "mydomainname.com/directory/" with the option to "Include all pages whose address contains this URL". It operates on a specific page of the website to search pages within the specified directory. The Link Target set in Websearch Settings is an iframe on the same page as the search bar.
The browser window and the iframe src are both on the same secure domain. And since the search results are all from a directory within the site structure, are all on this same domain as well.
Currently some results appear as "https://..." and some appear "www...". Obviously, this creates a mixed-content error when the browser window is https:// and an attempt is made to display a http:// search result in the iframe.
The results that are http:// will, of course, also work as https:// urls. I do not know what makes a page or file appear in the search results as "www." or "https://" when they all originate from a single secure domain.
The "http://" results appear even if I specify the site to be searched as https://www.mydomainname.com/directory/. I don't want to exclude these results, but I want them to be able to be displayed when browsing the site securely.
The Objective:
So the bottom-line rule that I need to work around is that insecure pages or files cannot be loaded into an iframe on a secure web page. I obviously want users to be able to utilize the https:// site but then I need the search to function in such a way that allows for all possible search results for these users.
The reason I need the results' target to be this iframe is that this is the frame that displays all the content of the web page. The search results work in harmony with the organization of other information. Such that choosing a link from a category in the page's navigation and choosing a search result from the custom search result display the chosen content into the same location, the iframe.
What I've Tried:
I've tried designating https:// specifically in the Google Search Engine (gse) settings and removing : 'http' from the script line gcse.src =(document.location.protocol == 'https:' ? 'https:' : 'http:') + '//cse.google.com/cse.js?cx=' + cx;.
I looked in the script file that it's linking to: http://cse.google.com/cse.js?cx=012685392925564329750:ghl2znnfada but I can't decipher what might need to be changed in it.
In the error log on the console I don't see much to be relevant except for the expected inability to load insecure pages while browsing securely. But there is this that looks like (maybe) it's relevant? though I could be completely wrong because I can't really decipher it either:
Mixed Content: The page at
'https://mydomainname.com/directory/index.php' was loaded over HTTPS,
but requested an insecure script 'http://www.google.com/jsapi?
key=ABQIAAAAdCtw6Xq1Q31YAr7VSQOSvxS5g7WKqCWUBuUdhz3-
rUOumR2saRSPGvey2WjYALW7f5_JzakSL3lAEg'. This request has been blocked;
the content must be served over HTTPS.
Insecure Script from Error Message:
http://www.google.com/jsapi?key=ABQIAAAAdCtw6Xq1Q31YAr7VSQOSvxS5g7WKqCWUBuUdhz3-rUOumR2saRSPGvey2WjYALW7f5_JzakSL3lAEg
Proposed Paths to a Solution:
I am open to any solution methods that may be possible. I have considered several routes but am not sure how to properly execute them or have failed in my attempts to execute them.
Some solutions I thought may work are:
Show all results as https:// links (without excluding any) so that they can be accessed whether on a secure connection to the site or not.
Redirect any links clicked without https:// to be loaded into the iframe as https://
Change something about the pages and files on the server so that they only appear in the search results as https://
Change something about Google's search engine script so it parses all found results as https://
Somehow show links as http:// if browsing non-secure, and https:// if browsing secure *
*I don't know how viable or efficient this would be
The most robust solution is to migrate all your website in https :
use 301 (permanent) redirect from http to https
and activate HSTS (if possible with includeSubdomains)
Google will take a little time to update his index but the HSTS will automatically replace http by https so you should avoid any mixed content issues.

Error 404 for file download without browser redirection?

on a website I display links to PDF files.
When the first time call for a file arrives, the request gets redirected to a php-script that generates and returns the file. Additionally, it saves the file to the linked location so next time it will be directly availibe. I send the pdf mime type to make the browser open a download dialog instead of redirecting.
Due to reasony beyond my control, one out of 20 files cannot be generated.
How to respond?
Error 404 or 500 would direct the browser to an error page, while sending a mime-type would let the user download an empty / defect pdf file. Is there an established best practise? How to let the user know that a file link is broken, yet keep him on the site without redirect?
I had the same problem and solved it as follows:
If you have link to file, for example:
<a download href="/files/document.pdf">Click to download</a>
And if you don't want the browser redirect to blank/error page if the file doesn't exist, just reply with 204 without any content.
Nothing will happen, the user will stay where he is without redirection.
In php it would look something like this:
if (!readfile("/files/document.pdf") {
http_response_code(204);
die();
}

how to solve anti-leech in a better way?

As I came across with the hot-leeching problem, I searched the website and found two ways to solve it.
The first is an easier and simpler way with the code showing below:
RewriteEngine On
RewriteCond %{HTTP_REFERER}!^$ Options +FollowSymlinks
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com(/)?.*$ [NC]
RewriteRule .*\.(gif¦jpg¦jpeg¦png¦swf)$ [mydomain.com...] [R,NC]"
This can only prevent some simple leeching ,but can do nothing with a determined person.
The other way is a better way with a script-and-cookies-based approach. They said "You set a cookie on an 'authorizated' page of your site, and then use a script to serve images only if the correct cookie is present in the image request. Images are kept in a directory accessible only to the script, and not via the Web. So, the script acts as an 'image server' on your site." I understand this principle but don't have any idea about how to realize it . Could anyone know how to realize this?
Any help appreciated.
I can't really give any implementation, but only some idea of how it can be achieved:
You will need a "portal" page, where you set the cookie for the user. Any request for resources without having a cookie of your site should be redirected here. There may not may not be a login mechanism here, depending on the purpose of your site, but usually you will set the cookie, after the user is logged in.
All resource links will link to to the same "script" page. The difference is that different resource will have different identifier (can be some sort of id - if you maintain a database of id to file path mapping). The identifier must be included in the query of the URL. The "script" will find the resource on the server based on the identifier (in case of id to file mapping, you will obtain the file path and go retrieve the file).
There will be a "script" page, which can be php code, for example. It will check for the cookie, then check for the identifier, then load the resource accordingly. You may also want to check for Referer to restrict the access a bit more (without checking, hot linking will work for any logged in user).
In this implementation, sharing a hot link to a resource will not work for any user that haven't visited the "portal" page (or haven't logged in, depending on your web site). It will also not work even for logged in user if they click the link from somewhere else.
However, scraping your website for resources is simple in both implementations mentioned in your question, since scraper can freely adjust the HTTP header.

default Twitter button doesn't load image

I went to Twitter's resource page here (https://twitter.com/about/resources/tweetbutton) and got the following code:
Tweet<script type="text/javascript" src="//platform.twitter.com/widgets.js"></script>
When I put this in my Wordpress template, I don't get the Twitter button -- I just get the text "Tweet". However, when I change the src for widgets.js to include https:// or http:// at the beginning it works.
Could it be that it's just an error that they forgot the protocol? Also, do you think it is better to use https (for consistency with the share link) versus http, or does it not matter?
Thanks for your suggestions.
The URL "//example.com/script.js" tells the browser to open the URL using the protocol of the current page, which is likely to be "file://" if your browser opened an html file on your own machine. Of course, you don't have a file called "file://example.com/script.js" on your computer.
In the past, urls for embedded widgets used to include the protocol (http or https), but a site visitor would receive warnings whenever a secure page loaded a script from an insecure page, and sometimes even vice versa. Now, widgets from Twitter, Google Analytics, and other sites no longer specify the protocol so that the same embed code can work on any page on the internet. The downside is that this does not work when you embed such a widget into a file and view it on your own browser by double-clicking it!