How can I find out who is hotlinking my content?

How can I find out who is hotlinking my content? - apache

I have lots of videos on my website, I am curious to know what websites are hotlinking to it.
I am using cpanel with awstats, I have google analytics too.
The server is running Apache.

Actually you can check Referer header.
If you want block all requests outside of your domain. Here is example for Apache server.
But this technique has 2 disadvantages:
Very-very easy to send faked Referer header
Some browsers in very rear case may not send Referer header at all
Most common way to prevent content from cross-linking is generate dynamic temporary links with limited session time.

Related

Confusion on the 'Access-Control-Allow-Origin' header with apache

Lets say I have my website named SiteA.com running on an Apache web server. I have defined the ff. below on my httpd.conf file:
Header set Access-Control-Allow-Origin "CustomBank.com"
Questions:
Does this mean only CustomBank.com can access my site (SiteA.com) directly? or does it mean only my site (SiteA.com) can access the CustomBank.com domain directly? I am confused if this setting is for inbound or outbound.
In reality I don't have any CORS requirement needed for my site, so I didn't implement the setting mentioned above, the one below shows up in my response header.
Access-Control-Allow-Origin: *
Penetration Testing team said this setting is overly permissive. Do I just need to remove it? if not what should I do?

It means javascript loaded from CustomBank.com can make requests to your site (the site whose configuration has changed) via XMLHTPRequest in the background.
Since XMLHTTPRequest will send a users existing session cookie with your site, malicious scripts could do all kinds of nefarious/misleading things on behalf of your user. That's why * is not normally a suitable fix.
The restrictions apply to other script-like invocations that are more esoteric that you can read about in the specs.

What htaccess rule would you use to redirect users already using the secure version of your site to purely secure links without affecting HTTP access?

Basically if somebody is already on an HTTPS page, I don't want them to be capable of being redirected to/accidentally clicking an HTTP one (on the same site at least). It seems to me like you would use the referer as a RewriteCond to accomplish this, except for the fact that it is apparently browser policy not to send referers when going from HTTPS pages to HTTP ones. So if a user loads an HTTP page, how can I detect if they came from an HTTPS one and make sure they are redirected to the secure version of the page they are trying to access?
Unfortunately the software we are using has many hardcoded HTTP links so it is necessary to use some sort of redirection.

non-browsable URLs?

How can one make URL's on their site non-browsable?
Example:
http://mydomain.com/files/file1.txt
If a user hits it directly, don't allow it.
If I call it inside an href on MY site then it would work.
Would one url-rewrite t accomplish this?
or how?
Apache, CentOS 5.5

You can check the Referer header.
Note that not all browsers send Referer headers, so you'll be completely locking out some users.
Also note that the Referer header is trivially spoofable.
Alternatively, and more securely, you can protect the files with a server-side script.
Change your links to point to a server-side script and include a randomly-generated one-time passcode in the querystring.
The server-side script should verify the one-time passcode (use a database), then send the file to the client.
Depending on your application, you can also use an ordinary password-based authentication system. (if you have user accounts)

.htaccess, YSlow, and "Use cookie-free domains"

One of YSlow's measurables is to use cookie-free domains to serve static files.
"When the browser requests a static
image and sends cookies with the
request, the server ignores the
cookies. These cookies are unnecessary
network traffic. To workaround this
problem, make sure that static
components are requested with
cookie-free requests by creating a
subdomain and hosting them there." --
Yahoo YSlow
I interpret this to mean that I could experience performance gains if I move www.example.com/images to static.example.com/images.
Although this is easy to do, I would lose the handy ability within my content management system (Joomla/WordPress) to easily reference and link to these images.
Is it possible to use .htaccess to redirect all requests for a particular folder on www.example.com to a folder on static.example.com instead? Would this method also fool the CMS into thinking the images were located in the default locations on its own domain?

Is it possible to use .htaccess to redirect all requests
for a particular folder on www.example.com to a folder on
static.example.com instead?
Possible, but counter productive — the client would have to make an HTTP request, get the redirect response, then make another HTTP request.
This costs a lot more than the single line of cookie data saved!
Would this method also fool the CMS into thinking the images
were located in the default locations on its own domain?
No.

Although this is easy to do, I would
lose the handy ability within my
content management system
(Joomla/WordPress) to easily reference
and link to these images.
What you could try to do is create a plugin in Joomla that dinamically creates these references.
For example, you have a plugin that when you enter {dinamic_path path} in an article, it appends 'static.example.com/images' to the path provided. So, everytime you need to change the server path, you just change in the plugin. For the links that are already in the database, you can try to use phpMyAdmin to change them in this structure.
It still loses the WYSIWYG hability in TinyMCE, but is an alternative.

In theory you could create a virtual domain that points directly to the images folder, such as images.example.com. Then in your CMS (hopefully at the theme layer) you could replace any paths that point to the images folder with an absolute path to the subdomain.

The redirects would cause far more network traffic, and far more latency, than simply leaving things as they are.

It would redirect the request but the client would still be sending its cookies to the server, so really you accomplished nothing. You would have to directly access the files from a domain that isn't storing cookies for it to work.

What you really want to do is use staticexample.com/images instead of static.example.com/images so that you don't pick up any cookies on the example.com domain that you may have set. If all you do is server images from that domain with a simple apache server or something then you can configure that server not to return even a session cookie.
The redirects are a very bad idea. Cookies cause some performance hits but round trips to the server such as a redirect would cause are a much more serious performance issue.

I did below and gained success:
<FilesMatch "!\.(gif|jpe?g|png)$">
php_value session.cookie_domain example.com
</FilesMatch>
What it means is that if you do not set images in cookie information.
Then images are cookie-free with server.

Should I get rid of bots visiting my site?

I have been noticing on my trackers that bots are visiting my site ALOT. Should I change or edit my robots.txt or change something? Not sure if thats good, because they are indexing or what?

Should i change or edit my robots.txt or change something?
Depends on the bot. Some bots will dutifully ignore robots.txt.
We had a similar problem 18 months ago with the google AD bot because our customer was purchasing Soooo many ads.
Google AD bots will (as documented) ignore wildcard (*) exclusions, but listen to explicit ignores.
Remember, bots that honor robots.txt will just not crawl your site. This is undesirable if you want them to get access to your data for indexing.
A better solution is to throttle or supply static content to the bots.
Not sure if thats good, because they are indexing or what?
They could be indexing/scraping/stealing. All the same really. What I think you want is to throttle their http request processing based on UserAgents. How to do this depends on your web server and app container.
As suggested in other answers, if the bot is malicious, then you'll need to either find the UserAgent pattern and send them 403 forbiddens. Or, if the malicious bots dynamically change user agent strings you have a two further options:
White-list UserAgents - e.g. create a user agent filter that only accepts certain user agents. This is very imperfect.
IP banning - the http header will contain the source IP. Or, if you're getting DOS'd (denial of service attack), then you have bigger problems

I really don't think changing the robots.txt is going to help, because only GOOD bots abide by it. All other ignore it and parse your content as they please. Personally I use http://www.codeplex.com/urlrewriter to get rid of the undesirable robots by responding with a forbidden message if they are found.

The spam bots don't care about robots.txt. You can block them with something like mod_security (which is a pretty cool Apache plugin in its own right). Or you could just ignore them.

You might have to use .htaccess to deny some bots to screw with your logs.
See here : http://spamhuntress.com/2006/02/13/another-hungry-java-bot/
I had lots of Java bots crawling my site, adding
SetEnvIfNoCase User-Agent ^Java/1. javabot=yes
SetEnvIfNoCase User-Agent ^Java1. javabot=yes
Deny from env=javabot
made them stop. Now they only get 403 one time and that's it :)

I once worked for a customer who had a number of "price comparison" bots hitting the site all of the time. The problem was that our backend resources were scarce and cost money per transaction.
After trying to fight off some of these for some time, but the bots just kept changing their recognizable characteristics. We ended up with the following strategy:
For each session on the server we determined if the user was at any point clicking too fast. After a given number of repeats, we'd set the "isRobot" flag to true and simply throttle down the response speed within that session by adding sleeps. We did not tell the user in any way, since he'd just start a new session in that case.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I find out who is hotlinking my content? - apache

I have lots of videos on my website, I am curious to know what websites are hotlinking to it. I am using cpanel with awstats, I have google analytics too. The server is running Apache.

Related

Confusion on the 'Access-Control-Allow-Origin' header with apache

What htaccess rule would you use to redirect users already using the secure version of your site to purely secure links without affecting HTTP access?

non-browsable URLs?

.htaccess, YSlow, and "Use cookie-free domains"

Should I get rid of bots visiting my site?

Categories

Resources