I would like to shut down people running screen scrapers against our site and mining the data. Or at least slow them down big time.
My idea is to write every IP address into a memory object and count how many requests they make per minute, then put them in some "no access" list if they exceed some number that I set.
I'm just looking for some community validation on whether this is a sound approach for a Rails application. Thanks for any help you can offer.
I'm not sure it's a good idea to protect your site from these IPs at the application level. I would personally investigate if that would be possible to do that at the network level, for example in your firewall / router. If you have a Cisco router, check out the "rate-limit" command.
I don't know guys, I came up with a pretty nice way to do this in the app. I just want to punish bad behavior in one app anyway. What I ended up doing was:
def log_ip
# Initialize
if IPLOG.include?(request.ip)
IPLOG[request.ip][:count]+=1
else
IPLOG[request.ip] = {:time => Time.now, :count => 1}
end
# Reset the time if necessary
IPLOG[request.ip][:time] = Time.now if IPLOG[request.ip][:time] < 1.minute.ago
if IPLOG[request.ip][:count] > REQUESTS_PER_MINUTE_UNTIL_BLACKLIST
Blacklist.create(:ip_address => request.ip, :count => IPLOG[request.ip][:count])
end
if Blacklist.where(:ip_address => request.ip).first
IPLOG.delete(request.ip)
redirect_to blocked_path
end
end
I'm sure I can tighten this up so I'm not doing the db hit every time, but it appears to be working pretty well. It caught the GoogleBot last night. Plus, there's opportunity for whitelisting IP addresses in case a bunch of people are coming in through a known proxy.
Related
I often see in my webservers logs the “x-middleton” flag from amazon IP's ranges and looks like normal traffic (there is a variety of user agents but they all share that x-middleton at the end )
Anyone has any ideas what it might be ?
I came across this link http://support.ezoic.com/hc/en-us/articles/206245065-Origin-Errors-and-other-error-messages- but its not that it makes much sense either.
Ezoic adds it to user agents when it proxies traffic.
Recently I've been asked this type of questions for several times.
It is like "If you can't access the back end, how do you know if a problem is from front end or back end?", or "If you can't access the database, how do you know the performance issue in the website is from the front end or back end?", or "In front end, how can you differentiate if a problem is from front end or from back end?"
I really don't have any clue how to answer this kind of questions. Can someone help me? Thanks in advance.
I think that alla the 4 questions could have a great number of answers,all of them depending from the context. Backend and frontend when using ajax requests get mixer a lot.
Let's say we're talking about performance problems in a website and we don't know anything about the architecture behind.
In this case I would take a look into network stats and timelines from firebug or similar. If some sort of request to our application server is taking to much time it could be the backend. But how to be sure? maybe the frontend is asking with an ajax call for al the database entries while the pace only needs are to display just one entry.
Maybe the only good answer for all the questions would be "I would start to look for performance bottlenecks and understand their nature "
So my friend hosts a little get together every once in a while where space is limited to the first 14 people who RSVP. He emails the invite out to a list and then accepts the first people who respond. Tonight I barely got in because I can't always check my email, so I told him that I would write a program that would respond instantly to his request. This would not normally be a problem (autoresponder, easy) except he has recently created an online signup form. I think it would be funny for him to send out his next invite and get a sub-100ms response from me, so I would like to give this a try.
The problem is, I'm not quite sure how to go about it without going to to much expense. I have a personal site that can host some .NET backend code, but it's on a shared GoDaddy server so I don't really have a ton of access to the mailserver or anything. I was thinking that if I could get an email sent to a certain address that maybe it could trigger a webrequest that could pull down his page and then fill the (very simple, like 2 or 3 inputs) form out and submit it, but again, I'm not quite sure how.
Would anyone have an idea about how I could go about this? I would want for this to happen automatically without any sort of interaction from me, just basically as soon as I get an email from a certain email address, somehow my code is triggered and the form filled out and submitted.
This is just for fun, but the programmer in me is curious as to how I could actually get this to work.
Thanks!
The most affordable thing I know of would be through NearlyFreeSpeech.NET. If you set up an account there, you can configure a domain with email forwarding for 3 cents/day. They have an option to forward the email to a script, so you could write something that would look at the mail, pull down the form, and post to a server.
I'm not sure but I think the script has to be running on their servers, so you'll have to set up a website (another few cents per day) and write the script to run in a UNIX environment (PHP or Perl or such). If you insist on .NET, you could write a minimal PHP script to forward the data to your GoDaddy account.
There is one IP (from China) which is trying to download my entire website. It downloads all my pages and loads the server significantly (I have more than 500 000 pages). Looking at the access logs I can tell it's definitely not a Google bot or any other search engine bot.
Temporarily I've banned it (using iptables rules), but it's not a solution for me, because some of my real users also have the same IP, so they are also banned and cannot acces the website.
Is there any way to prevent such kind of "user activity"? Maybe a mechanism which implements captcha if you try to request more than 5 requests a second or something?
P.S. I'm using Yii framework (PHP).
Any suggestions are greatly appreciated.
thank you!
You have answered your own question!
Make captcha appear if the request exceeds certain number per second or per minute!
You should use CCaptchaAction to implement, like this.
I guess the best way to monitor for suspicious user activity is really user session, CWebUser's getState()/setState(). Store current request time in user session, compare it to several previous values, show captcha if user makes requests too often.
Create new component, preload it via CWebApplication::$preload and check user activity in components init() function. This way you'll be able to turn bot check on and off easily.
I'm making something that requires me to pass information from one domain to a subdomain. The subdomain would be in an iframe on the domain. I know I can use cookies, sessions, or a database. But I'm trying to save processing time so I thought about using the referrer. I know that some people turn the referrer off for some reason, but exactly just how many. If they do, this won't work for them.
Oh and I can't use the URL to pass information.
I'd say < 0.001 % of all Internet users have ever heard about referrers. Even a smaller portion of them will be willing to switch them off. Even a smaller number of them will be able to.