How to avoid spammer to use my FTP, bandwidth and mySQL of my site?

How to avoid spammer to use my FTP, bandwidth and mySQL of my site? - block

THE PROBLEM
My server gave me an ultimatum (3 business days):
"We regret to say That database is currently consuming excessive resources on our servers Which causes our servers to degrade performance Affecting ITS customers to other database driven sites are hosted on this server That. The database / tables / queries statistical information's are provided below:
AVG Queries / logged / killed
79500/0/0
There are Several Reasons where the queries gets Increased. Unused plugins will Increase the number of queries. If the plugins are not causing the issue, you can go ahead and block the IP addresses of the spammers Which will optimize the queries. Also you can look for any spam Existed contents in the database and clear them up.
You need to check for the top hitters in the Stats page. Depending upon the bandwidth accessed, top hits and IP you need to take specific actions on Them to optimize the database queries. you need to block the Unknown robot (Identified by 'bot *'). Since These bots are scraping content from your website, blog comment spamming your area, harvesting email addresses, sniffing for security holes in your scripts, trying to use your mail form scripts as relays to send spam email. .htaccess Editor tool is available to block the IP address."
THE BACKGROUND
The site is made 100% from us in VB. NET, mySQL and platform of Win (except the Snitz Forum). The only point from which we received SPAM was a form for comments which now has a captcha. We talk of more than 4000 files between tools articles, forums, etc. for a total of 19GB of space. Only upload it takes me 2 weeks.
STATISTICS OF ROBOTS
Awstats tells us for the month of February 2012:
ROBOT AND SPIDER
Googlebot
+303 2572945 accesses
5:35 GB
Unknown robot (Identified by 'bot *')
772520 accesses +2740
259.55 MB
BaiDuSpider
+95 96 639 access
320.02 MB
Google AdSense
35907 accesses
486.16 MB
MJ12bot
33567 +1208 access
844.52 MB
Yandex bot
+104 18 876 access
433.84 MB
[...]
STATISTICS OF IP
IP
41.82.76.159
11681 pages
12078 accesses
581.68 MB
87.1.153.254
9807 pages
10734 accesses
788.55 MB
[...]
other
249561 pages
4055612 accesses
59.29 GB
THE SITUATION
Help!!! I don't know how to block IP with .htaccess and I don't know what IP! I'm not sure! Awstats ends without the past 4 days!
I already tried in the past to change the password of FTP and account, nothing! The goal is not I think are generic attacks aimed at obtaining backlinks and redirects (often do not work)!

This isn't really an htaccess issue. Look at your own stats. You've had ~4M hits generating some 12Kb per hit in the last 4 days. I ran the OpenOffice.org user forums for 5 years and this sort off access rate can be typical for a busy forum. I used to run on a dedicated quad-core box, but migrated this a modern single core VM and when tuned, this took this sort of load.
The relative Bot volumetrics are also not surprising as a % of these volumes, nor are the 75K D/B queries.
I think that what your hosting provider is pointing out is that you are using an unacceptable amount of system (D/B) resources for your type of account. You either need to upgrade your hosting plan or examine how you can optimise your database use. E.g. are your tables properly indexed and do you routinely do a Check/Analyze/Optimize of all tables. If not then you should!
It may well be that spammers are exploiting your forum for SPAM link posts, but you need to look at the content in the first instance to see if this is the case.

Related

Static files as API GET targets

I'm creating a RESTful backend API for eventual use by a phone app, and am toying with the idea of making some of the API read functions nothing more than static files, created and periodically updated by my server-side code, that the app will simply GET directly.
Is this a good idea?
My hope is to significantly reduce the CPU and memory load on the server by not requiring any code to run at all for many of the API calls. However, there could potentially be a huge number of these files (at least one per user of the phone app, which will be a public app listed in the app stores that I naturally hope will get lots of downloads) and I'm wondering if that alone will lead to latency issues I'm trying to avoid.
Here are more details:
It's an Apache server
The hardware is a hosting provider's VPS with about 1gb memory and 20gb free disk space
The average file size (in terms of content and not disk footprint) will probably be < 1kb
I imagine my server-side code might update a given user's data once a day or so at most.
The app will probably do GETs on these files just a few times a day. (There's no real-time interaction going on.)
I might password protect the directory the files will be in at the .htaccess level, though there's no personal or proprietary information in any of the files, so maybe I don't need to, but if I do, will that make a difference in terms of the main question of feasibility and performance?
Thanks for any help you can give me.

This is generally a good thing to do: anything that can be static rather than dynamic is a win for performance and cost (it's why we do caching!), but the main issue with with authorization (which you'll still need to do for each incoming request).
You might also want to consider using a cloud service for storage of the static data (e.g., Amazon S3 or Google Cloud Storage). There are neat ways to provide temporary authorized URLs that you can pass to users so that they can read the data for a short time and then must re-authorize to continue having access.

Avoiding Safari's repeated requests for IndexedDB permission beyond 50 MB

I'm trying to get Safari to run a series of tests (web-platform-tests) served from my local machine. The tests create a large amount of data in IndexedDB for which Safari requires permission (requests are for larger than 50 MB) but this gets too cumbersome to approve permission each time when cycling through hundreds of tests.
In Preferences->Privacy->Cookies and website data->Manage Website Data..., there is an entry, "Local documents on your computer (Databases)" apparently indicating the presence of this data, but it does not provide any configuration options, nor does Preferences->Privacy->Cookies and website data->Always allow work to avoid the prompting.
Is there any other config which can allow me to get around the need for manual permission? (I'm asking here instead of superuser, as I don't know if there might also be an API which can persist overcoming the limit as well.)

Good idea to host data that will be downloaded internationally using S3?

I don't have any experience regarding server hosting performance and how slow it gets so I wanted to ask this question.
My situation is, I want to host a ~1MB data file that needs to be downloaded by clients occasionally (once every 2-3 days). Of course I would like to minimize costs as long as it does not hurt user experience too much. I have data to indicate that I have clients globally.
I wanted to ask what the ballpark figure would be for the amount of time it would take to download a file of this size from other parts of the world (data is hosted in the US). Does anyone have any idea, for instance, how long it would take to download a 1MB file from locations such as Japan?
In case people are wondering, I personally would consider it OK if it takes under 10s to download in most parts of the world.

The first thing to do when you don't know how well something works... is to try it. Create buckets in all of the regions, store a file, and then download it and see.
The official AWS-centric answer for global content distribution is to connect a CloudFront distribution to an S3 bucket, and set things up so that your content is downloaded from S3 via CloudFront. This will tend to improve download speeds more when the user is distant from the bucket, even if the content isn't cached at a CloudFront edge, because most of the distance the download has to travel, it will be traveling on the AWS "Edge Network," a global network connecting CloudFront to the AWS regions, with fewer unknowns than the Internet at large between here and wherever.
I have a global client base, but -- for example -- my shopping pages' catalog images are stored in S3 in Oregon (us-west-2), but with links pointing to CloudFront.
Interestingly, the pricing for using both services together sometimes works out a little bit less expensive than using only S3. A possible explanation for this is that edge network egress traffic represents a lower cost to AWS and the rates are set accordingly. It's not a major difference, but once you understand the pricing tables, you'll see it.

1MB in 10s equals 800kbps. I'd be very surprised if any reputable hosting provider couldn't keep up with that speed of delivery. Looking at Akamai rankings (2015)*, in Japan (as in your example) the average user's speed is 15Mbps: your file would then be downloaded in 0.53 seconds.
( *Looking at the rankings, keep in mind that in countries where fast internet infra is yet to be ubiquitous, the "average speed" will be an average of fast corporate pipes and other premium links, with actual mainstream users having substantially slower speeds.)
Then in most cases, this will be up to the user's connection speed, and further, their ISP's international links, which can be much slower than their national or regional pipes. More so in countries with less developed internet infrastructure, where operators are cutting costs and corners.
In deciding if you need to deploy S3 or other CDN solutions, or no extra solutions at at all, you'll have to start with mapping up your user demographics. If there's a substantial sector from far-away countries with weak net infra, it makes sense. Otherwise, it doesn't seem likely that your target speed of 1MB/10s wouldn't be matched even without a special means of delivery.
If you have some but not substantial traffic from countries/regions where you reckon int'l traffic might be slower, and if you want to eliminate extra costs, I figure your users will survive even if it takes 15-20 seconds once in a blue moon as their speeds fluctuate. (This is opinion-based relative to how picky your users are!) In such a case, I'd only bother with a CDN if I wanted to improve speeds across the board, e.g. for all requests for static resources, not just a single file requested every couple of days. Would make a more substantial contribution towards the general user experience.

How long should it take to return a 200mb blob from SQL

I have SQL 2008 R2 supporting a SharePoint 2010 environment. Some of the files will be stupidly large (i.e. 200mb). While Remote Blob Storage will be added to our platform in the future, I want to understand what reasonable performance might look like in getting a 200mb file to the user.
The network between the SharePoint WFE is only one part. Simply reading the blob from disk and passing it through the SharePoint layer MUST take some time, but I have no idea how to calculate this (or what additional information people need to help out)
Any suggestions?

That's a very complex question and requires knowledge of the environment in which you are working. The network as you rightly say is one aspect but there are many others. Traffic congestion, QoS, SQL Server versions, setup, hardware, etc Then there are issues with how the Web Front Ends are handing off the data and the HTTP pipe to the user, the browser in use, etc, etc.
Have a look at installing the Developer Dashboard for SharePoint 2010 and you'll be able to see all of the steps in fecthing and delivering files and how long each one will take you. You'll be quite surprised at how detailed the path is.
SharePoint 2010 Developer Dashboard

Regardless of the large size, you should consider activating the BlobCache feature if your large content is currently stored in a document library.
That will put it on your WFEs after first access, deliver it with proper expiration headers and completely cut the load from the SQL Server. Imagine 20 concurrent users accessing a 200mb file. If not in a blobcache, your farm will have a hard time swallowing that useless load.
The first access will be longer than your test scenario when you request it with as single user but any further access will be a fast as IIS 7 is able to deliver it and the network capacities up to your clients.
Hope it helped.

The ideal multi-server LAMP environment

There's alot of information out there on setting up LAMP stacks on a single box, or perhaps moving MySQL onto it's own box, but growing beyond that doesn't seem to be very well documented.
My current web environment is having capacity issues, and so I'm looking for best-practices regarding configuration tuning, determining bottlenecks, security, etc.
I presently host around 400 sites, with a fair need for redundany and security, and so I've grown beyond the single-box solution - but am not at the level of a full ISP or dedicated web-hosting company.
Can anyone point me in the direction of some good expertise on setting up a great apache web-farm with a view to security and future expansion?
My web environment consists of 2 redundant MySQL servers, 2 redundant web-content servers, 2 load balancing front-end apache servers that mount the content via nfs and share apache config and sessions directories between them, and a single "developer's" server which also mounts the web-content via nfs, and contains all the developer accounts.
I'm pretty happy with alot of this setup, but it seems to be choking on the load prematurely.
Thanks!!
--UPDATE--
Turns out the "choking on the load" is related to mod_log_sql, which I use to send my apache logs to a mysql database. By re-configuring the webservers to write their sql statements to a disk file, and then creating a separate process to submit those to the database it allows the webservers to free up their threads much quicker, and handle a much greater load.

You need to be able to identify bottlenecks and test improvements.
To identify bottlenecks, you need to use your system's reporting tools. Some examples:
MySQL has a slow query log.
Linux provides stats like load average, iostat, vmstat, netstat, etc.
Apache has the access log and the server-status page.
Programming languages have profilers, like Pear Benchmark.
Use these tools to identifyy the slowest/biggest offenders and concentrate on them. Try an improvement and measure to see if it actually improves performance.
This becomes a never ending loop for two reasons: there's always something in a complex system that can be faster and as your system grows, different functions will start slowing down.
Based on the description of your system, my first hunch would be disk io and network io on the NFS servers, then I'd look at MySQL query times. I'd also check the performance of the shared sessions.

The schoolbook way of doing it would be to identify the bottlenecks with real empirical data.
Is it the database, apache, network, cpu, memory,io? Do you need more ram, sharding(+), is the DiskIO, the NFS network load, cpu for doing full table scans?
When you find out where the problem is you might run into the problem that its not enough to scale the infrastructure, because of the way the code works, and you end up with the need to either just create more instances of you current setup or make the code different.

I would also recommend as a first step in terms of scalability, off-load your content to a CDN like Edgecast. Use your current two content servers as additional web servers.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas