Removing Hacked URL Strings From Google - apache

I recently suffered a hack on a number of websites which were hosted on the same server. I've identified and removed the source of the hack, and used Patrick Altoft's smart Google Alerts idea to monitor for further attempts.
I've then logged into Google webmaster tools, asked to be re-evaluated post hack, and I've also re-submitted site maps to speed up a re-crawl.
However I would like to remove the infected url's from Google, and was thinking the best way to speed up this process would be to use .htaccess to return a 404 error, whenever a page with a specific string variable appeared.
Is this possible with a .htaccess file, or is there a better course of action to take?
You can see the damage done here.
Thanks for any help and suggestions.

404 will work, but is possible not the best solution. A better solution would be 301: moved permanently, or 410: gone.
A 404 tells you that a page is missing, but not why. Google may keep these urls for a while to investigate later whether they exist again. By using 301 or 410, you explicitly tell Google that that url is not going to be fixed.
410 is the better option, but I'm not sure if this is possible from htaccess, athough you could 301 to a php-file that returns a 410 header.
Addition: Here's an article about redirecting using the '410, Gone' header with .htaccess. http://diveintomark.org/archives/2003/03/27/http_error_410_gone

Yep, give them 404/410/301 status code, then Google will remove them in a day or two. I've done that before. It will take way too long for Google to renew its cache with 200 status code.

Related

301 or 404 redirect?

One of my websites is constantly being scanned for WordPress directories and files. This particular site never had or will have WordPress. If it did then I would follow the standard practices outlined at Hackertarget to prevent getting hacked.
There currently is a blank 404 page for the site (not really user friendly, but that's the point). This does not seem like the best option so I am considering using either an internal 301 redirect or possibly redirecting any requests for /wp/*, /wordpress/*, etc..., over to WordPress.org.
A similar question was asked, but I am not concerned about SEO and those answers do not address this particular scenario.
So, which is best?
1. Keep the blank 404.
2. Internal 301.
3. External 301.
The 404 response is certainly the standard. Any of the 300 codes will just divert the traffic to another site, which would be rude on your part. If you are being scanned, don't expect the scanner to take heed of the "permanence" of your 301 response. Please, go with the 404.

How to remove unwanted URL from google cache

We bought a new domain from HugeDomains.com before a month and made it live last week.
Before we move live, the advertisement published by HugeDomains.com got cached in search engines.
Now we need to remove that cached URL from all search engines.
Following is the Pattern of URL got cached, it's just a query string getting passed
http://www.example.com/?fp=ah1QKL6n%2FlECnlCZX2M7prGsvtbv8ddXendjKdEvTBtzHaEkYE%2BEk37MD1iDIPnimmKBVn7jZKj%2BPGqRUxNQzA%3D%3D&prvtof=ytNnOdijWVo6UL0CLJYkUNs043cNT%2BNtJQ5d5VD69Ac%3D&poru=RLg1S8TlJRc59ObVEdjqkbBOZjhk%2FIf%2BH8W1DtjVOk5VRbieT62uHl%2FGfuWk4d%2FnOfDQwYDvqLza3nG76SMxZA%3D%3D&
I have used Disallow in Robots.txt to remove that but its not working, following will be the code
Disallow: /*?fp=
Disallow:
/?fp=ah1QKL6n%2FlECnlCZX2M7prGsvtbv8ddXendjKdEvTBtzHaEkYE%2BEk37MD1iDIPnimmKBVn7jZKj%2BPGqRUxNQzA%3D%3D&prvtof=ytNnOdijWVo6UL0CLJYkUNs043cNT%2BNtJQ5d5VD69Ac%3D&poru=RLg1S8TlJRc59ObVEdjqkbBOZjhk%2FIf%2BH8W1DtjVOk5VRbieT62uHl%2FGfuWk4d%2FnOfDQwYDvqLza3nG76SMxZA%3D%3D&
I even enabled a 302 Redirect for this query string fp= to my home page
Please let me know a way to resolve this.
Thanks in advance.
I wouldn't do this with robots.txt.
Just wait. I think the most search engines will recognize that your website is new so they will crawl it again in near future.
Otherwise you can create a google-webmaster account and send your url to google to crawl it again.
EDIT: You're also able to disallow url-parameter in webmaster tools.
Robots.txt disallow should do it, but another good way is to return a 410 Gone result, then google will stop indexing it since it'll see this page has disappeared.
Edit
Looks like I was wrong about Robots.txt, but right about 410 Gone response:
Reference
You have to do a 301 permanent redirect for Google to drop old indexed page. If you do 302, Google will try to crawl that url once in a while as it is temporary. Ignoring query parameters does not help in clearing the cache, it just sends signal saying the url with query param is same as the one without it. I guess that is not what you want. My suggestion would be to do a 301 permanent redirect if you encounter query param fb.
Right now i doubt google handles 404 and 410 lot differently. So you can do a 410 as well.
Google webmaster can help you in removing outdated/ cache content from Google search results
Copy your domain Cached URL
Browse to https://www.google.com/webmasters/tools/removals
Follow Request instructions.
The cache can be removed in a few numbers of hours. Google search engine crawls to new/current URL contents.

htaccess 301 redirect - how to disable it?

I have added 301 redirect on my website by mistake (because I was doing maintenance). Now lots of people can't get back to my website, because they are still redirected to other page - eventhough I removed redirection (even deleted htaccess). As much as I searched around it's because htaccess (or 301 redirect) is cached in users browser and I wasn't able to find any solution for this. Is there any way to fix this, I can't just loose hundreds of visitors because of something like this?
This page explains what is going on in good detail:
301 Redirects: The Horror That Cannot Be Uncached
Basically, modern browsers cache the redirect response for 301 for some indeterminate amount of time and will not make an updated request to your old web page to refresh it. Users can manually clear the cache and, because it is a cache, data can be purged if the browser needs more space for other data (like other redirects).
This SuperUser question resolves the caching issue from the client's end:
How can I make Chrome stop caching redirects?
One interesting answer is:
//superuser.com/a/660522/178910
In this answer, the user points out that the browser treats http://example.com/ and http://example.com/? as two different URLs. You could go to the "new" site and setup an HTTP 302 redirect pointing back to the original page with a ? on the end and it should load. If they original page already had a query as part of the URL, you can simple add an & to the end to achieve the same result.
It's not perfect -- it is a different URL after all -- but at least they'll be able to view your old site.
Note that your web application may try to redirect empty queries or invalid queries back to a "clean" page, which you may have to disable to get the intended result.
UPDATE
One other option is to put a redirect from the new site back to the old site (make this a 302 or 307 redirect to avoid the 301 problem you're currently having). From my testing, Chrome will remove the old redirect when it does this. It may throw a "redirect loop" error, but only once. I was unable to reproduce the cached redirect problem at all with the latest version of Firefox. Other browsers' behavior is probably going to be inconsistent.

SEO - 301 redirect via 404 page

I am new to this so I will try to explain myself clearly.
I am doing my 301 redirect from a custom 404 page. Now I got it working my question is more regarding how google would treat this. Cause we going to a 404 page would google just record it as a 40 page or would it actually record the 301? As i said I am new to this and have looked through google to try and find an answer to this.
Anyway any help or comment would be greatly appriciated. thanks in advance
Best practice in this case could be:
If the page doesn't exist, but we have new one, with highly similar content, we can make 301 redirection, simply saying: "Moved permanently", which is instructing Google to actually take new URL on account and prioritize it.
If the page doesn't exist, and we actually have no idea why someone could type this link, as this URL never existed and is just wrong, then we serve 404 "Not Found". It simply means that the URL is wrong, and someone (or some other website) has fooled you to follow this link. You shouldn't automatically redirect user from this page, but place a link to the homepage instead, so user can choose his action.
If the page doesn't exist, and we know that we had this page, but it doesn't exist, and it will not exist in the future as well (we has simply decided that we will no longer have this page), then serve 410 "Gone" page, with a link to homepage as well, and let user decide.
HTTP codes, are not just a theory, it's a standard we should use. I noticed, that many 404 pages are served without correct HTTP response code, which only suggests that there is a poor development behind it.
More about HTTP response codes here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
From my understanding, a 301 redirect is the best way to retain "link juice" and should be used if the 404 page is referencing has a lot of external links, has substantial traffic, etc.
Sending a generic 404 page straight to the home page is not ideal, as it may confuse the user. Allowing the 404 keeps the page from being repeatedly indexed and crawled by search engines.
Read more about it here: http://moz.com/learn/seo/http-status-codes.
it is not ok to redirect 404 page to another. it's better to correct it and show the old page. if it's impossible you should show 404 page and put some helpful links in it.
if you want to redirect to the correct one it's ok but the best way is to show display original page regardless of duplication. but you must use rel canonical to tell search engines where is the correct version on the page.
https://support.google.com/webmasters/answer/139394?hl=en

Apache / Google undo 301 redirect

I did something stupid a couple of days ago :)
The was a hacker attack on a server where I had a clients website hosted.
Long story short, some files were deleted and I had to rebuild it.
In the meantime, I copied a snippet of the code I found on stackoverflow.com to redirect everybody that came to that domain to another clients domain (with another similar website).
I didn't notice that the code I copied was 301 permanent redirect...
So I'm guessing, the redirect is cached in users browsers and can't be cleared out.
But what about google?
I'm guessing google will fround upon this mistake and give the domain a penalty of some sort.. Or maybe just remove the content from the search results...
Is there a way to resolve this so google is affected as less as possible?
Thanks!
Unfortunately, Google will translate your old page rankings over to the temporary redirect because they were 301's. You can't go and tell them how to index the internet, but you can create similar 301 redirects from the temporary site to the new permanent site and that should preserve most of your Google juice.