robots.txt file is probably invalid [closed] - indexing

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
this is my robots.txt. I want to only allow the base url domain.com for indexing and disallow all sub urls like domain.com/foo and domain.com/bar.html.
User-agent: *
Disallow: /*/
Because I am not sure whether this is a valid syntax I tested it using Google Webmaster Tools. It shows me this message.
robots.txt file is probably invalid.
Is my file valid? Is there a better way of only allowing the base url for indexing?
Update: Google downloaded my robots.txt 4 hours ago. I think thats why it doesn't work. I will wait some time and if the problem stays I will update my question again.

Here is a link to a validator. It might help you work through any errors in the file.
Robots.txt Checker
I checked on another validator, robots.txt Checker, and this is what I got for the second line:
Wildcard characters (like "*") are not allowed here The line below
must be an allow, disallow, comment or a blank line statement
This might be what you're looking for:
User-Agent: *
Allow: /index.html
Disallow: /
This assumes your homepage is index.html.
If index.php is your homepage, you should be able to swap out index.html for index.php.
User-Agent: *
Allow: /index.php
Disallow: /
On my dynamic websites that run through index.php, going to mydomain.com/index.php still takes me to the homepage, so the above should work.

Related

robots.txt block bots crawling subdirectory [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'd like to block all bots from crawling a sub directory http://www.mysite.com/admin plus any files and folders in that directory. For example there may be further directories inside /admin such as http://www.mysite.com/admin/assets/img
I'm not sure what is the exact correct declarations to include in robots.txt to do this.
Should it be:
User-agent: *
Disallow: /admin/
Or:
User-agent: *
Disallow: /admin/*
Or:
User-agent: *
Disallow: /admin/
Disallow: /admin/*
Based on information available on the net (I can't retrieve it all but some forums actually report the problem, as in here and here, for example) I'd follow those who suggest we never tell people or bots (or both) where is that we don't want them to look ("admin" looks like sensitive content...).
After having checked, I'd confirm it's the first one you say. Reference here

Apache structure issue (.htaccess) [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a structure like this
/ < root folder
/Site/Public/.htaccess
/Site/Public/index.php
/Site/Public/error.php
/Site/Public/images/chat.png
In my htaccess I have disabled access to subfolders and set a default 403 document like so:
ErrorDocument 403 error.php
Options All -Indexes
But the problem is that I cannot get it to pick that error.php file unless I use the full path starting from root. I also tried this
ErrorDocument 403 chat.png
And it doesn't pick that up either just displays a string in both situations. Can anyone tell me how to target that error.php file without using the absolute path?
The experimenting url is localhost/Site/Public/images
from http://httpd.apache.org/docs/2.2/mod/core.html#errordocument
URLs can begin with a slash (/) for local web-paths (relative to the
DocumentRoot), or be a full URL which the client can resolve.
Alternatively, a message can be provided to be displayed by the
browser.
any argument that is not a full url (http://www.example.com) or does not start with / will be treated as string.
The urls have to be defined relative to the DocumentRoot, which in your case seems to be the same for your sites.
Alternatively you can use full urls that can be resolved by the client.
That may be an alternative for you.
Everything else you need to know can be read in the manual:
http://httpd.apache.org/docs/current/mod/core.html#errordocument

index.php appending to url [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a magento site which has index.php appended to the url you click on. I googled a lot to find the solution and i did what i could find.To clear my my doubts i uploaded htaccess file fresh copy from magento copy and made the url rewrite in configuation>system>web to yes and clear the cache too but still it put index.php in url.I have also double checked secure and unsecure link to see if it contain any index.php which it doesn't
I can do all what i can to do research and applied it but no change. What can i do or what can be wrong?
The steps you describe should be right:
System > Configuration > Web > Use Web Server Rewrites set to yes (also check the store view level value, because the scope for this is not global)
.htaccess present in document root
clear Magento cache
Additional things to check:
System > Configuration > Web > (Un)secure base url
does your Apache take into consideration .htaccess (AllowOverride)
how did you clear the cache
the scope for your settings System > Configuration > Web > Use Web Server Rewrites

301 Redirect every page on a site to a new domains homepage [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I know there a million other posts like this but this ones a bit specific.
Basically I have an old site that has been dead, just made it live again and theres still a few old links in Google. I want to do something in the htaccess that will 302 redirect ALL pages and the site to my new domain. for example I need:
www.oldsite.com > www.newsite.co.uk
oldsite.com > www.newsite.co.uk
www.oldsite.com/?color=red > www.newsite.co.uk
www.oldsite.com/?color=red&size=large > www.newsite.co.uk
www.oldsite.com/page > www.newsite.co.uk
www.oldsite.com/something.html > www.newsite.co.uk
Any idea how to do this?
You're right that a .htaccess file is the answer.
RewriteEngine on
RewriteRule .* http://www.newsite.co.uk/? [R=301,L]
As you want to redirect everything, no RewriteCond required.
Believe you'll want it 301'ed though if you want Google to follow these
Edit: Have added ? to the end of the domain to redirect to. This will remove the current query string.

Redirect all requests to a certain folder using .htaccess [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I really need some help on a HTACCESS file - if someone could help me it would be very much appreciated.
All I want to do is redirect any request to the folder /test/
So if you go to my website http://www.example.com, I want it to auto-redirect via the HTACCESS file to /test/ directory!
Every time I try to do it, it causes a redirect loop! Help! :(
If anyone has any idea of how exactly I could do this it would be much appreciated.
Thanks for you're help.
Please note: I am trying to redirect from one website to the same website but just in a different folder. i.e. I want example.com to redirect to example.com/test/
RewriteCond %{REQUEST_URI} !=/test
RewriteRule ^ /test [R=301]
redirect / http://www.expample.com/test/