Google Webmaster is not Accepting my Sitemap [closed] - seo

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 9 years ago.
Improve this question
After setting my Google Webmaster account and verified my website, i failed to add my sitemap to it. It was issuing the following error.
I tried to do the following:
I removed the robots.txt and still didn't work.
I tried to verify my sitemap on http://www.validome.org/google/validate and it got reported as valid.
I checked the sitemap and my URL several times for errors and everything seemed to be alright.
For Reference:
My sitemap.xml is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://www.myDomain.com/</loc>
<changefreq>daily</changefreq>
<priority>1.00</priority>
</url>
<url>
<loc>http://www.myDomain.com/about/</loc>
<changefreq>daily</changefreq>
<priority>0.90</priority>
</url>
<url>
<loc>http://www.myDomain.com/help.php</loc>
<changefreq>daily</changefreq>
<priority>0.90</priority>
</url>
</urlset>
My Robots.txt is as follows:
User-agent: ia_archiver
Disallow: /
User-agent: duggmirror
Disallow: /
User-agent: *
Disallow: /cgi-bin/
Sitemap: http://www.myDomain.com/sitemap.xml

Here is some nice reference for you problem with htaccess rules against bot :
http://www.widexl.com/tutorials/htaccess.html
http://www.howtoforge.com/forums/showthread.php?t=27809

If you want something really effective for your need :
Replace your robots.txt rules by this one :
User-agent: *
Allow: /
Sitemap: http://www.myDomain.com/sitemap.xml
And add this to your .htaccess
#Block ia_archiver & duggmirror
RewriteCond %{HTTP_USER_AGENT} .*.ia_archiver|duggmirror* [NC]
RewriteRule .* - [F]
#Block cgi access
<filesMatch "^php5?\.(ini|cgi)$">
Order Deny,Allow
Deny from All
Allow from env=REDIRECT_STATUS
</filesMatch>

It turned out the problem was from Google servers. I didn't change anything, just left the whole topic for a week and tried again. Everything seems to be working fine now.
Sometimes due to some major updates or Google busy servers, the acceptance of sitemaps takes more time that the usual rates. For those people who are facing similar problems, just wait for some days and give it another shot before complaining in Google forums and such.

Related

Convert NGINX rewrite rules to Apache htaccess [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I have a client who is moving away from an NGINX webserver to Apache. Everything is simple nothing complicated however since I'm an NGINX kinda guy I forgot how to convert the NGINX rewrite rules onto apache ones.
For example, these are the NGINX rewrites
rewrite ^/Tower-Topics-Calendar/?$ https://$host/events/ permanent;
How would I convert something like that onto .htaccess to use with Apache?
rewrite ^/Tower-Topics-Calendar/?$ https://$host/events/ permanent;
This looks like an external 301 redirect from /Tower-Topics-Calendar (with and without a trailing slash) to https://<host>/events/ - where <host> is the same hostname from the request and you specifically state the HTTPS protocol in the target.
In .htaccess you can achieve this using mod_rewrite. For example:
RewriteEngine On
RewriteRule ^Tower-Topics-Calendar/?$ https://%{HTTP_HOST}/events/ [R=301,L]
Note the absence of the slash prefix in the RewriteRule pattern.
However, if you don't specifically need to include the HTTPS scheme (ie. this is already canonicalised) then you can use a single mod_alias RedirectMatch directive instead. For example:
RedirectMatch 301 ^Tower-Topics-Calendar/?$ /events/
OR, to include the HTTPS protocol, you need to hardcode the hostname:
RedirectMatch 301 ^Tower-Topics-Calendar/?$ https://example.com/events/

Will this combination of robots.txt and .htaccess rules block indexing of certain file types?

I'm working on a WordPress site that has a login portal where users can access 'classified' documents in pdf,doc and a few other formats. The files are uploaded via the media manager, so are always stored in /wp-content/uploads
I need to make sure these file types are not shown in search results. I've made some rules in .htaccess and robots.txt that I think will work, but it's very hard to test, so I was hoping someone could glance over them and let me know if they'll do what I'm expecting them to or not. One thing in particular I wasn't sure of: would the disallow: /wp-content/ stop the x-robots-tag from being seen?
.htaccess - under # end Wordpress
# do not index specified file types
<IfModule mod_headers.c>
<FilesMatch "\.(doc|docx|xls|xlsx|pdf|ppt|pptx)$">
Header set X-Robots-Tag "noindex"
</FilesMatch>
</IfModule>
robots.txt - complete
User-agent: *
Disallow: /feed/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /wp-content/
Disallow: /wp-includes/
Disallow: /xmlrpc.php
Disallow: /wp-
Disallow: /growers-portal
Disallow: /growers-portal/
Disallow: /grower_posts
Disallow: /grower_posts/
Sitemap: http://www.pureaussiepineapples.com.au/sitemap_index.xml
Neither of those stop anyone reading your "classified" documents. To do that you really want to restrict access based on logged in users.
The robots tag will keep the files out of the search results.
However, robots.txt does not stop files being in the search results. Google takes that directive to say they can't read the file but they can still include it in the index.
This causes an interesting scenario. Your robots.txt stops Google reading the robots tag so does not know you want it out of the index.
So, if you're not going to physically control access to the files I would use the robots tag but not robots.txt directives.

How Disallow a mirror site (on sub-domain) using robots.txt? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a website saying :
http://domain.com/
mirror site on
http://cdn.domain.com/
I don't want cdn to be indexed. How can I write robots.txt rule to avoid the cdn from being indexed without disturbing my present robots.txt excludes.
My present robots.txt excludes :
User-agent: *
Disallow: /abc.php
How can I avoid cdn.domain.com from being indexed ?
User-agent: *
Disallow: /abc.php
in your root .htaccess file add the following
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Amazon.CloudFront$
RewriteRule ^robots\.txt$ robots-cdn.txt
And then create a separate robots-cdn.txt:
User-agent: *
Disallow: /
When accessed through via http://cdn.domain.com/robots.txt will return the contents of the robots-cdn.txt file... otherwise the rewrite won't kick in and the true robots.txt will kick in.
This way you are free to mirror the entire site (including the .htaccess) file with the expected behavior
Update :
HTTP_USER_AGENT did it since Amazon uses it while querying it from any location.
I have verified and it works
If the codebase are the same, you can generate your robots.txt dynamically and change its content depending on the requested (sub)domain.

Error 500 with .htaccess edits but mod_rewrite is loaded [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
as of title, I'm having problems with my .htaccess file, everything should be set fine, but has I write something as basic as
RewriteEngine on
it starts giving me that nice 500 Internal Server Error. I'm hosting on localhost on an Apache server (UNIX)
Obviously I triple checked that everything is set fine, and top of all that mod_rewrite is loaded.
Thanks for your precious help!
If the error occurs for every single type of instruction you put in the file (that is, caching, FilesMatch, ErrorDocument, etc), there are two possible options that I can think of right now:
The encoding of your .htaccess file is not compatible with the server you're running. Try converting it to ANSI, and then try again (Apache does not support Byte Order Marks, so you'd need to save it to ANSI, or UTF-8, without the BOM). If that does not work:
AllowOverride is not set correctly, or not at all. If you have access to the Virtual Host/Directory configuration, you'll need to enable it by adding the line AllowOverride All in the <Directory> container.
When you try to remove the code RewriteEngine on and the URL remapping or the URL redirecting is still functioning well, then you do not have to put that code in your .htaccess file, for your HTTP server was already configured that the rewrite engine is on. But if you've tried to remove it and the rewriting or the redirecting stop, then you must include all your .htaccess codes in your question. Then if it's empty, then why you need to turn the RewriteEngine on if you're not going to apply even a single RewriteRule?
Have you ever tried this:
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^notexist.html$ /index.html
And try to visit yourdomain.com/notexist.html if it's URL remapping into /index.html.

url rewrite `/abc/products-` to `/def/products-` [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have many url links in /abc/ sub-directory, like
http://www.domain.com/abc/products-12345
http://www.domain.com/abc/products-23456
http://www.domain.com/abc/products-34567
http://www.domain.com/abc/new-items
Now I want to url rewrite /abc/products- to /def/products-
http://www.domain.com/def/products-12345
http://www.domain.com/def/products-23456
http://www.domain.com/def/products-34567
http://www.domain.com/abc/new-items
My code in .htaccess, but nothing changed. How to rewrite in this case? Thanks.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^abc/products-(.*)$ /def/products-$1 [L,R=301]
</IfModule>
As you can test on this simulator the rules should work.
The most probable problem you are facing should be:
The global configuration does not allow for .htaccess overwrite
.htaccess is not readable for apache user
You have no mod_rewrite active on this apache server, thus no trigger to <IfModule> directive.
Best troubleshooting option would be trying a redirection of all page to a static page. If this does not work, look for a configuration problem.