Robots.txt disallow path with regular expression - seo

Does robots.txt accept regular expression ?
I have many URLs with this format:
https://example.com/view/99/title-sample-text
ID ----------------------^
Title -----------------------------^
I used this:
Disallow: /view
But look like this not working because google indexed more pages. so i want to do this with regex, something like this:
Disallow: /view/([0-9]+)/([^/]*)
But is this correct format or valid in robots.txt ?

You can use a wildcard ...
User-agent: *
disallow: /view/*
See https://webmasters.stackexchange.com/questions/72722/can-we-use-regex-in-robots-txt-file-to-block-urls
Hope this helps.

Related

How to rewrite URLs in htaccess that end with recurring characters

I have changed web platforms and have old URLs that I cannot and do not want to match on the new platform where the old content is now living.
I have an array of old product URLs that all have '-p-' in the URL, followed by a string of numbers and ending in .html (osCommerce platform URLs).
I would like to know how to rewrite:
/x/[rest-of-url]-p-[random numbers].html
to
/x/[rest-of-url]
I would like the end result to look something like this:
http://www.shop.com/shop/versace-black-snakeskin-pony-hair-hobo-p-2214.html
redirects to:
http://www.shop.com/shop/versace-black-snakeskin-pony-hair-hobo
Does anyone know if this is doable in the htaccess file as a rewrite rule?
My managed hosting service providers BeepWeb answered my question.
RewriteRule ^/shop/(.*)-p-(.*).html$ http://www.shop.com/product/$1/ [R=302]
The first argument is the URI that you are matching. The (.) matches any characters. The second argument is the destination URL. The $1 corresponds to the first (.). $2 would be the second (.*), and so on... The [R=302] tells the rewrite to be a 302 redirect (use [R=301] for a 301 redirect).
Using the (.) is essentially like using a wildard. You can instead narrow this down by specifying which characters you want to match as opposed to all characters (instead of using (.) you could use ([abc]*) which would match only against a, b and c characters).
Also, be careful that you do not match other URLs unintentionally (i.e. you need to make sure that the pattern matches are unique to the URLs being rewritten).
If you need the source reference, see the following:
https://httpd.apache.org/docs/current/rewrite/intro.html
Thanks again to http://www.beepweb.com for their detailed response.
Hope it helps others.

rewrite spaces to redirect to underscores when following a specific domain url string and only

how to write a modrewrite that will replace a space %20 with an underscore that will look like this. http://cityinsider.com/b/ocean%20shores_wa/mikes-seafood-ocean-shores will redirect to http://cityinsider.com/b/ocean_shores_wa/mikes-seafood-ocean-shores . All the underscores are permanent, but sometimes there is a space between two and three word cities. so any spaces found after only this type and part of url: cityinsider.com/b/ need to be replaced . So this won't effect any other url strings that don't look like the above; for example it should not work on spaces that are immediately after domain root e.g. cityinsider.com/%20/appleWood and shoulnd't work where it doesn't follow the cityinsider.com/b/ such as cityinsider.com/c/%20
Try:
RewriteEngine On
RewriteRule ^([^/]*)\ ([^/]*)/(.*)$ /$1_$2/$3 [L,R=301]

How to disallow service api and multilingual urls in robots.txt

I need to disallow the next URLs:
service api /_s/user, /_s/place, ... All starts with /_s/
save form: /{language}/save. For example /{en}/save, /{ru}/save, ...
NOTE: most URLs have language parameter at the beginning: /en/event, ... I don't want to block them.
Should be something like: (but this is not allowed by robots.txt format)
Disallow: /_s/*
Disallow: /:lang/save
In robots.txt matching is from the left, so it matches anything that begins with /pattern.
The wildcard like /*pattern matches any beginning which must be followed by the given pattern. Therefore * is never needed on the right (e.g. /foo* as it is equivalent to /foo).
So in your case you can use
Disallow: /_s/
to disallow anything which starts with /_s/ e.g. /_s/foo
Disallow: /*save
to disallow all patterns such as /en/save but also /foosave or /en/save/other
You can use $ to signify "must end with"
Disallow: /*save$
to disallow all patterns such as /en/save or /fr/save but not /en/save/other
You can find a bit more on robots.txt in Robots.txt : 4 Things You Should Know article
I hope that will help.

Redirect 301 from a Directory to a Single File

I'm having a bit of trouble figuring out something that should be simple. I want to 301 redirect everything in a directory to one single file in a new location.
In my .htaccess, I've already tried the following...
Redirect 301 /myDir/ http://www.mydomain.com/myNewDir/index.html
and this...
Redirect 301 /myDir/ http://www.mydomain.com/myNewDir/
and this...
Redirect 301 /myDir http://www.mydomain.com/myNewDir
The problem is that each of those are simply mapping each file within /myDir/, and appending it to the end of the destination URL.
After Googling, I saw something that said to do this...
Redirect 301 ^/myDir(.*) http://www.mydomain.com/myNewDir
But that just does the same thing... it's mapping the existing file location to the end of the URL.
It was easy finding lots of ".htaccess redirect" tutorials online but they seem to only show the obvious examples like 'one-to-one file mapping' or 'one-to-one directory mapping'. These tutorials also seem to neglect explaining the various relevant file directives and how to properly use them.
This particular hosting account is garbage and also has FrontPage extensions installed. Mod-rewrite fails (breaks the whole site) yet the Redirect 301 lines are operating fine. So until I can move this new (non-FrontPage) site to a more robust hosting account, I'll need to stick with the Redirect 301 one-liner.
How can I simply use a Redirect 301 to redirect everything within /myDir/ to the same single file located at /myNewDir/index.html? (I'd prefer using just /myNewDir/ if possible). Kindly explain, in detail, the file directives used in your solution.
UPDATE:
Previously accepted answer is not working.
Example:
RedirectMatch 301 /myDir1/(.*) http://mydomain.org/newpath/myDir1/index.html
...is giving a "Too many redirects occurred trying to open" error.
This is because /myDir1/(.*) is matching anyplace within the string so if the target URL contains /myDir1/ anywhere, not just the root, it will get redirected into a nasty loop.
See my own posted answer for correct solution.
I found the answer within one of my old projects.
Redirect 301 is all wrong for this. I really wanted RedirectMatch 301 instead.
RedirectMatch 301 ^/myDir/(.*) http://www.example.com/myNewDir/
Explanation(s):
http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirectmatch
"This directive is equivalent to Redirect, but makes use of standard
regular expressions, instead of simple prefix matching."
http://www.zytrax.com/tech/web/regex.htm
"The ^ (circumflex or caret) outside square brackets means look only at
the beginning of the target string, for example, ^Win will not find
Windows in STRING1 but ^Moz will find Mozilla."
and...
"The . (period) means any character(s) in this position, for example,
ton. will find tons, tone and tonneau but not wanton because it has no
following character."
and...
The * (asterisk or star) matches the preceding character 0 or more
times, for example, tre* will find tree (2 times) and tread (1 time)
and trough (0 times).
Try this:
RedirectMatch 301 /myDir/.* http://www.mydomain.com/myNewDir/index.html
Reference: http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirectmatch.
As far as brackets around .* are concerned, RedirectMatch uses standard regular expressions, which means that you can capture matched characters and use them in your redirect rule refferencing them as $1, $2, etc.
In regular expressions * means any number of repetitions of the previous character. . - denotes any character. So the combination .* says that this pattern match any number of any character. Hence * . * means that this pattern will match /myDir and /myDir/, and still /myDir/test.html. So * . * can also be used

How to replace escape codes in url and redirect it using htaccess?

I need to redirect multiple urls from this format:
http://site.com/gallery.php%3Fpage%3D12
(the 12 at the end is the page number, i have many links like this with different numbers at the end)
to this:
http://site.com/gallery.php?page=12
how to i write a rule in htaccess that will replace those chars in all the urls and redirect them to the correct urls?
By default URLs in mod_rewrte are decoded(unescaped) so there is no need to escape(encode) them!
As mentioned by "Death", there is no need to replace the chars, this simple rule did the trick:
RewriteRule ^gallery\.php\?page\=(.*) http://site.com/gallery.php?page=$1 [R=301,L]