Mod_Rewrite how to ignore slashes that don't belong in url string - apache

I'm trying to setup canonical links in our forum and I need to come up with a rewrite rule that will ignores slashes that don't belong in the URL.
The proper URL would look like:
http://www.truckingtruth.com/truckers-forum/Topic-20315/Page-1/speak-to-recruiter
For this I'm using the following mod_rewrite rule to pass the 'topic', 'page', and 'subjectString' variables:
^Topic-(.*)/Page-(.*)/(.*)$ index.html?topic=$1&page=$2&subjectString=$3
But sometimes improper links to our site or improper links in a comment will add slashes to the URL that don't belong there and it throws off the rule. Example:
http://www.truckingtruth.com/truckers-forum/Topic-1652/Page-1/www.truckingtruth.com/free_truck_driving_schools/swift/how-to-use-the-qualcomm
When that happens the variables being passed are:
topic = "1652"
page = "1/www.truckingtruth.com/free_truck_driving_schools/swift"
subjectString = "how-to-use-the-qualcomm"
What I want it to do is pass:
topic = "1652"
page = "1"
subjectString = "www.truckingtruth.com/free_truck_driving_schools/swift/how-to-use-the-qualcomm"
How can I create a rewrite rule that will pass everything after "Page-1" as the subjectString even if there are slashes in it?

Since the topic is always integer, for your first capturing group you can use \d which matches any decimal digit (equivalent to [0-9]).
For page, just make sure not to include any slashes, [^/] will take care of that.
The rest should then all go to third capturing group, so the resulting regex will be:
^Topic-(\d*)/Page-([^/]*)/(.*)$

Related

How do I enable query string in Cloudflare page rules?

I want to forward this URL
https://demo.example.com/page1?v=105
to
https://www.example.com/page1
It's not working if I set so:
But if I remove the ?v= part, it works:
Is there any way to include the ?v= part in this page rule?
The rule you mentioned works only for that exact URL: https://demo.example.com/page1?v=105. If it has any additional query parameter or even an additional character, eg https://demo.example.com/page1?v=1050, it won't match the rule.
If you need to use complicated URL matches that require specific query strings, you might need to use bulk redirects or a Cloudflare worker.

Apache regex -301 redirects to eradicate duplicates in url path

We are using a CMS that produces URLs of the format www.domain.com/home/help/contact/contact. Here the first occurrence of contact is the directory and the second occurrence is the HTML page itself. These urls are causing issues in the SEO space.
We have implemented canonical tags but the business wants to make sure they don't come across these duplicates in both the search engines and Google analytics, and have asked us to implement a 301 solution on our web server.
My question is we have got a regex to find these matches but I also need the part of the URL before the match. The regex we have is .*?([\w]+)\/\1+ and this returns contact in /home/help/contact/contact. How can I get the /home/help/ path as well so I can redirect to the right page? Can someone help with this please as I am a beginner when it comes to regex?
Since you're able to get contact using a matching group, enclose everything before that inside a matching group as well:
(.*?)(/[\w]+)\2+
I have put the / inside a matching group too, so that you won't get false positives for
/home/some/app/page
this would be \1 ^ ^ found repetition (character p would be matched)

mod_rewrite for 1 defined and other arbitrary parameters

I am struggling with mod_rewrite. Currently I have this url:
http://localhost/products.php?page=3&sort=r90&range=25-50
The page, sort and range parameters are not mandatory. What I like to have is the following working urls:
For page 1:
http://localhost/products --> http://localhost/products.php
For page 3:
http://localhost/products/3/ --> http://localhost/products.php?page=3
For page 5 with n parameters:
http://localhost/products/5/?sort=r90&range=25-50 --> http://localhost/products.php?page=5&sort=r90&range=25-50
I currently have this rule:
^products/([0-9]+)$ http://localhost/products.php?page=$1
which correctly works for:
http://localhost/products/3/ and http://localhost/products/
However when I add parameters e.g. http://localhost/products/2/?sort=r90&range=25-50
It redirects to http://localhost/products.php?sort=r90&range=25-50 (without the page parameter)
Any thoughts?
Use QSA flag. modify your current rule to:
RewriteRule ^products/([0-9]+)/?$ /products.php?page=$1 [NC,QSA,L]
From Apache's docs on mod-rewrite:
Modifying the Query String
By default, the query string is passed through unchanged. You can,
however, create URLs in the substitution string containing a query
string part. Simply use a question mark inside the substitution string
to indicate that the following text should be re-injected into the
query string. When you want to erase an existing query string, end the
substitution string with just a question mark. To combine new and old
query strings, use the [QSA] flag.

What's the best way to compare this in VB.net

I would like to ask you, what your opinion is on the best way to compare URLs. Lets say there are 10 available formats for a URL. I've listed them below.
http://domain.com
https://domain.com
http://www.domain.com
https://www.domain.com
www.domain.com
domain.com
and some more with slash in the end.
http://domain.com/
https://domain.com/
http://www.domain.com/
https://www.domain.com/
www.domain.com/
domain.com/
What would be the best solution for easily comparing if these URLs match an item in a listbox. I'm currently doing something else, which constructs 3 different URLs. But the code is too messy, and I'm looking for something a bit cleaner.
I'm looking for a bit something like the code below.
But how well, would this actually compare the two URLs?
For Each result As String In lb_results.Items
If String.Compare(result, "urls to compare") Then
End If
Next
if your simply comparing "domain.com" to its variants and need to strip this out...
quickly normalize the string so that the domain name and extension must be between periods.
MessyURL = Replace(MessyURL, "/", ".").Trim
get rid of the last pesky last slash which is now a period if its there.
If Mid(MessyURL, MessyURL.Length, 1) = "." Then
MessyURL = Mid(MessyURL, 1, MessyURL.Length - 1)
Put the name, a dot, and the com (or whatever) back together.
Dim TestName As String
TestName= MessyURL.Split(".").ElementAt(MessyURL.Split(".").Count - 2) &
"." & MessyURL.Split(".").ElementAt(MessyURL.Split(".").Count - 1)
and whala, a nice 'domain.com' testname name to compare against a list, and or insert into the list if its not there...
If MyListOfUrLs.Items.IndexOf(TestName) = -1 then MyListOfUrLs.Items.Add(TestName)
YOu want just to check if the URLs are the same site use:
For Each result As String In lb_results.Items
If result.Contains("domain") = True Then
MsgBox("They have the same URL")
End If
Next

Apache rule to change / (slash) to _ (underscore)

I've set up an Apache HTTP server with VirtualHosts in front of a proprietary web server in the back. The backend server can only have one (1) level in its ID paths so the following public URLs:
http://public-server/path1/path2/path3?querystring-parameters
should be converted for the backend to:
http://internal-server/path1/path2/page/<path1>_<path2>_<path3>?querystring-parameters
Notice that there can be any number of path1, path2, path3, path4, .... and they should all (no matter if only 1 exists or multiple) be concatenated with an underscore. Also notice that the querystring-parameters CAN contain '?', '/' and '_' so the rule should not alter the querystring in any way.
I've tried searching for solutions to this but can't figure out how to overcome the problem. Any suggestions?
If you can come up some maximum number of possible paths, you can do something to this effect:
# This will work for up to 5 paths
RewriteRule /([^/]*)/?([^/]*)/?([^/]*)/?([^/]*)/?([^/]*) http://internal-server%{REQUEST_URI}$1_$2_$3_$4_$5 [L,QSA]
The /?([^/]*) can be added to the end as many times as you need, along with added the corresponding groups (_$6 ..) to the rewritten URL.
Unfortunately, there is not a way have a completely unknown number of paths, while at the same time use them in the rewritten URL. Also, the [QSA] flag will attach your querystring on to the forwarded URL, untouched.
Hope this helps.