Apache httpd server redirect url with anchor tag(fragment id) in it [duplicate] - apache

I'm overhauling a website that someone else built for my organization. It was originally set up with "not so great" anchor links which included spaces. I have replaced those anchors with new ones that will work better.
Example:
One of the old anchors looked like this /course/#To Have which browsers would luckily convert to /course/#To%20Have. I changed that anchor to this: /course/#to-have.
I'm now wanting to make sure that any anchors that may have been shared on social media or that could be linked to from other websites still work; I was planning on doing this via redirect in the .htaccess file, such as this one:
Redirect 301 /course/#To%20Have /course/#to-have
After some research I've found that this is not possible due to the # in the URLs. And I also have not seen examples where an anchor was redirected to another anchor.
Is this possible?

As mentioned in my comment, this is not possible with .htaccess.
Reason being: the hash part (known as the fragment) is not actually sent to the server, and so Apache would not be able to pick it up. Servers may only pick up everything before that, which is described in the Syntax section of this article.
As an alternative, I would recommend that you use JavaScript to convert the fragment before scrolling to its location. You can do that by pulling in the value of [window.]location.hash (the part in square parenthises is optional as location is also available in the global scope) if it exists, as shown below:
if (window.location.hash) {
// Function to 'slugify' the fragment
// #see https://gist.github.com/mathewbyrne/1280286#gistcomment-1606270
var slugify = function(input) {
return input.toString().toLowerCase().trim()
.replace(/\s+/g, '-') // Replace spaces with -
.replace(/&/g, '-and-') // Replace & with 'and'
.replace(/[^\w\-]+/g, '') // Remove all non-word chars
.replace(/\-\-+/g, '-'); // Replace multiple - with single -
}
// Now get the hash and 'slugify' it
var hash = slugify(window.location.hash.split('#')[1]);
// Go to the new hash by setting it
window.location.hash = '#' + hash;
}

Related

Mod_Rewrite how to ignore slashes that don't belong in url string

I'm trying to setup canonical links in our forum and I need to come up with a rewrite rule that will ignores slashes that don't belong in the URL.
The proper URL would look like:
http://www.truckingtruth.com/truckers-forum/Topic-20315/Page-1/speak-to-recruiter
For this I'm using the following mod_rewrite rule to pass the 'topic', 'page', and 'subjectString' variables:
^Topic-(.*)/Page-(.*)/(.*)$ index.html?topic=$1&page=$2&subjectString=$3
But sometimes improper links to our site or improper links in a comment will add slashes to the URL that don't belong there and it throws off the rule. Example:
http://www.truckingtruth.com/truckers-forum/Topic-1652/Page-1/www.truckingtruth.com/free_truck_driving_schools/swift/how-to-use-the-qualcomm
When that happens the variables being passed are:
topic = "1652"
page = "1/www.truckingtruth.com/free_truck_driving_schools/swift"
subjectString = "how-to-use-the-qualcomm"
What I want it to do is pass:
topic = "1652"
page = "1"
subjectString = "www.truckingtruth.com/free_truck_driving_schools/swift/how-to-use-the-qualcomm"
How can I create a rewrite rule that will pass everything after "Page-1" as the subjectString even if there are slashes in it?
Since the topic is always integer, for your first capturing group you can use \d which matches any decimal digit (equivalent to [0-9]).
For page, just make sure not to include any slashes, [^/] will take care of that.
The rest should then all go to third capturing group, so the resulting regex will be:
^Topic-(\d*)/Page-([^/]*)/(.*)$

How to rewrite rules NginX in this example?

I am creating an API to feed some apps.
So the app could call these possible URLs to get information from the database;
mysite.com/api/v1/get/menus/list_tblname1.json.php
mysite.com/api/v1/get/menus/list_tblname1.json.php?type=arr
mysite.com/api/v1/get/menus/list_tblname2.json.php
mysite.com/api/v1/get/menus/list_tblname2.json.php?type=arr
In php I have already the code that grabs the tblname from the URL and give me back all the table content. It works good (it is not the final version).
But now I find myself copying and pasting the same code for each page where the URL points to. Here is the code:
<?php
header('Content-Type:application/json');
include_once '../../../../class/db.php';
$verb=$_SERVER['REQUEST_METHOD'];
$filePath=$_SERVER['SCRIPT_NAME'];
$split1 = explode("/", $filePath);
preg_match("/(?<=_)[^.]+/", $split1[5], $matches);
$tableName = $matches[0];
if ($verb=="GET") {
header("HTTP/1.1 200 ok");
if(isset($_GET['type']) && $_GET['type']=="arr"){
echo db::get_list($tableName,'arr');//Reply ARRAY
}
else{
echo db::get_list($tableName);//Reply JSON
}
}
else{
die("Nothing for you at this page!");
}
I mean, I have the same code inside each these pages.
list_tblname1.json.php
list_tblname2.json.php
I am not sure how to solve this situation but I think that this is case for
rewrite rules.
So, I think a possible solution is to create one page that could call
returncontent.php for example and create rules in the server that should point to the same page when certanlly pages are requested and pass the parameter $tableName to the page. I think I should pass the regex to my server and grab the $tableName with $_GET[] (I think) inside returncontent.php.
I am not sure about it.
I am using NginX.
How to implement it in this scenario?
As a rule, it's bad practice to parse a URI in NginX and pass the result downstream.
Rather: mysite.com/api/v1/get/menus/returncontent.php?file=list_tblname2.json
No changes to NginX needed. Parse the query param (file) in PHP.

Completely custom path with YII?

I have various products with their own set paths. Eg:
electronics/mp3-players/sony-hg122
fitness/devices/gymboss
If want to be able to access URLs in this format. For example:
http://www.mysite.com/fitness/devices/gymboss
http://www.mysite.com/electronics/mp3-players/sony-hg122
My strategy was to override the "init" function of the SiteController in order to catch the paths and then direct it to my own implementation of a render function. However, this doesn't allow me to catch the path.
Am I going about it the wrong way? What would be the correct strategy to do this?
** EDIT **
I figure I have to make use of the URL manager. But how do I dynamically add path formats if they are all custom in a database?
Eskimo's setup is a good solid approach for most Yii systems. However, for yours, I would suggest creating a custom UrlRule to query your database:
http://www.yiiframework.com/doc/guide/1.1/en/topics.url#using-custom-url-rule-classes
Note: the URL rules are parsed on every single Yii request, so be careful in there. If you aren't efficient, you can rapidly slow down your site. By default rules are cached (if you have a cache setup), but I don't know if that applies to dynamic DB rules (I would think not).
In your URL manager (protected/config/main.php), Set urlFormat to path (and toptionally set showScriptName to false (this hides the index.php part of the URL))
'urlManager' => array(
'urlFormat' => 'path',
'showScriptName'=>false,
Next, in your rules, you could setup something like:
catalogue/<category_url:.+>/<product_url:.+> => product/view,
So what this does is route and request with a structure like catalogue/electronics/ipods to the ProductController actionView. You can then access the category_url and product_url portions of the URL like so:
$_GET['category_url'];
$_GET['product_url'];
How this rule works is, any URL which starts with the word catalogue (directly after your domain name) which is followed by another word (category_url), and another word (product_url), will be directed to that controller/action.
You will notice that in my example I am preceding the category and product with the word catalogue. Obviously you could replace this with whatever you prefer or leave it out all together. The reason I have put it in is, consider the following URL:
http://mywebsite.com/site/about
If you left out the 'catalogue' portion of the URL and defined your rule only as:
<category_url:.+>/<product_url:.+> => product/view,
the URL Manager would see the site portion of the URL as the category_url value, and the about portion as the product_url. To prevent this you can either have the catalogue protion of the URL, or define rules for the non catalogue pages (ie; define a rule for site/about)
Rules are interpreted top to bottom, and only the first rule is matched. Obviously you can add as many rules as you need for as many different URL structures as you need.
I hope this gets you on the right path, feel free to comment with any questions or clarifications you need

How to Detect and Redirect from URL with Anchor Using mod_rewrite/htaccess?

I've seen a number of examples of the opposite, but I'm looking to go from an anchor/hash URL to a non-anchor URL, like so:
From: http://old.swfaddress-site.com/#/page/name
To: http://new.html-site.com/page/name
None of the examples at http://karoshiethos.com/2008/07/25/handling-urlencoded-swfaddress-links-with-mod_rewrite/ have functioned for me. It sounds like REQUEST_URI has the /#/stuff in it, but neither me nor my Apache (2.0.54) see it.
Any ideas, past experiences or successes?
Anything after the # is a fragment, and will not be sent to the webserver. You cannot capture it at any point there, you'll have to use a client-sided approach to capture those.
#RobRuchte : would it not be better to use window.location.hash, with a replace instead of a regular expression?
var redirectFragment = window.location.hash.replace(/^#/,'');
if ( '' !== redirectFragment ) {
window.location = 'http://new.html-site.com' + redirectFragment;
}
I'm the author of the post you linked to. Wrikken is correct, the content after the named anchor is not sent to the server unless something has mangled the URL along the way. On the client side, you need some JavaScript like this in your landing page to redirect the swfaddress links to corresponding URLs on another domain:
var re = new RegExp('#(.*)');
var redirectFragment = re.exec(document.location.toString());
if (redirectFragment!=null)
{
document.location = 'http://new.html-site.com'+redirectFragment[1];
}
I used a modified version of the answer by #m14t. This works for redirects that look like http://example.com/path/to/page#fragment --> http://example.com/path/to/page/fragment. Notice that I also concatenated the window.location.pathname for the redirect, otherwise I would not get the full path for the redirect. If the new file path is completely different from the old one, then this would not work.
var redirectFragment = window.location.hash.replace(/#/,'/');
if ( '' !== redirectFragment ) {
window.location = 'http://example.com' + window.location.pathname + redirectFragment;
}
In my case, I needed to build fragmented links into individual pages, which is part of what is commonly done to improve a website's SEO.

Are colons allowed in URLs?

I thought using colons in URIs was "illegal". Then I saw that vimeo.com is using URIs like http://www.vimeo.com/tag:sample.
What do you feel about the usage of colons in URIs?
How do I make my Apache server work with the "colon" syntax because now it's throwing the "Access forbidden!" error when there is a colon in the first segment of the URI?
Colons are allowed in the URI path. But you need to be careful when writing relative URI paths with a colon since it is not allowed when used like this:
<a href="tag:sample">
In this case tag would be interpreted as the URI’s scheme. Instead you need to write it like this:
<a href="./tag:sample">
Are colons allowed in URLs?
Yes, unless it's in the first path segment of a relative-path reference
So for example you can have a URL like this:
https://en.wikipedia.org/wiki/Template:Welcome
And you can use it normally as an absolute URL or some relative variants:
Welcome Template
Welcome Template
Welcome Template
But this would be invalid:
Welcome Template
because the "Template" here would be mistaken for the protocol scheme.
You would have to use:
Welcome Template
to use a relative link from a page on the same level in the hierarchy.
The spec
See the RFC 3986, Section 3.3:
https://www.rfc-editor.org/rfc/rfc3986#section-3.3
The path component contains data, usually organized in hierarchical
form, that, along with data in the non-hierarchical query component
(Section 3.4), serves to identify a resource within the scope of the
URI's scheme and naming authority (if any). The path is terminated
by the first question mark ("?") or number sign ("#") character, or
by the end of the URI.
If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character. If a URI
does not contain an authority component, then the path cannot begin
with two slash characters ("//"). In addition, a URI reference
(Section 4.1) may be a relative-path reference, in which case the
first path segment cannot contain a colon (":") character. The ABNF
requires five separate rules to disambiguate these cases, only one of
which will match the path substring within a given URI reference. We
use the generic term "path component" to describe the URI substring
matched by the parser to one of these rules. [emphasis added]
Example URL that uses a colon:
https://en.wikipedia.org/wiki/Template:Welcome
Also note the difference between Apache on Linux and Windows. Apache on Windows somehow doesn't allow colons to be used in the first part of the URL. Linux has no problem with this, however.