In Express router, what is the best way to specially handle "random" urls that aren't handled by anything else? - express

Say I was making a URL shortener service, and I want it to be able to make urls like [domain]/xf6B2sT. But I also want to be able to have "normal looking urls," whether the pages are static or dynamic, and if a normal page is routed, it won't continue to look for ones of this compact format.

It would be best if you had an algorithmic way to tell whether a URL was a shortened URL or not without looking it up in your database and without comparing to all the regular site URLs. That algorithm just has to be something that allows you to examine a URL and immediately determine whether it's a shortened URL or not. If not, you send it to a router for your site URLs and if it doesn't match there you return a 404. If it does match the format for a shortened URL, then you look it up in the database and go from there.
The algorithm could be whatever you want. It could be that all site URLs have one level of path: http://yourdomain.com/site/home or it could be that all shortened URLs start with some magic character like an x that no site URLs will ever start with. There's an infinite number of possible algorithms you could invent. The point is you need to be able to quick look at a URL with some Javascript in your middleware and determine which it is without looking up anything in a database.

Related

Is there a quick way to detect redirections?

I am migrating a website and it has many redirections. I would like to generate a list in which I can see all redirects, target and source.
I tried using Cyotek WebCopy but it seems to be unable to give the data I need. Is there a crawling method to do that? Or probably this can be accessed in Apache logs?
Of course you can do it by crawling the website, but I advise against it in this specific situation, because there is an easier solution.
You use Apache, so you are (probably) working with HTTP/HTTPS protocol. You could refer to HTTP referrer, if you use PHP, then you can reach the previous page via $_SERVER['HTTP_REFERER']. So, you will need to do the following:
figure out a way to store previous-next page pairs
at the start of each request store such a pair, knowing what the current URL is and what the previous was
maybe you will need to group your URLs and do some aggregation
load the output somewhere and analyze

How to direct multiple clean URL paths to a single page?

(Hi! This is my first time asking a question on Stack Overflow after years of finding answers here... Thanks!)
I have a dynamic page, and I'd like to have fixed URLs that point to different states of that page. So, for example: "www.mypage.co"(/index.php) is the base page, and it rearranges its content based on user choices. I'd then like to be able to point to "www.mypage.co/contentA" or "www.mypage.co/contentB" in order to automatically load base the page at "www.mypage.co" with the desired content.
At heart the problem is an aesthetic one. I know I could simply write www.mypage.co/index.html?state=contentA to reach the desired end, but I want to keep the URL simple and readable (ie, clean). I also, due to limitations in my hosting relationship, would most appreciate a solution that is server-independent (across LAM[PHP] stacks, at least), if possible.
Also, if I just have incorrect assumptions about how to implement clean URLs, I'd appreciate direction to a good, comprehensive explanation. I can't seem to find one...
You could use a htaccess file to redirect all requests to one location and then from there determine what you want to return to the client. Look over the htaccess/dispatch system that Tonic uses.
If you use Apache, you can use mod_rewrite. I have a rule like this where multiple restful urls all go to the same page, using regex and moving parts of the old url into parameters for the new url:
RewriteRule ^/testapp/(name|number|rn|sid|unii|inchikey|formula)(/(startswith))?/?(.*) /testapp/ProxyServlet?objectHandle=Search&actionHandle=drillIn&searchtype=$1&searchterm=$4&startswith=$3 [NC,PT]
That particular regex accepts urls like
testapp/name
testapp/name/zuchini
testapp/name/startswith/zuchini
and forwards them to the same page.
I also use UrlRewriteFilter for Tomcat, but as you mentioned PHP, that doesn't seem that it would be useful.

How to use regular urls without the hash symbol in spine.js?

I'm trying to achieve urls in the form of http://localhost:9294/users instead of http://localhost:9294/#/users
This seems possible according to the documentation but I haven't been able to get this working for "bookmarkable" urls.
To clarify, browsing directly to http://localhost:9294/users gives a 404 "Not found: /users"
You can turn on HTML5 History support in Spine like this:
Spine.Route.setup(history: true)
By passing the history: true argument to Spine.Route.setup() that will enable the fancy URLs without hash.
The documentation for this is actually buried a bit, but it's here (second to last section): http://spinejs.com/docs/routing
EDIT:
In order to have urls that can be navigated to directly, you will have to do this "server" side. For example, with Rails, you would have to build a way to take the parameter of the url (in this case "/users"), and pass it to Spine accordingly. Here is an excerpt from the Spine docs:
However, there are some things you need to be aware of when using the
History API. Firstly, every URL you send to navigate() needs to have a
real HTML representation. Although the browser won't request the new
URL at that point, it will be requested if the page is subsequently
reloaded. In other words you can't make up arbitrary URLs, like you
can with hash fragments; every URL passed to the API needs to exist.
One way of implementing this is with server side support.
When browsers request a URL (expecting a HTML response) you first make
sure on server-side that the endpoint exists and is valid. Then you
can just serve up the main application, which will read the URL,
invoking the appropriate routes. For example, let's say your user
navigates to http://example.com/users/1. On the server-side, you check
that the URL /users/1 is valid, and that the User record with an ID of
1 exists. Then you can go ahead and just serve up the JavaScript
application.
The caveat to this approach is that it doesn't give search engine
crawlers any real content. If you want your application to be
crawl-able, you'll have to detect crawler bot requests, and serve them
a 'parallel universe of content'. That is beyond the scope of this
documentation though.
It's definitely a good bit of effort to get this working properly, but it CAN be done. It's not possible to give you a specific answer without knowing the stack you're working with.
I used the following rewrites as explained in this article.
http://www.josscrowcroft.com/2012/code/htaccess-for-html5-history-pushstate-url-routing/

Hash character in URLs (accessing and redirecting in Apache)

It looks as though this question has been asked in part by some others, but I can't find the answer I'm looking for specifically, so I thought I'd pose my particular scenario in case anyone is able to help.
We have an old website (developed externally by a third party) that is due to be retired and replaced by a new site designed in house. For reasons best known to themselves, the developers of the old site used the hash character as part of the URL for the old site (www.mysite.com/#/my-content-stuff). To assist with the transition and help with SEO I need to set up 301 redirects for the top performing URLs from the old site. As I'm now discovering however, I'm not able to set up a simple redirect in the .htaccess file as I believe it takes the hash character to be a comment and ignores the remainder of the line. I've tried escape characters, using %23 instead, wildcard matching, nothing seems to work.
As a workaround, I wondered about simply creating dummy files with the same paths and URLs as the old site had, then simply creating HTML redirects within them to drive traffic to the correct new pages, but it looks as though the server is doing something similar regarding the hash character in the URL, and ignoring anything afterit. So, if I create a sub-folder on my news server called '#' and create a file in there called 'test.html', I expected to be able to just go to 'www.myNEWsite.com/#/test.html', but it just takes me to the default root file of my site.
Please can anyone shed any light on how I might get around this? I must admit I'm not that clued up on Apache so I'm having to learn a lot as I go.
Many thanks in advance for any pointers or info anyone can provide.
Cheers,
Rich
A hash character in the URL specifies the anchor, and it's not even sent to your webserver. A redirect is impossible on the server side, and the old developer probably did it using JavaScript. Implement fallback URLs without the hash instead, and have a global JavaScript script detect these URLs and redirect automatically.
Hash tags cannot be read by the server. They are regarded as locations within the document and are therefore not exposed to the server. The client is the only one whom see's these. The best you could do is use a "meta refresh" tag, or alternatively, you could use javascript to detect the url, and if its one which requires 301 redirection, use "window.location" to move the user to a full url where mod_rewrite or a php page can issue a 301 header.
However neither are SEO friendly and only really solve the issue for users that click onto an old link via an external site
<!-- Put in head tag so the page does not wait to load the content-->
<script type="text/javascript">
if(window.location.hash != "") {
var h = window.location.hash.match(/#\/?(.*)/i)[1];
switch(h) {
case "something_old":
window.location = "/something_new.html";
break;
case "something_also_old":
window.location = "/something_also_new.html";
break;
}
}
</script>

Removing URL duplicates when using pretty urls

I'm using pretty URLs in my web app, one example is 'forum/post/1' which invokes PostController in Forum module, which loads a post with id=1. This is what I need but that post is also accessible from 'forum/post/view/id/1'. That's bad, because search crawlers don't like when same page is accessible from several URLs, right?
I'm using Yii framework which supports 'useStrictParsing' option, which tells that incoming request must match at least one "pretty" route, otherwise request fails with 404. However it's not a perfect solution, because I don't have pretty URLs for every controller/action.
Ideally, framework should redirect 'forum/post/view/id/1' to 'forum/post/1' with a 301 status code. How did you solve this problem? It's not Yii/PHP specific question, how does your framework/tool deal with it?
The best way to make sure search engines only rank one page the pretty url over another, if there are multiple ways to view the content is to your a canonical tag within the header of your document
<link rel="canonical" href="http://www.mydomain.com/nice-url/" />
This is very useful with windows based system as IIS is not case sensitive with its web pages but the web standard is case sensitive.
So
www.maydomain.com/Newpage.aspx
www.maydomain.com/newpage.aspx
www.maydomain.com/NEWPAGE.aspx
These are all seen by Google as different pages, and you are then marked down for having a site with duplicate content, but not so with a canonical as each page in the case above would have the same canonical meta tag and the that url is the only one which will be used by the search engines.
Provided that no one links to your non-pretty urls, the search engines will never know that they exist.
If you do want to eliminate them, you could bypass your web framework by adding an alias in you web server's configuration file; the url will be redirected before it ever reaches the framework.
Frameworks like Django, which don't provide 'magic' routing, don't face this issue, the only routes which exist are those which you define manually. In it's case, you could define a view for the non-pretty url which returns the appropriate redirect.