Advantage of placing webpages in separate directories? - structure

My question is there really an advantage by placing each webpage in it's own directory compared to putting them in a directory?
( www.example.com/ and www.example.com/b.php ) vs ( www.example.com/ and www.example.com/b/ )

What you've seen is probably not that each file is in its separate folder, but rather a rewriting/routing engine in action. The basic concept is that you tell the server that "a URL that looks like <this>, should point to a file with a filename like <this>, and with <these> parameters". This way, you can create easily readable URLs (which benefit both users, developers and search engines).
Example:
A user types in domain.com/cats/Garfield/. This could be interpreted as domain.com/index.php?category=cats&cat=Garfield by the server. Thus, the "usage URL" is far cleaner and easier to read and remember.
More info in the Wikipedia article about URL Rewriting.

Related

How to direct multiple clean URL paths to a single page?

(Hi! This is my first time asking a question on Stack Overflow after years of finding answers here... Thanks!)
I have a dynamic page, and I'd like to have fixed URLs that point to different states of that page. So, for example: "www.mypage.co"(/index.php) is the base page, and it rearranges its content based on user choices. I'd then like to be able to point to "www.mypage.co/contentA" or "www.mypage.co/contentB" in order to automatically load base the page at "www.mypage.co" with the desired content.
At heart the problem is an aesthetic one. I know I could simply write www.mypage.co/index.html?state=contentA to reach the desired end, but I want to keep the URL simple and readable (ie, clean). I also, due to limitations in my hosting relationship, would most appreciate a solution that is server-independent (across LAM[PHP] stacks, at least), if possible.
Also, if I just have incorrect assumptions about how to implement clean URLs, I'd appreciate direction to a good, comprehensive explanation. I can't seem to find one...
You could use a htaccess file to redirect all requests to one location and then from there determine what you want to return to the client. Look over the htaccess/dispatch system that Tonic uses.
If you use Apache, you can use mod_rewrite. I have a rule like this where multiple restful urls all go to the same page, using regex and moving parts of the old url into parameters for the new url:
RewriteRule ^/testapp/(name|number|rn|sid|unii|inchikey|formula)(/(startswith))?/?(.*) /testapp/ProxyServlet?objectHandle=Search&actionHandle=drillIn&searchtype=$1&searchterm=$4&startswith=$3 [NC,PT]
That particular regex accepts urls like
testapp/name
testapp/name/zuchini
testapp/name/startswith/zuchini
and forwards them to the same page.
I also use UrlRewriteFilter for Tomcat, but as you mentioned PHP, that doesn't seem that it would be useful.

Sub-domain vs Sub-directory to block from crawlers

I've google a lot and read a lot of articles, but got mixed reactions.
I'm a little confused about which is a better option if I want a certain section of my site to be blocked from being indexed by Search Engines. Basically I make a lot of updates to my site and also design for clients, I don't want all the "test data" that I upload for previews to be indexed to avoid the duplicate content issue.
Should I use a sub-domain and block the whole sub-domain
or
Create a sub-directory and block it using robots.txt.
I'm new to web-designing and was a little insecure about using sub-domains (read somewhere that it's a little advanced procedure and even a tiny mistake could have big consequences, moreover Matt Cutts has also mentioned something similar (source):
"I’d recommend using sub directories until you start to feel pretty
confident with the architecture of your site. At that point, you’ll be
better equipped to make the right decision for your own site."
But on the other hand I'm hesitant on using robots.txt as well as anyone could access the file.
What are the pros and cons of both?
For now I am under the impression that Google treats both similarly and it would be best to go for a sub-directory with robots.txt, but I'd like a second opinion before "taking the plunge".
Either you ask bots not to index your content (→ robots.txt) or you lock everyone out (→ password protection).
For this decision it's not relevant whether you use a separate subdomain or a folder. You can use robots.txt or password protection for both. Note that the robots.txt always has to be put in the document root.
Using robots.txt gives no guaranty, it's only a polite request. Polite bots will honor it, others not. Human users will still be able to visit your "disallowed" pages. Even those bots that honor your robots.txt (e.g. Google) may still link to your "disallowed" content in their search (they won't index content, though).
Using a login mechanism protects your pages from all bots and visitors.

Restrict unauthenticated access to files with mod_rewrite and scripting language

I have scavenged for the answers online but none seem to be similar to what I am trying to achieve. As such, I hope that gurus at stackoverflow can help me out.
What is it that I am trying to accomplish?
I want to restrict access to content for non-authorized users. Accessible content to non-authorized users will be specified in a white list. All other content is blacklisted.
What is my environment?
I am running Apache in conjunction with a scripting language very similar to that of PHP. The scripting language will not be known by many but it is Fazzt ( in case you do know and are able to infer the differences of it as compared to PHP... there are no pointers / memory management, decimal values, and binary data ). I have to use this environment due to the nature of the project.
What is happening on the site?
The site authenticates users and stores authentication in sessions. An unauthenticated user is presented with a styled ( contains images, css, js, etc ) webpage. Hence, I need to white-list all of the static images, css, js files in order for them to be available for download by the client browser. Once signed in, broader range of dynamic content is presented ( as such, anything that is not white-listed is automatically black-listed ).
How did I plan to solve the problem?
This is silly but I guess obvious is not always seen. My approach involved mod_rewriting all requests to existing files that do not match .fzt and .fsp pages. The rewrite would go to a scripting file that would check the requested file against the white list. If the file is present in the list, request would get routed directly to the file ( yes, silly me... it would get mod_rewritten again >_< ). If it's not in the list, user's authentication would be checked. If the user is not authenticated, "File not found" HTTP would be returned. Otherwise, the request would be redirected to the file and served ( same folly ).
As you can see, the approach is greatly flawed. However, I am sure something of the nature should be possible... yet, I have not found any proof just yet. What do you think? Is the mod_rewrite / script a completely wrong way of performing this task? How would you do it otherwise? Note that I cannot simply slap .htaccess as the access determined by user authentication that is tracked by Fazzt ( read above, scripting language similar to that of PHP ).
Any suggestions or thoughts would be greatly appreciated!

How do many websites hide their file structure?

When I look at many large sites (e.g. wikipedia or this site) the urls looks like this:
http://en.wikipedia.org/wiki/StackOverflow
And not like:
http://en.wikipedia.org/wiki.php?article=StackOverflow
http://en.wikipedia.org/wiki.pl?article=StackOverflow
... or even
http://en.wikipedia.org/wiki?article=StackOverflow
I suppose that wikipedia does not create a separate file for every article (and then use apache modules like mod_rewrite to hide the file extensions).
But how do they do this? Are they using a special server? Is there a way to configure apache to act like this? For example one script is called by every request and the path of the request is transmitted to the script, which will decide what to print.
These are called Friendly or Clean Urls .
Have a look at
http://en.wikipedia.org/wiki/Rewrite_engine
http://en.wikipedia.org/wiki/Clean_URL
http://www.petefreitag.com/item/503.cfm

Upgrade URL for SEO from example.com/dbtable_id/ to example.com/dbtable_id/article-title

I have an existing journal website with the following url structure
http://example.com/dbtable_id/
(eg. http://example.com/89348/)
where 89348 is the primary key id of the journal article.
I want to add the title of the article to the url for SEO purposes like
http://example.com/dbtable_id/article-title
(eg. http://example.com/89348/hello-world)
I like this approach because I don't need to change the PHP code since it will still look up the article by dbtable_id. All I have to do is append url friendly titles to relevant links in template files and add one more rule to a .htaccess file.
Is there anything I should be concerned about? Am I following best practices? Will the possibility for mismatch between "dbtable_id" and "article-title" affect SEO?
There are some that argue that shallow paths are better than deeper paths, but I don't put too much stock in this. A semantic page with a screwed up URL will always do better than an unsemantic page with a "perfect" URL.
So i say, go for it. As long as it doesn't have any querystring parameters, you should be fine.