Special characters in document.location.href

Special characters in document.location.href - apache

Is it bad practice to make a page that has a web address like these:
http://example.com/-products-and-services.php
http://example.com/-contact-us.php
http://example.com/--books.php
http://example.com/--translation.php
http://example.com/--illustration.php
http://example.com/-$-special-feature.php
http://example.com/-$-vip-area.php
Will google or apache have problems with these (- $) characters?
I am doing this because I makes it easier for me to view and categorise pages while still letting me add keywords to the file names.
Thanks

You can use alphanumerics, and the special characters "$-_.+!*'()," in url.
But it may not be very helpful with seo, search engine indexing etc.
You can read what google says here,
https://support.google.com/webmasters/answer/76329?hl=en

Related

Sharing a link (share?url=) with spaces?

Trying to add a sharing function to my site, but GPlus seems to have trouble sharing url's with spaces in them.
Even escaped they dont seem to work.
eg;
https://plus.google.com/share?url=http://www.google.com/%23test%20test
It only seems to recognize upto before the %20.
Any ideas? Is this a bug? Am I doing something wrong?
The site is rather ajaxy, and in the history tokens would be a pain to need to use non-standard escaping of characters just for google plus.

I don't think that this is a bug with Google+ but rather its likely intentional because those URLs would need to be double URL encoded because one URL is sharing a second URL, thus your shared URL should be http%3A%2F%2Fwww.google.com%2F%2523test%2Btest
This won't work to create a preview in the share snippet but the URL is correct when it is shared.
All said, you shouldn't use spaces in your URLs because they are considered unsafe, see RFC 1738. You should change your app's URL structure.

Why are urls usually lowercase with words separated by a dash and no special characters?

As an example, the Rails parameterize method would create a string like so:
"hello-there-joe-smith" == "Hello There Joe.Smith".parameterize
For legacy reasons, a project I am working on requires uppercase letters as well as periods to be available in a particular URL parameter.
Why would this ever be a problem?
Clarification
The url type I'm talking about is what is used instead of an id, commonly knows as a slug.
Would a Rails app with the following url come to any issues: http://example.com/Smith.Joe?

This will be a problem both in terms of SEO and browser caching (and hence performance,)
Search engines are case sensitive, so same URL in different case will be taken as two URLs.
Browser like IE's caching is case sensitive, so eg. if you try to access your page as MYPAGE.aspx and at some place in code, you write it as mypage.aspx then IE will treat them as two different pages and instead of getting it from cahce, it will get it from server.
Dashes should be fine but underscores should be avoided : http://www.mattcutts.com/blog/dashes-vs-underscores/

Is there a way to prevent Googlebot from indexing certain parts of a page?

Is it possible to fine-tune directives to Google to such an extent that it will ignore part of a page, yet still index the rest?
There are a couple of different issues we've come across which would be helped by this, such as:
RSS feed/news ticker-type text on a page displaying content from an external source
users entering contact phone etc. details who want them visible on the site but would rather they not be google-able
I'm aware that both of the above can be addressed via other techniques (such as writing the content with JavaScript), but am wondering if anyone knows if there's a cleaner option already available from Google?
I've been doing some digging on this and came across mentions of googleon and googleoff tags, but these seem to be exclusive to Google Search Appliances.
Does anyone know if there's a similar set of tags to which Googlebot will adhere?
Edit: Just to clarify, I don't want to go down the dangerous route of cloaking/serving up different content to Google, which is why I'm looking to see if there's a "legit" way of achieving what I'd like to do here.

What you're asking for, can't really be done, Google either takes the entire page, or none of it.
You could do some sneaky tricks though like insert the part of the page you don't want indexed in an iFrame and use robots.txt to ask Google not to index that iFrame.

In short NO - unless you use cloaking with is discouraged by Google.

Please check out the official documentation from here
http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html
Go to section "Excluding Unwanted Text from the Index"
<!--googleoff: index-->
here will be skipped
<!--googleon: index-->

Found useful resource for using certain duplicate content and not to allow index by search engine for such content.
<p>This is normal (X)HTML content that will be indexed by Google.</p>
<!--googleoff: index-->
<p>This (X)HTML content will NOT be indexed by Google.</p>
<!--googleon: index>

At your server detect the search bot by IP using PHP or ASP. Then feed the IP addresses that fall into that list a version of the page you wish to be indexed. In that search engine friendly version of your page use the canonical link tag to specify to the search engine the page version that you do not want to be indexed.
This way the page with the content that do want to be index will be indexed by address only while the only the content you wish to be indexed will be indexed. This method will not get you blocked by the search engines and is completely safe.

Yes definitely you can stop Google from indexing some parts of your website by creating custom robots.txt and write which portions you don't want to index like wpadmins, or a particular post or page so you can do that easily by creating this robots.txt file .before creating check your site robots.txt for example www.yoursite.com/robots.txt.

All search engines either index or ignore the entire page. The only possible way to implement what you want is to:
(a) have two different versions of the same page
(b) detect the browser used
(c) If it's a search engine, serve the second version of your page.
This link might prove helpful.

There are meta-tags for bots, and there's also the robots.txt, with which you can restrict access to certain directories.

Is a deep directory structure a bad thing for SEO?

a friend of mine told me that the company he works at are redoing their SEO for their large website. Large == both number of pages and traffic they get a day.
Currently they have a (quote) deeply nested site , which i'm assuming means /x/y/z/a/b/c.. or something. I also know it's very unRESTful from some of the pages i've also seen -> eg. foo.blah?a=1&b=2&c=3......z=24 (yep, lots of crap in the url).
So updating their SEO sounds like a much needed thing.
But, they are going flat. I mean -> totally flat. eg. /foo-bar-pew-pew-abc-article1
This scares the bollox out of me.
From what he said (if i understood him right), each - character doesn't mean a new heirachial level.
so /foo-bar-pew-pew-abc-article1 does not mean /foo/bar/pew/pew/abc/article1
A space could be replace by a -. A + represents a space, but only if the two words are suppose to be one word (whatever that means). ie. Jean-Luke will be jean+luke but if i had a subject like 'hello world, that would be listed ashello-world`.
Excuse me while i blow my head up.
Is this just mean or is it totally silly to go completly flat. To mean, I was under the impression that when SEO people say keep it as flat as possible, they are trying to say keep it to 1 or 2 levels. 4 is the utter max=.
Is this me or is a flat heirachy a 'really really good thing' for seo ... for MEDIUM and LARGE sites (lots of resources, not necessairly lots of hits/page views).

Well, let's take a step back and look at what SEO is supposed to accomplish; it's meant to help a search engine identify quality, relevant content for users based on key phrases and terms.
Take, for example, the following blog URLs:
* http://blog.example.com/articles/2010/01/20/how-to-improve-seo/
* http://blog.example.com/how-to-improve-seo/
Yes, one is deep and the other is flat; but the URL structure is important for two reasons:
URL terms and phrases are high-value targets for determining relevance of a page by a search engine
A confusing URL may immediately force a user to skip your link in the search results
Let's face it: Google and other search engines can associate even the worst URLs with relevant content.
Take, for example, a search for "sears kenmore white refrigerator" in Google: http://www.google.com/search?q=sears+kenmore+white+refrigerator&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a.
Notice the top hit? The URL is http://www.sears.com/shc/s/p_10153_12605_04665802000P , and yet Google replaces the lousy URL with www.sears.com › Refrigerators › Top Freezers. (Granted, 2 results down is the true URL.)
If your goal for SEO is optimized organic relevance, then I would wholeheartedly recommend generating either key/value pairs in the URL, like www.sears.com/category/refrigerators/company/kenmore (meh), or phrase-like URLs like www.sears.com/kenmore/refrigerators/modelNumber. You want to align your URLs with the user's search terms and phrases to maximize your effort.
In the end, if you offer valuable content and you structure your content and site properly, the search engines will accurately gather it. You just need to help them realize how specific and authoritative your content is. :)

Generally the less navigation to reach content the better. But with a logical breadcrumb strategy and well thought out deep linking the excess of directory depth can be managed and not hurt seo and the visibility in search.
Remember that Google is trying to return the most relevant link and the best user experience, so if your site has 3 urls coming up for the same search term and it take 2 or 3 exits to find the appropriate content, Google will read that as bad and start lowering all of your urls in SERPs.
You have to consider how visitors will find your content - not navigate it. Think content discovery and just navigation.
HTH

Flat or deeply nested really shouldn't affect the SEO. The key part is how those individual pages are linked to will determine how they get ranked. I did write some basic stuff on this years ago see here, but essentially if pages are not buried deeply within a site, i.e. it takes several clicks (or links from Google's perspective) then they should rank fairly much the same in either case. Google used to put a lot more weight on keywords in URL's but this has been scaled back in more recent algorithm changes. It helps to have keywords there, but its no longer the be-all and end-all.
What you/they will need to consider are the following two important points:
1) How will the URL structure be perceived by the users of the site? Will they they be able to easily navigate the site and not have to rely on the URL structure in the address bar?
2) In making navigational changes such as this its vitally important to set-up redirects from old url's. Google, hates 404's and they should either put in 410 (Gone) HTTP responses for pages are no longer valid or 301 HTTP response for permanent redirects (with new url).
In making any large changes such as this you can save loads of time getting the site indexed successfully by utilising XML sitemaps and Google's webmaster console.

Efficient way to add Canonical tags

If the value of the href for Canonical tags is populated via javascript function, would that affect the Search engine indexing (as search engines ignore javascript) ?

I'm not sure I fully understand the question as you worded it. But here's my take:
Canonical tags are used to make sure that Google (et al) knows that the same page with different URLs are, in fact, the same page.
This saves Google a lot of processing time, because it will treat those pages as a single page instead of trying to index every one of them. Also, your domain's search engine ranking will probably go up because Google doesn't think you're duplicating content.
For any page that could be duplicated because of parameters, you should include a canonical link of the page you want known as the original. So yes, it would help in your case. Though you cannot put a canonical link on someone else's domain pointing to your domain, so putting it on a partner's page would not have the intended consequences.
If you want more information, read up here: Google Webmaster Central: Specify Your Canonical

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas