If we develop a site with SEO compatible, is it possible to use Session Variables?
If not what is the alternative?
Thanks many.
Best regards.
A search engine indexes pages on your site based on their URLs. If your URLs are not dependent on the unique Session ID assigned to every request, then a spider should not have a problem indexing your site.
That said, the content of your pages also matters. If the page content relies heavily on Session variables (or Viewstate params), you might have a problem getting that page indexed. The best way is to have unique and static URLs for each section of your site.
From Googles Webmaster Guide:
"Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page."
So I don't think tht it is a good idea to require content that you want indexed to require a session. It depends on what your requirements are as to possible alternatives/workarounds.
You should use cookies, because they are machine dependent. Session identifiers in the URL are very unsave (session-stealing), because you lose your session if you send the url to somebody.
I agree with Cerebrus. Just make sure that
You have unique and static URLs. If you don't have unique urls you will loose the links to that page.
You have the same title for all states of a page
You target the same keywords for all states of a page
Session variables can be passed in a HTTP request as a cookie, a POST variable, or in the URL.
Search engines do not support cookies or POST variables, and it they try to avoid pages with session variables in the URL.
You can use cookie or POST based session tracking for your users, but be aware that requests from search engines will always appear as the start of a new session.
URL is independent on the unique Session ID assigned to every request, then the web spider should not have a problem indexing your site on google.
Related
so I made a mistake in my application which caused thousands of URLs to be indexed by google with the session id appended. What should I do to remove all those session id's from the google index? I'd like to only have the page indexed minus the session id.
You can fix this with an edit to your robots.txt. Also, there's a webmasters stackexchange -- consider checking there for your answer in the future, I know they have an SEO tag.
robots.txt stops crawling, it doesnt stop urls being indexed.
You could use a canonical tag to consolidate the urls to their parent url and/or parameter handling in webmaster tools.
When I use LIMIT to make pages of results, how do we usually know the offset i.e. which page should be retrieved for each request?
Via cookies?
Via a query string parameter, traditionally. URLs typically include a ?page=3 to request page 3, like you'll see all over Stack Overflow: https://stackoverflow.com/questions?page=2&sort=newest
This is something you absolutely should not do through cookies. The URL should include everything necessary to navigate to the given page. Consider a user bookmarking page three of your results, or trying to link somebody else to the page they're looking at: Using cookies to store pagination data breaks these situations completely.
Usually via request parameters in action frameworks (RoR, ZF, Cake, Django) and via state of the session in component frameworks (Prado, JSF, ASP.NET). Session is usually associated with request by a cookie.
Using session to store current page is quite common in business-oriented applications, where state of the gui might be very complicated and the practically of being able to bookmark a page - limited.
I have been advised by an SEO consultant to add the "google-site-verification" meta tag to every page of my site. This is to make sure that my pages are indexed by google.
However, I am reluctant to do this for a couple of reasons
1) My site is already verified using an alternative method of verification -by hosting a html verification file on the server.
2) I recall reading an article indicating that this meta tag does not impact crawling or page rank.
I do have some pages that are not indexed.
An example is
http://www.contractsforgeeks.com/TechJobs/Florida/Tampa.aspx
But I am making the assumption that adding this meta tag will not help the page get indexed.
Is there any value in adding the site verification meta tag to each page instead of uploading a single html verification file?
For example, what happens if I accidentally delete the verification file from my site (some time after the site has already been verified) . Does it need to be need to be re-verified. Or is the verification process a one time deal? In which case, it may be safer to include in each page (even though it does not help indexing?)
One method is enough to verify your site. If you choose the HTML file method, you don't need to put meta tag "google-site-verification" to every page.
Moreover, as assumption, this meta tags doesn't help your site to be indexed by Google. It doesn't impact crawling or PageRank.
If you want seeing your site indexed, you can submit to Google Webmaster Tools a sitemap.xml and put more links from other sites pointing to yours.
And if you delete the verification HTML file from your site, you'll need to verify again your site, this process is not a one time deal.
It does not help indexing. It does not help ranking. Its only purpose is to verify that you are the one claiming to be when registering at Google Webmaster Tools.
If you delete the verification, you'd need to verify your domain again. Otherwise it would be possible to still control a domain at GWT, although the owner changed in the meantime.
If you need to argue against the use of the corresponding meta element, you could point out that it could actually lower your ranking, -- of course this would have no real, measurable effect, only in theory! -- because Google prefers faster-loading pages.
I understand the og:url meta tag is the canonical url for the resource in the open graph.
What strategies can I use if I wish to support 301 redirecting of the resource, while preserving its place in the open graph? I don't want to lose my likes because i've changed the URLs.
Is the best way to do this to store the original url of the content, and refer to that? Are there any other strategies for dealing with this?
To clarify - I have page:
/page1, with an og:url of http://www.example.com/page1
I now want to move it to
/page2, using a 301 redirect to http://www.example.com/page2
Do I have any options to avoid losing the likes and comments other than setting the og:url meta to /page1?
Short answer, you can't.
Once the object has been created on Facebook's side its URL in Facebook's graph is fixed - the Likes and Comments are associated with that URL and object; you need that URL to be accessible by Facebook's crawler in order to maintain that object in the future. (note that the object becoming inaccessible doesn't necessarily remove it from Facebook, but effectively you'd be starting over)
What I usually recommend here is (with examples http://www.example.com/oldurl and http://www.example.com/newurl):
On /newpage, keep the og:url tag pointing to /oldurl
Add a HTTP 301 redirect from /oldurl to /newurl
Exempt the Facebook crawler from this redirect
Continue to serve the meta tags for the page on http://www.example.com/oldurl if the request comes from the Facebook crawler.
No need to return any actual content to the crawler, just a simple HTML page with the appropriate tags
Thus:
Existing instances of the object on Facebook will, when clicked, bring users to the correct (new) page via your redirect
The Like button on the (new) page will still produce a like of the correct object (but at the old URL)
If you're moving a lot of URLs around or completely rewriting your URL scheme you should use the new URLs for new articles/products/etc, but you'll need to keep the redirect in place if you want to retain likes, comments, etc on the older content.
This includes if you're changing domain.
The only problem here is maintaining the old URL -> new URL mapping somewhere in your code, but it's not technically difficult, just an additional thing to maintain in the future.
BTW, The Facebook crawler UA is currently facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
I'm having the same problem with my old sites. Domains are changing, admins want to change urls for seo etc
I came to conclusion its best to have some sort uniqe id in db just for facebook - from the beginning. For articles for example I have myurl.com/a/123 where 123 is ID of the article.
Real url is myurl.com/category/article-title. Article can then be put in different category, renamed etc with extensive logic for 301 redirects behind it. But the basic fb identifier can stay the same for ever.
Of course this is viable only when starting with a fresh site or when implementing fb comments for the first time.
Just an idea if you can plan ahead :) Let me know what you think.
I'm doing a very rudimentary tracking of page views by logging url, referral codes, sessions, times etc but finding it's getting bombarded with robots (Google, Yahoo etc). I'm wondering what an effective way is to filter out or not log these statistics?
I've experimented with robot IP lists etc but this isn't foolproof.
Is there some kind of robots.txt, htaccess, PHP server-side code, javascript or other method(s) that can "trick" robots or ignore non-human interaction?
Just to add - a technique you can employ within your interface would be to use Javascript to encapsulate the actions that lead to certain user-interaction view/counter increments, for a very rudimentary example, a robot will(can) not follow:
Chicken Farms
function viewItem(id)
{
window.location.href = 'www.example.com/items?id=' + id + '&from=userclick';
}
To make those clicks easier to track, they might yield a request such as
www.example.com/items?id=4&from=userclick
That would help you reliably track how many times something is 'clicked', but it has obvious drawbacks, and of course it really depends on what you're trying to achieve.
It depends on what you what to achieve.
If you want search bots to stop visiting certain paths/pages you can include them in robots.txt. The majority of well-behaving bots will stop hitting them.
If you want bots to index these paths but you don't want to see them in your reports then you need to implement some filtering logic. E.g. all major bots have a very clear user-agent string (e.g. Googlebot/2.1). You can use these strings to filter these hits out from your reporting.
Well the robots will all use a specific user-agent, so you can just disregard those requests.
But also, if you just use a robots.txt and deny them from visiting; well that will work too.
Don't redescover the weel!
Any statistical tool at the moment filters robots request. You can install AWSTATS (open source) even if you have a shared hosting. If you won't to install a software in your server you can use Google Analytics adding just a script at the end of your pages. Both solutions are very good. In this way you only have to log your errors (500, 404 and 403 are enough).