Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm going to have a site where content remains on the site for a period of 15 days and then gets removed.
I don't know too much about SEO, but my concern is about the SEO implications of having "content" indexed by the search engines, and then one day it suddenly goes and leaves a 404.
What is the best thing I can do to cope with content that comes and goes in the most SEO friendly way possible?
The best way will be to respond with HTTP Status Code 410;
from w3c:
The requested resource is no longer available at the server and no
forwarding address is known. This condition is expected to be
considered permanent. Clients with link editing capabilities SHOULD
delete references to the Request-URI after user approval. If the
server does not know, or has no facility to determine, whether or not
the condition is permanent, the status code 404 (Not Found) SHOULD be
used instead. This response is cacheable unless indicated otherwise.
The 410 response is primarily intended to assist the task of web
maintenance by notifying the recipient that the resource is
intentionally unavailable and that the server owners desire that
remote links to that resource be removed. Such an event is common for
limited-time, promotional services and for resources belonging to
individuals no longer working at the server's site. It is not
necessary to mark all permanently unavailable resources as "gone" or
to keep the mark for any length of time -- that is left to the
discretion of the server owner.
more about status codes here
To keep the traffic it may be an option to not delete but archive the old content. So it remains accessible by its old URL but linked at some deeper points in the archive on your site.
If you really want to delete it then it is totally ok to return with 404 or 410. Spiders understand that the resource is not available anymore.
Most search engines use something called a robot.txt file. You can specify which URLs and Paths you want the search engine to ignore. So if all of your content is at www.domain.com/content/* then you can have Google ignore that whole branch of your site.
Related
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
We have a live website with URL e.g. abc.com but when the site fully loads, it gets redirected to abc.com/home.
I indexed all the pages to google search console, under coverage it says,
Duplicate without user-selected canonical and the page is not under valid URL's. We have not added the URL "abc.com/home" in the sitemap that we have submitted to the search console.
how do I deal with "Duplicate without user-selected canonical" so that I get good rankings on SEO?
Google maintains a Support Article listing all the different ways you can specify which of your links to treat as the "canonical" or "main" version when google detects that your site has multiple pages that are duplicates (that is if they are in fact actually duplicates, if the pages aren't "meant" to be duplicates, find out why and fix it).
Reasons you may see duplicate URL's:
There are valid reasons why your site might have different URLs that point to the same page, or have duplicate or very similar pages at different URLs. Here are the most common:
To support multiple device types:
https://example.com/news/koala-rampage
https://m.example.com/news/koala-rampage
https://amp.example.com/news/koala-rampage
To enable dynamic URLs for things like search parameters or session IDs:
https://www.example.com/products?category=dresses&color=green
https://example.com/dresses/cocktail?gclid=ABCD
https://www.example.com/dresses/green/greendress.html
If your blog system automatically saves multiple URLs as you position the same post under multiple sections.
https://blog.example.com/dresses/green-dresses-are-awesome/
https://blog.example.com/green-things/green-dresses-are-awesome/
If your server is configured to serve the same content for www/non-www http/https variants:
http://example.com/green-dresses
https://example.com/green-dresses
http://www.example.com/green-dresses
If content you provide on a blog for syndication to other sites is replicated in part or in full on those domains:
https://news.example.com/green-dresses-for-every-day-155672.html (syndicated post)
https://blog.example.com/dresses/green-dresses-are-awesome/3245/
(original post)
I should also add an example for analytics campaigns. In my case, google is detecting url's with third-party (not google) campaign URL parameters as separate (and therefore duplicate) pages.
Telling Google about your Canonical pages:
The support article also includes a table and details on various methods for telling google about canonical pages roughly in importance order:
add a <link> tag to the HTML of all duplicate pages with the rel=canonical attribute to point to the new URL (i.e. googles example: <link rel="canonical" href="https://example.com/dresses/green-dresses" />)
rel=canonical HTTP headers. (i.e. Link: <http://www.example.com/downloads/white-paper.pdf>; rel="canonical" )
Submit your canonical URL's in a sitemap
Use 301 (permanent) redirects for URLs that have permanently moved so that the old and new locations aren't marked as duplicates of each other
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What's the best/restful way to design an API endpoint for checking existence of resources?
For example there is a user database. While new user tries to sign up I want to check if email has been used on-the-fly.
My idea is: POST /user/exists and payload would be something like {"email": "foo#bar.com"}. The response would be either 200 OK or 409 Conflict.
Is this a proper way?
Thanks!
HEAD is the most effecient for existence checks:
HEAD /users/{username}
Request a user's path, and return a 200 if they exist, or a 404 if they don't.
Mind you, you probably don't want to be exposing endpoints that check email addresses. It opens a security and privacy hole. Usernames that are already publicly displayed around a site, like on reddit, could be ok.
I believe the proper way to just check for existence is to use a HEAD verb for whatever resource you would normally get with a GET request.
I recently came across a situation where I wanted to check the existence of a potentially large video file on the server. I didn't want the server to try and start streaming the bytes to any client so I implemented a HEAD response that just returned the headers that the client would receive when doing a GET request for that video.
You can check out the W3 specification here or read this blog post about practical uses of the HEAD verb.
I think this is awesome because you don't have to think about how to form your route any differently from a normal RESTful route in order to check for the existence of any resource, Whether that's a file or a typical resource, like a user or something.
GET /users?email=foo#bar.com
This is a basic search query: find me the users which have the email address specified. Respond with an empty collection if no users exist, or respond with the users which match the condition.
I prefer:
HEAD /users/email/foo#bar.com
Explanation: You are trying to find through all the users someone that are using the e-mail foo#bar.com. I'm assuming here that the e-mail is not the key of your resource and you would like to have some flexibility on your endpoint, because if you need another endpoint to check availability of another information from the user (like username, number, etc) , this approach can fit very well:
HEAD /users/email/foo#bar.com
HEAD /users/username/foobar
HEAD /users/number/56534324
As response, you only need to return 200 (exists, so it's not available) or 404 (not exists, so it's available) as http code response.
You can also use:
HEAD /emails/foo#bar.com
if the HEAD /users/email/foo#bar.com conflict with an existing rest resource, like a GET /users/email/foo#bar.com with a different business rule. As described on Mozilla's documentation:
The HEAD method asks for a response identical to that of a GET request, but without the response body.*.
So, have a GET and HEAD with different rules is not good.
A HEAD /users/foo#bar.com is a good option too if the e-mail is the "key" of the users, because you (probably) have a GET /users/foo#bar.com.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
My staging site is showing up in search results, even though I've specified that I don't want the site crawled. Here's the contents of my robots.txt file for the staging site:
User-agent: Mozilla/4.0 (compatible; ISYS Web Spider 9)
Disallow:
User-agent: *
Disallow: /
Is there something I'm doing wrong here?
Your robots.txt tells Google not to crawl/index your page's content.
It doesn't tell Google not to add your URL to their search results.
So if your page (which is blocked by robots.txt) is linked somewhere else, and Google finds this link, it checks your robots.txt if it is allowed to crawl. It finds that it is forbidden, but hey, it still has your URL.
Now Google might decide that it would be useful to include this URL in their search index. But as they are not allowed (per your robots.txt) to get the page's metadata/content, they only index it with keywords from your URL itself, and possibly anchor/title text that someone else used to link to your page.
If you don't want your URLs to be indexed by Google, you'd need to use the meta-robots, e.g.:
<meta name="robots" content="noindex">
See Google's documentation: Using meta tags to block access to your site
You're robots file looks clean, but remember Google, Yahoo, Bing, etc. etc. do not need to crawl your site in order to index it.
There is a very good chance the Open Directory Project or a less polite bot of some kind stumbled across it. Once someone else finds your site these days it seems everyone gets their hands on it. Drives me crazy too.
A good rule of thumb when staging is:
Always test your robots file for any oversights with relation to syntax before posting it on your production site. Try robots.txt Checker, Analyze robots.txt, or Robots.txt Analysis - Check whether your site can be accessed by Robots.
2.Password protect your content while staging. Even if its somewhat bogus, put a login and password at your indexes root. Its an extra step for your fans and testers -- but well worth it if you want polite --OR-- unpolite bots out of your hair.
3.Depending on the project you may not want to use your actual domain for testing. Even if I have a static ip - sometimes Ill use dnsdynamic or noip.com to stage my password protected site. So for example, if I want to stage my domain ihatebots.com :) I will simply go to dnsdynamic or noip (theyre free btw) and create a fake domain such as: ihatebots.user32.com or somthingtotallyrandom.user32.com and then assign my ip address to it. This way even if someone crawls my staging project -- my original domain: ihatebots.com is still untouched from any kind of search engine result (so are its records too btw).
Remember there are billions of dollars around the world aimed at finding you 24 hrs a day and that number is ever increasing. Its tough these days. Be creative and always password protect if you can while staging.
Good luck.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a product page in my website which I have added 3 years before.
Now the product production was stopped and the product page was removed from website.
What I did is I started displaying message in the product page telling that the production of the product got stopped.
when some one searches in google for that products the product page which was removed from site shows up first in google search.
The page rank for the product page is also high.
I don't want the removed product page to be shown at the top of search result.
What is the proper method to remove a page from website so that it gets depicted by what ever google have indexed in its table.
Thanks for the reply
Delete It
The proper way to remove a page from a site is to delete the actual file that is been returned to the user/bot when the page is requested. If the file is not on the webserver, any well configured webserver will return a 404 and the bot/spider will choose to remove that from the index in the next refresh.
Redirect It
If you want to keep the good "google juice" or SERP ranking the page has, probably due to any inbound links from external sites, you'd be best to set your websever to do a 302 redirect to a similar (updated product).
Keep and convert
However, if the page is doing so well that it ranks #1 for searches to the entire site, you need to use this to your advantage. Leave the bulk of the copy on the page the same, but highlight to the viewer that the product no longer exists and provide some helpful options to the user instead: tell them about a newer, better product, tell them why it's no longer available, tell them where they can go to get support if they already have the discontinued product.
I am completely agree with above suggestion and want to add just one point.
If you want to remove that page from Google Search Result; just login to Google webmaster tool (you must have verified that website in Google webmaster tool) and add that particular page for index removal request.
Google will de-index that page and it will be removed from Google search rankings.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to create a web site builder. What I'm thinking is to have a one server as the main web server
my concept is as follows
1 - user enters url (http://www.userdomain.com)
2- it masks and redirect to one of my custom domain (http://www.myapp.userdomain.com)
3 - from the custom domain (myapp.userdomain) my application will identify the web site
3 - according to the website, it will render pages
my concerns are,
1 - is this the proper way of doing something like this (online web site builder)
2- since I'm masking the url i will not be able to do something like
'http://www.myapp.userdomain.com/products'
and if the user refresh the page it goes to home page (http://www.myapp.userdomain.com). how to avoid that
3- I'm thinking of using Rails, liquid for this. Will that be a good option
thanks in advance
cheers
sameera
Masking domains with redirects is going to get messy plus all those redirects may not play nice for SEO. Rails doesn't care if you host everything under a common domain name. It's just as easy to detect the requested domain name as it is the requested subdomain.
I suggest pointing all of your end-user domains directly to the IP of your main server so that redirects are not required. Use the the :domain and :subdomain conditions in the Rails router or parse them in your application controller to determine which site to actually render based on the hostname the user requested. This gives you added flexibility later as you could tell Apache or Nginx which domains to listen for and setup different instances of your application as to support rolling upgrades and things like that.
Sounds like this was #wukerplank's approach and I agree. Custom router to look at the domain name of the current request keeps the rest of your application simple.
There will you get some more help by getting site details of existing online site builder you can look upon [wix][1] , [weebly][2] , ecositebuilder and word press and many