I have a Single Page Application with some dynamic content but the Meta tags and stuff don't work when sharing on social sites (for obvious reasons). I was thinking of detecting the user-agent on the server side and rendering a static version of the page when I detect Googlebot or Facebook or others.
Is this good practice? Would it land me in any trouble with any of the social sites/search engines?
I think you may get a ban from Google for this.
I mean, it is not a good idea at all, cause Google will count it as black SEO.
Google wants to see the same that normal user sees.
Related
my two partners and me are about to create a software which automates liking, commenting and following for Instagram with the use of browser simulation (that means that we log into the account of the user through a browser, like google chrome).
Is that kind of automation allowed by Instagram? And if not, is there a possiblity to get aproved?
Yes it's against their terms. I wouldn't bother nor risk it. Instagram is actively suing bot services. Look at the biggest bot service, Instagress - mysteriously shut down entirely.
They're also penalizing accounts that use bots. I run an agency and have seen my clients' engagement mysteriously drop by 50-90% for a seemingly endless amount of time after using bots.
I imagine the purpose of doing it with "browser simulation" like Chrome is to try to avoid detection? Good luck. Instagram is smart and of course has some of the best programmers in the world who know how to combat this type of stuff.
I would say that such operation goes against the terms of user of Instagram. Under "General Description", section 10:
We prohibit crawling, scraping, caching or otherwise accessing any content on the Service via automated means, including but not limited to, user profiles and photos (except as may be the result of standard search engine protocols or technologies used by a search engine with Instagram's express consent).
Since you will be accessing content (and performing actions) via automated means, I would interpret that as a violation of this section.
I was just dabbling a little with the Instagram API and noticed that all the media links (e.g. image / video URLs) that point to the fbcdn are publicly accessible.
I usually try to use signed URLs for user-generated content and was wondering why Instagram apparently is not choosing to do that?
If they would do that, the URLs that are used to render the webpage or the app for example could only be valid for a couple of minutes so if someone should collect them (for example to scrape user profiles) they wouldn't be accessible anymore afterwards? I know this is not perfect but it feels like it would at least give the privacy aspect a little extra notch...
before responsive design we need mobile specific sitemaps, but with responsive design they were not needed.
But with introduction of Accelerate Mobile Pages (AMP), we are again having mobile specific URLs, so my questions are:
Do we need Separate (Mobile) Sitemap for AMP pages?
if yes, then what schemas we should use?
old schema http://www.google.com/schemas/sitemap-mobile/1.0? or something new?
No need providing you have a rel="amphtml" link in your regular page to tell crawlers the AMP HTML version as discussed here:
https://www.ampproject.org/docs/guides/discovery.html
Similarly your AMP pages should have a rel="canonical" link to point to real page, to avoid search engines thinking you have duplicate content.
In fact for Google, in the Google Search Console for you site there is an AMP section (under Search Appearance section) that shows all AMP pages it has found and if there are any problems with them.
As BazzaDP said their no need for separate sitemap.But you need to add rel="amphtml" to the top of the page. But it is good to have separate site map for AMP page, the major reason is Google crawler will learn how your site interacts having a separate sitemap for amp will make it easy for Google Crawler to detect and display in search result though it is not necessary. My opinion if making sitemap for amp page is difficult for your stack leave it, If it not do it. As this will allow other search engine to detect easily. Creating separate sitemap doesn't give you any advantage.
As for your question, there is no need for it.
Will Web API based website suffer SEO problems?
Given that all content of a page is being pulled by javascript...
will search engine crawlers be able to get the page content?
I heard that crawlers do not always support javascript or perform javascript when crawling on a page.
It's not Web API that is bad for SEO, it's choosing an architecture where you use a web browser to navigate to empty HTML pages and then use JS to pull in the data. ASP.NET Web API does not have to be used that way.
You can't blame a hammer for building a bad house.
Depends.
Will ALL search engine crawlers be able to get the page content, I do not know.
Do the search engine crawlers that matter get the page content, yes.
Google and Bing combined own the search market, both can index content pulled in by javascript (and probably others as well).
Robert Scavilla on how content is indexed.
Search Engine Land on google executing javascript for indexing.
I have a site which has been developed completely in flash. Now the site owners do not want to shift to a more text/html based site. So am planning to create an alternative html/text based site which the googlebot will get redirected to. (By checking the useragent). My question is that is this allowed officially by google?
If not then how come there are many subscription based sites which display a different set of data to google compared to the users? Is that allowed?
Thank you very much.
I've dealt with this exact scenario for a large ecommerce site and Google essentially ignored the site. Google considers it cloaking and addresses it directly here and says:
Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.
Instead, create an ADA compliant version of the website so that users with screen readers and vision aids can use your web site. As long as there as link from your home page to your ADA compliant pages, Google will index them.
The official advice seems to be: offer a visible link to a non-flash version of the site. Fooling the googlebot is a surefire way to get in trouble. And remember, Google results will link to the matching page! Do not make useless results.
Google already indexes flash content so my suggestion would be to check how your site is being indexed. Maybe you don't have to do anything.
I don't think showing an alternate version of the site is good from a Google perspective.
If you serve up your page with the exact same address, then you're probably fine. For example, if you show 'http://www.somesite.com/' but direct googlebot to 'http://www.somesite.com/alt.htm', then Google might direct search users to alt.htm. You don't want that, right?
This is called cloaking. I'm not sure what the effects of it are but it is certainly not whitehat. I am pretty sure Google is working on a way to crawl flash now so it might not even be a concern.
I'm assuming you're not really doing a redirect but instead a PHP import or something similar so it shows up as the same page. If you're actually redirecting then it's just going to index the other page like normal.
Some sites offer a different level of content -- they LIMIT the content, they don't offer alternative and additional content. This is done so it doesn't index unrelated things generally.