SEO and stripping UTM parameters with Varnish [closed] - seo

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
Recently I had a problem where a client of mine sent out an email with MailChimp containing UTM (Google) and MC (Mailchimp) parameters in the URL.
Since the link was pointing to a Magento 2 site with Varnish running, I had to come up with a fix for that, otherwise Varnish would create a lot of different entries for the "unique" URL's.
Now, by using this adjusted snippet in the Varnish .vcl, I was able to strip these parameters:
if (req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|mc_[a-z]+|utm_[a-z]+)=") {
set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|mc_[a-z]+|utm_[a-z]+)=[-_A-z0-9+()%.]+&?", "");
set req.url = regsub(req.url, "[?|&]+$", "");
}
And this works pretty good, it strips the URL.
BUT, I can't seem to find a correct explanation if this in any way will affect SEO, or Analytics tracking - I tried Googling it as much as I could, but cannot find a clear explanation.
Anyone here with a solution and / or explanation?

This will not affect SEO in any way. Those links are typically added by Google itself (Analytics, Adwords) or email marketing campaigns which use the same. The search engines will not see those links so there's no impact on SEO whatsoever.
The parameters mentioned are used by Javascript libraries and never by the PHP scripts, so what you did for better cacheability is correct. Browser's Javascript engines will still see them because they have access to full URL. The PHP backend (Magento) does not need them.

Related

Search engine page creating [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I noted that Google use web page content when they index pages for SEO purposes. Therefore, what I did was I created the web pages and I used lot of keyword on the web pages. Then I applied the background color to the above keywords to show users.
Question is do they block this kind of pages?
Search Engine Optimization (SEO) is something you really need an expert for these days. The days of having some keywords and meta-data only have long gone, so you need to keep up to date with current SEO tricks to get your site up the Google ranking. You can also check the Alexa rankings for your website.
Take a look at the SEO guidelines from Google here
Take a look at some pointers here and here, but you really need to invest some time and research into the best practices.
You should also make your site as accessible as possible, this will make the site easier to spider, there are some tools here to look at and there's a site here you can use.

what is the meaning of Disallow: /search in robot.txt? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
I found this in my root.txt file
Disallow: /search
what does it mean?
If you're talking about a robots.txt file, then it indicates to web crawlers that they are to avoid going into URLs beginning with /search on that host. Your robots.txt file is related to the Robots Exclusion Standard.
You mention "robot.txt" in the question title and "root.txt" in the body. If this is indeed a robots.txt file, it needs to be named "robots.txt", otherwise it has no effect at all.
It instructions robots/crawlers/spiders that they shouldn't access anything within that folder, or variants of that URL, such as the following examples:
/search
/search?term=x
/search/page/
/search/category=y&term=x
/search/category-name/term/
With regards to the comments above on how this affects indexation (whether or not a search engine or other entity will catalogue the URL), none of them are quite correct.
It should be noted that instructions in a robots.txt file are crawl directives, not indexation directives. Whilst compliant bots will read the robots.txt file prior to requesting a URL and determine whether or not they're allowed to crawl it, disallow rules do not prevent indexation (nor even, in the case of non-compliant bots, prevent access/crawling/scraping).
You'll see instances periodically of search results in Google with a meta description alluding to the page having been included though inaccessible; something along the lines of "we weren't able to show a description because we're not allowed to crawl this page". This typically happens when Google (or w/e) encounters a disallowed URL, but believes that the URL should still be catalogued - in Google's case, this typically occurs when a highly linked and/or authoritative URL is disallowed.
To prevent indexation, you're much better off using an on-page meta tag, or even an x-robots http header (particularly useful for non-page resources, such as PDFs, etc).
"Disallow: /search" tells search engine robots not to index and crawl those links which contains "/search" For example if the link is http://yourblog.blogspot.com/search.html/bla-bla-bla then robots won't crawl and index this link.

SEO, Remove OLD pages from Google? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
A site I worked on recently used to be joomla based and had a ton of articles within it and the entire business is different.
After clearing out the site (FTP) and starting fresh and finally finishing all was done however, the sites rankings on google are plagued by old pages which no longer exist. Furthermore, Google seems to think these pages do exist.
I was under the impression that after recrawling the site (at whatever time it saw fit) it would recognise those pages are now non existent and replace them when it could.
Its driving me insane. There are 100's of pages, so I can't put in requests to remove them all, won't they ever automatically be removed?
It will take a while but they will eventually stop looking for those pages. They keep trying for a little while under the assumption that their being missing is an error and they will return. If you're not going to do removal requests then you will have to simply wait it out.
Make sure, all old pages are returning 404 or 410 status. If Googlebot encounters multiple times 404/410 status, it will remove them from index.
Also suggest to check if any of those pages are having backlinks. If Googlebot keeps encountering backlinks to outdated page, it might still hold them in their search index. If there are some pages with valid backlink suggest to 301 (re-direct) them to valid pages.
Try the answer using Google WebMaster Tools you may find here.

Is serving a bot-friendly page to google-bot likely ot adversly affect SEO? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a site which has a homepage that contains a great deal of javascript. I am concious that this isn't great for mobile clients, javascript-less browsers and crawlers/bots. The page uses propper <noscript /> alternatives, alt attributes, etc.
The user-agent can easily be sniffed to serve up a the page content without Javascript (there is a non-javascript version of the content already on the site), but I don't want to be seen to be cheating to crawlers (google-bot).
Humans that use mobile-clients and javascript-less browsers would surely appreciate a tailored version (given an option to switch back to the full version if they want). Bots might think they're being cheated.
Finally, the site has been indexed very well so far, so I am tempted to not tailor it for google-bot, just for humans that use mobile clients and javascript-less browsers. It seems like a safer option.
If you serve different content to the search engines then you do your users you are cloaking and definitely in violation of Google's terms of service.
The proper way to handle generated with JavaScript is to use progressive enhancement. This means that all of your content is available without JavaScript being required to fetch or display it. Then you enhance that content using JavaScript. This way everyone has access to the same content but users with JavaScript get a better experience. This is good usability and good for SEO.

How should google crawl my blog? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I was wondering how (or if) I should guide Googlebot through my blog. Should I only allow visiting pages with single entries or should it also crawl the main page (which also has full entries)? My concern is that the main page changes when I add a new post and google keeps the old version for some time. I also find directing people to the main page annoying - you have to look through all the post before you find the one you're interested in. So what is the proper way to solve this issue?
Why not submit a sitemap with the appropriate <changefreq> tags -- if you set that to "always" for the homepage, the crawler will know that your homepage is very volatile (and you can have accurate change freq for other URLs too, of course). You can also give a lower priority to your homepage and a higher one to the pages you prefer to see higher in the index.
I do not recommend telling crawlers to avoid indexing your homepage completely, as that would throw away any link juice you might be getting from links to it from other sites -- tweaking change freq and priority seems preferable.
Make a sitemap.xml and regenerate it periodically. Check out Google Webmaster Tools.