I have a Meteor project that has the spiderable package added to it. If I load the page normally and then do view page source I don't get anything in the <body> tag. If I enter the url and then add the ugly ?_escaped_fragment_= at the end and look at the page source again - everything shows up as it should. I think this means that the spiderable package is working and is correctly rendering the HTML with phantomJS. So the question now is, how do I make the regular URL without the ugly part become crawlable ? I want to submit the site to google Adsense and the ugly url is not accepted, trying to see what google sees with the http://www.feedthebot.com/tools/spider/ tool results in an empty result. Any suggestions/helps ?
Edit 1: Adding the google crawl result from Google Webmaster
Date: Saturday, April 5, 2014 at 8:13:45 PM PDT
Googlebot Type: Web
Download Time (in milliseconds): 304
HTTP/1.1 200 OK
Vary: Accept-Encoding
Content-Type: text/html; charset=utf-8
Date: Sun, 06 Apr 2014 03:13:58 GMT
Connection: keep-alive
Transfer-Encoding: chunked
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="/7a2b57749c356bfba1728bdae88febf653d0c6ca.css?meteor_css_resource=true">
<script type='text/javascript'>__meteor_runtime_config__ = {"meteorRelease":"0.7.2","PUBLIC_SETTINGS":{"ga":{"account":"UA-********-1"}},"ROOT_URL":"http://****.***","ROOT_URL_PATH_PREFIX":"","autoupdateVersion":"8213872485a2cc1cff2745d78330d7c8db8d8899"};</script>
<script type="text/javascript" src="/caefe2b2510e562c5e310f649c4ff564ddb6b519.js"></script>
<script type='text/javascript'>
if (typeof Package === 'undefined' ||
! Package.webapp ||
! Package.webapp.WebApp ||
! Package.webapp.WebApp._isCssLoaded())
document.location.reload();
</script>
<meta name="fragment" content="!">
</head>
<body>
</body>
</html>
Edit 2:
For now it seems that Google indexes the site correctly, but adsense doesn't use the same policies, which is the core of this issue for me. Meteor + spiderable + phantomjs = incompatibe for AdSense = but...compatible for indexing by Google.
The issue appears to be simply how Google is reporting the crawling in the Webmaster Tools. After some testing with a dummy app, it appears that even though the Google Webmaster Tools reports that it fetched the empty page, the site still gets crawled, indexed, and cached properly on Google.
So for some reason, it shows the result for the pretty URL, even though the ugly URL is the actual page getting crawled, as expected. This doesn't seem like it would be a problem that is specific to Meteor, but rather with the Webmaster Tools. The spiderable package appears to be working as expected.
After all, http://meteor.com, http://docs.meteor.com, and http://atmosphere.meteor.com are all running Meteor and they are indexed/cached fine on Google.
One way you can verify that your site is being crawled without submitting it to be indexed is to look at the thumbnail of the site on your Webmaster Tools homepage:
https://www.google.com/webmasters/tools/home?hl=en
If you're running Apache you could setup a *mod_rewrite* rewrite rule that would push every 404 error to a script. The script would would check if the requests was pointing to special folder (like the 'content' folder below) and try to pull the content for the corresponding ugly url.
The change to the .htaccess file would look something this:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) /director.php?q=$1 [L,QSA]
The director.php script would work something like this:
Check if the 404 request is targeting a specfic folder like 'content'
Example: http: //myplace.com/content/f-re=feedddffv
Convert the unknown URL into an ugly URL and use CURL to get and serve the content
http: //myplace.com/content/f-re=feedddffv becomes http: //myplace.com/?f-re=feedddffv
The the script uses CURL to pull the ugly url's content into a varible
Echo the content to the viewer
You also need to create a site map for the search engine with the new pretty links. You can done something similar in IIS with URL rewriter. Using something like CURL can slow so try to keep your sitemap away from human eyes if possible.
Related
I'm trying to optimise the speed of some web page by adding the Link HTTP header to instruct the browser to preload some assets and data that is used on the web page. However, most visitors will get redirected first, before they get to the actual HTML page.
Does it make sense to already include the Link header in the response that contains the redirect? Or will the browser not use those assets for the redirected page anyway, so can I better leave them out to prevent unnecessary requests?
Example
The user first opens /initial, and gets redirected from there:
GET /initial HTTP/1.1
HTTP/1.1 302 Found
Link: </static/css/bundle.css>; rel=preload; as=style
Location: /final
The browser then follows the redirect to /final:
GET /final HTTP/1.1
HTTP/1.1 200 OK
Link: </static/css/bundle.css>; rel=preload; as=style
<html>
<link rel="stylesheet" href="/static/css/bundle.css">
...
So, does it make sense that the Link header is already included in the first 302 response, or can I better leave it out there?
I have a ecommerce site with hundreds of products. I recently changed permalinks and their base. Using Wordpress and Woocommerce plugin, I removed /shop/%product-category% from the URL. However, my old URLs are still active. Check out the following example:
greenenvysupply.com/shop/accessories/gro1-1-3mp-usb-led-digital-microscope-10x-300x/
greenenvysupply.com/gro1-1-3mp-usb-led-digital-microscope-10x-300x/
The first URL is old. Why does it still work? Shouldn't I get a 404 page?
Here is code from page source related to the canonical:
href="https://www.greenenvysupply.com/shop/feed/" />
<link rel='canonical' href='https://www.greenenvysupply.com/gro1-1-3mp-usb-led-digital-microscope-10x-300x/' />
<meta name="description" content="The 1.3 Mega-Pixel USB LED Digital Microscope is great for identifying pests and diseases on your plants so you can accurately resolve the problem."/>
<link rel="canonical" href="https://www.greenenvysupply.com/gro1-1-3mp-usb-led-digital-microscope-10x-300x/" />
Because the old URL is still active and not redirecting, my entire website is being seen as having duplicate content. Google crawlers are not being redirected. Why is the URL with /shop/ in it still active even though I have changed the permalink? There has got to be an easy fix for this.
A canonical URL or other metadata in your response is not the same as a redirect. To accomplish a redirect, your server needs to return a 3xx status code (typically a 301 or 308 for a permanent move as you have here or a 302 or 307 for a temporary move) and return a "Location" header that indicates the URL to which to redirect. How exactly you make your server do this is dependent on the type of server or server framework that you happen to be using for your website.
How to accomplish a redirect is somewhat independent of your implicit SEO question about whether to prefer a redirect over a canonical URL, which I'm afraid I cannot answer. Regardless of the approach you use, though, you should be aware that search engines -- Google or otherwise -- may not reflect the changes from your website immediately, so don't panic if you don't see the desired search engine change you were looking for immediately following a change to your website.
My website is here, and visiting it in Chrome gives the 'load unsafe script' error and unsecured content errors in the console. Firefox loads the site, but there isn't a lock.
My site is entirely in PHP, and I'm not sure where to start. The console and firebug said that the site was loading unsecure scripts over HTTP, but how do I make it all HTTPS?
Thanks in advance!
Your HTML has lots of links to http:// resources, eg.:
<link rel="stylesheet" type="text/css" href="http://portal.thespartaninstitute.com/...">
You need to ditch the http: part and just link to //portal.thespartaninstitute.com/... - that will then use https when the page has been loaded that way.
I'm tearing my hair out over Internet Explorer 9's caching.
I set a series of cookies from a perl script depending on a query string value. These cookies hold information about various things on the page like banners and colours.
The problem I'm having is that in IE9 it will always, ALWAYS, use the cache instead of using the new values. The sequence of events runs like this:
Visit www.example.com/?color=blue
Perl script sets cookies, I am redirected back to www.example.com
Colours are blue, everything is as expected.
Visit www.example.com/?color=red
Cookies set, redirected, colours set to red, all is normal
Re-visit www.example.com/?color=blue
Perl Script runs, cookies are re-set (I have confirmed this) but! IE9 retreives all resources from the cache, so on redirect all my colours stay red.
So, every time I visit a new URL it gets the resources fresh, but each time I visit a previously visited URL it retrieves them from the cache.
The following meta tags are in the <head> of example.com, which I thought would prevent the cache from being used:
<META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE">
<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
<META HTTP-EQUIV="EXPIRES" CONTENT="0">
For what it's worth - I've also tried <META HTTP-EQUIV="EXPIRES"
CONTENT="-1">
IE9 seems to ignore ALL these directives. The only time I've had success so far in that browser is by using developer tools and ensuring that it is manually set to "Always refresh from server"
Why is IE ignoring my headers, and how can I force it to check the server each time?
Those are not headers. They are <meta> elements, which are an extremely poor substitute for HTTP headers. I suggest you read Mark Nottingham's caching tutorial, it goes into detail about this and about what caching directives are appropriate to use.
Also, ignore anybody telling you to set the caching to private. That enables caching in the browser - it says "this is okay to cache as long as you don't forward it on to another client".
Try sending the following as HTTP Headers (not meta tags):
Cache-Control: private, must-revalidate, max-age=0
Expires: Thu, 01 Jan 1970 00:00:00
I don't know if this will be useful to anybody, but I had a similar problem on my movies website (crosstastemovies.com). Whenever I clicked on the button "get more movies" (which retrieves a new random batch of movies to rate) IE9 would return the exact same page and ignore the server's response... :P
I had to call a random variable in order to keep IE9 from doing this. So instead of calling "index.php?location=rate_movies" I changed it to "index.php?location=rate_movies&rand=RANDOMSTRING".
Everything is ok now.
Cheers
Will just mention that I had a problem looking very like this. But I tried IE9 on a different computer and there was no issue. Then going to Internet Options -> General -> Delete and deleting everything restored correct behaviour. Deleting the cache was not sufficient.
The only items that HTML5 specifies are content-type, default-style and refresh. See the spec.
Anything else that seems to work is only by the grace of the browser and you can't depend on it.
johnstok is correct. Typing in that code will allow content to update from the server and not just refresh the page.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8; Cache-Control: no-cache" />
put this line of code into your section if you need to have it in you asp code and it should work.
I'm having an issue with a friends iWeb website - http://www.africanhopecrafts.org. Rather than pages viewing they want to download instead but they're all html files. I've tried messing with my htaccess file to see if that was affecting it but nothings working.
Thanks so much
Most likely your friend's web site is dishing up the wrong MIME type. The web server might be malconfigured, but the page can override the content-type responde header by adding a <meta> tag to the page's <head> like this:
<meta http-equiv="content-type" content="text/html" charset="ISO-8859-1" />
(where the charset in use reflects that of the actual web page.)
If the page is being served up with the correct content-type, the browser might be malconfigured to not handle that content type. Does the problem occur for everybody, or just you? IS the problem dependent on the browser in use?
You can sniff the content-type by installing Firefox's Tamper Data plug in. Fire up Firefox, start TamperData and fetch the errant web page via Firefox. Examining the response headers for the request should tell you what content-type the page is being served up with.