How do you dynamically edit robots.txt in a load balanced environment? - seo

Looks like we are going to have to start load balancing our webservers here soon.
We have a feature request to edit robots.txt dynamically which is not a problem for one host -- however once we get our load balancer up and going -- it sounds like I will have to scp the file over to the other host(s).
This sounds extremely 'bad'. How would you handle this situation?
I already let the client edit the meta tag 'robots' which (imo) should effectively do the same thing as he wants from the robots.txt editing but I really don't know that much about SEO.
Maybe there is a completely different way of handling this?
UPDATE
looks like we will store it in s3 for now and memcache it frontside...
HOW WE ARE DOING IT NOW
so we are using merb..I mapped a route to our robots.txt like so:
match('/robots.txt').to(:controller => 'welcome', :action => 'robots')
then that relevant code looks like this:
def robots
#cache = MMCACHE.clone
begin
robot = #cache.get("/robots/robots.txt")
rescue
robot = S3.get('robots', "robots.txt")
#cache.set("/robots/robots.txt", robot, 0)
end
#cache.quit
return robot
end

I might have the app edit the contents of robots.txt and have the user input saved to a database. Then at certain intervals, have a background process pull the latest from the DB and push to your servers.

An alternative would be to have the reverse proxy that is doing your load balancing treat robots.txt differently. You could serve it directly from the reverse-proxy or have all requests for that file go to a single server. It makes a lot of sense since robots.txt is going to be required relatively infrequently.

I'm not sure if you're home on this yet. If so ignore. (UPDATE: I see a note to your original post, but this may be useful reagrdless.)
If you mapped a call to robots.txt to an http-handler or similar, you can generate the response from say a dB.

serve it via whatever dynamic content generation you are using. its just a file . nothing special.

Related

Yii framework multi language, multi top level domain

My goal is to have this:
.com/english-urls - English (United States)
.com.br/portuguess-urls - Portuguess (Brazil)
.com.mx/spanish-urls - Spanish (México)
...
I already have working multilingual functionality using this Language Switcher: http://www.yiiframework.com/wiki/293/manage-target-language-in-multilingual-applications-a-language-selector-widget-i18n/
And URL localization using this: http://www.yiiframework.com/wiki/55/i18n-subdomains-and-url-rules/
Any idea on how to have the multi top level domain functionality?
Thanks in advance to contribute to Yii development.
There are a few different ways you can approach this.
Parameterized host names. See the guide for details on how to set it up: http://www.yiiframework.com/doc/guide/1.1/en/topics.url#parameterizing-hostnames
Use environment variables set as part of your web server depending on the domain name being used.
I've done #1 in the past and it works pretty well. One nasty side effect comes up if you have a site that runs with SSL but your devs work with non-SSL machines. Parameterized host names require the full http:// or https:// as part of the URL rule unless you extend CUrlManager.
Another bug I hit recently occurs if you use parameterized hostnames AND a baseUrl (https://github.com/yiisoft/yii/issues/3520). Probably not something to worry about, but an FYI that it is there.
Which is why the idea of using environment variables intrigues me. You might be able to load only rule sets that match your given language, etc. but I haven't personally built a system using that approach.
Finally I got a solution!
Using this class: http://www.yiiframework.com/wiki/55/i18n-subdomains-and-url-rules/
1.- Define your top level domains list.
public $domainList = array('www.example.com.mx' => 'es', 'www.example.com' => 'en');
2.- Comment the unnesesary code.
3.- Detect SERVER_NAME and save as activeLanguage.
$languageCode = $this->domainList[$_SERVER['SERVER_NAME']];
$this->activeLanguage = $this->isSupportedLanguage($languageCode);
3.- Create the links in your header or main.
Thanks to twitter.com/atrandafir and acorncom for contribute!

Domain Specific URL include() exemption

I want to allow a URL to be put inside php's include(); but only certain domains - not every one. The other domains are owned by me, and on a separate server. If someone tries to include() any other domain bu these I want them disallowed.
If this is not possible, is there a work around?
My recommendation? Don't do it with includes. Executing code in that fashion is like swallowing a chocolate covered cherry bomb.
You could do it on the web server so you don't even let the domain which is not allowed to parse or get any files. you have to remember any request will go through the web server and server static content or be parsed which all takes time and then add the time for the PHP script to execute.
$allowed_domains = array(
'stackoverflow.com',
'www.stackoverflow.com',
'facebook.com',
'www.facebook.com',
'google.com',
'www.google.com',
);
if(!in_array(parse_url($url, PHP_URL_HOST), $allowed_domains)) {
// throw an error
}

Modern File Structuring for Website Development

So I am fairly new to website development, PHP, Mysql, etc. so it's a given if I get some downvotes for my sheer lack of intelligence, I just want the answer haha.
I have probably jumped the band wagon or probably inherited a completely bad coding practice; instead of simplistic website structures such as stackoverflow.com/questions.php?q=ask (displaying content based on GET data), or making it even more simplistic such as stackoverflow.com/ask.php, etc, we have the seemingly straight forward stackoverflow.com/questions/ask
So what's the weird magic going on?
It's likely you're looking for mod_rewrite.
mod_rewrite will work but it really won't improve your actual file structure on the site (behind the scenes.)
For that you would use a PHP framework. I would suggest starting with CodeIgniter which is simpler than Zend. (I don't have experience with CakePHP so I won't comment on that.)
You will need to configure routing to trap a url so that it maps to a certain controller the use a function to capture the rest of the url as parameters.
function _remap($params = array()) {
return call_user_func_array(array($this, 'index'), $params);
}
Then, in the same controller change the index function like this:
function index($id = null) {
$data['question'] = /* get your data from the database */;
$this->load->view('index', $data);
return true;
}
This assumes you started with the welcome controller example in the zip file.
But, to answer your question more directly, there isn't really any magic going on. The browser is requesting a certain resource and the server is returning that resource according to it's own logic and how it is configured. The layout of files on the server is an internal issue, the browser only sees the representation of the server's state. Read up on the REST principle to understand this better.

Server with the sole purpose of setting cookies

At work we ran up against the problem of setting server-side cookies - a lot of them. Right now we have a PHP script, the sole purpose of which is to set a cookie on the client for our domain. This happens a lot more than 'normal' requests to the server (which is running an app), so we've discussed moving it to its own server. This would be an Apache server, probably dedicated, with one PHP script 3 lines long, just running over and over again.
Surely there must be a faster, better way of doing this, rather than starting up the whole PHP environment. Basically, I need something super simple that can sit around all day/night doing the following:
Check if a certain cookie is set, and
If that cookie is not set, fill it with a random hash (right now it's a simple md5(microtime))
Any suggestions?
You could create a simple http server yourself to accept requests and return the set-cookie header and empty body. This would allow you to move the cookie generation overhead to wherever you see fit.
I echo the sentiments above though; Unless cookie generation is significantly expensive, I don't think you will gain much by moving from your current setup.
By way of an example, here is an extremely simple server written with Tornado that simply sets a cookie on GET or HEAD requests to '/'. It includes an async example listening for '/async' which may be of use depending on what you are doing to get your cookie value.
import time
import tornado.ioloop
import tornado.web
class CookieHandler(tornado.web.RequestHandler):
def get(self):
cookie_value = str( time.time() )
self.set_cookie('a_nice_cookie', cookie_value, expires_days=10)
# self.set_secure_cookie('a_double_choc_cookie', cookie_value)
self.finish()
def head(self):
return self.get()
class AsyncCookieHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
self._calculate_cookie_value(self._on_create_cookie)
#tornado.web.asynchronous
def head(self):
self._calculate_cookie_value(self._on_create_cookie)
def _on_create_cookie(self, cookie_value):
self.set_cookie('double_choc_cookie', cookie_value, expires_days=10)
self.finish()
def _calculate_cookie_value(self, callback):
## meaningless async example... just wastes 2 seconds
def _fake_expensive_op():
val = str(time.time())
callback(val)
tornado.ioloop.IOLoop.instance().add_timeout(time.time()+2, _fake_expensive_op)
application = tornado.web.Application([
(r"/", CookieHandler),
(r"/async", AsyncCookieHandler),
])
if __name__ == "__main__":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
Launch this process with Supervisord and you'll have a simple, fast, low-overhead server that sets cookies.
You could try using mod_headers (usually available in the default install) to manually construct a Set-Cookie header and emit it -- no programming needed as long as it's the same cookie every time. Something like this could work in an .htaccess file:
Header add Set-Cookie "foo=bar; Path=/; Domain=.foo.com; Expires=Sun, 06 May 2012 00:00:00 GMT"
However, this won't work for you. There's no code here. It's just a stupid header. It can't come up with the new random value you'd want, and it can't adjust the expire date as is standard practice.
This would be an Apache server, probably dedicated, with one PHP script 3 lines long, just running over and over again. [...] Surely there must be a faster, better way of doing this, rather than starting up the whole PHP environment.
Are you using APC or another bytecode cache? If so, there's almost no startup cost. Because you're talking about setting up an entire server just for this, it sounds like you control the server as well. This means that you can turn off apc.stat for even less of a startup hit.
Really though, if all that script is doing is building an md5 hash and setting a cookie, it should already be blisteringly fast, especially if it's mod_php. Do you already know, though benchmarking and testing, that the script isn't performing as well as you'd like? If so, can you share those benchmarks with us?
It would be interesting to know why do you think you need extra server - do you actually have a bottle neck for generating the cookie or somewhere else ? Is it the log writing as requests happen alot ? ajax polling ? Client download speed ?
Atleast for starters, i'd look something more efficient than fetching time to generate the "random hash". For example, on this intel i7 laptop i have, generating 999999 md5 hashes from microtime takes roughly about 4 seconds and doing same thing with random numbers is second faster (not taking a seeding of rand into account).
Then, if you take opening/and closing of socket into account, just moving your script (which is most likely already really fast - that is, without knowing how your pages take that into account), you will end up actually slowing down the requests. Actually, now that i've re-read your question, it makes me think that your cookie setter script is already a dedicated page ? Or do you just "include" into real content served by another php script? If not, try that approach. Also this would beneficial if you have default logging rules for apache, if cookies are set in on own page, your apache will log a row for that and in high load systems, this will cumulate to total io time spend by apache.
Also, consider that testing if cookie is set and then setting it, might be slower than just to forcefully set it always even if cookie exists or not ?
But overall, i don't think you'd need to set up a server just to offload cookie generation without knowing more about how you handle the cookies now.. Unless you are doing something really nasty.
Apache has a module called mod_usertrack which looks like it might do exactly what you want. There's no need for PHP and you could likely create a really optimised lightweight Apache config to serve this with.
If you want to go for something even faster and are happy to not use Apache you could use lighttpd and it's mod_usertrack or nginx's HttpUserId module

Up and download directly - no waiting

I would want to program something where you upload a file on the one side and the other person can download it the moment I start uploading. I knew such a service but can't remember the name. If you know the service I'd like to know the name if its not there anymore I'd like to program it as an opensource project.
And it is supposed to be a website
What you're describing sounds a lot like Bit Torrent.
You might be able to achieve this by uploading via a custom ISAPI filter (if you use IIS) -- all CGI implementations won't start to run your script until the request has completed, which makes sense, as you won't have been told all the values just yet, I'd suspect ISAPI may fall foul of this as well.
So, your next best bet is to write a custom HTTP server, that can handle the serving of files yet to finish uploading.
pipebytes.com I found it :)