Communicating between two processes on heroku (what port to use) - process

I have a Procfile like so:
web: bundle exec rails server -p $PORT
em: script/eventmachine
The em process fires up an eventmachine with start_server (port ENV['PORT']) and my web process occasionally needs to communicate with it.
My question is how does the web process know what port to communicate with it on? If I understand heroku correctly it assigns you a random port when the process starts up (and it can change if the ps is killed or restarted). Thanks!

According to Heroku documentation,
Two processes running on the same dyno can communicate over TCP/IP using whatever ports they want.
Two processes running on different dynos cannot communicate over TCP/IP at all. They need to use memcached, or the database, or one of the Heroku plugins, to communicate.

Processes are isolated and cannot communicate directly with each other.
http://www.12factor.net/processes
There are, however, a few other ways. One is to use a backing service such as Redis, or Postgres to act as an intermediary - another is to use FIFO to communicate.
http://en.wikipedia.org/wiki/FIFO
It is a good thing that your processes are isolated and share-nothing, but you do need to architecture your application slightly differently to accommodate this.

I'm reading this while on my commute to work. So I haven't tried anything with it (sorry) but this looks relevant and potentially awesome.
https://blog.heroku.com/archives/2013/5/2/new_dyno_networking_model

Related

Redis connection settings for app "surviving" redis connectivity issues

I'm using azure redis cache for certain performance monitoring services. Basically when events like page loads, etc occur, I send a fire and forget command to redis to record the event. My goal is for my app to function fine whether or not it can contact the redis server. I'm looking for a best practice for this scenario. I would be OK with losing some events if necessary. I've been finding that even though I'm using fire and forget, the app staggers when the web server runs into high latency or connectivity issues with the server.
I'm using StackExchange.Redis. Any best practice configuration options/programming practices for this scenario?
The way I was implementing a singleton pattern on the connection turned out to be blocking requests. Once I fixed this my app behaves as I want (e.g. it still functions when redis connection dies).

init.d values for separate redis-server instances

I need to clear a concept. I have two redis servers running on a single VM. Server#1 connects via TCP, server#2 connects via a UNIX socket. I'm on the cusp of converting the TCP server to UNIX as well.
An excerpt from the init.d script for server#1 is:
DAEMON=/usr/bin/redis-server
DAEMON_ARGS=/etc/redis/redis.conf
NAME=redis-server
DESC=redis-server
RUNDIR=/var/run/redis
PIDFILE=$RUNDIR/redis-server.pid
The comparable excerpt from the init.d script for server#2 is (which has its own config):
DAEMON=/usr/bin/redis-server
DAEMON_ARGS=/etc/redis/redis-2.conf
NAME=redis2-server
DESC=redis2-server
RUNDIR=/var/run/redis
PIDFILE=$RUNDIR/redis2-server.pid
Both servers are currently up and running. My question is: how come DAEMON is kept the same for both servers? Why wasn't a separate executable needed?
I configured the two servers using config from various internet forums. While it works, I've failed to understand the significance of the DAEMON value, given it remains the same for both server instances. Is it because the executable is fed different config files, and this the same DAEMON is able to handle multiple server instances? Being a beginner, I'd really like some expert opinion on this. Thanks in advance.
Open terminal (or cmd). Now open it again. You have two copies open, but they are both using the same executable.
You're doing the same with redis: DAEMON is just saying where to find the program, and since you're happy to use the same version of redis for both, you can use the same path for both DAEMON values, each instance of which has its own ID stored in the PIDFILE, which is why they need to be different paths or they will interfere with each other.

Resque Workers from other hosts registered and active on my system

The Rails application I'm currently working on is hosted at Amazon EC2 servers. It's using Resque for running background jobs, and there are 2 such instances (would-be production and a stage). Also I've mounted Resque monitoring web app to the /resque route (on stage only).
Here is my question:
Why there are workers from multiple hosts registered within my stage system and how can I avoid this?
Some additional details:
I see workers from apparently 3 different machines, but only 2 of them I managed to identify - the stage(obviously) and the production. The third has another address format(starts with domU) and haven't any clue what it could be.
It looks like you're sharing a single Redis server across multiple resque server environments.
The best way to do this safely is to use separate Redis servers or separate Redis databases or namespaces. The Redis-namespace gem can be used with Resque to isolate each environments Resque queues and worker data.
I can't really help you with what the unknown one is, but I had something similar happen when moving hosts and having dns names change. The only way I found to clear out the old ones was to stop all workers on the machine, fire up IRB, require 'resque' and look at Resque.workers. This will list all the workers resque knows about, which in your case will include about 20 bogus ones. You can then do:
Resque.workers.each do {|worker| worker.unregister_worker}
This should prune all the not-really-there workers and get you back to a proper display of the real workers.

Redis clients broadcast problems (in the context of Socket.IO)

So I've read some articles about scaling Socket.IO. For various reasons I don't want to use built-in Socket.IO scaling mechanism (mostly it seems to be inefficient, since it publishes a lot more stuff to Redis then required from my point of view).
So I've came up with this simple idea:
Each Socket.IO server creates Redis pub/sub/store clients, connects to Redis and subscribes to a channel. Now, when I want to broadcast data I just publish it to Redis and all other Socket.IO servers get it and push it to users.
There is a problem, though (which I think is also a problem for Socket.IO built-in mechanism). Let's say I want to know the number of all connected users. There are at least two ways of doing that:
Server A publishes give_me_clients to Redis. Then each Socket.IO server counts connections and publishes number_of_clients. Server A grabs this data, combines it and sends it to the client.
Each server updates number_of_clients_for::ID_HERE in Redis whenever user connects/disconnects to the server. Then Server A just fetches data and combines it. Might be more efficient.
There are problems with these solutions though:
Server A is not aware of other servers. Therefore he does not know when he should stop listening to number_of_clients. One could fix it with making Server A aware of other servers: whenever a server connects to Redis he publishes new_server (Server A grabs the data and stores it in memory). But what to do, when Redis - Socket.IO connection breaks? Is there a way for Redis to notify clients that one of the client disconnected?
Actually the same as above. When a Socket.IO server crashes how to clear number_of_clients data?
So the real question is: can Redis notify (publish to chanel) clients that the connection with one of them has just ended??
After a lot of testing it seems, that Redis does not have such functionality. Also I've found out, that scaling Socket.IO is really a pain.
So I've switched from Socket.IO to WS (see this link). It is low level (but perfect for my use) and it only supports WebSockets (in all major versions). But then again I only want to support WebSockets and FlashSocket (which I have to imlement manually, but that's fine).
The advantage is that I can easily create cluster with such servers. HAProxy works with such servers almost out of the box (some minor tuning). Servers can easily communicate on a local net (with UDP or central TCP server if the cluster is big).
The disadvantage is that one have to manually implement some cool features like heartbeats, broadcasting, rooms, etc. Also you want have long-polling fallback, but that's fine in my case. Scaling is still more important, imho.

New Relic API - difference between instances and hosts?

Referring to https://github.com/newrelic/newrelic_api for the New Relic API, I was wondering what was the difference between hosts and instances.
Basically, I get what an application is and what a server is (obviously). I would assume instances are instances of the application, i.e. if my app were running on Heroku, each instance would correspond to a dyno running my app. But then what is a host? And what's the difference between host and instance?
Thanks,
-Billy
UPDATE
Thanks for the answer!
So if I got this right, in the general case, the mapping between applications and instances is 1-to-n, i.e. each app can have 1 or more instances. Also, the mapping between instances and hosts is n-to-m, i.e. each instance can be running on at most one host (at any given time), but instances are distributed among available hosts. Similarly, hosts are distributed among servers (say, m-to-s). Is that it? (Apologies if this sound like I'm saying very obvious stuff, but I'm unfamiliar with the terminology they are using over at New Relic)
If the above is correct, how can I get the instances - hosts and hosts - servers mappings from the API? I can see how to get the applications - instances and applications - hosts, but what about the other two?
Thanks again for your help!
A host (server) can run many instances of an application. Each process that responds to requests (e.g., a Unicorn worker) is an instance from the New Relic perspective. The host/instance distinction is roughly equivalent to the difference between an IP address and a port.
If you're using Heroku, New Relic treats the entire dyno grid as a single host/server, and each dyno as an instance.
Re: the updated question
A host is a machine or VM that applications run on, and each one can run N instances of the application.
A "server", for the purposes of the NR API, is an OS+hardware that's monitored by New Relic Server Monitoring. The NR application monitoring agent can also be running on a server monitored by the Server Monitoring agent. In that case, both the host and the server should report the same name to New Relic ("server01.example.com").
There isn't a way to get the instance-host or host-server mappings explicitly from the New Relic API. But in the case of server-host, the mapping is that they share the same name. You can probably infer the instance-host mapping from the instance names, too, since they will almost always contain the host name (and possibly also the port number).