Redis mimic MASTER/MASTER? or something else? - redis

I have been reading a lot of the posts on here and surfing the web, but maybe I am not asking the right question. I know that Redis is currently Master/slave until Cluster becomes available. However, I was wondering if someone can tell me how I would want to configure Redis logistically to meet my needs (or if its not the right tool).
Scenerio:
we have 2 sites on opposite ends of the US. We want clients to be able to write at each site at a high volume. We then want each client to be able to perform reads at their site as well. However we want the data to be available from a write at the sister site in < 50ms. Given that we have plenty of bandwidth. Is there a way to configure redis to meet our needs? our writes maximum size would be on the order of 5k usually much less. The main point is how can i have2 masters that are syncing to one another even if it is not supported by default.

The catch with Tom's answer is that you are not running any sort of cluster, you are just writing to two servers. This is a problem if you want to ensure consistency between them. Consider what happens when your client fails a write to the remote server. Do you undo the write to local? What happens to the application when you can't write to the remote server? What happens when you can't read from the local?
The second catch is the fundamental physics issue Joshua raises. For a round trip you are talking a theoretical minimum of 38ms leaving a theoretical maximum processing time on both ends (of three systems) of 12ms. I'd say that expectation is a bit too much and bandwidth has nothing to do with latency in this case. You could have a 10GB pipe and those timings are still extant. That said, transferring 5k across the continent in 12ms is asking a lot as well. Are you sure you've got the connection capacity to transfer 5k of data in 50ms, let alone 12? I've been on private no-utilization circuits across the continent and seen ping times exceeding 50ms - and ping isn't transferring 5k of data.
How will you keep the two unrelated servers in-sync? If you truly need sub-50ms latency across the continent, the above theoretical best-case means you have 12ms to run synchronization algorithms. Even one query to check the data on the other server means you are outside the 50ms window. If the data is out of sync, how will you fix it? Given the above timings, I don't see how it is possible to synchronize in under 50ms.
I would recommend revisiting the fundamental design requirements. Specifically, why this requirement? Latency requirements of 50ms round trip across the continent are usually the sign of marketing or lack of attention to detail. I'd wager that if you analyze the requirements you'll find that this 50ms window is excessive and unnecessary. If it isn't, and data synchronization is actually important (likely), than someone will need to determine if the significant extra effort to write synchronization code is worth it or even possible to keep within the 50ms window. Cross-continent sub-50ms latency data sync is not a simple issue.
If you have no need for synchronization, why not simply run one server? You could use a slave on the other side of the continent for recovery-only purposes. Of course, that still means that best-case you have 12ms to get the data over there and back. I would not count on
50ms round trip operations+latency+5k/10k data transfer across the continent.

It's about 19ms at the speed of light to cross the US. <50ms is going to be hard to achieve.
http://www.wolframalpha.com/input/?i=new+york+to+los+angeles

This is probably best handled as part of your client - just have the client write to both nodes. Writes generally don't need to be synchronous, so sending the extra command shouldn't affect the performance you get from having a local node.

Related

What is cyclic and acyclic communication?

So I've already searched if there was a question like this posted before, but I wasn't able to find the answer I liked.
I've been working with some PLCs and variable frequency drives lately and thought it was about time I finally found out what cyclic and non-cyclic communication is.
So correct me if I'm wrong, but when I think of cyclic data, I think of data that is continuously being updated and is able to be sent/sampled to other devices. With relation to what I'm doing, I'm thinking that the variable frequency drive is able to update information such as speed and frequency that can be sampled/read from a PLC. This is what I would consider cyclic communication, something that is always updating a certain type of information that can be sent as data.
So I might be completely wrong with this assumption, and that leaves me with the question of what exactly would be considered non-cyclic or acyclic communication.
Any help?
Forenote: This is mostly a programming based site, and while your question does have an answer within the contexts of programming, I happen to know that in your industrial application, the importance of cyclic vs acyclic tends to be very hardware/protocol specific, and is really more of a networking problem than a programming one.
Cyclic data is not simply "continuous" data. In industry, it refers to data delivered on a guaranteed (or at least highly predictable) schedule. If the data stream were to violate the schedule, it could have disastrous consequences (a VFD misses its shutdown command by a fraction of a second, and you lose your arm!).
Acyclic data is still reliable for machine control, it is just delivered in a less deterministic way (on the order of milliseconds, sometimes up to several seconds). When accessing a single VFD with a single PLC, you will probably never notice this bursting behavior, and in fact, you may perceive smoother and quicker data transmissions. From the hardware interface perspective, acyclic data transfer does not provide as strong of a guarantee about if or when one machine will respond to the request of another.
Both forms of data transfer deliver data at speeds much faster than humans can deal with, but in certain applications they will each have their own consequences.
Cyclic networks usually must take the form of master/slave, where only one device is allowed speak at a time, and answers are always returned, even if just to confirm that the message was received. Cyclic networks usually do not allow as many devices on the same wire, and often they will pass larger amounts of data at slower rates.
Acyclic networks might be thought of as a bit more choatic, but since they skip handshaking formalities, they can often cheat more devices onto the network and get higher speeds all at the same time. This comes at the cost of occasional data collisions/bottlenecks, and sometimes even, requests for critical data are simply ignored/lost with no indication of failure or success from the target ( in the case the sender will likely be sitting and waiting desperately for a message it will not get, and often then trigger process watchdogs that will shutdown the system).
From a programmer perspective, not much is different between these two transmission types.
What will usually dictate a situation,
how many devices are running on the wire (sometimes this forces the answer right away)
how sensitive/volatile is the data they want to share (how useful are messages if they are a little late)
how much data they might be required to send at any given time ( shifting demands on a network that already produces race conditions can be hard to anticipate/avoid if you don't see it coming before hand).
Hope that helps :)

Concurrent page request comparisons

I have been hoping to find out what different server setups equate to in theory for concurrent page requests, and the answer always seems to be soaked in voodoo and sorcery. What is the approximation of max concurrent page requests for the following setups?
apache+php+mysql(1 server)
apache+php+mysql+caching(like memcached or similiar (still one server))
apache+php+mysql+caching+dedicated Database Server (2 servers)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/single dbserver)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/multi dbserver)
+distributed (amazon cloud elastic) -- I know this one is "as much as you can afford" but it would be nice to know when to move to it.
I appreciate any constructive criticism, I am just trying to figure out when its time to move from one implementation to the next, because they each come with their own implementation feat either programming wise or setup wise.
In your question you talk about caching and this is probably one of the most important factors in a web architecture r.e performance and capacity.
Memcache is useful, but actually, before that, you should be ensuring proper HTTP cache directives on your server responses. This does 2 things; it reduces the number of requests and speeds up server response times (if you have Apache configured correctly). This can also be improved by using an HTTP accelerator like Varnish and a CDN.
Another factor to consider is whether your system is stateless. By stateless, it usually means that it doesn't store sessions on the server and reference them with every request. A good systems architecture relies on state as little as possible. The less state the more horizontally scalable a system. Most people introduce state when confronted with issues of personalisation - i.e serving up different content for different users. In such cases you should first investigate using the HTML5 session storage (i.e store the complete user data in javascript on the client, obviously over https) or if the data set is smaller, secure javascript cookies. That way you can still serve up cached resources and then personalise with javascript on the client.
Finally, your stack includes a database tier, another potential bottleneck for performance and capacity. If you are only reading data from the system then again it should be quite easy to horizontally scale. If there are reads and writes, its typically better to separate the read write datasets into a separate database and have the read only in another. You can then use more relevant methods to scale.
These setups do not spit out a single answer that you can then compare to each other. The answer will vary on way more factors than you have listed.
Even if they did spit out a single answer, then it is just one metric out of dozens. What makes this the most important metric?
Even worse, each of these alternatives is not free. There is engineering effort and maintenance overhead in each of these. Which could not be analysed without understanding your organisation, your app and your cost/revenue structures.
Options like AWS not only involve development effort but may "lock you in" to a solution so you also need to be aware of that.
I know this response is not complete, but I am pointing out that this question touches on a large complicated area that cannot be reduced to a single metric.
I suspect you are approaching this from exactly the wrong end. Do not go looking for technologies and then figure out how to use them. Instead profile your app (measure, measure, measure), figure out the actual problem you are having, and then solve that problem and that problem only.
If you understand the problem and you understand the technology options then you should have an answer.
If you have already done this and the problem is concurrent page requests then I apologise in advance, but I suspect not.

What is a good access time to a database (SQL)?

Hey.. i wanna know which time is a good accesstime, because i'm searching for a good sql database and hsqldb says their accesstime is 12ms... <-- good?
I think it would depend on your needs. Is it for a web server or a desktop application? The amount of data is also important, because reading lots of small records will perform differently than reading a few large records. Access time is also based upon your hardware, software and maybe even some other factors.
For example, you can use a database with lightning-fast access, but if your users need to connect to it over a 5 megabit VPN connection, passing through three different proxies and with trafic world-wide, your database would then just be a waste of power.
Basically, it's a marketing thing that they're claiming. It's a good product but don't just focus on access time. Make sure you also look at your other needs. Another system might just perform better, even if it has a slower acess time, because it is more optimized in reading it's indices and stuff.
So, what do you want, exactly?
I don't think access time tells you anything, really. If you have slow or incorrectly configured storage, then this access time metric will be dwarfed by how much time is spent on waits and split I/Os. Network latency is also a factor, since I'm guessing you probably won't want to have your code on the same machine as your database, and you will most likely have a few network devices you'll need to traverse in your production environment.
In my experience, all the database platforms these days will all perform adequately if configured correctly and paired with a complementary application. Pick the DBMS that best fits your requirements, follow the best practices for configuration of the DBMS on your hardware, and you should be please with the outcome.

How online-game clients are able to exchange data through internet so fast?

Let's imagine really simple game... We have a labirinth and two players trying to find out exit in real time through internet.
On every move game client should send player's coordinates to server and accept current coordinates of another client. How is it possible to make this exchange so fast (as all modern games do).
Ok, we can use memcache or similar technology to reduce data mining operations on server side. We can also use fastest webserver etc., but we still will have problems with timings.
So, the questions are...
What protocol game clients are usually using for exchanging information with server?
What server technologies are coming to solve this problem?
What algorithms are applied for fighting with delays during game etc.
Usually with Network Interpolation and prediction. Gamedev is a good resource: http://www.gamedev.net/reference/list.asp?categoryid=30
Also check out this one: http://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking
use UDP, not TCP
use a custom protocol, usually a single byte defining a "command", and as few subsequent bytes as possible containing the command arguments
prediction is used to make the other players' movements appear smooth without having to get an update for every single frame
hint: prediction is used anyway to smooth the fast screen update (~60fps) since the actual game speed is usually slower (~25fps).
The other answers haven't spelled out a couple of important misconceptions in the original post, which is that these games aren't websites and operate quite differently. In particular:
There is no or little "data-mining" that needs
to be speeded up. The fastest online
games (eg. first person shooters)
typically are not saving anything to
disk during a match. Slower online
games, such as MMOs, may use a
database, primarily for storing
player information, but for the most
part they hold their player and world data in memory,
not on disk.
They don't use
webservers. HTTP is a relatively slow
protocol, and even TCP alone can be
too slow for some games. Instead they
have bespoke servers that are written just for that particular game. Often these servers are tuned for low latency rather than throughput, because they typically don't serve up big documents like a web server would, but many tiny messages (eg. measured in bytes rather than kilobytes).
With those two issues covered, your speed problem largely goes away. You can send a message to a server and get a reply in under 100ms and can do that several times per second.

First Time Architecturing?

I was recently given the task of rebuilding an existing RIA. The new RIA that I've designed is based on Silverlight, with a WCF service to connect to MS SQL Server. This is my first time doing something like this, so I'm not sure how to design the entire thing.
Basically, the client can look through graphs of "stocks" (allowing the client to choose different time periods, settings, etc). I've written the whole application essentially, but I'm not sure how to put it together.
The graphs are supposed to be directly based on the database, and to create the datapoints on the graph, some calculations need to be done (not very expensive ones).
The problem I'm having is to decide where to put the calculations (client or serverside? Or half and half?)
What factors should I look for to help me decide where the calculations should be done? And how can I go about optimizing this (caching, etc)?
Obviously this is a very broad subject, so I'm not expecting an immediate answer, but any help/pointing in the right direction/resources would be appreciated.
A few tips for this kind of app.
Put as much logic as possible on the client.
Make the client responsible for session data, making all your server code stateless.
Try to minimize traffic to and from the server (Bigger requests are more efficient than multiple smaller ones) so consolidate requests when possible.
If this project is likely to grow beyond it's current feature set I think it's probably a good idea to perform the calculations client side. This can avoid scaling issues, because you're using all the client side CPUs ratther than you're single, precious server CPU. This does however rely on being able to transfer the required data to the client in an efficient way, otherwise you replace a processor bottleneck with a network bottleneck.
As for caching it depends on your inputs, what variables can users of the client affect? If any of the variables they can alter are discrete (ie they can be a fixed set of values) then they're candidates for caching. For example if a user can select a date range of stock variations to view then that's probably not so useful, if however they can only select a year then you could cache your data sets by year (download each data set to the client and perform your calculation). I'd not worry about caching too much unless you find it's a real performance problem, it'll only make your code more complex, so don't add it until you have proven you need it.
One other thing, if this project is unlikely to be a long term concern then implement the calculations wherever is easiest and fastest, you can revisit if the project becomes more important later on.
Be REALLY REALLY careful about implementing client-side caching. Caching is INSANELY hard to do right while maintaining performance, security and correctness. Note that your DB Server's caching mechanism is already likely to be way better than any local caching mechanism you're likely to implement in less than 2 weeks' effort!
I would urge you to do as much work on the back-end as possible and to limit your client to render the data in a manner that is appropriate for your users. While many may balk at this suggestion, it's based on a number of observations from building many such systems in the past:
If you're going to filter some of the data returned by your service, you've just wasted thousands of clock cycles shipping data that need never have left your server
If you're going to sort your data, your DB could have done the sorting for you (often using otherwise idle CPU ticks) while waiting for the data to be read from its disks.
Your server most likely has more CPU and RAM available than your clients and has a surprising amount of "free time" to use for sorting, filtering, running inline calculations, etc., while its waiting for disks to read sectors etc.
As Roman suggested: Minimize your round-trips between your client and your server as much as possible.
But perhaps most importantly:
BEFORE YOU START DESIGNING YOUR SYSTEM, state your performance goals
Design what you think will achieve those goals. Try to find bottlenecks in your design, particularly areas where you make blocking calls. Re-design those areas to use async patterns wherever you can.
Build your intended solution
Measure your actual perforamnce under actual real-world load
If you're within your expected performance goals, then you're done.
If not, work out where you're spending too long and tune the design of that portion of the system. Goto 3.
Don't try to build the perfect system in one try - chances are that you won't manage it, no matter how hard you try, for a variety of reasons including user expectations, your servers ability to process the required load, your clients' ability to handle the returned data, your network's ability to carry the traffic, etc.
They're a little old now, but I suggest you read through some of the earlier posts at http://blogs.msdn.com/richardt for more thoughts around designing and constructing Service Oriented and distributed systems.