Iptables Custom Chain vs IPSET - iptables

There is not any research comparing the difference between custom chain in IP tables and IPset.
Can anyone tell me the difference in terms of function and performance between these two please? (especially the efficiency)
If possible to provide any published research paper or information, please?

If you server receives really high traffic, a lot of iptables rules can have significant overhead especially on the CPU. It depends on the types of rules you have and the iptables extensions you use.
You could read these two papers:
https://workshop.netfilter.org/2013/wiki/images/a/ab/Jozsef_Kadlecsik_ipset-osd-public.pdf
http://www.netdevconf.org/1.1/proceedings/slides/kadlecsik-ipset-firewalling-iptables.pdf
They provide different scenarios, how iptables works, performance tests and so on.
Not sure if that's what you're looking for but it's a good read nonetheless
UPDATE:
ipset is an extension to iptables that allows you to create firewall rules that match entire "sets" of addresses at once. Unlike normal iptables chains, which are stored and traversed linearly, IP sets are stored in indexed data structures, making lookups very efficient, even when dealing with large sets.
Supposing you want to block 10k ip addresses, with just iptables you'll have to create 10k rules, one for each ip address, while with ipset you can create a single rule for a specific set of those ip addresses. This translates in speed, less rules, less cpu processing power etc

Related

Redis > Isolate keys with large values?

It's my understanding that best practice for redis involves many keys with small values.
However, we have dozens of keys that we'd like to have store a few MB each. When traffic is low, this works out most of the time, but in high traffic situations, we find that we have timeout errors start to stack up. This causes issues for all of our tiny requests to redis, which were previously reliable.
The large values are for optimizing a key part of our site's functionality, and a real performance boost when it's going well.
Is there a good way to isolate these large values so that they don't interfere with the network I/O of our best practice-sized values?
Note, we don't need to dynamically discover if a value is >100KB or in the MBs. We have a specific method that we could have use a separate redis server/instance/database/node/shard/partition (I'm not a hardware guy).
Just install/configure as many instances as needed (2 in the case), each managing independently on a logical subset if keys (e.g. big and small), with routing done by the application. Simple and effective - divide and converter conquer
The correct solution would would be to have 2 separate redis clusters, one for big sized keys, and another one for small sized keys. These 2 clusters could run on the same set of physical or virtual machines, aka multitenancy (You would want to do that to fully utilize the underlying cores on your machine, as redis server is single threaded). This way you would be able to scale both the clusters separately, and your problem of small requests timing out because of queueing behind the bigger ones will be alleviated.

Concurrent page request comparisons

I have been hoping to find out what different server setups equate to in theory for concurrent page requests, and the answer always seems to be soaked in voodoo and sorcery. What is the approximation of max concurrent page requests for the following setups?
apache+php+mysql(1 server)
apache+php+mysql+caching(like memcached or similiar (still one server))
apache+php+mysql+caching+dedicated Database Server (2 servers)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/single dbserver)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/multi dbserver)
+distributed (amazon cloud elastic) -- I know this one is "as much as you can afford" but it would be nice to know when to move to it.
I appreciate any constructive criticism, I am just trying to figure out when its time to move from one implementation to the next, because they each come with their own implementation feat either programming wise or setup wise.
In your question you talk about caching and this is probably one of the most important factors in a web architecture r.e performance and capacity.
Memcache is useful, but actually, before that, you should be ensuring proper HTTP cache directives on your server responses. This does 2 things; it reduces the number of requests and speeds up server response times (if you have Apache configured correctly). This can also be improved by using an HTTP accelerator like Varnish and a CDN.
Another factor to consider is whether your system is stateless. By stateless, it usually means that it doesn't store sessions on the server and reference them with every request. A good systems architecture relies on state as little as possible. The less state the more horizontally scalable a system. Most people introduce state when confronted with issues of personalisation - i.e serving up different content for different users. In such cases you should first investigate using the HTML5 session storage (i.e store the complete user data in javascript on the client, obviously over https) or if the data set is smaller, secure javascript cookies. That way you can still serve up cached resources and then personalise with javascript on the client.
Finally, your stack includes a database tier, another potential bottleneck for performance and capacity. If you are only reading data from the system then again it should be quite easy to horizontally scale. If there are reads and writes, its typically better to separate the read write datasets into a separate database and have the read only in another. You can then use more relevant methods to scale.
These setups do not spit out a single answer that you can then compare to each other. The answer will vary on way more factors than you have listed.
Even if they did spit out a single answer, then it is just one metric out of dozens. What makes this the most important metric?
Even worse, each of these alternatives is not free. There is engineering effort and maintenance overhead in each of these. Which could not be analysed without understanding your organisation, your app and your cost/revenue structures.
Options like AWS not only involve development effort but may "lock you in" to a solution so you also need to be aware of that.
I know this response is not complete, but I am pointing out that this question touches on a large complicated area that cannot be reduced to a single metric.
I suspect you are approaching this from exactly the wrong end. Do not go looking for technologies and then figure out how to use them. Instead profile your app (measure, measure, measure), figure out the actual problem you are having, and then solve that problem and that problem only.
If you understand the problem and you understand the technology options then you should have an answer.
If you have already done this and the problem is concurrent page requests then I apologise in advance, but I suspect not.

redis sharding, pipelining, and round-trips

Suppose in your web application you need to do a number of redis calls to render a page, like, getting a bunch of user hashes. To speed this up you could wrap up your redis commands in a MULTI/EXEC section, thus using pipelining, so that you avoid doing many round-trips. But you also want to shard your data, because you have lots of it and/or you want to distribute writes. Then pipelining wouldn't work, because different keys would potentially live on different nodes, unless you have a clear idea of the data layout of your application and shard based on roles rather than using a hash function. So, what are the best practices to shard data across different servers without compromising performance too much due to many servers being contacted to complete a "conceptually unique" job? I believe the answer depends on the web application one is developing, and I'll eventually run some tests, but it'd be helpful to know how others have coped with the trade-offs I mentioned.
MULTI/EXEC and pipelining are two different things. You can do MULTI/EXEC without any pipelining and vice versa.
If you want to shard and pipeline at the same time, you need to group the operations to pipeline per Redis instance, and then use pipelining for each instance.
Here is a simple example using Ruby: https://gist.github.com/2587593
One way to further improve performance is to parallelize the traffic on the Redis instances once the operations have been grouped (i.e. you group the operations, you send them to all instances in parallel, you wait for the answers from all instances).
This is a bit more complex, because an asynchronous non blocking client is required. For maximum performance, C/C++ should be used on client side. This can be easily implemented by using hiredis + the event loop of your choice.

architecture for high availability

I have this scenario:
You have a factory process line which runs 24/7. Downtime is extremely expensive.
The software controlling all different parts must use a shared form of database storage
The main reason for this is to know in which state the factory is in. For example some products can be mixed when using the same set of equipement and others DEFINITELY not.
requirements:
I want to the software be able to detect that an error in one part of
the plant must result in some machine shutdown more then 1 km away. so stoing data in the plc's is not an option.
Updates and upgrades to the factory environment are frequent
load (in computer terms) will be really low.
The systems handles a few hunderd assignments a day for which calculations / checks are done followed by instructions send for the factory machines. Systems will be bored most of the time. Most important requirement is the central computer system must be correct and always working.
I was thinking to use a dynamo based database (riak or cassandra) where data gets written to multiple machines with each machine having the whole database
When one system goes down it will go down unoticed. A Traditional sql databse might be more of a pain to upgrade when tables changes and this master slave is harder to configure.
What would be your solution?
Network has been made redundant and most other single points of failure to. The database system is critical because downtime of the db means downtime for the entire plant not just one of the machines which is acceptable.
How to solve shared state problem.
complexity in the database will not be a problem. I will be more like a simple key value store to get the most current and correct data.
I don't think this is a sql/nosql question. All of Postgres, MySQL and MS SQL Server have some kind of cluster or hot standby option.
Configuration is a one-time thing, but any NoSQL option is going to give you headaches from top-to-bottom of code, if you are trying to do something fundamentally relational on a platform that has given up relational for the purposes of running things like Amazon or Facebook. The configuration is once, the coding is forever.
So I would say stick with a tried and true solution and get that hot replication going.
This also provides a solution for upgrades. The typical sequence is to "fail over" to the standby, upgrade the master, flip back to the master, upgrade the standby, and resume. With details specific to the situation of course.
Use an established RDBMS that supports such things natively
Do you really want to run a 24/7 mission critical system on something that may be consistent at any point in time?
You need to avoid single points of failure.
All the major players in our dbms world offer at least one way to avoid making the database itself a single point of failure. I might question whether they can propagate changes fast enough for your manufacturing processes. (Or is data update not really an issue? Can't really tell from your question.) My db work in manufacturing is limited to the car and the chemical industry. Microseconds didn't matter to them.
But the dbms isn't the only thing that can fail. "Always working" means that the clients have to always be working, too. Client hardware, connections to the network, the network and network servers themselves all probably have single points of failure. Failure-tolerant servers have multiple power supplies, multiple NICs, etc.
"Always working" is really expensive. I have a feeling that the database isn't going to be the biggest problem for your company.

How online-game clients are able to exchange data through internet so fast?

Let's imagine really simple game... We have a labirinth and two players trying to find out exit in real time through internet.
On every move game client should send player's coordinates to server and accept current coordinates of another client. How is it possible to make this exchange so fast (as all modern games do).
Ok, we can use memcache or similar technology to reduce data mining operations on server side. We can also use fastest webserver etc., but we still will have problems with timings.
So, the questions are...
What protocol game clients are usually using for exchanging information with server?
What server technologies are coming to solve this problem?
What algorithms are applied for fighting with delays during game etc.
Usually with Network Interpolation and prediction. Gamedev is a good resource: http://www.gamedev.net/reference/list.asp?categoryid=30
Also check out this one: http://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking
use UDP, not TCP
use a custom protocol, usually a single byte defining a "command", and as few subsequent bytes as possible containing the command arguments
prediction is used to make the other players' movements appear smooth without having to get an update for every single frame
hint: prediction is used anyway to smooth the fast screen update (~60fps) since the actual game speed is usually slower (~25fps).
The other answers haven't spelled out a couple of important misconceptions in the original post, which is that these games aren't websites and operate quite differently. In particular:
There is no or little "data-mining" that needs
to be speeded up. The fastest online
games (eg. first person shooters)
typically are not saving anything to
disk during a match. Slower online
games, such as MMOs, may use a
database, primarily for storing
player information, but for the most
part they hold their player and world data in memory,
not on disk.
They don't use
webservers. HTTP is a relatively slow
protocol, and even TCP alone can be
too slow for some games. Instead they
have bespoke servers that are written just for that particular game. Often these servers are tuned for low latency rather than throughput, because they typically don't serve up big documents like a web server would, but many tiny messages (eg. measured in bytes rather than kilobytes).
With those two issues covered, your speed problem largely goes away. You can send a message to a server and get a reply in under 100ms and can do that several times per second.