Two-way encryption/authentication between servers and clients - authentication

To be honest I don't know if this is the appropriate title since I am completely new to this area, but I will try my best to explain below.
The scenario can be modeled as a group of functionally identical servers and a group of functionally identical clients. Assume each client knows the endpoints of all the servers (possibly from a broker or some kind of name service), and randomly chooses one to talk to.
Problem 1: The client and the server first need to authenticate themselves to each other (i.e. the client must show the server that it's a valid client, vice versa).
Problem 2: After that, the client and server talk to each other over some kind of encryption.
For Problem 1, I don't know what's the best solution. For Problem 2, I'm thinking about letting each clients create a private key and give the corresponding public key to the server it talks to right after authentication, so that no one else can decrypt its messages; and let all servers share a private key and distribute the corresponding public key to all clients, so that the external world (including the clients) can't decrypt what the clients send to the servers.
These are probably very naive approaches though, so I'd really appreciate any help & thoughts on the problems. Thank you.

I asked a similar question about half a year ago here, I've been redirected to Information Security.
After reading through my answer and rethinking your question, if you still have questions that are so broad, I suggest to ask there. StackOverflow, from what I know, is more about programming (thus security in programming) than security concepts. Either way, you will probably have to migrate there at some point when doing your project.
To begin with, you need to seriously consider what needs protecting in your system. Like here (check Gilles' comment and others), one of the first and most important things to do is to think over what security measures you have to take. You just mentioned authentication and encryption, but there are many more things that are important, like data integrity. Check wiki page for security measures. After knowing more about these, you can choose what (if any) encryption algorithms, hashing functions and others you need.
For example, forgetting about data integrity is forgetting about hashing, which is the most popular security measure I encounter on SO. By applying encryption, you 'merely' can expect no one else to be able to read the message. But you cannot be sure if it reaches the destination unchanged (if anything), either because of interceptors or signal losses. I assume you need to be sure.
A typical architecture I am aware of, assumes asymmetric encryption for private key exchange and then communicating using private keys. This is because public-key infrastructure (PKI) assumes that the key of one of the sides is publicly known, making communication easier, but certainly slower (e.g. due to key length: RSA [asymmetric] starts with 512bits, but typical key length now is 2048, which I can compare to weakest, but still secure AES [symmetric], which key lengths start with 128bits). The problem is, as you stated, the server and user are not authenticated to each other, so the server does not really know if the person sending the data really is who they claim they are. Also, the data could have been changed during traffic.
To prevent that, you need a so called 'key exchange algorithm', such as one of the Diffie Hellman schemes (so, DH might be the 'raw' answer to both of your problems).
Taking all above into consideration, you might want to use one (or more) of the popular protocols and/or services to define your architecture. Popular ones are SSH, SSL/TLS and IPSec. Read about them, define what services you need, check if they are present in one of the services above and you are willing to use the service. If not, you can always design your own using raw crypto algorithms and digests (hashes).

Related

How can I set up authenticated links between processes in Elixir?

Background:
I am trying to write a program in Elixir to test distributed algorithms by running them on a set of processes and recording certain statistics. To begin with I will be running these processes on the same machine, but the intention is eventually to have them running on separate machines/VMs.
Problem:
One of the requirements for algorithms I wish to implement is that messages include authentication. That is, whenever a process sends a message to another process, the receiver should be able to verify that this message did indeed come from the sender, and wasn't forged by another process. The following snippets should help to illustrate the idea:
# Sender
a = authenticate(self, receiver, msg)
send(receiver, {msg, self, a})
# Receiver
if verify(msg, sender, a) do
deliver(msg)
end
Thoughts so far:
I have searched far and wide for any documentation of authenticated communication between Elixir processes, and haven't been able to find anything. Perhaps in some way this is already done for me behind the scenes, but so far I haven't been able to verify this. If it were the case, I wonder if it would still be correct when the processes aren't running on the same machine.
I have looked into the possibility of using SSL/TLS functions provided by Erlang, but with my limited knowledge in this area, I'm not sure how this would apply to my situation of running a set of processes as opposed to the more standard use in client-server systems and HTTPS. If I went down this route, I believe I would have to set up all the keys and signatures myself beforehand, which I believe could possible using the X509 Elixir package, though I'm not sure if this is appropriate and may be more work than is necessary.
In summary:
Is there a standard/pre-existing way to achieve authenticated communication between processes in Elixir?
If yes, will it be suitable for processes communicating between separate machines/VMs?
If no to either of the above, what is the simplest way I could achieve this myself?
As Aleksei and Paweł point out, if something is in your cluster, it is already trusted. It's not quite like authenticating random web requests that could have originated virtually anywhere, you are talking about messages originating from inside your local network of trusted machines. If some nefarious actor is running on one of your servers, you have far bigger problems to worry about than just authenticating messages.
There are very few limitations put on Elixir/Erlang processes running inside a cluster with respect to security: their states can be inspected by any other process, for example. Some of this transparency is by-design and necessary in order to have a fault-tolerant system capable of doing hot-code reloads, but the conversation about the specific how's and why's is too nuanced for me to do it justice.
If you really need to do some logging to have an auditable "paper trail" to verify which process sent which message, I think you'll have to roll your own solution which could rely on a number of common techniques (such as keys + signatures, block-chains, etc.). But keep in mind: these are concerns that would come up if you were dealing with web requests between different servers anyhow! And there are already protocols for establishing secure connections between computers, so I would not recommend re-inventing those network protocols in your application.
Your time may be better spent working on the algorithms themselves and not trying to re-invent the wheel on security. Your app should focus on the unique stuff that nobody else is doing (algorithms in your case). If you have multiple interconnected VMs passing messages to each other, all the "security" requirements there come with defining the proper access to each machine/subnet, and that requirement holds no matter what application/language you're running on them.
The more I read what are you trying to achieve, the more I am sure all you need is the footprint of the calling process.
For synchronous calls GenServer.handle_call/3 you already have the second parameter as a footprint.
For asynchronous messages, you might add the caller information to the messages themselves. Like, instead of sending a plain :foo message, send {:foo, pid()} or somewhat even more sophisticated like {:foo, {pid(), timestamp(), ip(), ...} and make callee to verify those.
That would be safe by all means: erlang cluster would ensure these messages are coming from trusted sources, and your internal validation might ensure that the source is valid within your internal rules.

why do we need consistent hashing when round robin can distribute the traffic evenly

When the load balancer can use round robin algorithm to distribute the incoming request evenly to the nodes why do we need to use the consistent hashing to distribute the load? What are the best scenario to use consistent hashing and RR to distribute the load?
From this blog,
With traditional “modulo hashing”, you simply consider the request
hash as a very large number. If you take that number modulo the number
of available servers, you get the index of the server to use. It’s
simple, and it works well as long as the list of servers is stable.
But when servers are added or removed, a problem arises: the majority
of requests will hash to a different server than they did before. If
you have nine servers and you add a tenth, only one-tenth of requests
will (by luck) hash to the same server as they did before. Consistent hashing can achieve well-distributed uniformity.
Then
there’s consistent hashing. Consistent hashing uses a more elaborate
scheme, where each server is assigned multiple hash values based on
its name or ID, and each request is assigned to the server with the
“nearest” hash value. The benefit of this added complexity is that
when a server is added or removed, most requests will map to the same
server that they did before. So if you have nine servers and add a
tenth, about 1/10 of requests will have hashes that fall near the
newly-added server’s hashes, and the other 9/10 will have the same
nearest server that they did before. Much better! So consistent
hashing lets us add and remove servers without completely disturbing
the set of cached items that each server holds.
Similarly, The round-robin algorithm is used to the scenario that a list of servers is stable and LB traffic is at random. The consistent hashing is used to the scenario that the backend servers need to scale out or scale in and most requests will map to the same server that they did before. Consistent hashing can achieve well-distributed uniformity.
Let's say we want to maintain user sessions on servers. So, we would want all requests from a user to go to the same server. Using round-robin won't be of help here as it blindly forwards requests in circularly fashion among the available servers.
To achieve 1:1 mapping between a user and a server, we need to use hashing based load balancers. Consistent hashing works on this idea and it also elegantly handles cases when we want to add or remove servers.
References: Check out the below Gaurav Sen's videos for further explanation.
https://www.youtube.com/watch?v=K0Ta65OqQkY
https://www.youtube.com/watch?v=zaRkONvyGr8
For completeness, I want to point out one other important feature of Consistent Hashing that hasn't yet been mentioned: DOS mitigation.
If a load-balancer is getting spammed with requests, (either from too many customers, an attack, or a haywire local service) a round-robin approach will apply the request spam evenly across all upstream services. Even spread out, this load might be too much for each service to handle. So what happens? Your loadbalancer, in trying to be helpful, has brought down your entire system.
If you use a modulus or consistent hashing approach, then only a small subset of services will be DOS'd by the barrage.
Being able to "limit the blast radius" in this manner is a critical feature of production systems
Consistent hashing is fits well for stateful systems(where context of the previous request is required in the current requests), so in stateful systems if previous and current request lands in different servers than for current request context is lost and system won't be able to fulfil the request, so in consistent hashing with the use of hashing we can route of requests to same server for that particular user, while in round robin we cannot achieve this, round robin is good for stateless systems.

Why time-based nonce should be avoided?

In challenge-response mechanism (and other systems), it advised not to use time-based nonce.
Why it should be avoided?
(Disclaimer: I have no degree in crypto, everything I wrote is just a layman's opinion.)
Using time-based nonces is discouraged because they are likely to incidentally collide and easy to be implemented in a wrong way.
Nonces (“numbers used only once”) are not the same thing as secret keys or initialization vectors. The ciphers that use them are usually designed bearing in mind that:
exposing nonces to the attacker doesn't harm security as long as the secret key is not compromised;
nonces don't have to be random at all, all they have to be is unique for a given secret key.
So, it's perfectly okay to select zero as the starting nonce and increment it before sending each successive message. Nonce predictability is not an issue at all.
The sole reason why time-based nonces are discouraged are probable backward clock adjustments. If your system NTP service rewinds your clock two seconds backward, then your are likely to send two encrypted messages with the same nonce within the short period of time. If you can guaranty that no clock rewinds will ever happen, than go ahead.
Another point against time-based nonces is that the clock resolution may be not enough to provide each message with a unique number.
UPD:
Using counter-based or time-based nonces is safe in terms of encryption strength. However, they may weaken your security system by providing attacker with additional information, namely: how much messages have the you system already sent, that's the average message rate, that are the number of clients it serves simultaneously, and so on. The attacker may be able to use this information to their advantage.That's called a side-channel attack.
See also:
https://crypto.stackexchange.com/questions/37903
https://crypto.stackexchange.com/questions/53153
https://download.libsodium.org/doc/secret-key_cryptography/encrypted-messages.html, section “Nonce-misuse resistance”
a time or counter based nonce could lead to a scenario where an attacker can prepare in advance ... that alone usually won't break a system, abut it is one step into the wrong direction... unpredictable nonces usually don't hurt...

Interoperability in DDS

I am new to DDS domain and need to have the below understanding.
how to publish common topics between two vendors to achieve interoperability in DDS?
The Scenario is :
Suppose there are two vendor products V1 and V2. V1 has a publisher which publishes on topic T1. V2 wants to subscribe for this topic.How will the Subscriber(V2) know that there exists a Topic T1?
I have a similar doubt on Domain level.how will a subscriber know to which domain it has to participate in?
I am using OpenDDS.
Thanks
Interoperability between vendors is possible, and regularly tested/demonstrated by the main vendors.
You will need to configure your DDS implementation to use RTPS ( I think RTPS2 currently), rather than any proprietary transport that vendors may use. This might be enabled by default.
In terms of which domain to participate in, you programmatically create a domain participant in a particular domain (which domain it connects to might be controlled by a config file) and all further entities (publishers, subscribers, etc) that you create then belong to that domain participant and therefore operate in that domain
To build on #rcs's answer a bit... the actual amount of work you have to do can depend on DDS implementations (OpenDDS, RTI, Prismtech...) because they'll have different defaults. If you use the same on both ends, then your configuration becomes a lot simpler since defaults should line up for things like domain and RTPS.
You will need to make sure the following match:
Domain ID
Domain Partion
Transport (I recommend RTPS, FWIW version difference between 2.1 and 2.2 hasn't mattered in my experience)
TCP or UDP
Discovery port and data port - this will matter more or less depending which implementations you use and whether or not you're using the same one on both ends of the connection, if use using the same, they'll have the same defaults.
Make sure the topic one publishes matches the topic the other subscribes to, this will apply to the Topic and the Type (see more here)
Serialization of the data
Discovery (unicast vs. multicast, make sure
whatever setup you choose is valid, ex: both devices are in the same
multicast group)
QoS settings will need to line up, though I think defaults will likely work (read more here)
Get the Shapes demo working between the machines you're working on first, this does some basic sanity checking to know that it is possible with the given configuration and network setup. Every vendor/implementation that I've seen has a shapes demo to run, for example, here is RTI's.
That's all I can think of right now, hope that helps. I have found DDS documentation to be really good, especially if you know when you can (and when you can't) use any vendor's documentation's answer for your implementation (ex: answer found on RTI's doc or forum and whether or not it works for your OpenDDS application). Often the solutions are similar, but you'll find RTI supports the most and RTI + Prismtech have some of the best documentation.
The DDS RTPS protocol exchanges discovery information so that different applications participating in the same domain (!) know who is out there, and what they are offering/requesting. You need to make sure that the two applications are using the same domain ID (specified on the domain participant). Also, as some implementations allow for different transport options, make sure to use RTPS (sometimes called DDSI) networking.
The RTPS specification contains a mapping from domain ID to port numbers, so if applications from different vendors use the same ID it should just work. Implementations might however override portnumbers with configuration.
To maximize the chance that the applications communicate properly, ensure they use the same IDL datamodel. Vendors have different approaches to type evolution / mapping types that don't exactly match, and not all of them implement the XTypes specification (yet).
Also, as some implementations are stricter than others, ensure that you stay within bounds of the specification. This means that a topic name should only contain alphanumerical characters (I sometimes see ':' to indicate scoping, that is not allowed).
Things that will definitely not work between vendors is TRANSIENT/PERSISTENT durability or communication over TCP, as both have not been standardized yet. TRANSIENT_LOCAL should work. The difference between TRANSIENT_LOCAL and TRANSIENT is that with TRANSIENT_LOCAL, data is no longer aligned after a publisher (writer) leaves the system, whereas with TRANSIENT that data will still be available.
Also note that for API-level interoperability between vendors, your best chance is to use the new isocpp API, since that one has been implemented pretty consistently across the vendor implementations I've seen.
Hope that helps!

How much security is required for message storage and transmission?

I need to implement a very secured Web Service using WCF. I have read a lot of documents about security in WCF concerning authorization, authentication, message encryption. The web service will use https, Windows Authentication for access to the WS, SQL Server Membership/Role Provider for user authentication and authorization on WS operations and finally message encryption.
I read in one of documents that it is good to consider security on each layer indenpendently, i.e. Transport Layer security must be thought without considering Message Layer. Therefore, using SSL through https in combination with message encryption (using public/private key encryption and signature) would be a good practice, since https concerns Transport Layer and message encryption concerns Message Layer.
But a friend told me that [https + message encryption] is too much; https is sufficient.
What do you think?
Thanks.
If you have SSL then you still need to encrypt your messages if you don't really trust the server which stores them (it could have its files stolen), so this is all good practice.
There comes a point where you have a weakest link problem.
What is your weakest link?
Example: I spend $100,000,000 defending an airport from terrorists, so they go after a train station instead. Money and effort both wasted.
Ask yourself what the threat model is and design your security for that. TLS is a bare minimum for any Internet-based communications, but it doesn't matter if somebody can install a keystroke logger.
As you certainly understand, the role of Transport-Level Security is to secure the transmission of the message, whereas Message-Level Security is about securing the message itself.
It all depends on the attack vectors (or more generally the purpose) you're considering.
In both cases, the security models involved can have to purposes: protection against eavesdropping (relying on encryption) and integrity protection (ultimately relying on signatures, since based on public-key cryptography in most cases).
TLS with server-certificate only will provide you with the security of the transport, and the client will know that the communication really comes from the server it expects (if configured properly, of course). In addition, if you use client-certificate, this will also guarantee the server that the communication comes from a client that has the private key for this client certificate.
However, when the data is no longer in transit, you rely on the security of the machine where it's used and stored. You might no longer be able to assert with certainty where the data came from, for example.
Message-level security doesn't rely on how the communication was made. Message-level signature allows you to know where the messages came from at a later date, independently of how they've been transferred. This can be useful for audit purposes. Message-level encryption would also reduce the risks of someone getting hold of the data if it's stored somewhere where some data could be taken (e.g. some intranet storage systems).
Basically, if the private key used to decrypt the messages has the same protection as the private key used for SSL authentication, and if the messages are not stored for longer time than the connection, in that case it is certainly overkill.
OTOH, if you've got different servers, or if the key is stored e.g. using hardware security of sorts, or is only made available by user input, then it is good advice to secure the messages themselves as well. Application level security also makes sense for auditing purposes and against configuration mistakes, although personally I think signing the data (integrity protection) is more important in this respect.
Of course, the question can also become: if you're already using a web-service that uses SOAP/WSDL, why not use XML encrypt/sign? It's not that hard to configure. Note that it does certainly take more processor time and memory. Oh, one warning: don't even try it if the other side does not know what they are doing - you'll spend ages explaining it and even then you run into trouble if you want to change a single parameter later on.
Final hint: use standards and standardized software or you'll certainly run into crap. Spend some time getting getting to know how things work, and make sure you don't accept ill formatted messages when you call verify (e.g. XML signing the wrong node or accepting MD5 and such things).