I've been reading a lot about OpenSSL, specifically the TLS and DTLS APIs. Most of it makes sense, it's a pretty intuitive API once you understand it. One thing has really got me scratching my head though...
When/why would I use BIOs?
For example, this wiki page demonstrates setting up a barebones TLS server. There isn't even a mention of BIOs anywhere in the example.
Now this page Uses BIOs exclusively, not ever using the read and write functions of the SSL struct. Granted it's from 2013, but it's not the only one that uses BIOs.
To make it even more confusing this man page suggests that the SSL struct has an "underlying BIO" without ever needing to set it explicitly.
So why would I use BIOs if I can get away with using SSL_read() and SSL_write()? What are the advantages? Why do some examples use BIOs and others don't? What Is the Airspeed Velocity of an Unladen Swallow?
BIO's are always there, but they might be hidden by the simpler interface. Directly using the BIO interface is useful if you want more control - with more effort. If you just want to use TLS on a TCP socket then the simple interface is usually sufficient. If you instead want to use TLS on your own underlying transport layer or if you want have more control on how it interacts with the transport layer then you need BIO.
An example for such a use case is this proposal where TLS is tunneled as JSON inside HTTPS, i.e. the TLS frames are encoded in JSON and which is then transferred using POST requests and responses. This can be achieved by handling the TLS with memory BIO's which are then encoded to and decoded from JSON.
First, your Q is not very clear. SSL is (a typedef for) a C struct type, and you can't use the dot operator on a struct type in C, only an instance. Even assuming you meant 'an instance of SSL', as people sometimes do, in older versions (through 1.0.2) it did not have members read and write, and in 1.1.0 up it is opaque -- you don't even know what its members are.
Second, there are two different levels of BIO usage applicable to the SSL library. The SSL/TLS connection (represented by the SSL object, plus some related things linked to it like the session) always uses two BIOs to respectively send and receive protocol data -- including both protocol data that contains the application data you send with SSL_write and receive with SSL_read, and the SSL/TLS handshake that is handled within the library. Much as Steffen describes, these normally are both set to a socket-BIO that sends to and receives from the appropriate remote host process, but they can instead be set to BIOs that do something else in-between, or even instead. (This normal case is automatically created by SSL_set_{,r,w}fd which it should be noted on Windows actually takes a socket handle -- but not any other file handle; only on Unix are socket descriptors semi-interchangeable with file descriptors.)
Separately, the SSL/TLS connection itself can be 'wrapped' in an ssl-BIO. This allows an application to handle an SSL/TLS connection using mostly the same API calls as a plain TCP connection (using a socket-BIO) or a local file, as well as the provided 'filter' BIOs like a digest (md) BIO or a base64 encoding/decoding BIO, and any additional BIOs you add. This is the case for the IBM webpage you linked (which is for a client not a server BTW). This is similar to the Unix 'everything is (mostly) a file' philosophy, where for example the utility program grep, by simply calling read on fd 0, can search data from a file, the terminal, a pipe from another program, or (if run under inetd or similar) from a remote system using TCP (but not SSL/TLS, because that isn't in the OS). I haven't encountered many cases where it is particularly beneficial to be able to easily interchange SSL/TLS data with some other type of source/sink, but OpenSSL does provide the ability.
Related
this might be somewhat of a weird, long and convoluted question but hear me out.
I am running a licensed 3rd party closed-source proprietary software on my on-premise server that stores and manipulates data, the specifics of what it does are not important. One of the features of this software is that it has an API that accepts requests to insert/manipulate/retrieve data. Because of the poorly designed software, there is no mechanism to write internal scripts (at least not anymore, it has been deprecated in the newest versions) for the software or any events to attach to for writing code that further enhances the functionality of the software (further manipulation of the data according to preset rules, timestamping through a TSA of the incoming packages, etc.).
How can I bypass the need for an internal scripting functionality that still gives me a way to e.g. timestamp an incoming package and return an appropriate response via the API to the sender in case of an error.
I have thought about using the in-built database trigger mechanisms (specifically MongoDB Change Streams API) to intercept the incoming data and adding the required hash and other timestamping-related information directly into the database. This is a neat solution, other than the fact that in case of an error (there have been some instances where our timestamping authority API is down or not responding to requests) there is no way to inform the sender that the timestamping process has not gone through as expected and that the new data will not be accepted into the server (all data on the server must be timestamped by law).
Another way this could be done is by intercepting the API request somehow before it reaches its endpoint, doing whatever needs to be done to the data, and then forwarding the request further to the server's API endpoint and letting it do its thing. If I am not mistaken the concept is somewhat similar to what a reverse proxy does on the network layer - it routes incoming requests according to rules set in the configuration, removes/adds headers to the packets, encrypts the connection to the server, etc.
Finally, my short question to this convoluted setup would be: what is the best way of tackling this problem, are there any software solutions or concepts that I should be researching?
I am developing a client server pair of applications to transfer files by streaming bytes over TCP/IP and the channel would use TLS always.
(Note: Due to certain OS related limitations SFTP or other such secure file transfer protocols cannot be used)
The application level protocol involves minimum but sufficient features to get the file to the other side.
I need to decide if the application level protocol needs to implement an integrity check (Ex: MD5).
Since TLS guarantees integrity, would this be redundant?
The use of TLS can provide you with some confidence that the data has not been changed (intentionally or otherwise) in transit, but not necessarily that the file that you intended to send is identical to the one that you receive.
There are plenty of other opportunities for the file to be corrupted/truncated/modified (such as when it's being read from the disk/database by the sender, or when it's written to disk by the receiver). Implementing your own integrity checking would help protect against those cases.
In terms of how you do the checking, if you're worried about malicious tampering then you should be checking a cryptographic signature (using something like GPG), rather than just a hash of the file. If you're going to use a hash then it's generally recommended to use a more modern algorithm such as a SHA-256 rather than the (legacy) MD5 algorithm - although most of the issues with MD5 won't affect you if you're only concerned about accidental corruption.
Background:
I am trying to write a program in Elixir to test distributed algorithms by running them on a set of processes and recording certain statistics. To begin with I will be running these processes on the same machine, but the intention is eventually to have them running on separate machines/VMs.
Problem:
One of the requirements for algorithms I wish to implement is that messages include authentication. That is, whenever a process sends a message to another process, the receiver should be able to verify that this message did indeed come from the sender, and wasn't forged by another process. The following snippets should help to illustrate the idea:
# Sender
a = authenticate(self, receiver, msg)
send(receiver, {msg, self, a})
# Receiver
if verify(msg, sender, a) do
deliver(msg)
end
Thoughts so far:
I have searched far and wide for any documentation of authenticated communication between Elixir processes, and haven't been able to find anything. Perhaps in some way this is already done for me behind the scenes, but so far I haven't been able to verify this. If it were the case, I wonder if it would still be correct when the processes aren't running on the same machine.
I have looked into the possibility of using SSL/TLS functions provided by Erlang, but with my limited knowledge in this area, I'm not sure how this would apply to my situation of running a set of processes as opposed to the more standard use in client-server systems and HTTPS. If I went down this route, I believe I would have to set up all the keys and signatures myself beforehand, which I believe could possible using the X509 Elixir package, though I'm not sure if this is appropriate and may be more work than is necessary.
In summary:
Is there a standard/pre-existing way to achieve authenticated communication between processes in Elixir?
If yes, will it be suitable for processes communicating between separate machines/VMs?
If no to either of the above, what is the simplest way I could achieve this myself?
As Aleksei and Paweł point out, if something is in your cluster, it is already trusted. It's not quite like authenticating random web requests that could have originated virtually anywhere, you are talking about messages originating from inside your local network of trusted machines. If some nefarious actor is running on one of your servers, you have far bigger problems to worry about than just authenticating messages.
There are very few limitations put on Elixir/Erlang processes running inside a cluster with respect to security: their states can be inspected by any other process, for example. Some of this transparency is by-design and necessary in order to have a fault-tolerant system capable of doing hot-code reloads, but the conversation about the specific how's and why's is too nuanced for me to do it justice.
If you really need to do some logging to have an auditable "paper trail" to verify which process sent which message, I think you'll have to roll your own solution which could rely on a number of common techniques (such as keys + signatures, block-chains, etc.). But keep in mind: these are concerns that would come up if you were dealing with web requests between different servers anyhow! And there are already protocols for establishing secure connections between computers, so I would not recommend re-inventing those network protocols in your application.
Your time may be better spent working on the algorithms themselves and not trying to re-invent the wheel on security. Your app should focus on the unique stuff that nobody else is doing (algorithms in your case). If you have multiple interconnected VMs passing messages to each other, all the "security" requirements there come with defining the proper access to each machine/subnet, and that requirement holds no matter what application/language you're running on them.
The more I read what are you trying to achieve, the more I am sure all you need is the footprint of the calling process.
For synchronous calls GenServer.handle_call/3 you already have the second parameter as a footprint.
For asynchronous messages, you might add the caller information to the messages themselves. Like, instead of sending a plain :foo message, send {:foo, pid()} or somewhat even more sophisticated like {:foo, {pid(), timestamp(), ip(), ...} and make callee to verify those.
That would be safe by all means: erlang cluster would ensure these messages are coming from trusted sources, and your internal validation might ensure that the source is valid within your internal rules.
I was reading TURN server RFCs. All related RFCs ( 5766 and the more recent 8656) support Channel mechanism to avoid the 36 bytes overheads of STUN headers (Section 2.5 of RFC 5766) required for the send/data approach:
For some applications (e.g., Voice over IP), the 36 bytes of overhead
that a Send indication or Data indication adds to the application
data can substantially increase the bandwidth required between the
client and the server. To remedy this, TURN offers a second way for
the client and server to associate data with a specific peer.
For WebRTC, clearly there is no point in using the send/data mechanism. How do browsers choose between the two mechanisms for relaying? Is send/data a fallback? Will support for Channels alone in a TURN server be sufficient for WebRTC use-case?
They will usually do SendIndications while waiting for the Channel to be created.
SendIndications also are important if you get something on the relay before the Channel is created. Some clients only create the Channel when they send and not right when the permission is created.
Firefox doesn't support TURN channels: https://bugzilla.mozilla.org/show_bug.cgi?id=857736
Chrome also uses send/binding indications until ICE is done (presumably to avoid the overhead of creating channels which are not used later)
Don't rely on partial implementations of a spec, that won't work.
I need to implement a very secured Web Service using WCF. I have read a lot of documents about security in WCF concerning authorization, authentication, message encryption. The web service will use https, Windows Authentication for access to the WS, SQL Server Membership/Role Provider for user authentication and authorization on WS operations and finally message encryption.
I read in one of documents that it is good to consider security on each layer indenpendently, i.e. Transport Layer security must be thought without considering Message Layer. Therefore, using SSL through https in combination with message encryption (using public/private key encryption and signature) would be a good practice, since https concerns Transport Layer and message encryption concerns Message Layer.
But a friend told me that [https + message encryption] is too much; https is sufficient.
What do you think?
Thanks.
If you have SSL then you still need to encrypt your messages if you don't really trust the server which stores them (it could have its files stolen), so this is all good practice.
There comes a point where you have a weakest link problem.
What is your weakest link?
Example: I spend $100,000,000 defending an airport from terrorists, so they go after a train station instead. Money and effort both wasted.
Ask yourself what the threat model is and design your security for that. TLS is a bare minimum for any Internet-based communications, but it doesn't matter if somebody can install a keystroke logger.
As you certainly understand, the role of Transport-Level Security is to secure the transmission of the message, whereas Message-Level Security is about securing the message itself.
It all depends on the attack vectors (or more generally the purpose) you're considering.
In both cases, the security models involved can have to purposes: protection against eavesdropping (relying on encryption) and integrity protection (ultimately relying on signatures, since based on public-key cryptography in most cases).
TLS with server-certificate only will provide you with the security of the transport, and the client will know that the communication really comes from the server it expects (if configured properly, of course). In addition, if you use client-certificate, this will also guarantee the server that the communication comes from a client that has the private key for this client certificate.
However, when the data is no longer in transit, you rely on the security of the machine where it's used and stored. You might no longer be able to assert with certainty where the data came from, for example.
Message-level security doesn't rely on how the communication was made. Message-level signature allows you to know where the messages came from at a later date, independently of how they've been transferred. This can be useful for audit purposes. Message-level encryption would also reduce the risks of someone getting hold of the data if it's stored somewhere where some data could be taken (e.g. some intranet storage systems).
Basically, if the private key used to decrypt the messages has the same protection as the private key used for SSL authentication, and if the messages are not stored for longer time than the connection, in that case it is certainly overkill.
OTOH, if you've got different servers, or if the key is stored e.g. using hardware security of sorts, or is only made available by user input, then it is good advice to secure the messages themselves as well. Application level security also makes sense for auditing purposes and against configuration mistakes, although personally I think signing the data (integrity protection) is more important in this respect.
Of course, the question can also become: if you're already using a web-service that uses SOAP/WSDL, why not use XML encrypt/sign? It's not that hard to configure. Note that it does certainly take more processor time and memory. Oh, one warning: don't even try it if the other side does not know what they are doing - you'll spend ages explaining it and even then you run into trouble if you want to change a single parameter later on.
Final hint: use standards and standardized software or you'll certainly run into crap. Spend some time getting getting to know how things work, and make sure you don't accept ill formatted messages when you call verify (e.g. XML signing the wrong node or accepting MD5 and such things).