LDAP Server side sorting - really a good idea? - ldap

I'm toying with using server side sorting in my OpenLDAP server. However as I also get to write the client code I can see that all it buys me is in this case one line of sorting code at the client. And as the client is one of presently 4, soon to be 16 Tomcats, maybe hundreds if the usage balloons, sorting at the client actually makes more sense to me. I'm wondering whether SSS is really considered much of an idea. My search results in the case aren't larger, dozens rather than hundreds. Just wondering whether it might be more of a weapon than a tool.
In OpenLDAP it is bundled with VLV - Virtual List View, which I will need some day, so it is already installed: so it's really a programming question, not just a configuration question, hence SO not SF.

Server-side sorting is intended for use by clients that are unable or unwilling to sort results themselves; this might be useful in hand-held clients with limited memory and CPU mojo.
The advantages of server-side sorting include, but not limited to:
the server can enforce a time limit on the processing of the sorting
clients can specify an ordering rule for the server to use
professional-quality servers can be configured to reject requests with sort controls attached if the client connection is not secure
the server can enforce resource limits, for example, the aforementioned time limit, or administration limits
the server can enforce access restrictions on the attributes and on the sort request control itself; this may not be that effective if the client can retrieve the attributes anyway
the server may indicate it is too busy to perform the sort or simply unwilling to perform the sort
professional-quality servers can be configured to reject search requests for all clients except for clients with the necessary mojo (privilege, bind DN, IP address, or whatever)
The disadvantages include, but not limited to:
servers can be overwhelmed by sorting large result sets from multiple clients if the server software is unable to cap the number of sorts to process simultaneously
client-side APIs have to support the server-side sort request control and response
it might be easier to configure clients to sort by their own 'ordering rules'; although these can be added to professional-quality, extensible servers

To answer my own question, and not to detract from Terry's answer, use of the Virtual List View requires a Server Side Sort control.

Related

Intercepting, manipulating and forwarding API calls to a 3rd party API

this might be somewhat of a weird, long and convoluted question but hear me out.
I am running a licensed 3rd party closed-source proprietary software on my on-premise server that stores and manipulates data, the specifics of what it does are not important. One of the features of this software is that it has an API that accepts requests to insert/manipulate/retrieve data. Because of the poorly designed software, there is no mechanism to write internal scripts (at least not anymore, it has been deprecated in the newest versions) for the software or any events to attach to for writing code that further enhances the functionality of the software (further manipulation of the data according to preset rules, timestamping through a TSA of the incoming packages, etc.).
How can I bypass the need for an internal scripting functionality that still gives me a way to e.g. timestamp an incoming package and return an appropriate response via the API to the sender in case of an error.
I have thought about using the in-built database trigger mechanisms (specifically MongoDB Change Streams API) to intercept the incoming data and adding the required hash and other timestamping-related information directly into the database. This is a neat solution, other than the fact that in case of an error (there have been some instances where our timestamping authority API is down or not responding to requests) there is no way to inform the sender that the timestamping process has not gone through as expected and that the new data will not be accepted into the server (all data on the server must be timestamped by law).
Another way this could be done is by intercepting the API request somehow before it reaches its endpoint, doing whatever needs to be done to the data, and then forwarding the request further to the server's API endpoint and letting it do its thing. If I am not mistaken the concept is somewhat similar to what a reverse proxy does on the network layer - it routes incoming requests according to rules set in the configuration, removes/adds headers to the packets, encrypts the connection to the server, etc.
Finally, my short question to this convoluted setup would be: what is the best way of tackling this problem, are there any software solutions or concepts that I should be researching?

How can I set up authenticated links between processes in Elixir?

Background:
I am trying to write a program in Elixir to test distributed algorithms by running them on a set of processes and recording certain statistics. To begin with I will be running these processes on the same machine, but the intention is eventually to have them running on separate machines/VMs.
Problem:
One of the requirements for algorithms I wish to implement is that messages include authentication. That is, whenever a process sends a message to another process, the receiver should be able to verify that this message did indeed come from the sender, and wasn't forged by another process. The following snippets should help to illustrate the idea:
# Sender
a = authenticate(self, receiver, msg)
send(receiver, {msg, self, a})
# Receiver
if verify(msg, sender, a) do
deliver(msg)
end
Thoughts so far:
I have searched far and wide for any documentation of authenticated communication between Elixir processes, and haven't been able to find anything. Perhaps in some way this is already done for me behind the scenes, but so far I haven't been able to verify this. If it were the case, I wonder if it would still be correct when the processes aren't running on the same machine.
I have looked into the possibility of using SSL/TLS functions provided by Erlang, but with my limited knowledge in this area, I'm not sure how this would apply to my situation of running a set of processes as opposed to the more standard use in client-server systems and HTTPS. If I went down this route, I believe I would have to set up all the keys and signatures myself beforehand, which I believe could possible using the X509 Elixir package, though I'm not sure if this is appropriate and may be more work than is necessary.
In summary:
Is there a standard/pre-existing way to achieve authenticated communication between processes in Elixir?
If yes, will it be suitable for processes communicating between separate machines/VMs?
If no to either of the above, what is the simplest way I could achieve this myself?
As Aleksei and Paweł point out, if something is in your cluster, it is already trusted. It's not quite like authenticating random web requests that could have originated virtually anywhere, you are talking about messages originating from inside your local network of trusted machines. If some nefarious actor is running on one of your servers, you have far bigger problems to worry about than just authenticating messages.
There are very few limitations put on Elixir/Erlang processes running inside a cluster with respect to security: their states can be inspected by any other process, for example. Some of this transparency is by-design and necessary in order to have a fault-tolerant system capable of doing hot-code reloads, but the conversation about the specific how's and why's is too nuanced for me to do it justice.
If you really need to do some logging to have an auditable "paper trail" to verify which process sent which message, I think you'll have to roll your own solution which could rely on a number of common techniques (such as keys + signatures, block-chains, etc.). But keep in mind: these are concerns that would come up if you were dealing with web requests between different servers anyhow! And there are already protocols for establishing secure connections between computers, so I would not recommend re-inventing those network protocols in your application.
Your time may be better spent working on the algorithms themselves and not trying to re-invent the wheel on security. Your app should focus on the unique stuff that nobody else is doing (algorithms in your case). If you have multiple interconnected VMs passing messages to each other, all the "security" requirements there come with defining the proper access to each machine/subnet, and that requirement holds no matter what application/language you're running on them.
The more I read what are you trying to achieve, the more I am sure all you need is the footprint of the calling process.
For synchronous calls GenServer.handle_call/3 you already have the second parameter as a footprint.
For asynchronous messages, you might add the caller information to the messages themselves. Like, instead of sending a plain :foo message, send {:foo, pid()} or somewhat even more sophisticated like {:foo, {pid(), timestamp(), ip(), ...} and make callee to verify those.
That would be safe by all means: erlang cluster would ensure these messages are coming from trusted sources, and your internal validation might ensure that the source is valid within your internal rules.

why do we need consistent hashing when round robin can distribute the traffic evenly

When the load balancer can use round robin algorithm to distribute the incoming request evenly to the nodes why do we need to use the consistent hashing to distribute the load? What are the best scenario to use consistent hashing and RR to distribute the load?
From this blog,
With traditional “modulo hashing”, you simply consider the request
hash as a very large number. If you take that number modulo the number
of available servers, you get the index of the server to use. It’s
simple, and it works well as long as the list of servers is stable.
But when servers are added or removed, a problem arises: the majority
of requests will hash to a different server than they did before. If
you have nine servers and you add a tenth, only one-tenth of requests
will (by luck) hash to the same server as they did before. Consistent hashing can achieve well-distributed uniformity.
Then
there’s consistent hashing. Consistent hashing uses a more elaborate
scheme, where each server is assigned multiple hash values based on
its name or ID, and each request is assigned to the server with the
“nearest” hash value. The benefit of this added complexity is that
when a server is added or removed, most requests will map to the same
server that they did before. So if you have nine servers and add a
tenth, about 1/10 of requests will have hashes that fall near the
newly-added server’s hashes, and the other 9/10 will have the same
nearest server that they did before. Much better! So consistent
hashing lets us add and remove servers without completely disturbing
the set of cached items that each server holds.
Similarly, The round-robin algorithm is used to the scenario that a list of servers is stable and LB traffic is at random. The consistent hashing is used to the scenario that the backend servers need to scale out or scale in and most requests will map to the same server that they did before. Consistent hashing can achieve well-distributed uniformity.
Let's say we want to maintain user sessions on servers. So, we would want all requests from a user to go to the same server. Using round-robin won't be of help here as it blindly forwards requests in circularly fashion among the available servers.
To achieve 1:1 mapping between a user and a server, we need to use hashing based load balancers. Consistent hashing works on this idea and it also elegantly handles cases when we want to add or remove servers.
References: Check out the below Gaurav Sen's videos for further explanation.
https://www.youtube.com/watch?v=K0Ta65OqQkY
https://www.youtube.com/watch?v=zaRkONvyGr8
For completeness, I want to point out one other important feature of Consistent Hashing that hasn't yet been mentioned: DOS mitigation.
If a load-balancer is getting spammed with requests, (either from too many customers, an attack, or a haywire local service) a round-robin approach will apply the request spam evenly across all upstream services. Even spread out, this load might be too much for each service to handle. So what happens? Your loadbalancer, in trying to be helpful, has brought down your entire system.
If you use a modulus or consistent hashing approach, then only a small subset of services will be DOS'd by the barrage.
Being able to "limit the blast radius" in this manner is a critical feature of production systems
Consistent hashing is fits well for stateful systems(where context of the previous request is required in the current requests), so in stateful systems if previous and current request lands in different servers than for current request context is lost and system won't be able to fulfil the request, so in consistent hashing with the use of hashing we can route of requests to same server for that particular user, while in round robin we cannot achieve this, round robin is good for stateless systems.

Verify WCF interface is the same between client and server applications

We've got a Windows service that is connected to various client applications via a duplex WCF channel. The client and server applications are installed on different machines, in different locations, potentially at widely different times, and by different people. In addition, the client can be pointed at a different machine running the same Windows service at startup.
Going forward, we know that the interface between the client and the server applications will likely evolve. The application in the field will be administered by local IT personnel, and we have no real control over what version of either of these applications will be installed when/where or which will be connecting to the other. Since these are installed at various physical locations and by different people, there's a high likely that either the client or server application will be out of date compared to the other.
Since we can't control what versions of the applications in the field are trying to connect to each other, I'd like to be able to verify that the contracts between the client application and the server application are compatible.
Some things I'm looking for (may not be able to realistically get them all):
I don't think I care if the server's interface is newer or older, as long as the server's interface is a super-set of the client's
I want to use something other than an "interface version number". Any developer-kept version number will eventually be forgotten about or missed.
I'd like to use a computed interface comparison if that's possible
How can I do this? Any ideas on how to go about this would be greatly appreciated.
Seems like this is a case of designing your service for versioning. WCF has very good versioning capabilities and extension points. Here are a couple of good MSDN articles on versioning the service contract and more specifically the data contracts. For backward and "forward" compatible versioning look at this article on using the IExtensibleDataObject interface.
If the server's endpoint has metadata publishing enabled, you can programmatically inspect an endpoint's interface by using the MetadataResolver class. This class lets you retrieve the metadata from the server endpoint, and in your case, you would be interested in the ContractDescription which contains the list of all operations. You could then compare the list of operations to your client proxy's endpoint operations.
Of course now, comparing the lists of operations would need to be implemented, you could simply compare the operations names and fail if one of the client's operations is not found within the server's operations. This would not necessarily cover all incompatiblities, ex. request/response schema changes.
I have not tried implementing any of this by the way, so it's more of a theoretical view of your problem. If you don't want to fiddle with the framework, you could implement a custom operation that would return the list of operation names. This would be of minimal effort but is less standards-compliant.

using BOSH/similar technique for existing application/system

We've an existing system which connects to the the back end via http (apache/ssl) and polls the server for new messages, needless to say we have scalability issues.
I'm researching on removing this polling and have come across BOSH/XMPP but I'm not sure how we should take the BOSH technique (using long lived http connection).
I've seen there are few libraries available but the entire thing seems bloaty since we do not need buddy lists etc and simply want to notify the clients of available messages.
The client is written in C/C++ and works across most OS so that is an important factor. The server is in Java.
does bosh result in huge number of httpd processes? since it has to keep all the clients connected, what would be the limit on that. we are also planning to move to 64 bit JVM/apache what would be the max limit of clients in that case.
any hints?
I would note that BOSH is separate from XMPP, so there's no "buddy lists" involved. XMPP-over-BOSH is what you're thinking of there.
Take a look at collecta.com and associated blog posts (probably by Jack Moffitt) about how they use BOSH (and also XMPP) to deliver real-time information to large numbers of users.
As for the scaling issues with Apache, I don't know — presumably each connection is using few resources, so you can increase the number of connections per Apache process. But you could also check out some of the connection manager technologies (like punjab) mentioned on the BOSH page above.