Whether spring cloud config cache/store config data from backend - spring-cloud-config

In my project, I am planning to use multiple backend to store different data in my spring cloud conifg server setup: use git backend to store un-sensitive data, and use vault to store sensitive data like password/token. This is simiar to what https://content.pivotal.io/blog/spring-cloud-services-supports-vault-multiple-backends-use-the-right-config-repo-for-the-job suggests.
My question is since the returned decrypted value from vault is passed back to "client application" through config server, will config server cache/store/log the response from vault in any way. If this is true, config server will be a big target for hacker and we may have to protect the config server with extra configuration.

I suppose the true answer to your concern would be to secure each and every layer of your stack to prevent intrusion at any single point.
The Spring documentation makes no explicit reference to caching data - so you should be safe in that regard. It would also not make a lot of sense for Config Server to cache the configuration from external data stores as it is not the source-of-truth for that data. We want it to always fetch the data from source to ensure we get the latest version of the data. I'm supposing that there might be a case of caching if Config Server stored the configuration locally and was able to watch the files for changes and refresh its cache accordingly. But having said that, I'm still not sold on the benefit of caching at this layer.
From personal use of Spring Cloud Config Server I've yet to see it logging out the entire configuration; in-fact it logs very little to start off with. I'm sure you can suppress logging even further by setting the appropriate levels.
What you should also look at doing is securing the connections between Vault & Config Server and Config Server and each application using SSL. That will prevent you from transmitting data in clear text and will provide you with an additional layer of security.

Related

How to better utilize local cache with load balancing strategies?

I have an Authentication service where I need to cache some user information for better performance. I chose to use local cache because Authentication service probably will be called on each request so I want it to be super fast. Compared to remote cache options local cache is a lot faster (local cache access is below 1ms while remote cache access is around 25ms).
The problem is I can not cache as much information as a distributed cache without running out of memory (talking about millions of users). I can either leave it as it is and when local cache reaches the memory limit it would evict some other data but that would be bad optimization of the cache. Or I can use some kind of load balancer strategy where users will be redirected to same Authentication service instances based on their IP address or other criteria thus the cache hits will be a lot higher.
It kind of defeats the purpose of having stateless services however I think I can slightly compromise from this principle in network layer if I want both consistency and availability. And as for Authentication both are crucial for full security (user info always has to be up-to-date and available).
What kind of load balancing techniques out there for solving this kind of problem? Can there be other solutions?
Note: Even though this question is specific to Authentication I think many other services that are frequently accesses and requires speed can benefit a lot from using local caches.
So - to answer the question here - load balancers can handle this with their hashing algorithms.
I'm using Azure a lot so I'm giving Azure Load Balancer as an example:
Configuring the distribution mode
Load balancing algorithm
From the docs:
Hash-based distribution mode
The default distribution mode for Azure
Load Balancer is a five-tuple hash.
The tuple is composed of the:
Source IP
Source port
Destination IP
Destination port
Protocol type
The hash is used to map traffic to the available servers. The
algorithm provides stickiness only within a transport session. Packets
that are in the same session are directed to the same datacenter IP
behind the load-balanced endpoint. When the client starts a new
session from the same source IP, the source port changes and causes
the traffic to go to a different datacenter endpoint.

Prometheus target management

We are using prometheus in our production envirment recently. Before we only have 30-40 nodes for each service and those servers not change very often, so we just write it in the prometheus.yml, but right now it become too long to hold in one file and change much frequently then before, so my question is should i use file_sd_config to put those server list out of yml file and change those config files sepearately, or using consul for service discovery(same much easy to handle changes).
I have install 3 nodes consul cluster in data center and as i can see if i change to use consul to slove this problem , i also need to install consul client in each server(node) and define its services info. Is that correct? or does anyone have good advise.
Thanks
I totally advocate the use of a service discovery system. It may be a bit hard to deploy at first but surely it will worth it in the future.
That said, Prometheus comes with a lot of service discovery integrations. It's possible that you don't need a Consul cluster. If your servers are in a cloud provider like AWS, GCP, Azure, Openstack, etc, prometheus are able to autodiscover the instances.
If you keep running with Consul, the answer is yes, the agent must be running in every node. You can also register services and nodes via API but it's easier to deploy the agent.

database Vs cache management in deepstream

I was wondering about how deepstream decides to store an info in cache vs database if both of them are configured. Can this be decided by the clients?
Also, when using redis will it provide both cache and database functionality? I would be using amazon elastic cache with redis backend for the same.
It stores it in both, first in the cache in a blocking way and outside the critical path in the database in a non-blocking way.
Here's an animation illustrating this.
You can also find more information here: https://deepstream.io/tutorials/core/storing-data/

How Can I use Apache to load balance Marklogic Cluster

Hi I am new to Marklogic and Apache. I have been provided task to use apache as loadbalancer for our Marklogic cluster of 3 machines. Marklogic cluster is currently running on Linux servers.
How can we achieve this? Any information regarding this would be helpful.
You could use mod_proxy_balancer. How you configure it depends what MarkLogic client you would like to use. If you would like to use the Java Client API, please follow the second example here to allow apache to generate stickiness cookies. If you would like to use XCC, please configure it to use the ML-Server-generated or backend-generated "SessionID" cookie.
The difference here is that XCC uses sessions whereas the Java Client API builds on the REST API which is stateless, so there are no sessions. However, even in the Java Client API when you use multi-request transactions, that imposes state for the duration of that transaction so the load balancer needs a way to route requests during that transaction to the correct node in the MarkLogic cluster. The stickiness cookie will be resent by the Java Client API with every request that uses a Transaction so the load balancer can maintain that stickiness for requests related to that transaction.
As always, do some testing of your configuration to make sure you got it right. Properly configuring apache plugins is an advanced skill. Since you are new to apache, your best hope of ensuring you got it right is checking with an HTTP monitoring tool like WireShark to look at the HTTP traffic from your application to MarkLogic Server to make sure things are going to the correct node in the cluster as expected.
Note that even with the client APIs (Java, Node.js) its not always obvious or explicit at the language API layer what might cause a session to be created. Explicitly creating multi statement transactions definately will, but other operations may do so as well. If you are using the same connection for UI (browser) and API (REST or XCC) then the browser app is likely to be doing things that create session state.
The safest, but least flexable configuration is "TCP Session Affinity". If they are supported they will eliminate most concerns related to load balancing. Cookie Session Affinity relies on guarenteeing that the load balencer uses the correct cookie. Not all code is equal. I have had cases where it the load balancer didn't always use the cookie provided. Changing the configuration to "Load Balancer provided Cookie Affinity" fixed that.
None of this is needed if all your communications are stateless at the TCP layer, the HTTP layer and the app layer. The later cannot be inferred by the server.
Another conern is if your app or middle tier is co-resident with other apps or the same app connecting to the same load balancer and port. That can be difficult to make sure there are no 'crossed wires' . When ML gets a request it associates its identity with the client IP and port. Even without load balencers, most modern HTTP and TCP client libraries implement socket caching. A great perfomrance win, but a hidden source of subtle random severe errors if the library or app are sharing "cookie jars" (not uncomnon). A TCP and Cookie Jar cache used by different application contexts can end up sending state information from one unrelated app in the same process to another. Mostly this is in middle tier app servers that may simply pass on requests from the first tier without domain knowledge, presuming that relying on the low level TCP libraries to "do the right thing" ... They are doing the right thing -- for the use case the library programmers had in mind -- don't assume that your case is the one the library authors assumed. The symptoms tend to be very rare but catastrophic problems with transaction failures and possibly data corruption
and security problems (at an application layer) because the server cannot tell the difference between 2 connections from the same middle tier.
Sometimes a better strategy is to load balance between the first tier and the middle tier, and directly connect from the middle tier to MarkLogic.
Especially if caching is done at the load balancer. Its more common for caching to be useful between the middle tier and the client then the middle tier and the server. This is also more analogous to the classic 3 tier architecture used with RDBMS's .. where load balancing is between the client and business logic tiers not between business logic and database.

Apache HTTP load balancing based on URL pattern

I have a Apache web server in front of 2 tomcats which are connected to the same MySQL backend database.
I need to load balance the incoming requests between two tomcats based on a URL parameter named "projectid". For example all even project ids may be served with tomcat 1 and odd requests with tomcat 2.
This is required because the user may start jobs in a project of tomcat 1 which tomcat 2 won't be aware of and these jobs are currently not stored in the database.
Is there a way to achieve this using mod-proxy-load-balancing?
I'm not aware of such a load algorithm being already present. However, keep in mind that the most common loadbalancing outcome (especially when you have server-side state as you obviously have) is a sticky session: You're only balancing the initial request. After that, all requests are typically directed to the same server.
I typically recommend against distributing the session data as it adds some commonly unnecessary performance hit onto each request, negating the improved performance that you can get with clustering. This is subject to be changed in actual installations though and just a first rule of thumb.
You might be able to create your own loadbalancing algorithm with mod-proxy-load-balancing (you'll need to configure the algorithm in the config file), but I believe your time is better spent fixing your implementation, or implement business specific logic to check all cluster machines for running jobs.