Hadoop Security using SSL for data In transit - Use of hadoop.ssl.hostname.verifier - ssl

I am trying to understand the use of hadoop.ssl.hostname.verifier. As per https://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/core-default.xml there are these possible values. The hostname verifier to provide for HttpsURLConnections. Valid values are: DEFAULT, STRICT, STRICT_I6, DEFAULT_AND_LOCALHOST and ALLOW_ALL
The associated codebase at https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ssl/SSLHostnameVerifier.java has also described a bit more about the values, but I am trying to see where should I this in the log message or associated impact. The https endpoints for namenode or any other services do not seem to show any difference in responses. Appreciate if someone can tell me how to test the impact of different values.

Related

What information is logged by IdentityModel when ShowPii is set to true?

IdentityModelEventSource has a property called ShowPII that means that Personally Identifiable Information will be added to the logs (in relation to security). This value is used to decide when to log some OAuth2 sensitive data.
I am trying to understand what kind of Personally Identifiable Information will be logged:
Client ID? (aka Client Key, Consumer Key)
Client Secret? (aka Consumer Secret)
Json Web Tokens? (aka JWT)
Access Tokens?
Refresh Tokens?
Kerberos Tickets?
PKCE Values?
Authorization Codes?
I know it cannot get access to usernames and passwords because they are only exchanged directly with the IDP.
But but I need to know if I need to find a way to lock down my log files because it will have data that constitutes a security vulnerability.
This is possible log messages of IdentityModel: LogMessages.cs
About
I am trying to understand what kind of Personally Identifiable Information will be logged
I won't copy-paste log messages from there (especially, as they can change at any moment). You can check them yourself and decide what should be considered as the PII.
But here's an interesting example:
"IDX10615: Encryption failed. No support for: Algorithm: '{0}', SecurityKey: '{1}'."
and this is how it's used:
throw LogHelper.LogExceptionMessage(new SecurityTokenEncryptionFailedException(LogHelper.FormatInvariant(TokenLogMessages.IDX10615, encryptingCredentials.Enc, encryptingCredentials.Key)));
If you'll follow the track you'll find out that encryptingCredentials.Key will be logged if ShowPII = true and won't be logged if ShowPII = false.
Of course, depending on your use case, this particular message may never appear in your logs. And not all messages so outrageously leaky. But you never know:
your use case may change
you may be mistaken about the set of messages IdentityModel can emit for your use case
IdentityModel code may change, and you may forget to check if messages' set is still secure
So about
if I need to find a way to lock down my log files
Yes, you definitely need to.
Or better yet - don't use ShowPII = true in production for monitoring, use it only in development environment for debugging purposes.
Looking at the source, it appears that when ShowPII is on - it will do two things:
Replace all parameters passed to library-specific exceptions with their data type names
For all system exceptions - replace inner message with exception type name
In this context "library-specific" is an exception that is of type Exception and its full type name starts with "Microsoft.IdentityModel." (library defines a few)
Depending on your use case you'd see a variety of parameters that can be logged with custom exceptions. A quick search for FormatInvariant yields quite a few for your consideration.
Again, depending on how you use it, you might get a better idea of what the error messages are by looking through relevant LogMessages.cs file on your specific namespace.
P.S.: on a side note, it appears that default ShowPII setting is GDPR-compliant

How To: Save JMeterVariable values to influxdb with the sampleresults

I'd like to store some JMeterVariables together with the sampleResults to an influxdb using a BackendListenerClient for influxdb (I am using package rocks.nt.apm.jmeter to get the raw results).
My current test logs in for a random customer requests some random entities and logs out. Most of the results are within a range, I'd like to zoom in to certain extreme sample results, find out for which customer / requested entity these results are. We have seen in the past we can find performance issues with specific configurations this way.
I store customer and entity ID in a variable. My issue is that the JMeterVariables are not accessible from the BackendListenerClient. I looked at the sample_variables property, but this property will store the variables in the sampleEvent, which is not accessible in the BackendListener.
I could use the threadName, or sample label to store the vars, but I saw the CSVwriter can actually write the var values from the event, which is a much nicer solution.
Looking forward on your thoughts,
Best regards, Spud
You get it right - the Backend Listener is not customizable in terms of fine-shaping the data you're sending to Influx.
Alas.
However, there's a Swiss Army Knife always available in JMeter: the JSR223 components.
The JSR223 listener, in your case.
The InfluxDB line protocol is simple as simple could be, the HTTP/Rest libraries are
in abundance (Apache HTTP must have been already included with standard JMeter, to my recollection, no additional jars needed) - just pick it all up, form your timeseries as you like, toss it towards your InfluxDB REST endpoint, job's done.

Why the ExpressionLanguageScope in DBCPConnectionPool service is limited to only 'VARIABLE_REGISTRY' and not ' FLOWFILE_ATTRIBUTES'?

The DBCPConnectionPool Service requires 5 connection parameters to establish connection to a database as shown in the picture below [Marked Yellow]
I used UpdateAttribute Processor to manually add these 5 connection parameters and gave them their respective values as shown in the picture below [Marked Yellow]
Now, when I was trying to read the values for the connection parameters in DBCPConnectionPool Service through these attributes (Shown in picture below) , I was unable to read them.
To know the reason why the DBCPConnectionPool Service was unable to read the Flowfile attributes, I went ahead to check the source code for both DBCPConnectionPool Service and UpdateAttribute Processor.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-dbcp-service-bundle/nifi-dbcp-service/src/main/java/org/apache/nifi/dbcp/DBCPConnectionPool.java
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-update-attribute-bundle/nifi-update-attribute-processor/src/main/java/org/apache/nifi/processors/attributes/UpdateAttribute.java
Souce code for DBCPConnectionPool Service :
Souce code for UpdateAttribute Processor :
Thus, I came to know the reason why it was unable to read the values from FlowFile attributes. This is because the ExpressionLanguageScope is limited to VARIABLE_REGISTRY and not FLOWFILE_ATTRIBUTES.
Now, My Question is that why the ExpressionLanguageScope for DBCPConnectionPool Service is limited to VARIABLE_REGISTRY. What is the reason for this limitation? The reason why I am asking this question is because I want to read the values for the connection parameters through FlowFile attributes.
For the same question that has been asked in the NiFi dev mailing list, Andy had answered it in the best way possible. The reason why DBCPConnectionPool services or any controller services for that matter, uses ExpressionLanguageScope.VARIABLE_REGISTRY is that, the controller services have no access to the flowfiles so it won't read the flowfiles' attributes. And for the question, why it only supports VARIABLE_REGISTRY is:
Just because it doesn't read flowfile attributes doesn't mean that it should not use attributes from elsewhere.
One of the major reasons why VARIABLE_REGISTRY was introduced was to avoid exposing the sensitive values which is the case when we pass around such values as flowfile attributes. The controller services fit this case, because many of them use sensitive properties like Password.
And if you're assuming that you can make it work just changing the the scope for those properties to ExpressionLanguageScope.FLOWFILE_ATTRIBUTES, you're wrong. Changing them makes no sense and doesn't work, the reason is again the controller services never get to access the flowfiles.
If there is a specific requirement for you wherein you need to use different property values for different flowfiles, Andy in the original dev thread had shared some links which I'm posting again:
https://stackoverflow.com/a/49412970/70465
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Using_Custom_Properties

Define custom load balancing algorithm

Here is the situation:
I have a number of web servers, say 10. I need to use a (software) load balancer which can be implemented using a reverse proxy server, like HAProxy or Varnish. Now, All the traffic that we serve is over https and not http, so Varnish is out of the question.
Now, I want to divide the users' request into a few categories, which depend on one of the input (POST) parameters of the request. Depending on that parameter, I need to divide the request among the servers, as based on that, (even if all other input (POST) parameters are same) different servers would serve differently.
So, I need to define a custom load balancing algorithm, such that, for a particular value of that parameter, I divide the load to specific 3 (say), for some other value, divide the request to specific 2 and for other value(s), to remaining 5.
Since I cannot use varnish, as it cannot be use to terminate ssl (defining custom algorithm would have been easy in VCL), I am thinking of using HA-Proxy.
So, here is the question:
Can anyone help me with how to define a custom load balancing function using HA-Proxy?
I've researched a lot and I could not find any such document with which we can. So, if it is not possible with HA-Proxy, can you refer me to some other reverse-proxy service, that can be used as a load balancer too, such that it meets both the above criteria? (ssl termination and ability to define a custom load balancing).
EDIT:
This question is in succession with one of my previous questions. Varnish to be used for https
I'm not sure what your goal is, but I'd suggest NOT doing custom routing based on the HTTP request body at all. This will perform very poorly, and likely outweigh any benefit you are trying to achieve.
Anything that has to parse values beyond typical HTTP headers at your load balancer will slow things down. Cookies by themselves are generally a bad idea if you can avoid it.
If you can control the path/route values that is likely a much better idea than to parse every POST for certain values.
You can probably achieve what you want via NGINX with lua scripts (the Kong platform is based on them), but I can't say how hard that would be for you...
https://github.com/openresty/lua-nginx-module#readme
Here's an article with a specific example of setting different upstreams based on lua input.
http://sosedoff.com/2012/06/11/dynamic-nginx-upstreams-with-lua-and-redis.html
server {
...BASE CONFIG HERE...
port_in_redirect off;
location /somepath {
lua_need_request_body on;
set $upstream "default.server.hostname";
rewrite_by_lua '
ngx.req.read_body() -- explicitly read the req body
local data = ngx.req.get_body_data()
if data then
-- use data: see
-- https://github.com/openresty/lua-nginx-module#ngxreqget_body_data
ngx.var.upstream = some_deterministic_value
end
'
...OTHER PARAMS...
proxy_pass http://$upstream
}
}

Gemfire region with data expiration

Regarding this document, "entry-time-to-live-expiration" means How long the region's entries can remain in the cache without being accessed or updated. The default is no expiration of this type. However, when I use Spring Cache and client-region with following configuration, I find that setting dose not work well with being accessed. Going forward, regarding this document-> XMLTTL tab, it said "Configures a replica region to invalidate entries that have not been modified for 15 seconds.". So I am confused if TTL work for being accessed.
<gfe:client-region id="Customer2" name="Customer2" destroy="false" load-factor="0.5" statistics="true" cache-ref="client-cache">
<gfe:entry-ttl action="DESTROY" timeout="60"/>
<gfe:eviction threshold="5"/>
</gfe:client-region>
So, the documentation you might want refer to is here and here. Perhaps relevant to your situation is...
"Requests for entries that have expired on the consumers will be forwarded to the producer."
Based on your configuration, given you did not set either a ClientRegionShortcut or DataPolicy, your Client Region, "Customer2", defaults to ClientRegionShortcut.LOCAL, which sets a DataPolicy of "NORMAL". DataPolicy.NORMAL states...
"Allows the contents in this cache to differ from other caches. Data that this region is interested in is stored in local memory."
And for the shortcut of "LOCAL"...
"A LOCAL region only has local state and never sends operations to a server. ..."
However, it does not mean the client Region cannot receive data (of interests) from the Server. It simply implies operations are not distributed to the Server. It may be expiring the entry and then repopulating it from the Server (producer).
Of course, I am speculating and have not tested these ideas. You might try setting the Expiration Action to "LOCAL_DESTROY" and/or changing your distribution properties through different ClientRegionShortcuts.
Post back if you are still having problems. I too echo what #hubbardr is asking.
Cheers!