How to resolve this node affinity with Envoy - load-balancing

I provide a gRPC service that unfortunately has to have node affinity between BeginTransaction and Commit API Calls.
The Consumer API calls sequence is typically:
BeginTransaction() returns txnID
DoStuff(txnID, moreParams...)
DoStuff(txnID, moreParams...)
...
Commit(txnID)
Consumers can be multithreaded processes that make simultaneous calls to my API, so they might be using hundreds of Transactions at any point in time.
If I use Envoy proxy as my Service entry point, BeginTransaction should be routed to any healthy node in the cluster, but it must ensure that subsequent calls that use the returned txnID are routed to the same node.
Passing any context info in http headers, or in whatsoever part of the messages, is acceptable in my case.

I made some progress using Ring Hash balancer
In the envoy proxy server (look for "hash"):
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 80
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
codec_type: http2
stat_prefix: ingress_http #just for statistics
route_config:
name: local_route
virtual_hosts:
- name: samplefront_virtualhost
domains:
- "*"
routes:
- match:
prefix: "/mycompany.sample.v1"
grpc: {}
route:
cluster: sampleserver
hash_policy:
header:
header_name: "x-session-hash"
- match:
prefix: "/bbva.sample.admin"
grpc: {}
route:
cluster: sampleadmin
http_filters:
- name: envoy.router
config: {}
clusters:
- name: sampleserver
connect_timeout: 0.25s
type: strict_dns
lb_policy: ring_hash
http2_protocol_options: {}
hosts:
- socket_address:
address: sampleserver
port_value: 80 #Connect to the Sidecard Envoy
- name: sampleadmin
connect_timeout: 0.25s
type: strict_dns
lb_policy: round_robin
http2_protocol_options: {}
hosts:
- socket_address:
address: sampleadmin
port_value: 80 #Connect to the Sidecard Envoy
admin:
access_log_path: "/dev/null"
address:
socket_address:
address: 0.0.0.0
port_value: 8001
In my consumers, I create a random hash just before BeginTransaction() and I make sure it is sent in the x-session-hash header every single time until Commit(txnId)
It works but it has some limitations:
When I scale up the service, adding more nodes, some operations fail with error upstream connect error or disconnect/reset before headers. Failures are absolutely ok when one node is lost, but they are hardly acceptable when a node is added!!! Good news is that the load gets rebalanced in both cases.
The client must generate the hash before the first call (BeginTransaction) is made, so is the client who is inadvertently dictating which node will attend the requests for this transaction.
I will keep investigating.

Related

Inconsistent behaviour while achieving stickiness using Kong Ingress controller

I am using Kong ingress controller on EKS.
High level flow:
NLB → Kong ingress controller and proxy(running in the same pod) → k8s service → backend pods
I am trying to achieve stickiness using hash_on cookies configuration on upstream.
I am using session and hmac_auth plugin for generating session/cookie.
1st request from the client: First time when the client sends a message to the NLB, NLB sends the traffic to Kong ingress controller and from there it’s goes to one of the backend pods. This is the first time and so Kong will generate a cookie and send it back in the response to the client.
2nd request from the client: Now second time when client is sending the request it is including the cookie as well it got from the response of 1st request. Now when the request comes to Kong it forwards the request to some other pod, other than the pod it forwarded the request for the first time.
On 3rd, 4th…nth request Kong is forwarding the request to the same pod it forwarded to in the 2nd request.
How can we achieve stickiness for every request ?
My expectation was first time when Kong receives a request from a client it will generate a Cookie and it will put some detail specific to the pod it is sending traffic to and next time whenever the same client sends a request it will send the cookie with it, kong should use the cookie and forward the request to the same pod it forwarded the first time…but this is not happening…I am getting stickiness after 2nd to nth request but not for the 1st request.
`Ingress resource used for defining path:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
konghq.com/strip-path: "true"
name: kong-ingress-bk-srvs
namespace: default
spec:
ingressClassName: kong
rules:
- http:
paths:
- backend:
service:
name: httpserver-service-cip
port:
number: 8084
path: /api/v1/serverservice
pathType: Prefix
- backend:
service:
name: httpserver-service-cip-health
port:
number: 8084
path: /api/v1/healthservice
pathType: Prefix`
`upstream config:
apiVersion: configuration.konghq.com/v1
kind: KongIngress
metadata:
name: stickiness-upstream
upstream:
hash_on: cookie
hash_on_cookie: my-test-cookie
hash_on_cookie_path: /`
`session plugin:
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: session-plugin
config:
cookie_path: /
cookie_name: my-test-cookie
storage: cookie
cookie_secure: false
cookie_httponly: false
cookie_samesite: None
plugin: session`
`hmac plugin
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: hmac-plugin
config:
validate_request_body: true
enforce_headers:
- date
- request-line
- digest
algorithms:
- hmac-sha512
plugin: hmac-auth`
`consumer:
apiVersion: configuration.konghq.com/v1
kind: KongConsumer
metadata:
name: kong-consumer
annotations:
kubernetes.io/ingress.class: kong
username: consumer-user-3
custom_id: consumer-id-3
credentials:
- kong-cred
`
`Pod service config:(ingress backend service)
apiVersion: v1
kind: Service
metadata:
annotations:
konghq.com/override: stickiness-upstream
konghq.com/plugins: session-plugin,hmac-plugin
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"configuration.konghq.com":"stickiness-upstream"},"labels":{"app":"httpserver"},"name":"httpserver-service-cip","namespace":"default"},"spec":{"ports":[{"name":"comm-port","port":8085,"targetPort":8085},{"name":"dur-port","port":8084,"targetPort":8084}],"selector":{"app":"httpserver"},"sessionAffinity":"ClientIP","sessionAffinityConfig":{"clientIP":{"timeoutSeconds":10000}}}}
creationTimestamp: "2023-02-04T16:44:00Z"
labels:
app: httpserver
name: httpserver-service-cip
namespace: default
resourceVersion: "6729057"
uid: 481b7d8c-1f07-4293-809c-3b4b7dca41e0
spec:
clusterIP: 10.101.99.87
clusterIPs:
- 10.101.99.87
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: comm-port
port: 8085
protocol: TCP
targetPort: 8085
- name: dur-port
port: 8084
protocol: TCP
targetPort: 8084
selector:
app: httpserver
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10000
type: ClusterIP
status:
loadBalancer: {}`

Adding SASL Authentication in Kafka Banzai

I am looking to add SASL Plaintext authentication in Banzai Kafka. I have added following configs in my read only config section.
readOnlyConfig: |
auto.create.topics.enable=false
cruise.control.metrics.topic.auto.create=true
cruise.control.metrics.topic.num.partitions=1
cruise.control.metrics.topic.replication.factor=2
delete.topic.enable=true
offsets.topic.replication.factor=2
group.initial.rebalance.delay.ms=3000
sasl.mechanism.inter.broker.protocol=SCRAM-SHA-256
sasl.enabled.mechanisms=SCRAM-SHA-256
listener.name.external.scram-sha-256.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="user" password="testuser";
I have scripted following in listener config
listenersConfig:
externalListeners:
- type: "sasl_plaintext"
name: "external"
externalStartingPort: 51985
containerPort: 29094
accessMethod: LoadBalancer
internalListeners:
- type: "plaintext"
name: "internal"
containerPort: 29092
usedForInnerBrokerCommunication: true
- type: "plaintext"
name: "controller"
containerPort: 29093
usedForInnerBrokerCommunication: false
usedForControllerCommunication: true
When I try to connect producer or consumer - kafka returns Authentication Authorization failed error.
I am setting following properties:
session.timeout.ms=60000
partition.assignment.strategy=org.apache.kafka.clients.consumer.StickyAssignor
security.protocol=SASL_PLAINTEXT
sasl.mechanism=SCRAM-SHA-256
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="user" password="testuser";
Can any one suggest on this?

HTTPS redirect not working for default backend of nginx-ingress-controller

I'm having trouble getting an automatic redirect to occur from HTTP -> HTTPS for the default backend of the NGINX ingress controller for kubernetes where the controller is behind an AWS Classic ELB; is it possible?
According to the guide it seems like by default, HSTS is enabled
HTTP Strict Transport Security
HTTP Strict Transport Security (HSTS) is an opt-in security enhancement specified through the use of a special response header. Once a supported browser receives this header that browser will prevent any communications from being sent over HTTP to the specified domain and will instead send all communications over HTTPS.
HSTS is enabled by default.
And redirecting HTTP -> HTTPS is enabled
Server-side HTTPS enforcement through redirect
By default the controller redirects HTTP clients to the HTTPS port 443 using a 308 Permanent Redirect response if TLS is enabled for that Ingress.
However, when I deploy the controller as configured below and navigate to http://<ELB>.elb.amazonaws.com I am unable to get any response (curl reports Empty reply from server). What I would expect to happen instead is I should see a 308 redirect to https then a 404.
This question is similar: Redirection from http to https not working for custom backend service in Kubernetes Nginx Ingress Controller but they resolved it by deploying a custom backend and specifying on the ingress resource to use TLS. I am trying to avoid deploying a custom backend and just simply want to use the default so this solution is not applicable in my case.
I've shared my deployment files on gist and have copied them here as well:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: nginx-ingress-controller
namespace: ingress-nginx-sit
labels:
app.kubernetes.io/name: ingress-nginx-sit
app.kubernetes.io/part-of: ingress-nginx-sit
spec:
minReadySeconds: 2
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: '50%'
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx-sit
app.kubernetes.io/part-of: ingress-nginx-sit
template:
metadata:
labels:
app.kubernetes.io/name: ingress-nginx-sit
app.kubernetes.io/part-of: ingress-nginx-sit
annotations:
prometheus.io/port: '10254'
prometheus.io/scrape: 'true'
spec:
serviceAccountName: nginx-ingress-serviceaccount
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
args:
- /nginx-ingress-controller
- --configmap=$(POD_NAMESPACE)/nginx-configuration
- --annotations-prefix=nginx.ingress.kubernetes.io
- --publish-service=$(POD_NAMESPACE)/ingress-nginx
- --ingress-class=$(POD_NAMESPACE)
- --election-id=leader
- --watch-namespace=$(POD_NAMESPACE)
securityContext:
allowPrivilegeEscalation: true
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
# www-data -> 33
runAsUser: 33
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: http
containerPort: 80
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
---
kind: ConfigMap
apiVersion: v1
metadata:
name: nginx-configuration
namespace: ingress-nginx-sit
labels:
app.kubernetes.io/name: ingress-nginx-sit
app.kubernetes.io/part-of: ingress-nginx-sit
data:
hsts: "true"
ssl-redirect: "true"
use-proxy-protocol: "false"
use-forwarded-headers: "true"
enable-access-log-for-default-backend: "true"
enable-owasp-modsecurity-crs: "true"
proxy-real-ip-cidr: "10.0.0.0/24,10.0.1.0/24" # restrict this to the IP addresses of ELB
kind: Service
apiVersion: v1
metadata:
name: ingress-nginx
namespace: ingress-nginx-sit
labels:
app.kubernetes.io/name: ingress-nginx-sit
app.kubernetes.io/part-of: ingress-nginx-sit
annotations:
# replace with the correct value of the generated certificate in the AWS console
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:<region>:<account>:certificate/<id>"
# Specify the ssl policy to apply to the ELB
service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: "ELBSecurityPolicy-TLS-1-2-2017-01"
# the backend instances are HTTP
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
# Terminate ssl on https port
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "*"
# Ensure the ELB idle timeout is less than nginx keep-alive timeout. By default,
# NGINX keep-alive is set to 75s. If using WebSockets, the value will need to be
# increased to '3600' to avoid any potential issues.
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60"
# Security group used for the load balancer.
service.beta.kubernetes.io/aws-load-balancer-extra-security-groups: "sg-xxxxx"
spec:
type: LoadBalancer
selector:
app.kubernetes.io/name: ingress-nginx-sit
app.kubernetes.io/part-of: ingress-nginx-sit
loadBalancerSourceRanges:
# Restrict allowed source IP ranges
- "192.168.1.1/16"
ports:
- name: http
port: 80
targetPort: http
# The range of valid ports is 30000-32767
nodePort: 30080
- name: https
port: 443
targetPort: http
# The range of valid ports is 30000-32767
nodePort: 30443
I think I found the problem.
For some reason the default server has force_ssl_redirect set to false when determining if it should redirect the incoming request to HTTPS:
cat /etc/nginx/nginx.conf notice the rewrite_by_lua_block sends force_ssl_redirect = false
...
## start server _
server {
server_name _ ;
listen 80 default_server reuseport backlog=511;
set $proxy_upstream_name "-";
set $pass_access_scheme $scheme;
set $pass_server_port $server_port;
set $best_http_host $http_host;
set $pass_port $pass_server_port;
listen 443 default_server reuseport backlog=511 ssl http2;
# PEM sha: 601213c2dd57a30b689e1ccdfaa291bf9cc264c3
ssl_certificate /etc/ingress-controller/ssl/default-fake-certificate.pem;
ssl_certificate_key /etc/ingress-controller/ssl/default-fake-certificate.pem;
ssl_certificate_by_lua_block {
certificate.call()
}
location / {
set $namespace "";
set $ingress_name "";
set $service_name "";
set $service_port "0";
set $location_path "/";
rewrite_by_lua_block {
lua_ingress.rewrite({
force_ssl_redirect = false,
use_port_in_redirects = false,
})
balancer.rewrite()
plugins.run()
}
...
Then, the LUA code requires force_ssl_redirect and redirect_to_https()
cat /etc/nginx/lua/lua_ingress.lua
...
if location_config.force_ssl_redirect and redirect_to_https() then
local uri = string_format("https://%s%s", redirect_host(), ngx.var.request_uri)
if location_config.use_port_in_redirects then
uri = string_format("https://%s:%s%s", redirect_host(), config.listen_ports.https, ngx.var.request_uri)
end
ngx_redirect(uri, config.http_redirect_code)
end
...
From what I can tell the force_ssl_redirect setting is only controlled at the Ingress resource level through the annotation nginx.ingress.kubernetes.io/force-ssl-redirect: "true". Because I don't have an ingress rule setup (this is meant to be the default server for requests that don't match any ingress), I have no way of changing this setting.
So what I determined I have to do is define my own custom server snippet on a different port that has force_ssl_redirect set to true and then point the Service Load Balancer to that custom server instead of the default. Specifically:
Added to the ConfigMap:
...
http-snippet: |
server {
server_name _ ;
listen 8080 default_server reuseport backlog=511;
set $proxy_upstream_name "-";
set $pass_access_scheme $scheme;
set $pass_server_port $server_port;
set $best_http_host $http_host;
set $pass_port $pass_server_port;
server_tokens off;
location / {
rewrite_by_lua_block {
lua_ingress.rewrite({
force_ssl_redirect = true,
use_port_in_redirects = false,
})
balancer.rewrite()
plugins.run()
}
}
location /healthz {
access_log off;
return 200;
}
}
server-snippet: |
more_set_headers "Strict-Transport-Security: max-age=31536000; includeSubDomains; preload";
Note I also added the server-snippet to enable HSTS correctly. I think because the traffic from the ELB to NGINX is HTTP not HTTPS, the HSTS headers were not being correctly added by default.
Added to the DaemonSet:
...
ports:
- name: http
containerPort: 80
- name: http-redirect
containerPort: 8080
...
Modified the Service:
...
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
...
ports:
- name: http
port: 80
targetPort: http-redirect
# The range of valid ports is 30000-32767
nodePort: 30080
- name: https
port: 443
targetPort: http
# The range of valid ports is 30000-32767
nodePort: 30443
...
And now things seem to be working. I've updated the Gist so it includes the full configuration that I am using.

How to test RDP port is up using Prometheus Blackbox

I have been struggling to implement an RDP probe to check multiple ports in Windows machines using Prometheus Blackbox.
So far I manage to check DNS, ping, ports 80,8080 but I cannot manage to test 3389!
As a rule of thumb I would like to be able to ping/probe any ports that have services running on this hosts
My blackbox.yml is:
modules:
http_2xx:
prober: http
http:
http_get_2xx:
prober: http
http:
method: GET
http_post_2xx:
prober: http
timeout: 5s
http:
method: POST
headers:
Content-Type: application/json
body: '{}'
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
dns_test:
prober: dns
timeout: 5s
dns:
query_name: google.com
preferred_ip_protocol: ip4
And my prometheus.yml 3389 port probe entry is:
- job_name: "rdp-dev-status"
metrics_path: /probe
params:
module: [dns_test]
static_configs:
- targets:
- nostradata-dvmh-prodweb-01
# file_sd_configs:
# - files:
# - /opt/prometheus/tools/targets/rdp-dev-targets.yml
relabel_configs:
# Ensure port is 22, pass as URL parameter
- source_labels: [__address__]
regex: (.*)(:.*)?
replacement: ${1}:3389
target_label: __param_target
# Make instance label the target
- source_labels: [__param_target]
target_label: instance
# Actually talk to the blackbox exporter though
- target_label: __address__
replacement: PROD-NIFI:9115
module: [dns_test]
Using a DNS probe is probably not going to work with RDP. Try the tcp_connect module.

Cannot add SSL certificate on ELB using Ansible

I'm trying to create a elastic load balancer and use existing SSL certificate to secure it as follows -
---
- name: Setting up Elastic Load Balancer
hosts: local
connection: local
gather_facts: False
vars_files:
- vars/global_vars.yml
tasks:
- local_action:
name: "TestLoadbalancer"
module: ec2_elb_lb
state: present
region: 'us-east-1'
zones:
- us-east-1c
listeners:
- protocol: http
load_balancer_port: 80
instance_port: 80
listeners:
- protocol: ssl
load_balancer_port: 443
instance_protocol: tcp
instance_port: 7286
ssl_certificate_id: "arn:aws:iam::xxxxxx:server-certificate/LB_cert"
- local_action:
name: "TestLoadbalancer"
module: ec2_elb_lb
state: present
region: 'us-east-1'
zones:
- us-east-1c
listeners:
- protocol: http
load_balancer_port: 80
instance_port: 80
health_check:
ping_protocol: http
ping_port: 80
ping_path: "/"
response_timeout: 5
interval: 30
unhealthy_threshold: 2
healthy_threshold: 10
But it is not adding the listener: SSL - TCP
Another listener is added and is visible in the console: HTTP/80
Why the SSL one is missing? Am I missing any required parameters?
You are adding multiple keys with the name "listeners":
listeners:
- protocol: http
...
listeners:
- protocol: ssl
...
But glancing at the example in the documentation it should be:
listeners:
- protocol: http
...
- protocol: ssl
...