NIFI CLUSTER AND ZOOKEEPER CLUSTER - ssl

I want to configure a NIFI Cluster with external TLS zookeeper cluster (deployed in a kubernetes cluster). All is ok (quorum, zookeeper tls...) but when I set the zookeeper connection string to … myzk:3181,myzk2:3181… and Nifi tries connect to zookeeper cluster, I get this message :
io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0000002d0000
I think that is because Nifi is talking with zookeeper HTTP and the 3181 is HTTPS
Thanks in advance, regards
NIFI Version : 1.12.1
Zookeeper 3.7.0 (QUORUM IS OK)
#nifi.properties
# Site to Site properties
nifi.remote.input.host=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
# web properties #
nifi.web.war.directory=./lib
nifi.web.proxy.host=my_proxy.com
nifi.web.http.port=
nifi.web.https.port=9443
nifi.web.http.host=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.web.http.network.interface.default=eth0
nifi.web.https.host=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
# nifi.web.proxy.context.path=
# security properties #
nifi.sensitive.props.key=
nifi.sensitive.props.key.protected=
nifi.sensitive.props.algorithm=PBEWITHMD5AND256BITAES-CBC-OPENSSL
nifi.sensitive.props.provider=BC
nifi.sensitive.props.additional.keys=
nifi.security.keystore=/opt/nifi/nifi-current/config-data/certs/keystore.jks
nifi.security.keystoreType=jks
nifi.security.keystorePasswd=tym6nSAHI7xwnqUdwi4OGn2RpXtq9zLpqurol1lLqVg
nifi.security.keyPasswd=tym6nSAHI7xwnqUdwi4OGn2RpXtq9zLpqurol1lLqVg
nifi.security.truststore=/opt/nifi/nifi-current/config-data/certs/truststore.jks
nifi.security.truststoreType=jks
nifi.security.truststorePasswd=wRbjBPa62GLnlWaGMIMg6Ak6n+AyCeUKEquGSwyJt24
nifi.security.needClientAuth=true
nifi.security.user.authorizer=managed-authorizer
nifi.security.user.login.identity.provider=
nifi.security.ocsp.responder.url=
nifi.security.ocsp.responder.certificate=
# OpenId Connect SSO Properties #
nifi.security.user.oidc.discovery.url=https://my_url_oidc
nifi.security.user.oidc.connect.timeout=5 secs
nifi.security.user.oidc.read.timeout=5 secs
nifi.security.user.oidc.client.id=lkasdnlnsda
nifi.security.user.oidc.client.secret=fdjksalfnslknasfiDHn
nifi.security.user.oidc.preferred.jwsalgorithm=
nifi.security.user.oidc.claim.identifying.user=email
nifi.security.user.oidc.additional.scopes=
# Apache Knox SSO Properties #
nifi.security.user.knox.url=
nifi.security.user.knox.publicKey=
nifi.security.user.knox.cookieName=hadoop-jwt
nifi.security.user.knox.audiences=
# Identity Mapping Properties #
# These properties allow normalizing user identities such that identities coming from different identity providers
# (certificates, LDAP, Kerberos) can be treated the same internally in NiFi. The following example demonstrates normalizing
# DNs from certificates and principals from Kerberos into a common identity string:
#
# nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$
# nifi.security.identity.mapping.value.dn=$1#$2
# nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance#(.*?)$
# nifi.security.identity.mapping.value.kerb=$1#$2
# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=true
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates=
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=nifi-zk:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
nifi.zookeeper.client.secure=true
## BY DEFAULT, NIFI CLIENT WILL USE nifi.security.* if you require separate keystore and truststore uncomment below section
nifi.zookeeper.security.keystore=/opt/nifi/nifi-current/config-data/certs/zk/keystore.jks
nifi.zookeeper.security.keystoreType=JKS
nifi.zookeeper.security.keystorePasswd=123456
nifi.zookeeper.security.truststore=/opt/nifi/nifi-current/config-data/certs/zk/truststore.jks
nifi.zookeeper.security.truststoreType=JKS
nifi.zookeeper.security.truststorePasswd=123456
# Zookeeper properties for the authentication scheme used when creating acls on znodes used for cluster management
# Values supported for nifi.zookeeper.auth.type are "default", which will apply world/anyone rights on znodes
# and "sasl" which will give rights to the sasl/kerberos identity used to authenticate the nifi node
# The identity is determined using the value in nifi.kerberos.service.principal and the removeHostFromPrincipal
# and removeRealmFromPrincipal values (which should align with the kerberos.removeHostFromPrincipal and kerberos.removeRealmFromPrincipal
# values configured on the zookeeper server).
nifi.zookeeper.auth.type=
nifi.zookeeper.kerberos.removeHostFromPrincipal=
nifi.zookeeper.kerberos.removeRealmFromPrincipal=
# kerberos #
nifi.kerberos.krb5.file=
# kerberos service principal #
nifi.kerberos.service.principal=
nifi.kerberos.service.keytab.location=
# kerberos spnego principal #
nifi.kerberos.spnego.principal=
nifi.kerberos.spnego.keytab.location=
nifi.kerberos.spnego.authentication.expiration=12 hours
# external properties files for variable registry
# supports a comma delimited list of file locations
nifi.variable.registry.properties=

You usually see this when you have a HTTP vs HTTPS mismatch
ideally you would be calling your service over the HTTP
spring:
cloud:
gateway:
discovery:
locator:
url-expression: "'lb:http://'+serviceId"
For reference using client port 2181
apiVersion: v1
kind: Service
metadata:
name: zk-hs
labels:
app: zk
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk
---
apiVersion: v1
kind: Service
metadata:
name: zk-cs
labels:
app: zk
spec:
ports:
- port: 2181
name: client
selector:
app: zk
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
selector:
matchLabels:
app: zk
maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zk
spec:
selector:
matchLabels:
app: zk
serviceName: zk-hs
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: OrderedReady
template:
metadata:
labels:
app: zk
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
containers:
- name: kubernetes-zookeeper
imagePullPolicy: Always
image: "k8s.gcr.io/kubernetes-zookeeper:1.0-3.4.10"
resources:
requests:
memory: "1Gi"
cpu: "0.5"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
command:
- sh
- -c
- "start-zookeeper \
--servers=3 \
--data_dir=/var/lib/zookeeper/data \
--data_log_dir=/var/lib/zookeeper/data/log \
--conf_dir=/opt/zookeeper/conf \
--client_port=2181 \
--election_port=3888 \
--server_port=2888 \
--tick_time=2000 \
--init_limit=10 \
--sync_limit=5 \
--heap=512M \
--max_client_cnxns=60 \
--snap_retain_count=3 \
--purge_interval=12 \
--max_session_timeout=40000 \
--min_session_timeout=4000 \
--log_level=INFO"
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi

NiFi did not support TLS with ZooKeeper until release 1.13.0. If you're using NiFi 1.12.1 it will not support configuring TLS for ZooKeeper.

Related

Strimzi Kafka Zookeeper not starting

i'm trying to deploy kafka using strimzi, but zookeeper keep throwing following exception
Failed to verify hostname: 10.244.0.14 (org.apache.zookeeper.common.ZKTrustManager) [ListenerHandler-my-cluster-zookeeper-0.my-cluster-zookeeper-nodes.kafka.svc/10.244.1.20:3888]
javax.net.ssl.SSLPeerUnverifiedException:
Certificate for <10.244.0.14> doesn't match any of the subject alternative names: [*.my-
cluster-zookeeper-client.kafka.svc,
my-cluster-zookeeper-client, my-cluster-zookeeper-1.my-cluster-zookeeper-nodes.kafka.svc.cluster.local,
my-cluster-zookeeper-1.my-cluster-zookeeper-nodes.kafka.svc, my-cluster-zookeeper-client.kafka, my-cluster-zookeeper-client.kafka.svc,
*.my-cluster-zookeeper-nodes.kafka.svc,
*.my-cluster-zookeeper-nodes.kafka.svc.cluster.local, *.my-cluster-zookeeper-client.kafka.svc.cluster.local, my-cluster-zookeeper-client.kafka.svc.cluster.local]
below is the deployment file i'm using
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 3.1.0
replicas: 2
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: external
port: 9094
type: loadbalancer
tls: false
config:
offsets.topic.replication.factor: 2
transaction.state.log.replication.factor: 2
transaction.state.log.min.isr: 2
default.replication.factor: 2
min.insync.replicas: 2
inter.broker.protocol.version: "3.1"
storage:
type: ephemeral
zookeeper:
replicas: 2
storage:
type: ephemeral
this is how i created strimzi cluster operator
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka

Registered Targets Disappear

I have a working EKS cluster. It is using a ALB for ingress.
When I apply a service and then an ingress most of these work as expected. However some target groups eventually have no registered targets. If I get the service IP address kubectl describe svc my-service-name and manually register the EndPoints in the target group the pods are reachable again but that's not a sustainable process.
Any ideas on what might be happening? Why doesn't EKS find the target groups as pods cycle?
Each service (secrets, deployment, service and ingress consists of a set of .yaml files applied like:
deploy.sh
#!/bin/bash
set -e
kubectl apply -f ./secretsMap.yaml
kubectl apply -f ./configMap.yaml
kubectl apply -f ./deployment.yaml
kubectl apply -f ./service.yaml
kubectl apply -f ./ingress.yaml
service.yaml
apiVersion: v1
kind: Service
metadata:
name: "site-bob"
namespace: "next-sites"
spec:
ports:
- port: 80
targetPort: 3000
protocol: TCP
type: NodePort
selector:
app: "site-bob"
ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: "site-bob"
namespace: "next-sites"
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/tags: Environment=Production,Group=api
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/ip-address-type: ipv4
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
alb.ingress.kubernetes.io/load-balancer-name: eks-ingress-1
alb.ingress.kubernetes.io/group.name: eks-ingress-1
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:402995436123:certificate/9db9dce3-055d-4655-842e-xxxxx
alb.ingress.kubernetes.io/healthcheck-port: traffic-port
alb.ingress.kubernetes.io/healthcheck-path: /
alb.ingress.kubernetes.io/healthcheck-interval-seconds: '30'
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '16'
alb.ingress.kubernetes.io/success-codes: 200,201
alb.ingress.kubernetes.io/healthy-threshold-count: '2'
alb.ingress.kubernetes.io/unhealthy-threshold-count: '2'
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=60
alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
alb.ingress.kubernetes.io/actions.ssl-redirect: >
{
"type": "redirect",
"redirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}
}
alb.ingress.kubernetes.io/actions.svc-host: >
{
"type":"forward",
"forwardConfig":{
"targetGroups":[
{
"serviceName":"site-bob",
"servicePort": 80,"weight":20}
],
"targetGroupStickinessConfig":{"enabled":true,"durationSeconds":200}
}
}
labels:
app: site-bob
spec:
rules:
- host: "staging-bob.imgeinc.net"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ssl-redirect
port:
name: use-annotation
- backend:
service:
name: svc-host
port:
name: use-annotation
pathType: ImplementationSpecific
Something in my configuration added tagged two security groups as being owned by the cluster. When I checked the load balancer controller logs:
kubectl logs -n kube-system aws-load-balancer-controller-677c7998bb-l7mwb
I saw many lines like:
{"level":"error","ts":1641996465.6707578,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-nextsite-sitefest-89a6f0ff0a","namespace":"next-sites","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/imageinc-next-eks-4KN4v6EX for eni eni-0c5555fb9a87e93ad, got: [sg-04b2754f1c85ac8b9 sg-07b026b037dd4d6a4]"}
sg-07b026b037dd4d6a4 has description: EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloads.
sg-04b2754f1c85ac8b9 has description: Security group for all nodes in the cluster.
I removed the tag:
{
Key: 'kubernetes.io/cluster/_cluster name_',
value:'owned'
}
from sg-04b2754f1c85ac8b9
and the TargetGroups started to fill in and everything is now working. Both groups were created and tagged by Terraform. I suspect my worker group configuration is off.
facing the same issue when creating the cluster with terraform. Solved updating aws load balancer controller from 2.3 to 2.4.4

Assign roles to EKS cluster in manifest file?

I'm new to Kubernetes, and am playing with eksctl to create an EKS cluster in AWS. Here's my simple manifest file
kind: ClusterConfig
apiVersion: eksctl.io/v1alpha5
metadata:
name: sandbox
region: us-east-1
version: "1.18"
managedNodeGroups:
- name: ng-sandbox
instanceType: r5a.xlarge
privateNetworking: true
desiredCapacity: 2
minSize: 1
maxSize: 4
ssh:
allow: true
publicKeyName: my-ssh-key
fargateProfiles:
- name: fp-default
selectors:
# All workloads in the "default" Kubernetes namespace will be
# scheduled onto Fargate:
- namespace: default
# All workloads in the "kube-system" Kubernetes namespace will be
# scheduled onto Fargate:
- namespace: kube-system
- name: fp-sandbox
selectors:
# All workloads in the "sandbox" Kubernetes namespace matching the
# following label selectors will be scheduled onto Fargate:
- namespace: sandbox
labels:
env: sandbox
checks: passed
I created 2 roles, EKSClusterRole for cluster management, and EKSWorkerRole for the worker nodes? Where do I use them in the file? I'm looking at eksctl Config file schema page and it's not clear to me where in manifest file to use them.
As you mentioned, it's in the managedNodeGroups docs
managedNodeGroups:
- ...
iam:
instanceRoleARN: my-role-arn
# or
# instanceRoleName: my-role-name
You should also read about
Creating a cluster with Fargate support using a config file
AWS Fargate

Asp.net Core get "Back-off restarting failed container" on AKS

I am trying to deploy my very first and simple ASP.net Core Web Api on the AKS (ref to this article)
Here is my yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: aexp
labels:
app: aexp
spec:
replicas: 1
selector:
matchLabels:
service: aexp
template:
metadata:
labels:
app: aexp
service: aexp
spec:
containers:
- name: aexp
image: f2021.azurecr.io/aexp:v1
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
env:
- name: ASPNETCORE_URLS
value: http://+:80
---
apiVersion: v1
kind: Service
metadata:
name: aexp
labels:
app: aexp
service: aexp
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
service: aexp
It looks simple and straightforward, but I couldn't figure out why my pod gets Back-off restarting failed container. Any advice a clue to prevent the error? thanks in advance.
Name: aexp-5b5b7b6464-5lfz4
Namespace: default
Priority: 0
Node: aks-nodepool1-38572550-vmss000000/10.240.0.4
Start Time: Wed, 20 Jan 2021 10:01:52 +0700
Labels: app=aexp
pod-template-hash=5b5b7b6464
service=aexp
Annotations: <none>
Status: Running
IP: 10.244.0.14
IPs:
IP: 10.244.0.14
Controlled By: ReplicaSet/aexp-5b5b7b6464
Containers:
aexp:
Container ID: docker://25ffdb3ce92eeda465e1971daa363d6f532ac73ff82df2e9b3694a8949f50615
Image: f2021.azurecr.io/aexp:v1
Image ID: docker-pullable://f2021.azurecr.io/aexp#sha256:bf6aa2a47f5f857878280f5987192f1892e91e365b9e66df83538109b9e57c46
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 20 Jan 2021 10:33:47 +0700
Finished: Wed, 20 Jan 2021 10:33:47 +0700
Ready: False
Restart Count: 11
Environment:
ASPNETCORE_URLS: http://+:80
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-g4ks9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-g4ks9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-g4ks9
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 36m default-scheduler Successfully assigned default/aexp-5b5b7b6464-5lfz4 to aks-nodepool1-38572550-vmss000000
Normal Pulled 35m (x4 over 36m) kubelet Successfully pulled image "f2021.azurecr.io/aexp:v1"
Normal Created 35m (x4 over 36m) kubelet Created container aexp
Normal Started 35m (x4 over 36m) kubelet Started container aexp
Normal Pulling 34m (x5 over 36m) kubelet Pulling image "f2021.azurecr.io/aexp:v1"
Warning BackOff 62s (x166 over 36m) kubelet Back-off restarting failed container
And here is my az snippet to create AKS cluster
az aks create \
--location $REGION \
--resource-group $AKS_RG \
--name $AKS_NAME \
--ssh-key-value ./.ssh/id_rsa.pub \
--service-principal "xxxxxxxx-b8d1-4206-8a8a-xxxxx66c086c" \
--client-secret "xxxx.xxxxeNzq25iJeuRjWTh~xxxxxUGxu" \
--network-plugin kubenet \
--load-balancer-sku basic \
--outbound-type loadBalancer \
--node-vm-size Standard_B2s \
--node-count 1 \
--tags 'ENV=DEV' 'SRV=EXAMPLE' \
--generate-ssh-keys
Update 1:
I try with VS2019, start Debug using “Bridge to Kubernetes”, then it works, the same docker image, same deployment and same service.
Update 2: add docker file
#See https://aka.ms/containerfastmode to understand how Visual Studio uses this Dockerfile to build your images for faster debugging.
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-buster-slim AS base
WORKDIR /app
EXPOSE 80
EXPOSE 443
FROM mcr.microsoft.com/dotnet/core/sdk:3.1-buster AS build
WORKDIR /src
COPY ["Aexp/Aexp.csproj", "Aexp/"]
RUN dotnet restore "Aexp/Aexp.csproj"
COPY . .
WORKDIR "/src/Aexp"
RUN dotnet build "Aexp.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "Aexp.csproj" -c Release -o /app/publish
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "Aexp.dll"]
Update 3 [Jan 27] : I figured out the issue doesn't relate to my code or my yaml, at all. I have 02 azure subscriptions, one got the issue, one is working just fine with the same code, same deployment.yaml and configuration.
There can be several reasons for the pod to be crashing. The best way forward is to check the logs of your pod to see if crash comes from your application.
kubectl logs aexp-5b5b7b6464-5lfz4 --previous
Where --previous makes sure you can access the logs from the crashed pod.
If the log is empty you want to check your Dockerfile. It seems that the container does not have any long running process because it completed with a 'success' exit code:
Last State: Terminated
Reason: Completed
Exit Code: 0

Istio with SDS and Mutual TLS: upstream connect error or disconnect/reset before headers. reset reason: connection failure

I am trying to set up a cluster with Istio on it, where the SSL traffic gets terminated at the Ingress. I have deployed Istio with SDS and Mutual TLS. With the below yaml, I only get the error message upstream connect error or disconnect/reset before headers. reset reason: connection failure when accessing my cluster in the browser:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: default-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- hosts:
- '*'
port:
name: http
number: 80
protocol: HTTP
---
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: nginx1
name: nginx1
spec:
containers:
- image: nginx
name: nginx
resources: {}
ports:
- containerPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Never
status: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
run: nginx1
name: nginx1
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
run: nginx1
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: nginx1
spec:
hosts:
- "*"
gateways:
- istio-system/default-gateway
http:
- match:
- uri:
prefix: /nginx1
route:
- destination:
port:
number: 80
host: nginx1.default.svc.cluster.local
The ingressgateway logs show the following TLS error:
[2019-07-09 09:07:24.907][29][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:88] creating a new connection
[2019-07-09 09:07:24.907][29][debug][client] [external/envoy/source/common/http/codec_client.cc:26] [C4759] connecting
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/common/network/connection_impl.cc:702] [C4759] connecting to 100.200.1.59:80
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/common/network/connection_impl.cc:711] [C4759] connection in progress
[2019-07-09 09:07:24.907][29][debug][pool] [external/envoy/source/common/http/conn_pool_base.cc:20] queueing request due to no available connections
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/common/network/connection_impl.cc:550] [C4759] connected
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:168] [C4759] handshake error: 2
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:168] [C4759] handshake error: 1
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C4759] TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C4759] closing socket: 0
[2019-07-09 09:07:24.907][29][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C4759] disconnect. resetting 0 pending requests
[2019-07-09 09:07:24.907][29][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C4759] client disconnected, failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2019-07-09 09:07:24.907][29][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:164] [C4759] purge pending, failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2019-07-09 09:07:24.907][29][debug][router] [external/envoy/source/common/router/router.cc:671] [C4753][S3527573287149425977] upstream reset: reset reason connection failure
[2019-07-09 09:07:24.907][29][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1137] [C4753][S3527573287149425977] Sending local reply with details upstream_reset_before_response_started{connection failure,TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER}
Reading though this blog I thought I might need to add
- hosts:
- '*'
port:
name: https
number: 443
protocol: HTTPS
tls:
mode: SIMPLE
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
privateKey: /etc/istio/ingressgateway-certs/tls.key
to the ingressgateway configuration. However, this did not solve the problem. Additionally, since I am using SDS, there won't be any certificates in ingressgateway-certs (see https://istio.io/docs/tasks/security/auth-sds/#verifying-no-secret-volume-mounted-file-is-generated) as it is described in https://istio.io/docs/tasks/traffic-management/ingress/secure-ingress-mount/
Can anyone point me to a correct configuration? Most of what I find online is referring to the "old" filemount approach...
The issue has been resolved by not using istio-cni. See https://github.com/istio/istio/issues/15701
You may have to specify the minimum or maximum TLS version. The options are documented here, under minProtocolVersion and maxProtocolVersion:
https://istio.io/docs/reference/config/networking/v1alpha3/gateway/#Server-TLSOptions
Under the hood, these values map to the following Envoy parameters:
https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/auth/cert.proto#auth-tlsparameters