Istio with SDS and Mutual TLS: upstream connect error or disconnect/reset before headers. reset reason: connection failure - ssl

I am trying to set up a cluster with Istio on it, where the SSL traffic gets terminated at the Ingress. I have deployed Istio with SDS and Mutual TLS. With the below yaml, I only get the error message upstream connect error or disconnect/reset before headers. reset reason: connection failure when accessing my cluster in the browser:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: default-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- hosts:
- '*'
port:
name: http
number: 80
protocol: HTTP
---
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: nginx1
name: nginx1
spec:
containers:
- image: nginx
name: nginx
resources: {}
ports:
- containerPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Never
status: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
run: nginx1
name: nginx1
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
run: nginx1
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: nginx1
spec:
hosts:
- "*"
gateways:
- istio-system/default-gateway
http:
- match:
- uri:
prefix: /nginx1
route:
- destination:
port:
number: 80
host: nginx1.default.svc.cluster.local
The ingressgateway logs show the following TLS error:
[2019-07-09 09:07:24.907][29][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:88] creating a new connection
[2019-07-09 09:07:24.907][29][debug][client] [external/envoy/source/common/http/codec_client.cc:26] [C4759] connecting
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/common/network/connection_impl.cc:702] [C4759] connecting to 100.200.1.59:80
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/common/network/connection_impl.cc:711] [C4759] connection in progress
[2019-07-09 09:07:24.907][29][debug][pool] [external/envoy/source/common/http/conn_pool_base.cc:20] queueing request due to no available connections
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/common/network/connection_impl.cc:550] [C4759] connected
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:168] [C4759] handshake error: 2
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:168] [C4759] handshake error: 1
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C4759] TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2019-07-09 09:07:24.907][29][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C4759] closing socket: 0
[2019-07-09 09:07:24.907][29][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C4759] disconnect. resetting 0 pending requests
[2019-07-09 09:07:24.907][29][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C4759] client disconnected, failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2019-07-09 09:07:24.907][29][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:164] [C4759] purge pending, failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2019-07-09 09:07:24.907][29][debug][router] [external/envoy/source/common/router/router.cc:671] [C4753][S3527573287149425977] upstream reset: reset reason connection failure
[2019-07-09 09:07:24.907][29][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1137] [C4753][S3527573287149425977] Sending local reply with details upstream_reset_before_response_started{connection failure,TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER}
Reading though this blog I thought I might need to add
- hosts:
- '*'
port:
name: https
number: 443
protocol: HTTPS
tls:
mode: SIMPLE
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
privateKey: /etc/istio/ingressgateway-certs/tls.key
to the ingressgateway configuration. However, this did not solve the problem. Additionally, since I am using SDS, there won't be any certificates in ingressgateway-certs (see https://istio.io/docs/tasks/security/auth-sds/#verifying-no-secret-volume-mounted-file-is-generated) as it is described in https://istio.io/docs/tasks/traffic-management/ingress/secure-ingress-mount/
Can anyone point me to a correct configuration? Most of what I find online is referring to the "old" filemount approach...

The issue has been resolved by not using istio-cni. See https://github.com/istio/istio/issues/15701

You may have to specify the minimum or maximum TLS version. The options are documented here, under minProtocolVersion and maxProtocolVersion:
https://istio.io/docs/reference/config/networking/v1alpha3/gateway/#Server-TLSOptions
Under the hood, these values map to the following Envoy parameters:
https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/auth/cert.proto#auth-tlsparameters

Related

Strimzi Kafka Zookeeper not starting

i'm trying to deploy kafka using strimzi, but zookeeper keep throwing following exception
Failed to verify hostname: 10.244.0.14 (org.apache.zookeeper.common.ZKTrustManager) [ListenerHandler-my-cluster-zookeeper-0.my-cluster-zookeeper-nodes.kafka.svc/10.244.1.20:3888]
javax.net.ssl.SSLPeerUnverifiedException:
Certificate for <10.244.0.14> doesn't match any of the subject alternative names: [*.my-
cluster-zookeeper-client.kafka.svc,
my-cluster-zookeeper-client, my-cluster-zookeeper-1.my-cluster-zookeeper-nodes.kafka.svc.cluster.local,
my-cluster-zookeeper-1.my-cluster-zookeeper-nodes.kafka.svc, my-cluster-zookeeper-client.kafka, my-cluster-zookeeper-client.kafka.svc,
*.my-cluster-zookeeper-nodes.kafka.svc,
*.my-cluster-zookeeper-nodes.kafka.svc.cluster.local, *.my-cluster-zookeeper-client.kafka.svc.cluster.local, my-cluster-zookeeper-client.kafka.svc.cluster.local]
below is the deployment file i'm using
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 3.1.0
replicas: 2
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: external
port: 9094
type: loadbalancer
tls: false
config:
offsets.topic.replication.factor: 2
transaction.state.log.replication.factor: 2
transaction.state.log.min.isr: 2
default.replication.factor: 2
min.insync.replicas: 2
inter.broker.protocol.version: "3.1"
storage:
type: ephemeral
zookeeper:
replicas: 2
storage:
type: ephemeral
this is how i created strimzi cluster operator
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka

NIFI CLUSTER AND ZOOKEEPER CLUSTER

I want to configure a NIFI Cluster with external TLS zookeeper cluster (deployed in a kubernetes cluster). All is ok (quorum, zookeeper tls...) but when I set the zookeeper connection string to … myzk:3181,myzk2:3181… and Nifi tries connect to zookeeper cluster, I get this message :
io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0000002d0000
I think that is because Nifi is talking with zookeeper HTTP and the 3181 is HTTPS
Thanks in advance, regards
NIFI Version : 1.12.1
Zookeeper 3.7.0 (QUORUM IS OK)
#nifi.properties
# Site to Site properties
nifi.remote.input.host=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
# web properties #
nifi.web.war.directory=./lib
nifi.web.proxy.host=my_proxy.com
nifi.web.http.port=
nifi.web.https.port=9443
nifi.web.http.host=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.web.http.network.interface.default=eth0
nifi.web.https.host=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
# nifi.web.proxy.context.path=
# security properties #
nifi.sensitive.props.key=
nifi.sensitive.props.key.protected=
nifi.sensitive.props.algorithm=PBEWITHMD5AND256BITAES-CBC-OPENSSL
nifi.sensitive.props.provider=BC
nifi.sensitive.props.additional.keys=
nifi.security.keystore=/opt/nifi/nifi-current/config-data/certs/keystore.jks
nifi.security.keystoreType=jks
nifi.security.keystorePasswd=tym6nSAHI7xwnqUdwi4OGn2RpXtq9zLpqurol1lLqVg
nifi.security.keyPasswd=tym6nSAHI7xwnqUdwi4OGn2RpXtq9zLpqurol1lLqVg
nifi.security.truststore=/opt/nifi/nifi-current/config-data/certs/truststore.jks
nifi.security.truststoreType=jks
nifi.security.truststorePasswd=wRbjBPa62GLnlWaGMIMg6Ak6n+AyCeUKEquGSwyJt24
nifi.security.needClientAuth=true
nifi.security.user.authorizer=managed-authorizer
nifi.security.user.login.identity.provider=
nifi.security.ocsp.responder.url=
nifi.security.ocsp.responder.certificate=
# OpenId Connect SSO Properties #
nifi.security.user.oidc.discovery.url=https://my_url_oidc
nifi.security.user.oidc.connect.timeout=5 secs
nifi.security.user.oidc.read.timeout=5 secs
nifi.security.user.oidc.client.id=lkasdnlnsda
nifi.security.user.oidc.client.secret=fdjksalfnslknasfiDHn
nifi.security.user.oidc.preferred.jwsalgorithm=
nifi.security.user.oidc.claim.identifying.user=email
nifi.security.user.oidc.additional.scopes=
# Apache Knox SSO Properties #
nifi.security.user.knox.url=
nifi.security.user.knox.publicKey=
nifi.security.user.knox.cookieName=hadoop-jwt
nifi.security.user.knox.audiences=
# Identity Mapping Properties #
# These properties allow normalizing user identities such that identities coming from different identity providers
# (certificates, LDAP, Kerberos) can be treated the same internally in NiFi. The following example demonstrates normalizing
# DNs from certificates and principals from Kerberos into a common identity string:
#
# nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$
# nifi.security.identity.mapping.value.dn=$1#$2
# nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance#(.*?)$
# nifi.security.identity.mapping.value.kerb=$1#$2
# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=true
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates=
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=nifi-zk:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
nifi.zookeeper.client.secure=true
## BY DEFAULT, NIFI CLIENT WILL USE nifi.security.* if you require separate keystore and truststore uncomment below section
nifi.zookeeper.security.keystore=/opt/nifi/nifi-current/config-data/certs/zk/keystore.jks
nifi.zookeeper.security.keystoreType=JKS
nifi.zookeeper.security.keystorePasswd=123456
nifi.zookeeper.security.truststore=/opt/nifi/nifi-current/config-data/certs/zk/truststore.jks
nifi.zookeeper.security.truststoreType=JKS
nifi.zookeeper.security.truststorePasswd=123456
# Zookeeper properties for the authentication scheme used when creating acls on znodes used for cluster management
# Values supported for nifi.zookeeper.auth.type are "default", which will apply world/anyone rights on znodes
# and "sasl" which will give rights to the sasl/kerberos identity used to authenticate the nifi node
# The identity is determined using the value in nifi.kerberos.service.principal and the removeHostFromPrincipal
# and removeRealmFromPrincipal values (which should align with the kerberos.removeHostFromPrincipal and kerberos.removeRealmFromPrincipal
# values configured on the zookeeper server).
nifi.zookeeper.auth.type=
nifi.zookeeper.kerberos.removeHostFromPrincipal=
nifi.zookeeper.kerberos.removeRealmFromPrincipal=
# kerberos #
nifi.kerberos.krb5.file=
# kerberos service principal #
nifi.kerberos.service.principal=
nifi.kerberos.service.keytab.location=
# kerberos spnego principal #
nifi.kerberos.spnego.principal=
nifi.kerberos.spnego.keytab.location=
nifi.kerberos.spnego.authentication.expiration=12 hours
# external properties files for variable registry
# supports a comma delimited list of file locations
nifi.variable.registry.properties=
You usually see this when you have a HTTP vs HTTPS mismatch
ideally you would be calling your service over the HTTP
spring:
cloud:
gateway:
discovery:
locator:
url-expression: "'lb:http://'+serviceId"
For reference using client port 2181
apiVersion: v1
kind: Service
metadata:
name: zk-hs
labels:
app: zk
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk
---
apiVersion: v1
kind: Service
metadata:
name: zk-cs
labels:
app: zk
spec:
ports:
- port: 2181
name: client
selector:
app: zk
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
selector:
matchLabels:
app: zk
maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zk
spec:
selector:
matchLabels:
app: zk
serviceName: zk-hs
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: OrderedReady
template:
metadata:
labels:
app: zk
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
containers:
- name: kubernetes-zookeeper
imagePullPolicy: Always
image: "k8s.gcr.io/kubernetes-zookeeper:1.0-3.4.10"
resources:
requests:
memory: "1Gi"
cpu: "0.5"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
command:
- sh
- -c
- "start-zookeeper \
--servers=3 \
--data_dir=/var/lib/zookeeper/data \
--data_log_dir=/var/lib/zookeeper/data/log \
--conf_dir=/opt/zookeeper/conf \
--client_port=2181 \
--election_port=3888 \
--server_port=2888 \
--tick_time=2000 \
--init_limit=10 \
--sync_limit=5 \
--heap=512M \
--max_client_cnxns=60 \
--snap_retain_count=3 \
--purge_interval=12 \
--max_session_timeout=40000 \
--min_session_timeout=4000 \
--log_level=INFO"
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
NiFi did not support TLS with ZooKeeper until release 1.13.0. If you're using NiFi 1.12.1 it will not support configuring TLS for ZooKeeper.

Argocd Failed to Get Static Asset when Loading UI

New to ArgoCD. I have deployed ArgoCD on my EKS cluster fronted with an AWS ALB Controller.
...
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/listen-port: '[{"HTTPS":443}]'
name: argo-ingress
namespace: argocd
spec:
rules:
- host: argocd.example.com
http:
paths:
- backend:
serviceName: argocd-server
servicePort: 80
path: /
Given that the SSL is terminated at ALB, I deployed API server with the API server with the following parameters:
spec:
containers:
- command:
- argocd-server
- --insecure
- --staticassets
- /shared/app
When I port forward ArgoCD on the cluster, I am able to retrieve the objects locally.
HTTP request sent, awaiting response... 200 OK
Length: 2080536 (2.0M) [application/javascript]
Saving to: ‘main.12b930b6a3d660c9da5a.js.2’
100%[===================================================================================================================>] 2,080,536 --.-K/s in 0.03s
2020-10-26 02:14:53 (64.2 MB/s) - ‘main.12b930b6a3d660c9da5a.js.2’ saved [2080536/2080536]
However, when I use the browser to access the UI, I get 200 MSG and get a blank UI page and I get 400 error for the main.js and images.
Can anyone help me to troubleshoot this?
I managed to find the issue.
There was a typo in the ingress controller rules. As a result, all requests were being handled by the last ALB rule which resulted in 404. The fix was to include a '*' in the path. See below:
...
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/listen-port: '[{"HTTPS":443}]'
name: argo-ingress
namespace: argocd
spec:
rules:
- host: argocd.example.com
http:
paths:
- backend:
serviceName: argocd-server
servicePort: 80
path: /*

Lets Encrypt DNS challenge using HTTP

I'm trying to setup a Let's Encrypt certificate on Google Cloud. I recently changed it from http01 to dns01 challenge type so that I could create Cloud DNS zones and the acme challenge TXT record would automatically be added.
Here's my certificate.yaml
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: san-tls
namespace: default
spec:
secretName: san-tls
issuerRef:
name: letsencrypt
commonName: www.evolut.net
altNames:
- portal.evolut.net
dnsNames:
- www.evolut.net
- portal.evolut.net
acme:
config:
- dns01:
provider: clouddns
domains:
- www.evolut.net
- portal.evolut.net
However now I get the following error when I kubectl describe certificate:
Message: DNS names on TLS certificate not up to date: ["portal.evolut.net" "www.evolut.net"]
Reason: DoesNotMatch
Status: False
Type: Ready
More worryingly, when I kubectl describe order I see the following:
Status:
Challenges:
Authz URL: https://acme-v02.api.letsencrypt.org/acme/authz/redacted
Config:
Http 01:
Dns Name: portal.evolut.net
Issuer Ref:
Kind: Issuer
Name: letsencrypt
Key: redacted
Token: redacted
Type: http-01
URL: https://acme-v02.api.letsencrypt.org/acme/challenge/redacted
Wildcard: false
Authz URL: https://acme-v02.api.letsencrypt.org/acme/authz/redacted
Config:
Http 01:
Notice how the Type is always http-01, although in the certificate they are listed under dns01.
This means that the ACME TXT file is never created in Cloud DNS and of course the domains aren't validated.
This seems to be related an issue related to the use of multiple domains. I suggest the use of two different namespaces. You can check an example in the following link:
Failed to list *v1alpha1.Order: orders.certmanager.k8s.io is forbidden

Cert-manager certificates not found and challenges not created

I followed https://docs.cert-manager.io/en/venafi/tutorials/quick-start/index.html from start to end and everything seems to be working except that I'm not getting an external ip for my ingress.
NAME HOSTS ADDRESS PORTS AGE
staging-site-ingress staging.site.io,staging.admin.site.io, 80, 443 1h
Altough I'm able to use the nginx ingress controller external ip and use dns to access the sites. When I'm going to the urls I'm being redirected to https, so I assume that's working fine.
It redirects to https but still says "not secured", so he don't get a certificate issued.
When I'm debugging I get the following information:
Ingress:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreateCertificate 54m cert-manager Successfully created Certificate "tls-secret-staging"
Normal UPDATE 35m (x3 over 1h) nginx-ingress-controller Ingress staging/staging-site-ingress
Normal CreateCertificate 23m (x2 over 35m) cert-manager Successfully created Certificate "letsencrypt-staging-tls"
Certificate:
Status:
Conditions:
Last Transition Time: 2019-02-27T14:02:29Z
Message: Certificate does not exist
Reason: NotFound
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal OrderCreated 3m (x2 over 14m) cert-manager Created Order resource "letsencrypt-staging-tls-593754378"
Secret:
Name: letsencrypt-staging-tls
Namespace: staging
Labels: certmanager.k8s.io/certificate-name=staging-site-io
Annotations: <none>
Type: kubernetes.io/tls
Data
====
ca.crt: 0 bytes
tls.crt: 0 bytes
tls.key: 1679 bytes
Order:
Status:
Certificate: <nil>
Finalize URL:
Reason:
State:
URL:
Events: <none>
So it seems something goes wrong in order and no challenges are created.
Here are my ingress.yaml and issuer.yaml:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: staging-site-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
certmanager.k8s.io/issuer: "letsencrypt-staging"
certmanager.k8s.io/acme-challenge-type: http01
spec:
tls:
- hosts:
- staging.site.io
- staging.admin.site.io
- staging.api.site.io
secretName: letsencrypt-staging-tls
rules:
- host: staging.site.io
http:
paths:
- backend:
serviceName: frontend-service
servicePort: 80
path: /
- host: staging.admin.site.io
http:
paths:
- backend:
serviceName: frontend-service
servicePort: 80
path: /
- host: staging.api.site.io
http:
paths:
- backend:
serviceName: gateway-service
servicePort: 9000
path: /
apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
name: letsencrypt-staging
namespace: staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: hello#site.io
privateKeySecretRef:
name: letsencrypt-staging-tls
http01: {}
Anyone knows what I can do to fix this or what went wrong? Certmanager is installed correctly 100%, I'm just not sure about the ingress and what went wrong in the order.
Thanks in advance!
EDIT: I found this in the nginx-ingress-controller:
W0227 14:51:02.740081 8 controller.go:1078] Error getting SSL certificate "staging/letsencrypt-staging-tls": local SSL certificate staging/letsencrypt-staging-tls was not found. Using default certificate
It's getting spammed & the CPU load is always at 0.003 and the cpu graph is full (the other services are almost nothing)
I stumbled over the same issue once, following exactly the same official tutorial.
As #mikebridge mentioned, the issue is with Issuer/Secret's namespace mismatch.
For me, the best was to switch from Issuer to ClusterIssuer, which is not scoped to a single namespace.
The reason your certificate order is not completing is because the challenge is failing to successfully complete. Review your solver configuration in either your Issuer or ClusterIssuer.
See my answer here for more details.
https://stackoverflow.com/a/75454772/4820940