I'm running a .dotnet 3.1 RestAPI inside a pod in Openshift, and it process every request smoothly - all transactions to the database (outside the Openshift network) are executed properly, all programatic executions are being finalized without errors. However, 1 in 15 requests will always ECONNRESET and fail to return the HTTP request.
Let's say I make a GET to /users/id/3 - I can see this request hitting my restAPI, being processed all the way down the infra layer, fetch data from the DB, wrap the return, and send it back finishing the request, however at this point, no return is received by the frontend, or postman, but I can see on the API logs that the request was finished and returned.
All of this requests take 2.3min to execute, and often finish in a ECONNRESET. I'm at odds at how to troubleshoot this. I have tried curl'ing the resource in another pod and the same behaviour appears.
I think these requests sometimes are getting lost in the cluster network, so I tried playing with the sessionaffinity of the service config but it's not really tied to this, as far as I understood. Do I have a wrong route config, or service config?
Route config
spec:
host: api.com.cloud
to:
kind: Service
name: api
weight: 100
port:
targetPort: 8080-tcp
tls:
termination: edge
wildcardPolicy: None
status:
ingress:
- host: api.com.cloud
routerName: default
conditions:
- type: Admitted
status: 'True'
lastTransitionTime: XXXX
wildcardPolicy: None
routerCanonicalHostname: router-default.apps.com.cloud
Service config
spec:
clusterIP: XXXX
ipFamilies:
- IPv4
ports:
- name: 8080-tcp
protocol: TCP
port: 8080
targetPort: 8080
internalTrafficPolicy: Cluster
clusterIPs:
- XXX
type: ClusterIP
ipFamilyPolicy: SingleStack
sessionAffinity: None
selector:
deploymentconfig: api
status:
loadBalancer: {}
I'm using the Mercure hub 0.13, everything works fine on my development machine, but on my test server the hub keeps on trying to bind on port 80, resulting in a error, as nginx is already running on port 80.
run: loading initial config: loading new config: http app module: start: tcp: listening on :80: listen tcp :80: bind: address already in use
I'm starting the hub with the following command:
MERCURE_PUBLISHER_JWT_KEY=$(cat publisher.key.pub) \
MERCURE_PUBLISHER_JWT_ALG=RS256 \
MERCURE_SUBSCRIBER_JWT_KEY=$(cat publisher.key.pub) \
MERCURE_SUBSCRIBER_JWT_ALG=RS256 \
./mercure run -config Caddyfile.dev
Caddyfile.dev is as follows:
# Learn how to configure the Mercure.rocks Hub on https://mercure.rocks/docs/hub/config
{
{$GLOBAL_OPTIONS}
}
{$SERVER_NAME:localhost:3000}
log
route {
redir / /.well-known/mercure/ui/
encode zstd gzip
mercure {
# Transport to use (default to Bolt)
transport_url {$MERCURE_TRANSPORT_URL:bolt://mercure.db}
# Publisher JWT key
publisher_jwt {env.MERCURE_PUBLISHER_JWT_KEY} {env.MERCURE_PUBLISHER_JWT_ALG}
# Subscriber JWT key
subscriber_jwt {env.MERCURE_SUBSCRIBER_JWT_KEY} {env.MERCURE_SUBSCRIBER_JWT_ALG}
# Permissive configuration for the development environment
cors_origins *
publish_origins *
demo
anonymous
subscriptions
# Extra directives
{$MERCURE_EXTRA_DIRECTIVES}
}
respond /healthz 200
respond "Not Found" 404
}
When I provider the SERVER_NAME as an environment variable, without a domain, SERVER_NAME=:3000, the hub actually starts on port 3000, but runs in http mode, which only allows for anonymous subscriptions and is not what I need.
Server:
Operating System: CentOS Stream 8
Kernel: Linux 4.18.0-383.el8.x86_64
Architecture: x86-64
Full output when trying to start the Mercure hub:
2022/05/10 04:50:29.605 INFO using provided configuration {"config_file": "Caddyfile.dev", "config_adapter": ""}
2022/05/10 04:50:29.606 WARN input is not formatted with 'caddy fmt' {"adapter": "caddyfile", "file": "Caddyfile.dev", "line": 3}
2022/05/10 04:50:29.609 INFO admin admin endpoint started {"address": "tcp/localhost:2019", "enforce_origin": false, "origins": ["localhost:2019", "[::1]:2019", "127.0.0.1:2019"]}
2022/05/10 04:50:29.610 INFO http enabling automatic HTTP->HTTPS redirects {"server_name": "srv0"}
2022/05/10 04:50:29.610 INFO tls.cache.maintenance started background certificate maintenance {"cache": "0xc0003d6150"}
2022/05/10 04:50:29.627 INFO tls cleaning storage unit {"description": "FileStorage:/root/.local/share/caddy"}
2022/05/10 04:50:29.628 INFO tls finished cleaning storage units
2022/05/10 04:50:29.642 INFO pki.ca.local root certificate is already trusted by system {"path": "storage:pki/authorities/local/root.crt"}
2022/05/10 04:50:29.643 INFO tls.cache.maintenance stopped background certificate maintenance {"cache": "0xc0003d6150"}
run: loading initial config: loading new config: http app module: start: tcp: listening on :80: listen tcp :80: bind: address already in use
I'm a bit late, but I hope that will help someone.
As mentionned here, you can specify the http_port manually in your caddy configuration file.
I have yml configuration file with a router and a service. Every time I get a 404 error. I know the URL works and I can access the server from Traefik server. What am I missing? Also, for some reason the request reroutes to https. Perhaps a conflicting rule?
Also note, Traefik runs in docker, but the connecting server does not. The goal here is to add multiple nodes to the load balancer.
http:
routers:
demo_1-rtr:
rule: "Host(`http://demo.lab.local`)"
service: demo_1
entryPoints:
- http
services:
demo_1:
loadBalancer:
servers:
- url: "http://172.16.9.90:16000"
Traefik Config:
global:
checkNewVersion: true
sendAnonymousUsage: true
api:
insecure: true
providers:
docker:
endpoint: "unix://var/run/docker.sock"
exposedByDefault: false
file:
directory: /rules
watch: true
log:
level: DEBUG
accessLog: {}
entryPoints:
http:
address: ":80"
I suspect it would be this
--api.insecure=true global argument and it should work.
So in your case add the following in traefik.toml
[api]
insecure = true
Otherwise I would need more information to debug more.
I'm following the tutorial to use traefik as the ingress and ingress controller for Azure Kubernetes Service (AKS) cluster. I'm using terraform to deploy the traefik (version 1.7.24) helm chart.
resource "helm_release" "traefik" {
name = "traefik"
namespace = "traefik"
repository = "https://charts.helm.sh/stable"
chart = "traefik"
version = "1.87.2"
values = [<<EOF
loadBalancerIP: "50.100.200.300"
service:
annotations:
service.beta.kubernetes.io/azure-load-balancer-resource-group: "aks-rg"
kubernetes:
ingressClass: traefik
ingressEndpoint:
useDefaultPublishedService: true
dashboard:
enabled: true
domain: traefik.mydomain.tld
ingress:
annotations:
kubernetes.io/ingress.class: traefik
metrics:
serviceMonitor:
enabled: true
rbac:
enabled: true
ssl:
enabled: true
enforced: true
acme:
enabled: true
email: admin#mydomain.tld
staging: true
tlsChallenge: true
entrypoint: https
ports: "443:443"
challengeType: tls-alpn-01
onHostRule: true
domains:
enabled: true
domainsList:
- main: "mydomain.tld"
- sans:
- "traefik.mydomain.tld"
EOF
]
}
The DNS records are correctly pointed to the AKS Load Balancer IP.
When I check the traefik logs, I could see that "tls-alpn-01" challenge fails with the following error:
{"level":"error","msg":"Unable to obtain ACME certificate for domains \"mydomain.tld,traefik.mydomain.tld\" : unable to generate a certificatefor the domains [mydomain.tld traefik.mydomain.tld]: acme: Error -\u003e One or more domains had a problem:\n[mydomain.tld] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during connect (likely firewall problem), url: \n[traefik.mydomain.tld] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during connect (likely firewall problem), url: \n","time":"2021-02-26T02:32:05Z"}
Full log is given below:
{"level":"info","msg":"Using TOML configuration file /config/traefik.toml","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"No tls.defaultCertificate given for https: using the first item in tls.certificates as a fallback.","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Traefik version v1.7.24 built on 2020-03-25_04:34:11PM","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/v1.7/basics/#collected-data\n","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Preparing server traefik \u0026{Address::8080 TLS:\u003cnil\u003e Redirect:\u003cnil\u003e Auth:\u003cnil\u003e WhitelistSourceRange:[] WhiteList:\u003cnil\u003e Compress:false ProxyProtocol:\u003cnil\u003e ForwardedHeaders:0xc000851700} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Preparing server http \u0026{Address::80 TLS:\u003cnil\u003e Redirect:0xc0000b1b80 Auth:\u003cnil\u003e WhitelistSourceRange:[] WhiteList:\u003cnil\u003e Compress:true ProxyProtocol:\u003cnil\u003e ForwardedHeaders:0xc0008516a0} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Starting server on :8080","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Preparing server https \u0026{Address::443 TLS:0xc000431e60 Redirect:\u003cnil\u003e Auth:\u003cnil\u003e WhitelistSourceRange:[] WhiteList:\u003cnil\u003e Compress:true ProxyProtocol:\u003cnil\u003e ForwardedHeaders:0xc0008516c0} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Starting provider configuration.ProviderAggregator {}","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Starting server on :80","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Starting server on :443","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Starting provider *kubernetes.Provider {\"Watch\":true,\"Filename\":\"\",\"Constraints\":[],\"Trace\":false,\"TemplateVersion\":0,\"DebugLogGeneratedTemplate\":false,\"Endpoint\":\"\",\"Token\":\"\",\"CertAuthFilePath\":\"\",\"DisablePassHostHeaders\":false,\"EnablePassTLSCert\":false,\"Namespaces\":null,\"LabelSelector\":\"\",\"IngressClass\":\"traefik\",\"IngressEndpoint\":{\"IP\":\"\",\"Hostname\":\"\",\"PublishedService\":\"traefik/traefik\"},\"ThrottleDuration\":0}","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"ingress label selector is: \"\"","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Creating in-cluster Provider client","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Starting provider *acme.Provider {\"Email\":\"admin#mydomain.tld\",\"ACMELogging\":false,\"CAServer\":\"https://acme-staging-v02.api.letsencrypt.org/directory\",\"Storage\":\"/acme/acme.json\",\"EntryPoint\":\"https\",\"KeyType\":\"RSA4096\",\"OnHostRule\":true,\"OnDemand\":false,\"DNSChallenge\":null,\"HTTPChallenge\":null,\"TLSChallenge\":{},\"Domains\":[{\"Main\":\"mydomain.tld\",\"SANs\":[\"traefik.mydomain.tld\"]}],\"Store\":{}}","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Testing certificate renew...","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Server configuration reloaded on :8080","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Server configuration reloaded on :80","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Server configuration reloaded on :443","time":"2021-02-26T02:31:38Z"}
{"level":"info","msg":"Server configuration reloaded on :443","time":"2021-02-26T02:31:39Z"}
{"level":"info","msg":"Server configuration reloaded on :8080","time":"2021-02-26T02:31:39Z"}
{"level":"info","msg":"Server configuration reloaded on :80","time":"2021-02-26T02:31:39Z"}
{"level":"info","msg":"Register...","time":"2021-02-26T02:31:42Z"}
{"level":"info","msg":"Updated status on ingress traefik/traefik-dashboard","time":"2021-02-26T02:31:42Z"}
{"level":"info","msg":"Server configuration reloaded on :443","time":"2021-02-26T02:31:55Z"}
{"level":"info","msg":"Server configuration reloaded on :8080","time":"2021-02-26T02:31:55Z"}
{"level":"info","msg":"Server configuration reloaded on :80","time":"2021-02-26T02:31:55Z"}
{"level":"error","msg":"Unable to obtain ACME certificate for domains \"mydomain.tld,traefik.mydomain.tld\" : unable to generate a certificatefor the domains [mydomain.tld traefik.mydomain.tld]: acme: Error -\u003e One or more domains had a problem:\n[mydomain.tld] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during connect (likely firewall problem), url: \n[traefik.mydomain.tld] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during connect (likely firewall problem), url: \n","time":"2021-02-26T02:32:05Z"}
I have even added a "AllowAll" rule in AKS LoadBalancer NSG (firewall). But still tls-alpn-01 validation is facing the timeout error. The ssl certificate generation doesn't happen and my website is using the default example.com expired ssl certificate.
I can confirm that telnet to port 443 of mydomain.tld also works fine.
PS: I don't want to use "dns-01" challenge for ssl certificate as the dns provider doesn't have the APIs for let's encrypt. I can't use "http-01" as these are backend servers which do not have any web server.
Any help is much appreciated. I would also like to know how the tls-alpn-01 challenge works.
I'm trying to setup a Let's Encrypt certificate on Google Cloud. I recently changed it from http01 to dns01 challenge type so that I could create Cloud DNS zones and the acme challenge TXT record would automatically be added.
Here's my certificate.yaml
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: san-tls
namespace: default
spec:
secretName: san-tls
issuerRef:
name: letsencrypt
commonName: www.evolut.net
altNames:
- portal.evolut.net
dnsNames:
- www.evolut.net
- portal.evolut.net
acme:
config:
- dns01:
provider: clouddns
domains:
- www.evolut.net
- portal.evolut.net
However now I get the following error when I kubectl describe certificate:
Message: DNS names on TLS certificate not up to date: ["portal.evolut.net" "www.evolut.net"]
Reason: DoesNotMatch
Status: False
Type: Ready
More worryingly, when I kubectl describe order I see the following:
Status:
Challenges:
Authz URL: https://acme-v02.api.letsencrypt.org/acme/authz/redacted
Config:
Http 01:
Dns Name: portal.evolut.net
Issuer Ref:
Kind: Issuer
Name: letsencrypt
Key: redacted
Token: redacted
Type: http-01
URL: https://acme-v02.api.letsencrypt.org/acme/challenge/redacted
Wildcard: false
Authz URL: https://acme-v02.api.letsencrypt.org/acme/authz/redacted
Config:
Http 01:
Notice how the Type is always http-01, although in the certificate they are listed under dns01.
This means that the ACME TXT file is never created in Cloud DNS and of course the domains aren't validated.
This seems to be related an issue related to the use of multiple domains. I suggest the use of two different namespaces. You can check an example in the following link:
Failed to list *v1alpha1.Order: orders.certmanager.k8s.io is forbidden