Airflow worker trying to connect to redis despite rabbitmq configuration - redis

I installed and set up Airflow 2.0.1 with Celery, Rabbitmq and Postgresql9.6 on RHEL7, using the constraints https://raw.githubusercontent.com/apache/airflow/constraints-2.0.1/constraints-3.7.txt.
So I am not using Docker container, and in fact am building a cluster with 3 nodes.
I created DB and username for PSQL, and created user and set permission for Rabbitmq and am able to access its WebUI in 15672 port.
I am able to run airflow webserver and scheduler and access airflow WebUI with no problem.
The issue arises when I try to start airflow worker (whether from master node or worker nodes). Even though airflow.cfg is set to point out to rabbitmq, I get the error that says:
ImportError: Missing redis library (pip install redis)
Because it is trying to access redis instead of rabbitmq, but I have no idea why.
I checked airflow.cfg line by line and there is not a single line with redis, so am I missing a configuration or what?
My airflow.cfg configuration:
sql_alchemy_conn = postgresql+psycopg2://airflow_user:airflow_pw#10.200.159.59:5432/airflow
broker_url= amqp://rabbitmq_user:rabbitmq_pw#10.200.159.59:5672/airflow_virtual_host
celery_result_backend = db+postgresql://airflow_user:airflow_pw#10.200.159.59:5432/airflow
dags_are_paused_at_creation = True
load_examples = False
Why does my airflow worker try to reach redis when it is configured for rabbitmq?

I found the problem after spending many hours on such a simple, silly issue.
Airflow still tried to connect to redis, which isthe default Airflow config despite my rabbitmq configuration in airflow.cfg because I had written all of the configs under [core] section, wheras lines must be written to related parts in airflow.cfg.
I moved broker_url and result_backend to under [celery] and issue was resolved.

Related

Google cloud kubernetes cluster newbie question

I am a newbie of GKE. I created a GKE cluster with very simple setup. It only has on gpu node and all other stuff was default. After the cluster is up, I was able to list the nodes and ssh into the nodes. But I have two questions here.
I tried to install nvidia driver using the command:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
It output that:
kubectl apply --filename https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
daemonset.apps/nvidia-driver-installer configured
But 'nvidia-smi' cannot be found at all. Should I do something else to make it work?
On the worker node, there wasn't the .kube directory and the file 'config'. I had to copy it from the master node to the worker node to make things work. And the config file on the master node automatically updates so I have to copy again and again. Did I miss some steps in the creation of the cluster or how to resolve this problem?
I appreciate someone can shed some light on this. It drove me crazy after working on it for several days.
Tons of thanks.
Alex.
For the DaemonSet to work, you need to have a tag on your worker Node as cloud.google.com/gke-accelerator (see this line). The DaemonSet checks for this tag on a node before scheduling any pods for installing the driver. I'm guessing a default node pool you create did not have this tag on it. You can find more details on this on the GKE docs here.
The worker nodes, by design are just that worker nodes. They do not need privileged access to the Kubernetes API so they don't need any kubeconfig files. The communication between worker nodes and the API is strictly controlled through the kubelet binary running on the node. Therefore, you will never find kubeconfig files on a worker node. Also, you should never put them on the worker node either, since if a node gets compromised, the keys in that file can be used to damage the API Server. Instead, you should make it a habit to either use the master nodes for kubectl commands, or better yet, have the kubeconfig on your local machine, and keep it safe, and issue commands remotely to your cluster.
After all, all you need is access to an API endpoint for your Kubernetes API server, and it shouldn't matter where you access it from, as long as the endpoint is reachable. So, there is no need whatsoever to have kubeconfig on the worker nodes :)

RabbitMQ and EC2: Clusters can't join

I'm trying to create a RabbitMQ cluster.
The instances have been set up identically (They've been installed identically), they can resolve eachother's hostnames (Both with digand rabbitmqclt resolve_hostname) and their cookie hash is the same.
I'm wondering whether or not there are more steps to setting up a RabbitMQ cluster when in EC2.
I'm running RabbitMQ 3.9.13 and Ubuntu 20.04
Thank you all in advance
-brej
Basically, it should be sufficient, make sure to declare all these settings the config file of RabbitMQ, this way, each time it start, it will be able to reconnect to the cluster when needed.

How to check the if config command name is changed in AWS Elasticache(REDIS)

I am trying to access AWS elasticache(REDIS). I followed this instruction:
https://redsmin.uservoice.com/knowledgebase/articles/734646-amazon-elasticache-and-redsmin
Redis is connected now but when I click on configuration. I got this error:
"Redsmin can't load the configuration. Check with your provider that you have access to the configuration command."
edit 1:
config Redis command is sadly not available on AWS Elasticache, see their documentation:
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/RestrictedCommands.html
To deliver a managed service experience, ElastiCache restricts access to certain cache engine-specific commands that require advanced privileges. For cache clusters running Redis, the following commands are unavailable:
[...]
config
That's why Redsmin configuration module (it's the only module impacted) cannot display current your Redis AWS Elasticache configuration.

Airflow celery with redis - timeout after 6h

I'm having some troubles using airflow 1.9.0 with CeleryExecutor using redis as broker.
I need to run a job that takes more than 6 hours to complete and I'm losing my celery workers.
Looking into airflow code in GitHub, There is a hard-coded configuration:
https://github.com/apache/incubator-airflow/blob/d760d63e1a141a43a4a43daee9abd54cf11c894b/airflow/config_templates/default_celery.py#L31
How could I bypass this problem?
This is configurable in airflow.cfg under the section celery_broker_transport_options.
See the commit adding this possibility https://github.com/apache/incubator-airflow/commit/be79f87f36b6b99649e0a1f6ab92b41640b3beaa

Celery works without broker and backend running

I'm running Celery on my laptop, with rabbitmq being the broker and redis being the backend. I just used all the default settings and ran celery -A tasks worker --loglevel=info, then it all worked. The workers can get jobs done and I get fetch the execution results by calling result.get(). My question here is that why it works even if I didn't run the rebbitmq and redis servers at all. I did not set the accounts on the servers either. In many tutorials, the first step is to run the broker and backend servers before starting celery.
I'm new to these tools and do not quite understand how they work behind the scene. Any input would be greatly appreciated. Thanks in advance.
Never mind. I just realized that redis and rabbitmq automatically run after installation or shell startup. They must be running for celery to work.