Connecting Celery to Redis Sentinel - redis

How come celery cannot find my sentinel service?
I have:
app.conf.broker_url = "sentinel://192.168.29.11:26379"
app.conf.broker_transport_options = {"master_name": "mymaster"}
And what I am getting is:
{"message": "consumer: Cannot connect to sentinel://192.168.29.11:26379: No master found for None.
Trying again in 2.00 seconds... (1/100)",
"level": "ERROR",
"logger": "celery.worker.consumer.consumer"}
Why is there "No master found for None" when I am specifying the master_name?

It was my mistake, had another app.conf.broker_transport_options defined elsewhere and it overwrote the setting.

Related

neutron-linuxbridge-agent oslo_service.service amqp.exceptions.InternalError: Connection.open: (541) INTERNAL_ERROR

Openstack Train version's neutron-linuxbridge-agent component's log show error:
2022-03-17 14:38:36.727 6 ERROR oslo_service.service File "/var/lib/kolla/venv/lib/python3.6/site-packages/amqp/connection.py", line 648, in _on_close
2022-03-17 14:38:36.727 6 ERROR oslo_service.service (class_id, method_id), ConnectionError)
2022-03-17 14:38:36.727 6 ERROR oslo_service.service amqp.exceptions.InternalError: Connection.open: (541) INTERNAL_ERROR - access to vhost '/' refused for user 'openstack': vhost '/' is down
2022-03-17 14:38:36.727 6 ERROR oslo_service.service
2022-03-17 14:38:36.729 6 INFO neutron.plugins.ml2.drivers.agent._common_agent [-] Stopping Linux bridge agent agent.
docker logs neutron_linuxbridge_agent get:
++ /usr/bin/update-alternatives --query iptables
update-alternatives: error: no alternatives for iptables
++ . /usr/local/bin/kolla_neutron_extend_start
+ echo 'Running command: '\''neutron-linuxbridge-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini'\'''
+ exec neutron-linuxbridge-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini
Running command: 'neutron-linuxbridge-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini'
All openstack network agent list show state are UP, but Alive are XXX.
What's the problem with my cluster, and how could I fixed that? Thanks a lot.
The key server is rabbitmq reference of amqp.exceptions.InternalError, and the rabbit#node-3.log shows:
2022-03-18 06:50:35.270 [error] <0.21119.0> Error on AMQP connection <0.21119.0> (1.1.1.2:12345 -> 1.1.1.3:55672 - neutron-linuxbridge-agent:7:11111111-1111-1111-1111-111111111111, vhost: 'none', user: 'openstack', state: opening), channel 0:
{handshake_error,opening,
{amqp_error,internal_error,
"access to vhost '/' refused for user 'openstack': vhost '/' is down",
'connection.open'}}
While check and login the rabbitmq server site(http://1.1.1.3:15672/), I get this error tip:
rabbitmq virtual host experienced an error on node and may be inaccessible
Solve it by:
1, come in the rabbitmq container, and remove or move out recovery.dets file in directory /var/lib/rabbitmq/mnesia/rabbit#node-3/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L.
2, restart rabbitmq container.
Because of:
In RabbitMQ versions starting with 3.7.0 all messages data is combined in the msg_stores/vhosts directory and stored in a subdirectory per vhost. Each vhost directory is named with a hash and contains a .vhost file with the vhost name, so a specific vhost's message set can be backed up separately.
In RabbitMQ versions prior to 3.7.0 messages are stored in several directories under the node data directory: queues, msg_store_persistent and msg_store_transient. Also there is a recovery.dets file which contains recovery metadata if the node was stopped gracefully.
My whole cluster was reboot by accident, it was recoveried by this method.
if you wanna fix your problem easily please deploy your Rabbimq again with Kolla-ansible.
kolla-ansible -i <INVENTORY> deploy -t rabbitmq -vvvv
it's my experience that the easiest way with the lowest cost of fixing Rabbimq or oslo problem in OpenStack is to redeploy Rabbitmq and invest your time.

Unable to start second rabbitmq node on single Windows host

I am trying to run two rabbitmq nodes on a single windows host. My end goal is to run two rabbitmq services.
currently, I have the following commands for the second node in rabbitmq-env-conf.bat :
set RABBITMQ_CONFIG_FILE=%APPDATA%\RabbitMQ\rabbitmq.conf
set RABBITMQ_NODENAME=rabbit2#hostname
set RABBITMQ_DIST_PORT=5673
set RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15673}]"
Running .\rabbitmq-server.bat start produces the following error :
. . .
Starting broker...Logger - error: {removed_failing_handler,rabbit_log}
BOOT FAILED
===========
Error during startup: {error,
{rabbitmq_management,
{bad_return,
{{rabbit_mgmt_app,start,[normal,[]]},
{'EXIT',
{{could_not_start_listener,
[{cowboy_opts,[{sendfile,false}]},{port,15672}],
{shutdown,
{failed_to_start_child,ranch_acceptors_sup,
{listen_error,
{acceptor,{0,0,0,0,0,0,0,0},15672},
eaddrinuse}}}},
. . .
From the log :
Application rabbitmq_management exited with reason: {{could_not_start_listener,[{cowboy_opts,[{sendfile,false}]},{port,15672}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,{acceptor,{0,0,0,0,0,0,0,0},15672},eaddrinuse}}}},{gen_server,call,[rabbit_web_dispatch_registry,{add,rabbitmq_management_tcp,[{cowboy_opts,[{sendfile,false}]},{port,15672}],#Fun<rabbit_web_dispatch.0.73002970>,[{'_',[],[{[],[],rabbit_mgmt_wm_static,{priv_file,rabbitmq_management,"www/index.html"}},{[<<"api">>,<<"overview">>],[],rabbit_mgmt_wm_overview,...},...]}],...},...]}}
It looks like I am unable to setup the rabbitmq management port successfully despite suppling starting args.
15672 is the first rabbitmq's management port number and I am not sure why this number is being picked up.
Some troubleshooting pointers will be appreciated.

Error while running query on Impala with Superset

I'm trying to connect impala to superset, and when I test the connection prints: "Seems OK!", and when I try to see databases on impala with the SQL Editor in the left side it shows all databases without problems.
Preview of Databases/Tables
But when i write a query and click on "Run Query", it gives the error: "Could not start SASL: b'Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Ticket expired)'"
Error running query
I'm running superset with SSL and in production mode (with Gunicorn) and Impala with SSL in a Kerberized Hadoop Cluster, and my impala database config is:
Impala Config
And in the extras I put:
{
"metadata_params": {},
"engine_params": {
"connect_args": {
"port": 21050,
"use_ssl": "True",
"ca_cert": "path/to/my/ca_cert.pem",
"auth_mechanism": "GSSAPI"
}
},
"metadata_cache_timeout": {},
"schemas_allowed_for_csv_upload": []
}
How can I solve this error? In my superset log it only shows:
Triggering query_id: 65
INFO:superset.views.core:Triggering query_id: 65
Query 65: Running query on a Celery worker
INFO:superset.views.core:Query 65: Running query on a Celery worker
Versions: Superset 0.36.0, Impyla 0.16.2
I was able to fix this error doing this steps:
1 - Created service user for celery-worker, created a kerberos ticket for him and created a crontab to renew the ticket.
2 - Runned celery worker from this service user, instead running from root.
3 - Killed an celery-worker that was running in another machine of my cluster
4 - Restarted Impala and Superset
I think this error ocurred because in some queries instead of use the celery worker in my superset machine, it was using the celery worker that was in another machine without a valid kerberos ticket. I could fix this error because when I was reading celery-worker log , it showed that a connection with the celery worker in other machine failed in a query running.

How to get detailed log/info about rabbitmq connection action?

I have a python program connecting to a rabbitmq server. When this program starts, it connects well. But when rabbitmq server restarts, my program can not reconnect to it, and leaving error just "Socket closed"(produced by kombu), which is meaningless.
I want to know the detailed info about the connection failure. On the server side, there is nothing useful in the rabbitmq log file either, it just said "connection failed" with no reason given.
I tried the trace plugin(https://www.rabbitmq.com/firehose.html), and found there was no trace info published to amq.rabbitmq.trace exchange when the connection failure happended. I enabled the plugin with:
rabbitmq-plugins enable rabbitmq_tracing
systemctl restart rabbitmq-server
rabbitmqctl trace_on
and then i wrote a client to get message from amq.rabbitmq.trace exchange:
#!/bin/env python
from kombu.connection import BrokerConnection
from kombu.messaging import Exchange, Queue, Consumer, Producer
def on_message(self, body, message):
print("RECEIVED MESSAGE: %r" % (body, ))
message.ack()
def main():
conn = BrokerConnection('amqp://admin:pass#localhost:5672//')
channel = conn.channel()
queue = Queue('debug', channel=channel,durable=False)
queue.bind_to(exchange='amq.rabbitmq.trace', routing_key='publish.amq.rabbitmq.trace')
consumer = Consumer(channel, queue)
consumer.register_callback(on_message)
consumer.consume()
while True:
conn.drain_events()
if __name__ == '__main__':
main()
I also tried to get some debug log from rabbitmq server. I reconfigured rabbitmq.config according to https://www.rabbitmq.com/configure.html, and set
log_levels to
{log_levels, [{connection, info}]}
but as a result rabbitmq server failed to start. It seems like the official doc is not for me, my rabbitmq server version is 3.3.5. However
{log_levels, [connection,debug,info,error]}
or
{log_levels, [connection,debug]}
works, but with this there is no DEBUG info showing in the logs, which i don't know whether it is because the log_levels configuration is not effective or there is just no DEBUG log got printed all the time.
I know that this answer comes massively late, but for future purveyors, this worked for me:
[
{rabbit,
[
{log_levels, [{connection, debug}, {channel, debug}]}
]
}
].
Basically, you just need to wrap the parameters you want to set in whichever module/plugin they belong to.

RavenDB Replication Issue - database cannot be found

Have followed the documentation, but am unable to make replication work for RavenDB over the WAN.
Scenario:
Using Raven build #2261
Master DB: has a local name of "it23"
Slave DB: has a remote name of "http://184.169.xxx.xxx" (xxx's are for
privacy)
On both servers I have created a database called "TonyTest".
On the Master db, I have set up replication using the following document:
{
"Destinations": [
{
"Url": "http://184.169.xxx.xxx:8080",
"Username": null,
"Password": null,
"Domain": null,
"ApiKey": null,
"Database": "TonyTest",
"TransitiveReplicationBehavior": "None",
"IgnoredClient": false,
"Disabled": false,
"ClientVisibleUrl": null
}
]
}
When browsing to the remote server using the same URL of: http://184.169.xxx.xxx:8080, the RavenDB studio launches correctly, and I can see the TestTony database. This seems to confirm that the URL is formatted correctly.
However, the master database immediately generates a document showing failures:
{
"Destination": "http://184.169.xxx.xxx:8080/databases/TonyTest",
"FailureCount": 142
}
When we look at the logs for the REMOTE db, we see that there IS communication with the master, but the replication doesn't complete.
Debug 3/9/2013 12:19:44 AM Document with key 'Raven/Replication/Sources/http://it23:8080/databases/TonyTest' was not found Raven.Storage.Esent.StorageActions.DocumentStorageActions
It looks like the remote server is saying that the db "TonyTest' can't be found, but it IS created.
Can anyone spot my mistake?
Per Ayende's request, here are some log samples from LOCAL server after attempting to setup replication (again I replaced IPs with xxx for privacy). We do not see any errors in the LOCAL db's log. And we do see errors popup in the REMOTE db log. This seems to imply that the LOCAL db is connecting to the REMOTE db, but the replication does not happen. Here are the LOCAL logs:
Debug 3/11/2013 3:17:00 PM No work was found, workerWorkCounter: 17626, for: ReducingExecuter, will wait for additional work Raven.Database.Indexing.WorkContext
Debug 3/11/2013 3:17:00 PM Going to index 1 documents in IndexName: Raven/DocumentsByEntityName, LastIndexedEtag: 00000001-0000-0100-0000-000000002265: (Raven/Replication/Destinations/184.169.xxx.xxx8080databasesTonyTest) Raven.Database.Indexing.AbstractIndexingExecuter
Debug 3/11/2013 3:17:00 PM Document with key 'Raven/Studio/PriorityColumns' was not found Raven.Storage.Esent.StorageActions.DocumentStorageActions
Debug 3/11/2013 3:16:56 PM Going to index 1 documents in IndexName: Raven/DocumentsByEntityName, LastIndexedEtag: 00000001-0000-0100-0000-000000002256: (Raven/Replication/Destinations/184.169.xxx.xxx8080databasesTonyTest) Raven.Database.Indexing.AbstractIndexingExecuter
Update 3/11 8:24p Pacific time
I am now seeing the following errors in the MASTER/Local raven logs:
Failed to close response
System.AggregateException: One or more errors occurred. ---> System.Net.HttpListenerException: An operation was attempted on a nonexistent network connection
at System.Net.HttpResponseStream.Dispose(Boolean disposing)
at System.IO.Stream.Close()
at Raven.Database.Util.Streams.BufferPoolStream.Dispose(Boolean disposing) in c:\Builds\RavenDB-Stable\Raven.Database\Util\Streams\BufferPoolStream.cs:line 144
at System.IO.Stream.Close()
at Raven.Database.Impl.ExceptionAggregator.Execute(Action action) in c:\Builds\RavenDB-Stable\Raven.Database\Impl\ExceptionAggregator.cs:line 23
--- End of inner exception stack trace ---
at Raven.Database.Impl.ExceptionAggregator.ThrowIfNeeded() in c:\Builds\RavenDB-Stable\Raven.Database\Impl\ExceptionAggregator.cs:line 38
at Raven.Database.Server.Abstractions.HttpListenerResponseAdapter.Close() in c:\Builds\RavenDB-Stable\Raven.Database\Server\Abstractions\HttpListenerResponseAdapter.cs:line 94
at Raven.Database.Server.Abstractions.HttpListenerContextAdpater.FinalizeResponse() in c:\Builds\RavenDB-Stable\Raven.Database\Server\Abstractions\HttpListenerContextAdpater.cs:line 92
---> (Inner Exception #0) System.Net.HttpListenerException (0x80004005): An operation was attempted on a nonexistent network connection
at System.Net.HttpResponseStream.Dispose(Boolean disposing)
at System.IO.Stream.Close()
at Raven.Database.Util.Streams.BufferPoolStream.Dispose(Boolean disposing) in c:\Builds\RavenDB-Stable\Raven.Database\Util\Streams\BufferPoolStream.cs:line 144
at System.IO.Stream.Close()
at Raven.Database.Impl.ExceptionAggregator.Execute(Action action) in c:\Builds\RavenDB-Stable\Raven.Database\Impl\ExceptionAggregator.cs:line 23<---
Solved this. Although I don't understand the full reason why.
On the SLAVE server, you must set the raven.server.exe config file to have the following key:
<add key="Raven/AnonymousAccess" value="All"/>
The default was
<add key="Raven/AnonymousAccess" value="Get"/>.
The default worked fine when the master and slave were on the same machine. But when the master and slave were on separate machines (either on the LAN, or across the WAN) replication failed.
I could never find a log entry on the master that pointed toward his problem. The only log entry I could see was on the slave which said that Raven/Replication/Sources/ was not found. I realized that the master was connecting to the slave, but the slave was unable to create the "Raven/Replication/Sources/" document remotely.