We are using RabbitMQ for celery tasks execution. We were having one queue operating over 230000 tasks which was crashed yesterday with below log,
<code>2019-02-11 22:30:32,770 WARNING 13003 [celery.worker.consumer] consumer.py:289 - consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 278, in start
blueprint.start(self)
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
step.start(parent)
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 821, in start
c.loop(*c.loop_args())
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/celery/worker/loops.py", line 70, in asynloop
next(loop)
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/kombu/async/hub.py", line 340, in create_loop
cb(*cbargs)
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/kombu/transport/base.py", line 164, in on_readable
reader(loop)
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/kombu/transport/base.py", line 146, in _read
drain_events(timeout=0)
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/amqp/connection.py", line 324, in drain_events
return amqp_method(channel, args)
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/amqp/channel.py", line 1647, in _basic_cancel_notify
raise ConsumerCancelled(consumer_tag, (60, 30))
ConsumerCancelled: Basic.cancel: (0) None8
2019-02-11 22:30:32,878 INFO 13003 [celery.worker.consumer] consumer.py:479 - Connected to amqp://celery:**#127.0.0.1:5672//
2019-02-11 22:31:20,308 ERROR 13003 [celery.worker.consumer] consumer.py:364 - consumer: Cannot connect to amqp://celery:**#127.0.0.1:5672//: [Errno 104] Connection res$
Trying again in 2.00 seconds...
</code>
After crashed rabbitmq i have restarted again using below command:
sudo service rabbitmq-server restart
Once rabbitmq restart i lost my all queues. My queue durability was Durable and message delivery mode was non-persistent.
Is there any way we can recover messages which was in queue ? It was having very important data of user which were under processing.
Nope. Non-persistent means they were in RAM, not stored on the disk.
A general comment - RabbitMQ is not a database. Even if you had set the queues to be persistent, expecting a message broker to reliably handle temporary storage of 200,000 messages is madness. Your system should be designed such that the broker is a buffer between tasks, with average queue length of zero. If you find such large numbers, either speed up processing or store in a database designed to be able to survive occasional restarts with little to no consequences.
Related
I am trying to get a connection with the Nessus server with the bellow command in python but it failed with an error message can you tell me what can be the cause. I have checked my network connection it is fine.
requests.post( 'https://164.99.175.30:8834/'+ '/session',data={'username':'admin','password':'micro#123'},verify=False)```
error message
Traceback (most recent call last):
File "nessus.py", line 425, in <module>
login()
File "nessus.py", line 111, in login
res = requests.post(url + '/session',data={'username':username,'password':password},verify=verify)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='localhost', port=8834): Max retries exceeded with url: /session (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f46f2d6d410>: Failed to establish a new connection: [Errno 111] Connection refused',))
The nessus api is depreciated as of version 7.x, this is the best source I could find.
EDIT: I have found a better source directly from tenable
What has been removed from Nessus 7:
There is a restriction in scan API capabilities.
The ability to manage scans via API and CLI has been removed in v7. All Nessus Pro scanning operations must be done through the user interface.
So currently the ability of the Nessus API is as follows:
Removed the ability to run scans or reports and create new objects
The Read features, where the ability to pull scan data so GET /scan/scan ID now works again and this aids with some of the integration processes.
https://community.tenable.com/s/article/The-differences-between-Nessus-6-and-Nessus-7
This is only for Nessus pro versions
I am using the Prometheus Client library to push metrics to pushgateway. Frequently I am getting below error while pushing the metrics. How can I find the root cause of this issue?
push_to_gateway(
File "/usr/local/lib/python3.8/dist-packages/prometheus_client/exposition.py", line 285, in push_to_gateway
_use_gateway('PUT', gateway, job, registry, grouping_key, timeout, handler)
File "/usr/local/lib/python3.8/dist-packages/prometheus_client/exposition.py", line 358, in _use_gateway
handler(
File "/usr/local/lib/python3.8/dist-packages/prometheus_client/exposition.py", line 217, in handle
resp = build_opener(HTTPHandler).open(request, timeout=timeout)
File "/usr/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/lib/python3.8/urllib/request.py", line 1369, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/usr/lib/python3.8/urllib/request.py", line 1330, in do_open
r = h.getresponse()
File "/usr/lib/python3.8/http/client.py", line 1332, in getresponse
response.begin()
File "/usr/lib/python3.8/http/client.py", line 303, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.8/http/client.py", line 272, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without respons
We use Kubernetes internally for deploying services, and here I am trying to Push metrics to push gateway using its ingress. Changing it to use Kubernetes service names instead of ingress reduced these errors significantly but this is not a portable solution incase if service is relocated to other cluster. Solution which worked for me is to do retry using python decorator functions
I'm using RPC Pattern for processing my objects with RabbitMQ.
You suspect,I have an object, and I want to have that process finishes and After that send ack to RPC Client.
Ack as default has a timeout about 3 Minutes.
My process Take long time.
How can I change this timeout for ack of each objects or what I must be do for handling these like processess?
Modern versions of RabbitMQ have a delivery acknowledgement timeout:
In modern RabbitMQ versions, a timeout is enforced on consumer delivery acknowledgement. This helps detect buggy (stuck) consumers that never acknowledge deliveries. Such consumers can affect node's on disk data compaction and potentially drive nodes out of disk space.
If a consumer does not ack its delivery for more than the timeout value (30 minutes by default), its channel will be closed with a PRECONDITION_FAILED channel exception. The error will be logged by the node that the consumer was connected to.
Error message will be:
Channel error on connection <####> :
operation none caused a channel exception precondition_failed: consumer ack timed out on channel 1
Timeout by default is 30 minutes (1,800,000ms)note 1 and is configured by the consumer_timeout parameter in rabbitmq.conf.
note 1: Timeout was 15 minutes (900,000ms) before RabbitMQ 3.8.17.
if you run rabbitmq in docker, you can describe volume with file rabbitmq.conf, then create this file inside volume and set consumer_timeout
for example:
docker compose
version: "2.4"
services:
rabbitmq:
image: rabbitmq:3.9.13-management-alpine
network_mode: host
container_name: 'you name'
ports:
- 5672:5672
- 15672:15672 ----- if you use gui for rabbit
volumes:
- /etc/rabbitmq/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
And you need create file
rabbitmq.conf
on you server by this way
/etc/rabbitmq/
documentation with params: https://github.com/rabbitmq/rabbitmq-server/blob/v3.8.x/deps/rabbit/docs/rabbitmq.conf.example
I have more than 4 servers up and running s3-auth server on them.
I am able to authenticate user from first server.
if we turn off first server, how to validate that s3 server is not running on first server through botocore, so it should automatically use next server.
When i turn off first server and send request to list users, response never comes from botocore. It retry operation for 5 times and after that it do nothing.
Botocore Version: 1.3.30
Boto3 version: 1.2.2
Please help on this.
See below botocore logs:
DEBUG:botocore.endpoint:Response received to retry, sleeping for 7.72060814652 seconds
DEBUG:botocore.hooks:Event request-created.iam.ListUsers: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f238444ccd0>>
DEBUG:botocore.auth:Calculating signature using v4 auth.
DEBUG:botocore.auth:CanonicalRequest:
POST
/
host:iam.test.com:8085
user-agent:Boto3/1.2.2 Python/2.7.5 Linux/3.10.0-229.11.1.el7.x86_64 Botocore/1.3.30
x-amz-date:20170322T131218Z
host;user-agent;x-amz-date
b6359072c78d70ebee1e81adcbab4f01bf2c23245fa365ef83fe8f1f955085e2
DEBUG:botocore.auth:StringToSign:
AWS4-HMAC-SHA256
20170322T131218Z
20170322/us-east-1/iam/aws4_request
e74bb593aaf7d92c8dfb517a4daedfe353ec4f9806a7b1c50bca7d7ed2e9e45e
DEBUG:botocore.auth:Signature:
adc96eb87a11f5b69214163bfafab7a82574113be0bcb0cddb43dfa70cbbc789
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [POST]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (5): iam.test.com
DEBUG:botocore.endpoint:ConnectionError received when sending HTTP request.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 174, in _get_response
proxies=self.proxies, timeout=self.timeout)
File "/usr/lib/python2.7/site-packages/botocore/vendored/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/site-packages/botocore/vendored/requests/adapters.py", line 415, in send
raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))
DEBUG:botocore.hooks:Event needs-retry.iam.ListUsers: calling handler <botocore.retryhandler.RetryHandler object at 0x7f23843a2790>
I run to issue that Celery worker connection with Rabbitmq met broken pipe error IN Gevent Mode. While no problem when Celery worker work in Process pool mode (without gevent without monkey patch).
After that, Celery workers will not get task messages from Rabbitmq anymore until they are restarted.
That issue always happen when the speed of Celery workers consuming task messages slower than Django applications producing messages, and about 3 thounds of messages piled in Rabbitmq.
Gevent version 1.1.0
Celery version 3.1.22
====== Celery log ======
[2016-08-08 13:52:06,913: CRITICAL/MainProcess] Couldn't ack 293, reason:error(32, 'Broken pipe')
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 93, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
self.channel_id, method_sig, args, content,
File "/usr/local/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
write_frame(1, channel, payload)
File "/usr/local/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
frame_type, channel, size, payload, 0xce,
File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 412, in sendall
timeleft = self.__send_chunk(chunk, flags, timeleft, end)
File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 351, in __send_chunk
data_sent += self.send(chunk, flags)
File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 320, in send
return sock.send(data, flags)
error: [Errno 32] Broken pipe
======= Rabbitmq log ==================
=ERROR REPORT==== 8-Aug-2016::14:28:33 ===
closing AMQP connection <0.15928.4> (10.26.39.183:60732 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
=ERROR REPORT==== 8-Aug-2016::14:29:03 ===
closing AMQP connection <0.15981.4> (10.26.39.183:60736 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
=ERROR REPORT==== 8-Aug-2016::14:29:03 ===
closing AMQP connection <0.15955.4> (10.26.39.183:60734 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
The similar issue appears when Celery worker use eventlet.
[2016-08-09 17:41:37,952: CRITICAL/MainProcess] Couldn't ack 583, reason:error(32, 'Broken pipe')
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 93, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
self.channel_id, method_sig, args, content,
File "/usr/local/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
write_frame(1, channel, payload)
File "/usr/local/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
frame_type, channel, size, payload, 0xce,
File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 385, in sendall
tail = self.send(data, flags)
File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 379, in send
return self._send_loop(self.fd.send, data, flags)
File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 366, in _send_loop
return send_method(data, *args)
error: [Errno 32] Broken pipe
Add setup and load test info
We use supervisor to launch celery with the following options
celery worker -A celerytasks.celery_worker_init -Q default -P gevent -c 1000 --loglevel=info
And Celery use Rabbitmq as broker.
And we have 4 Celery worker processes by specifying "numprocs=4" in supervisor configurations.
We use jmeter to emulate web access load, Django applications will produces tasks for Celery workers to consume. Those tasks basically need to access Mysql DB to get/update some data.
From rabbitmq web admin page, tasks-producing speed is like 50 per seconds while consuming speed is like 20 per seconds. After about 1 miniutes load testing, log file shows many connections between Rabbitmq and Celery met Broken-Pipe error
We noticed that this issue is also caused because of a combination of high prefect count along with high concurrency.
We had concurrency set to 500 and prefetch to 100, which means the ultimate prefetch is 500*100=50,000 per worker.
We had around 100k tasks piled up and because of this configuration one worker reserved all tasks for itself and other workers weren't even used, this one worker kept getting Broken pipe error and never acknowledge any task which lead to tasks being never cleared from the queue.
We then changed the prefetch to 3 and restarted all the workers which fixed the issue, after changing the prefetch down to a lower number we have seen 0 instances of Broken pipe error since we used to see it quite often before that.