How to set producer/publisher socket timeout for RabbitMQ (pika)? - rabbitmq

Is there a way to make set the socket timeout when publishing?
I'm testing correct recovery from lost connection with Pika, by
establishing a BlockingConnection connection
disconnecting from the network to force an error
reestablishing a connection and checking that the producer reconnects correctly and continues producing.
However, I don't seem to be able to set the socket timeout and basic_publish hangs - for WAY more than 5 seconds -- 60 or more.
credentials = pika.PlainCredentials(worker_config.username, worker_config.password)
connection = pika.BlockingConnection(pika.ConnectionParameters(
host=worker_config.host,
credentials=credentials,
port=worker_config.port,
connection_attempts=1,
retry_delay=5,
socket_timeout=5,
))
# No effect
#connection._impl.socket.settimeout(5)
channel = connection.channel()
while True:
result = channel.basic_publish(
exchange=EXCHANGE,
routing_key=ROUTING_KEY,
body=message,
properties=pika.BasicProperties(
delivery_mode, # MQ_TRANSIENT_DELIVERY_MODE, #1
))
# Someone after some success, disconnect network.
Pika comes into (select_connection.py):
def poll(self, write_only=False):
"""Poll until the next timeout waiting for an event
:param bool write_only: Only process write events
"""
while True:
try:
events = self._poll.poll(self.get_next_deadline())
break
except _SELECT_ERROR as error:
if _get_select_errno(error) == errno.EINTR:
continue
else:
raise
... and indeed, get_next_deadline is sending 5.
_poll is a python Poll object which takes a timeout in seconds.
What's up with this?
There's a similar question, but has no answers (not enough detail?)

Related

mysql python multiprocessing pool issues

Error I keep getting:
Lost connection to MySQL server during query
My code:
def runDBQuery(bl_sel):
dbResponse = []
bl_cur.execute(bl_sel)
myresult2 = bl_cur.fetchall()
dbResponse.append(myresult2)
return(dbResponse)
if __name__ == '__main__':
p1abl_sel = bl_sel_template.replace("{firstupc}",p1afirstupc).replace("{lastupc}",p1alastupc)
p2abl_sel = bl_sel_template.replace("{firstupc}",p2afirstupc).replace("{lastupc}",p2alastupc)
list_of_columns = [ p1abl_sel, p2abl_sel ]
#list_of_columns = [ p1abl_sel ]
p = Pool(processes=2)
data = p.map(runDBQuery, [i for i in list_of_columns])
# the 4 lines below are my failed attempts to try to resolve this.
bl_cur.close()
if cur and con:
cur.close()
con.close()
p.close()
print(data)
Whenever I uncomment the list_of_columns so there's only one element(query) in the list, it works and I get back a response from the DB. However, if I have more than one element in the list, I encounter the listed error.
Can anyone help me solve this problem?
The problem can be not in your code.
MySQL error "Lost connection to MySQL server during query" can accrue because of reading timeout. It can be either on the client side or mysql server configuration
MySQL
max_execution_time: The execution timeout for SELECT statements, in milliseconds. If the value is 0, timeouts are not enabled.
connect_timeout: Number of seconds the mysqld server waits for a connect packet before responding with 'Bad handshake'
interactive_timeout Number of seconds the server waits for activity on an interactive connection before closing it
wait_timeout Number of seconds the server waits for activity on a connection before closing it
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_execution_time
For pyMysql check read_timeout
https://pymysql.readthedocs.io/en/latest/modules/connections.html

is there a way to setup timeout in grpc server side?

Unable to timeout a grpc connection from server side. It is possible that client establishes a connection but kept on hold/sleep which is resulting in grpc server connection to hang. Is there a way at server side to disconnect the connection after a certain time or set the timeout?
We tried disconnecting the connection from client side but unable to do so from server side. In this link Problem with gRPC setup. Getting an intermittent RPC unavailable error, Angad says that it is possible but unable to define those parameters in python.
My code snippet:
def serve():
server = grpc.server(thread_pool=futures.ThreadPoolExecutor(max_workers=2), maximum_concurrent_rpcs=None, options=(('grpc.so_reuseport', 1),('grpc.GRPC_ARG_KEEPALIVE_TIME_MS', 1000)))
stt_pb2_grpc.add_ListenerServicer_to_server(Listener(), server)
server.add_insecure_port("localhost:50051")
print("Server starting in port "+str(50051))
server.start()
try:
while True:
time.sleep(60 * 60 * 24)
except KeyboardInterrupt:
server.stop(0)
if __name__ == '__main__':
serve()
I expect the connection should be timed out from grpc server side too in python.
In short, you may find context.abort(...) useful, see API reference. Timeout a server handler is not supported by the underlying C-Core API of gRPC Python. So, you have to implement your own timeout mechanism in Python.
You can try out some solution from other StackOverflow questions.
Or use a simple-but-big-overhead extra threads to abort the connection after certain length of time. It might look like this:
_DEFAULT_TIME_LIMIT_S = 5
class FooServer(FooServicer):
def RPCWithTimeLimit(self, request, context):
rpc_ended = threading.Condition()
work_finished = threading.Event()
def wrapper(...):
YOUR_ACTUAL_WORK(...)
work_finished.set()
rpc_ended.notify_all()
def timer():
time.sleep(_DEFAULT_TIME_LIMIT_S)
rpc_ended.notify_all()
work_thread = threading.Thread(target=wrapper, ...)
work_thread.daemon = True
work_thread.start()
timer_thread = threading.Thread(target=timer)
timer_thread.daemon = True
timer_thread.start()
rpc_ended.wait()
if work_finished.is_set():
return NORMAL_RESPONSE
else:
context.abort(grpc.StatusCode.DEADLINE_EXCEEDED, 'RPC Time Out!')

Mosquitto TLS sets auto payload size limit

I've implemented an unsecured mosquitto broker which works fantastically to send large amount of data periodically (~200kb file once per minute) over port 1883.
Since i've implemented TLS, the broker seems to reject data >128kb automatically over port 8883 despite setting the message_size_limit = 0.
heres my mosquitto.conf:
listener 1883 localhost
listener 8883
certfile /etc/letsencrypt/live/example.com/cert.pem
cafile /etc/letsencrypt/live/example.com/chain.pem
keyfile /etc/letsencrypt/live/example.com/privkey.pem
And heres my script which is used to test the broker which works fine without TLS over 1883
client = mqtt.Client("test")
client.tls_set(certfile="./mqtt/cert.pem", keyfile="./mqtt/key.pem")
client.connect("example.com", 8883)
#publish file as zip
with open("./mqtt/20180319171000.gz", 'rb') as f:
byte_array = f.read()
m.update(byte_array)
file_hash = m.hexdigest()
payload_json = {'byte_array': byte_array, 'md5': file_hash}
client.publish("topic", pickle.dumps(payload_json), 0)
time.sleep(1)
client.disconnect()
Is there a limit on the payload size with TLS or is something wrong with my setting/script?
The problem here is that the MQTT Client loop is not being run.
When the payload is larger than can fit in a single TCP packet the call to client.publish() needs to queue up the rest of the message and this is then broken up into multiple packets and sent via the client loop.
The correct response is not to increase the keepalive period. There are 2 ways to solve this with the python Paho library.
First you can use the Publish class instead of the Client class. This includes a one function that handles all background tasks required to ensure the whole message is delivered.
import paho.mqtt.publish as publish
tls_opt = {
'certfile':"./mqtt/cert.pem",
'keyfile':"./mqtt/key.pem"
}
with open("./mqtt/20180319171000.gz", 'rb') as f:
byte_array = f.read()
m.update(byte_array)
file_hash = m.hexdigest()
payload_json = {'byte_array': byte_array, 'md5': file_hash}
publish.single("topic", payload=pickle.dumps(payload_json), qos=0, hostname="example.com", port=8883, tls=tls_opt)
Second is to start the network loop as follows:
client = mqtt.Client("test")
client.tls_set(certfile="./mqtt/cert.pem", keyfile="./mqtt/key.pem")
client.connect("example.com", 8883)
client.loop_start()
#publish file as zip
with open("./mqtt/20180319171000.gz", 'rb') as f:
byte_array = f.read()
m.update(byte_array)
file_hash = m.hexdigest()
payload_json = {'byte_array': byte_array, 'md5': file_hash}
client.publish("topic", pickle.dumps(payload_json), 0)
time.sleep(1)
client.loop_stop()
client.disconnect()
An old question, but I experienced the same issue with large messages (>500kb). My solution was to increase the keepalive on the client from (default) 60 to 300 sec.
This is probably related to timeout for TLS encrypton on large messages taking longer than keepalive.
Edit: Added python-code for connect:
client.connect(
host="example.com",
port=8883,
keepalive=300)
Update:
I found this question looking for answers to a problem that looked similar to mine, that is MQTT publish failed for large (> 500kb) paylods when using MQTT TLS. As #hardillb indicates in his answer, OP is missing client.loop_start(). This does not fix my problem, however.
keepalive should have no impact, but that is just not the case. Increasing the value definetely fixes the problem. My theory is that the broker failes the connection on timeout because it tries to PING the client, but the client refuses to respond withion keepalive because it is busy trying to encrypt the message. This is just a theory, though.
I've created some test code to illustrate the problem. I also included a "last will" to check if the connection is lost without a proper disconnect(), and it seems to fit my theory. Using too small keepalive definetely activates the last will on the broker, indicating a "timeout".
Increasing the keepalive does not activate "last will" on the broker.
Here is my code I used to test different keepalive values and payload sizes.
import paho.mqtt.client as mqtt_client
import time
from datetime import datetime
password = 'somepassword'
user = 'someuser'
address = 'somebroker.no'
connected = False
def on_connect(client, userdata, flags, rc):
global connected
connected = True
print("Connected!")
def on_disconnect(client, userdata, rc):
global connected
connected = False
print("Disconnected!")
client = mqtt_client.Client()
client.username_pw_set(user, password)
client.on_connect = on_connect
client.on_disconnect = on_disconnect
client.tls_set()
client.will_set(topic='tls_test/connected', payload='False', qos=0, retain=True)
client.connect(host=address, port=8883, keepalive=100)
client.loop_start()
while not connected:
time.sleep(1)
topic = 'tls_test/abc'
payload = 'a'*1000000
start = time.time()
print('Start: {}'.format(datetime.fromtimestamp(start).strftime('%H:%M:%S')))
result = client.publish(topic='tls_test/connected', payload='True', qos=0, retain=True)
result = client.publish(topic=topic, payload=payload)
if result.rc != 0:
print("MQTT Publish failed: {}".format(result.rc))
exit()
client.loop_stop()
client.disconnect()
stop = time.time()
print('Stop: {}, delta: {} sec'.format(datetime.fromtimestamp(stop).strftime('%H:%M:%S'), stop-start))
Usig the code above (keepalive=100), it sends 1000.000 bytes and tls_test/connected has the value True on the broker after finishing.
Data is transmitted successfully, The console output is:
python3 .\mqtt_tls.py
Connected!
Start: 10:51:16
Disconnected!
Stop: 10:53:01, delta: 105.57992386817932 sec
Reducing the keepalive (keepalive=10), transmission fails and tls_test/connected has the value False on the broker after finishing.
Data transmit fails, and the console output is:
python3 .\mqtt_tls.py
Connected!
Start: 11:08:23
Disconnected!
Disconnected!
Stop: 11:08:43, delta: 19.537118196487427 sec
Tailing /var/log/mosquitto/mosquitto.log on the broker gives the following error message:
1612346903: New client connected from x.x.x.x as xxx (c1, k10, u'someuser').
1612346930: Socket error on client xxx, disconnecting.
My conclusion is: keepalive does have an impact on large payloads when using TLS

create a mass of connections on client side by twisted

I use twisted to do a test job for a server. I need create a lot of connections connect to the server. This is my code:
class Account(Protocol):
def connectionMade(self):
print "connection made"
def connectionLost(self, reason):
print "connection Lost. reason: ", reason
def createAccount(self, name):
self.transport.write(...)
print "create account: ", name
class AccountFactory(Factory):
def buildProtocol(self, addr):
return Account()
def accountCreate(p, i):
print "begin create"
p.createAccount(NAME_PREFIX+str(i))
def onError(err):
return 'error: ', err
c = 0
while c < 100:
accountPoint = TCP4ClientEndpoint(reactor, server_ip, port)
accountConn = accountPoint.connect(AccountFactory())
accountConn.addCallback(accountCreate, c)
accountConn.addErrback(onError)
c += 1
reactor.run()
If server and client located in same LAN, there is no problem, all of 100 "create account: xxx" will printed. But when I put server on a remote address(internet), the client only prints near 50% number of "create account: xxx". onError doesn't fire.
The log is:
2014-07-29 15:57:06+0800 [Uninitialized] connection made
2014-07-29 15:57:06+0800 [Uninitialized] begin create
2014-07-29 15:57:06+0800 [Uninitialized] create account: xxx
repeat 60 times
2014-07-29 15:57:17+0800 [Uninitialized] Stopping factory <__main__.AccountFactory instance at xxx>
repeat 40 times
Some callback failed to be calling, even the connection haven't be made. The only different is the latency between server and client.
The most interested thing is the duration between first success log and first "Stopping factory" log is exactly 20 seconds(I try this many times). But I am sure this is not caused by timeout because TCP4ClientEndpoint default timeout is 30 seconds.
And the log time stamp is also abnormal, the log time stamp is in bundle, for example: 10 logs are 2014-07-29 17:25:09, 20 logs are 2014-07-29 17:25:15. If the connection made in async manner, the time stamp should be random enough. It should not gather together: made 10 connections at time point a, made another 20 at time point a+15sec. Or log utility problem?
Revised:
I think this is bug of twisted. The reason of "Stopping" is timeout. When I run this in linux, the time duration between first log and first stopping is timeout seconds I passed into TCP4ClientEndpoint, but under windows whatever I set the timeout seconds, the duration always 21 seconds. I use socket(blocking) to do same thing instead, all is pretty good. So this should be a bug in twisted which involve timeout when making a lot of connections.
You haven't added any error handlers to your code, nor have you enabled logging so that unhandled errors will be reported anywhere.
Enable logging, either by calling twisted.python.log.startLogging or by writing your code as an ISeviceMaker plugin and running it with twistd.
And add errbacks to each Deferred in your application so you can handle failures from their associate operations.

Pika + RabbitMQ: setting basic_qos to prefetch=1 still appears to consume all messages in the queue

I've got a python worker client that spins up a 10 workers which each hook onto a RabbitMQ queue. A bit like this:
#!/usr/bin/python
worker_count=10
def mqworker(queue, configurer):
connection = pika.BlockingConnection(pika.ConnectionParameters(host='mqhost'))
channel = connection.channel()
channel.queue_declare(queue=qname, durable=True)
channel.basic_consume(callback,queue=qname,no_ack=False)
channel.basic_qos(prefetch_count=1)
channel.start_consuming()
def callback(ch, method, properties, body):
doSomeWork();
ch.basic_ack(delivery_tag = method.delivery_tag)
if __name__ == '__main__':
for i in range(worker_count):
worker = multiprocessing.Process(target=mqworker)
worker.start()
The issue I have is that despite setting basic_qos on the channel, the first worker to start accepts all the messages off the queue, whilst the others sit there idle. I can see this in the rabbitmq interface, that even when I set worker_count to be 1 and dump 50 messages on the queue, all 50 go into the 'unacknowledged' bucket, whereas I'd expect 1 to become unacknowledged and the other 49 to be ready.
Why isn't this working?
I appear to have solved this by moving where basic_qos is called.
Placing it just after channel = connection.channel() appears to alter the behaviour to what I'd expect.