Unable to timeout a grpc connection from server side. It is possible that client establishes a connection but kept on hold/sleep which is resulting in grpc server connection to hang. Is there a way at server side to disconnect the connection after a certain time or set the timeout?
We tried disconnecting the connection from client side but unable to do so from server side. In this link Problem with gRPC setup. Getting an intermittent RPC unavailable error, Angad says that it is possible but unable to define those parameters in python.
My code snippet:
def serve():
server = grpc.server(thread_pool=futures.ThreadPoolExecutor(max_workers=2), maximum_concurrent_rpcs=None, options=(('grpc.so_reuseport', 1),('grpc.GRPC_ARG_KEEPALIVE_TIME_MS', 1000)))
stt_pb2_grpc.add_ListenerServicer_to_server(Listener(), server)
server.add_insecure_port("localhost:50051")
print("Server starting in port "+str(50051))
server.start()
try:
while True:
time.sleep(60 * 60 * 24)
except KeyboardInterrupt:
server.stop(0)
if __name__ == '__main__':
serve()
I expect the connection should be timed out from grpc server side too in python.
In short, you may find context.abort(...) useful, see API reference. Timeout a server handler is not supported by the underlying C-Core API of gRPC Python. So, you have to implement your own timeout mechanism in Python.
You can try out some solution from other StackOverflow questions.
Or use a simple-but-big-overhead extra threads to abort the connection after certain length of time. It might look like this:
_DEFAULT_TIME_LIMIT_S = 5
class FooServer(FooServicer):
def RPCWithTimeLimit(self, request, context):
rpc_ended = threading.Condition()
work_finished = threading.Event()
def wrapper(...):
YOUR_ACTUAL_WORK(...)
work_finished.set()
rpc_ended.notify_all()
def timer():
time.sleep(_DEFAULT_TIME_LIMIT_S)
rpc_ended.notify_all()
work_thread = threading.Thread(target=wrapper, ...)
work_thread.daemon = True
work_thread.start()
timer_thread = threading.Thread(target=timer)
timer_thread.daemon = True
timer_thread.start()
rpc_ended.wait()
if work_finished.is_set():
return NORMAL_RESPONSE
else:
context.abort(grpc.StatusCode.DEADLINE_EXCEEDED, 'RPC Time Out!')
Related
Error I keep getting:
Lost connection to MySQL server during query
My code:
def runDBQuery(bl_sel):
dbResponse = []
bl_cur.execute(bl_sel)
myresult2 = bl_cur.fetchall()
dbResponse.append(myresult2)
return(dbResponse)
if __name__ == '__main__':
p1abl_sel = bl_sel_template.replace("{firstupc}",p1afirstupc).replace("{lastupc}",p1alastupc)
p2abl_sel = bl_sel_template.replace("{firstupc}",p2afirstupc).replace("{lastupc}",p2alastupc)
list_of_columns = [ p1abl_sel, p2abl_sel ]
#list_of_columns = [ p1abl_sel ]
p = Pool(processes=2)
data = p.map(runDBQuery, [i for i in list_of_columns])
# the 4 lines below are my failed attempts to try to resolve this.
bl_cur.close()
if cur and con:
cur.close()
con.close()
p.close()
print(data)
Whenever I uncomment the list_of_columns so there's only one element(query) in the list, it works and I get back a response from the DB. However, if I have more than one element in the list, I encounter the listed error.
Can anyone help me solve this problem?
The problem can be not in your code.
MySQL error "Lost connection to MySQL server during query" can accrue because of reading timeout. It can be either on the client side or mysql server configuration
MySQL
max_execution_time: The execution timeout for SELECT statements, in milliseconds. If the value is 0, timeouts are not enabled.
connect_timeout: Number of seconds the mysqld server waits for a connect packet before responding with 'Bad handshake'
interactive_timeout Number of seconds the server waits for activity on an interactive connection before closing it
wait_timeout Number of seconds the server waits for activity on a connection before closing it
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_execution_time
For pyMysql check read_timeout
https://pymysql.readthedocs.io/en/latest/modules/connections.html
I've implemented an unsecured mosquitto broker which works fantastically to send large amount of data periodically (~200kb file once per minute) over port 1883.
Since i've implemented TLS, the broker seems to reject data >128kb automatically over port 8883 despite setting the message_size_limit = 0.
heres my mosquitto.conf:
listener 1883 localhost
listener 8883
certfile /etc/letsencrypt/live/example.com/cert.pem
cafile /etc/letsencrypt/live/example.com/chain.pem
keyfile /etc/letsencrypt/live/example.com/privkey.pem
And heres my script which is used to test the broker which works fine without TLS over 1883
client = mqtt.Client("test")
client.tls_set(certfile="./mqtt/cert.pem", keyfile="./mqtt/key.pem")
client.connect("example.com", 8883)
#publish file as zip
with open("./mqtt/20180319171000.gz", 'rb') as f:
byte_array = f.read()
m.update(byte_array)
file_hash = m.hexdigest()
payload_json = {'byte_array': byte_array, 'md5': file_hash}
client.publish("topic", pickle.dumps(payload_json), 0)
time.sleep(1)
client.disconnect()
Is there a limit on the payload size with TLS or is something wrong with my setting/script?
The problem here is that the MQTT Client loop is not being run.
When the payload is larger than can fit in a single TCP packet the call to client.publish() needs to queue up the rest of the message and this is then broken up into multiple packets and sent via the client loop.
The correct response is not to increase the keepalive period. There are 2 ways to solve this with the python Paho library.
First you can use the Publish class instead of the Client class. This includes a one function that handles all background tasks required to ensure the whole message is delivered.
import paho.mqtt.publish as publish
tls_opt = {
'certfile':"./mqtt/cert.pem",
'keyfile':"./mqtt/key.pem"
}
with open("./mqtt/20180319171000.gz", 'rb') as f:
byte_array = f.read()
m.update(byte_array)
file_hash = m.hexdigest()
payload_json = {'byte_array': byte_array, 'md5': file_hash}
publish.single("topic", payload=pickle.dumps(payload_json), qos=0, hostname="example.com", port=8883, tls=tls_opt)
Second is to start the network loop as follows:
client = mqtt.Client("test")
client.tls_set(certfile="./mqtt/cert.pem", keyfile="./mqtt/key.pem")
client.connect("example.com", 8883)
client.loop_start()
#publish file as zip
with open("./mqtt/20180319171000.gz", 'rb') as f:
byte_array = f.read()
m.update(byte_array)
file_hash = m.hexdigest()
payload_json = {'byte_array': byte_array, 'md5': file_hash}
client.publish("topic", pickle.dumps(payload_json), 0)
time.sleep(1)
client.loop_stop()
client.disconnect()
An old question, but I experienced the same issue with large messages (>500kb). My solution was to increase the keepalive on the client from (default) 60 to 300 sec.
This is probably related to timeout for TLS encrypton on large messages taking longer than keepalive.
Edit: Added python-code for connect:
client.connect(
host="example.com",
port=8883,
keepalive=300)
Update:
I found this question looking for answers to a problem that looked similar to mine, that is MQTT publish failed for large (> 500kb) paylods when using MQTT TLS. As #hardillb indicates in his answer, OP is missing client.loop_start(). This does not fix my problem, however.
keepalive should have no impact, but that is just not the case. Increasing the value definetely fixes the problem. My theory is that the broker failes the connection on timeout because it tries to PING the client, but the client refuses to respond withion keepalive because it is busy trying to encrypt the message. This is just a theory, though.
I've created some test code to illustrate the problem. I also included a "last will" to check if the connection is lost without a proper disconnect(), and it seems to fit my theory. Using too small keepalive definetely activates the last will on the broker, indicating a "timeout".
Increasing the keepalive does not activate "last will" on the broker.
Here is my code I used to test different keepalive values and payload sizes.
import paho.mqtt.client as mqtt_client
import time
from datetime import datetime
password = 'somepassword'
user = 'someuser'
address = 'somebroker.no'
connected = False
def on_connect(client, userdata, flags, rc):
global connected
connected = True
print("Connected!")
def on_disconnect(client, userdata, rc):
global connected
connected = False
print("Disconnected!")
client = mqtt_client.Client()
client.username_pw_set(user, password)
client.on_connect = on_connect
client.on_disconnect = on_disconnect
client.tls_set()
client.will_set(topic='tls_test/connected', payload='False', qos=0, retain=True)
client.connect(host=address, port=8883, keepalive=100)
client.loop_start()
while not connected:
time.sleep(1)
topic = 'tls_test/abc'
payload = 'a'*1000000
start = time.time()
print('Start: {}'.format(datetime.fromtimestamp(start).strftime('%H:%M:%S')))
result = client.publish(topic='tls_test/connected', payload='True', qos=0, retain=True)
result = client.publish(topic=topic, payload=payload)
if result.rc != 0:
print("MQTT Publish failed: {}".format(result.rc))
exit()
client.loop_stop()
client.disconnect()
stop = time.time()
print('Stop: {}, delta: {} sec'.format(datetime.fromtimestamp(stop).strftime('%H:%M:%S'), stop-start))
Usig the code above (keepalive=100), it sends 1000.000 bytes and tls_test/connected has the value True on the broker after finishing.
Data is transmitted successfully, The console output is:
python3 .\mqtt_tls.py
Connected!
Start: 10:51:16
Disconnected!
Stop: 10:53:01, delta: 105.57992386817932 sec
Reducing the keepalive (keepalive=10), transmission fails and tls_test/connected has the value False on the broker after finishing.
Data transmit fails, and the console output is:
python3 .\mqtt_tls.py
Connected!
Start: 11:08:23
Disconnected!
Disconnected!
Stop: 11:08:43, delta: 19.537118196487427 sec
Tailing /var/log/mosquitto/mosquitto.log on the broker gives the following error message:
1612346903: New client connected from x.x.x.x as xxx (c1, k10, u'someuser').
1612346930: Socket error on client xxx, disconnecting.
My conclusion is: keepalive does have an impact on large payloads when using TLS
Is there a way to make set the socket timeout when publishing?
I'm testing correct recovery from lost connection with Pika, by
establishing a BlockingConnection connection
disconnecting from the network to force an error
reestablishing a connection and checking that the producer reconnects correctly and continues producing.
However, I don't seem to be able to set the socket timeout and basic_publish hangs - for WAY more than 5 seconds -- 60 or more.
credentials = pika.PlainCredentials(worker_config.username, worker_config.password)
connection = pika.BlockingConnection(pika.ConnectionParameters(
host=worker_config.host,
credentials=credentials,
port=worker_config.port,
connection_attempts=1,
retry_delay=5,
socket_timeout=5,
))
# No effect
#connection._impl.socket.settimeout(5)
channel = connection.channel()
while True:
result = channel.basic_publish(
exchange=EXCHANGE,
routing_key=ROUTING_KEY,
body=message,
properties=pika.BasicProperties(
delivery_mode, # MQ_TRANSIENT_DELIVERY_MODE, #1
))
# Someone after some success, disconnect network.
Pika comes into (select_connection.py):
def poll(self, write_only=False):
"""Poll until the next timeout waiting for an event
:param bool write_only: Only process write events
"""
while True:
try:
events = self._poll.poll(self.get_next_deadline())
break
except _SELECT_ERROR as error:
if _get_select_errno(error) == errno.EINTR:
continue
else:
raise
... and indeed, get_next_deadline is sending 5.
_poll is a python Poll object which takes a timeout in seconds.
What's up with this?
There's a similar question, but has no answers (not enough detail?)
I wirite a very simple spider program to fetch webpages from single site.
Here is a minimized version.
from twisted.internet import epollreactor
epollreactor.install()
from twisted.internet import reactor
from twisted.web.client import Agent, HTTPConnectionPool, readBody
baseUrl = 'http://acm.zju.edu.cn/onlinejudge/showProblem.do?problemCode='
start = 1001
end = 3500
pool = HTTPConnectionPool(reactor)
pool.maxPersistentPerHost = 10
agent = Agent(reactor, pool=pool)
def onHeader(response, i):
deferred = readBody(response)
deferred.addCallback(onBody, i)
deferred.addErrback(errorHandler)
return response
def onBody(body, i):
print('Received %s, Length %s' % (i, len(body)))
def errorHandler(err):
print('%s : %s' % (reactor.seconds() - startTimeStamp, err))
def requestFactory():
for i in range (start, end):
deferred = agent.request('GET', baseUrl + str(i))
deferred.addCallback(onHeader, i)
deferred.addErrback(errorHandler)
print('Generated %s' % i)
reactor.iterate(1)
print('All requests has generated, elpased %s' % (reactor.seconds() - startTimeStamp))
startTimeStamp = reactor.seconds()
reactor.callWhenRunning(requestFactory)
reactor.run()
For a few requests, like 100, it works fine. But for massive requests, it will failed.
I expect all of the requests(around 3000) should be automatically pooled, scheduled and pipelined, since I use HTTPConnectionPool, set maxPersistentPerHost, create an Agent instance with it and incrementally create the connections.
But it doesn't, the connections are not keep-alive nor pooled.
In this programm, it did establish the connections incrementally, but the connections didn't pooled, each connecction will close after body received, and later requests never wait in the pool for an available connecction.
So it will take thousands of sockets, and finally failed due to timeout, because the remote server has a connection timeout set to 30s. Thousands of requests can't be done within 30s.
Could you please give me some help on this?
I have tried my best on this, here is my finds.
Error occured exactly 30s after reactor start runing, won't be influenced by other things.
Let the spider fetch my server, I find something interesting.
The HTTP protocol version is 1.1 (I check the twisted document, the default HTTPClient is 1.0 rather than 1.1)
If I didn't add any explicit header(just like the minimized version), the request header didn't contain Connection: Keep-Alive, either do response header.
If I add explicit header to ensure it is a keep-alive connection, the request header did contain Connection: Keep-Alive, but the response header still not. (I am sure my server behave correctly, other stuff like Chrome, wget did receive Connection: Keep-Alive header.)
I check /proc/net/sockstat during running, it increase rapidly at first, and decrease rapidly later. (I have increase the ulimit to support plenty of sockets)
I write a similar program with treq, a twisted based request library). The code is almost the same, so not paste here.
Link: https://gist.github.com/Preffer/dad9b1228fcd75cebd75
It's behavior is almost the same. Not pooling. It is expected to be pooling as described in treq's feature list.
If I have add explicit header on it, Connection: Keep-Alive never appear in response header.
Based on all of the above, I am quite suspicious about the quirk Connection: Keep-Alive header ruin the program. But this header is part of HTTP 1.1 standard, and it did report as HTTP 1.1. I am completely puzzled on this.
I solved the problem myself, with help from IRC and another question in stackoverflow, Queue remote calls to a Python Twisted perspective broker?
In summary, the agent's behavior is very different from that in Nodejs(I have some experience in Nodejs). As it described on Nodejs doc
agent.requests
An object which contains queues of requests that have not yet been assigned to sockets.
agent.maxSockets
By default set to 5. Determines how many concurrent sockets the agent can have open per origin. Origin is either a 'host:port' or 'host:port:localAddress' combination.
So, here is the difference.
Twisted:
There is no doubt that Agent could queue requests if construct with a HTTPConnectionPool instance.
But if a new request is issued after connections in pool has run out, the agent will still create a new connection and perform the request, rather than put it in a queue.
Actually, it will lead to drop a connection in the pool, and push the newly generated connection into the pool, keep the connections count still equal to maxPersistentPerHost
Nodejs:
By default, agent will queue the requests with a implicit connection pool, which have a size of 5 connections.
If a new request is issued after connections in pool has run out, the agent will queue the requests into agent.requests variable, waiting for available connection.
The agent's behavior in twisted lead to a result that the agent is able to queue the requests, but actually it doesn't.
Follow our intuition, once assign a connection pool to an agent, it is in line with the intuition that agent will only use the connections in the pool, and wait for available connection if the pool has run out. That is exactly match with the agent in Nodejs.
Personally, I think it is a buggy behavior in twisted, or at least, could make an improvement to provide an option to set agent's behavior.
According to this, I have to use DeferredSemaphore to manually schedule the requests.
I raise a issue to treq project on github, and get similar solution. https://github.com/dreid/treq/issues/71
Here is my solution.
#!/usr/bin/env python
from twisted.internet import epollreactor
epollreactor.install()
from twisted.internet import reactor
from twisted.web.client import Agent, HTTPConnectionPool, readBody
from twisted.internet.defer import DeferredSemaphore
baseUrl = 'http://acm.zju.edu.cn/onlinejudge/showProblem.do?problemCode='
start = 1001
end = 3500
count = end - start
concurrency = 10
pool = HTTPConnectionPool(reactor)
pool.maxPersistentPerHost = concurrency
agent = Agent(reactor, pool=pool)
sem = DeferredSemaphore(concurrency)
done = 0
def onHeader(response, i):
deferred = readBody(response)
deferred.addCallback(onBody, i)
deferred.addErrback(errorHandler, i)
return deferred
def onBody(body, i):
sem.release()
global done, count
done += 1
print('Received %s, Length %s, Done %s' % (i, len(body), done))
if(done == count):
print('All items fetched')
reactor.stop()
def errorHandler(err, i):
print('[%s] id %s: %s' % (reactor.seconds() - startTimeStamp, i, err))
def requestFactory(token, i):
deferred = agent.request('GET', baseUrl + str(i))
deferred.addCallback(onHeader, i)
deferred.addErrback(errorHandler, i)
print('Request send %s' % i)
#this function it self is a callback emit by reactor, so needn't iterate manually
#reactor.iterate(1)
return deferred
def assign():
for i in range (start, end):
sem.acquire().addCallback(requestFactory, i)
startTimeStamp = reactor.seconds()
reactor.callWhenRunning(assign)
reactor.run()
Is it right? Beg for pointing out my error and improvements.
For a few requests, like 100, it works fine. But for massive requests,
it will failed.
This is either a protection against web crawlers or a server protection against DoS/DDoS, because you are sending too much requests from the same IP in a short time, so the Firewall or the WSA will block your future request. Just modify your script to make request in batch spaced by some time. you can use callLater() with some time after each X request.
I'm implementing a Twisted-based Heartbeat Client/Server combo, based on this example. It is my first Twisted project.
Basically it consists of a UDP Listener (Receiver), who calls a listener method (DetectorService.update) on receiving packages. The DetectorService always holds a list of currently active/inactive clients (I extended the example a lot, but the core is still the same), making it possible to react on clients which seem disconnected for a specified timeout.
This is the source taken from the site:
UDP_PORT = 43278; CHECK_PERIOD = 20; CHECK_TIMEOUT = 15
import time
from twisted.application import internet, service
from twisted.internet import protocol
from twisted.python import log
class Receiver(protocol.DatagramProtocol):
"""Receive UDP packets and log them in the clients dictionary"""
def datagramReceived(self, data, (ip, port)):
if data == 'PyHB':
self.callback(ip)
class DetectorService(internet.TimerService):
"""Detect clients not sending heartbeats for too long"""
def __init__(self):
internet.TimerService.__init__(self, CHECK_PERIOD, self.detect)
self.beats = {}
def update(self, ip):
self.beats[ip] = time.time()
def detect(self):
"""Log a list of clients with heartbeat older than CHECK_TIMEOUT"""
limit = time.time() - CHECK_TIMEOUT
silent = [ip for (ip, ipTime) in self.beats.items() if ipTime < limit]
log.msg('Silent clients: %s' % silent)
application = service.Application('Heartbeat')
# define and link the silent clients' detector service
detectorSvc = DetectorService()
detectorSvc.setServiceParent(application)
# create an instance of the Receiver protocol, and give it the callback
receiver = Receiver()
receiver.callback = detectorSvc.update
# define and link the UDP server service, passing the receiver in
udpServer = internet.UDPServer(UDP_PORT, receiver)
udpServer.setServiceParent(application)
# each service is started automatically by Twisted at launch time
log.msg('Asynchronous heartbeat server listening on port %d\n'
'press Ctrl-C to stop\n' % UDP_PORT)
This heartbeat server runs as a daemon in background.
Now my Problem:
I need to be able to run a script "externally" to print the number of offline/online clients on the console, which the Receiver gathers during his lifetime (self.beats). Like this:
$ pyhb showactiveclients
3 clients online
$ pyhb showofflineclients
1 client offline
So I need to add some kind of additional server (Socket, Tcp, RPC - it doesn't matter. the main point is that i'm able to build a client-script with the above behavior) to my DetectorService, which allows to connect to it from outside. It should just give a response to a request.
This server needs to have access to the internal variables of the running detectorservice instance, so my guess is that I have to extend the DetectorService with some kind of additionalservice.
After some hours of trying to combine the detectorservice with several other services, I still don't have an idea what's the best way to realize that behavior. So I hope that somebody can give me at least the essential hint how to start to solve this problem.
Thanks in advance!!!
I think you already have the general idea of the solution here, since you already applied it to an interaction between Receiver and DetectorService. The idea is for your objects to have references to other objects which let them do what they need to do.
So, consider a web service that responds to requests with a result based on the beats data:
from twisted.web.resource import Resource
class BeatsResource(Resource):
# It has no children, let it respond to the / URL for brevity.
isLeaf = True
def __init__(self, detector):
Resource.__init__(self)
# This is the idea - BeatsResource has a reference to the detector,
# which has the data needed to compute responses.
self._detector = detector
def render_GET(self, request):
limit = time.time() - CHECK_TIMEOUT
# Here, use that data.
beats = self._detector.beats
silent = [ip for (ip, ipTime) in beats.items() if ipTime < limit]
request.setHeader('content-type', 'text/plain')
return "%d silent clients" % (len(silent),)
# Integrate this into the existing application
application = service.Application('Heartbeat')
detectorSvc = DetectorService()
detectorSvc.setServiceParent(application)
.
.
.
from twisted.web.server import Site
from twisted.application.internet import TCPServer
# The other half of the idea - make sure to give the resource that reference
# it needs.
root = BeatsResource(detectorSvc)
TCPServer(8080, Site(root)).setServiceParent(application)