I have written a program that does multiprocessing and does Scan, Get, and Set to Redis. The reason we went with Redis is to speed it up.
Does anybody have any recommendations. I tried putting a retry loop around the get statements (and can do that for set as well), but the Scan statements will be more tricky. And is there some parameter I scan change or increase to avoid this?
Error:
redis.exceptions.ConnectionError: Error 10048 connecting to 192.168.5.51:6379. Only one usage of each socket address (protocol/network address/port) is normally permitted.
File "C:\Apps\ProcessData\redis\connection.py", line 1192, in get_connection
connection.connect()
File "C:\Apps\ProcessData\redis\connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
File "C:\Apps\ProcessData\redis\connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
I think part two of the error was this: waiting for it to happen again to get the exact text:
Error 10048 connecting to 192.168.5.51:6379. Only one usage of each socket address (protocol/network address/port) is normally permitted.
For the code that does a redis-get, I am trying the following:
redis_max_retries = 72
redis_retries = 0
redis_get_success = False
sleep_time_in_seconds = .1
while not redis_get_success and redis_retries < redis_max_retries:
try:
json_str = redis_obj.get(redis_key)
redis_get_success = True
except redis.exceptions.ConnectionError as e:
redis_retries += 1
time.sleep(sleep_time_in_seconds)
if not redis_get_success:
print("redis_retries=", redis_retries, " on json_str = redis_obj.get(redis_key) in process_intervals" )
sys.exit(0)
But I also get the error on a scan, which look trickier to retry.
for keybatch in batcher(redis_obj.scan_iter(key_pattern), 1500):
batch_counter += 1
# print(batch_counter, "keybatch=", keybatch)
for key in keybatch:
if key is not None:
#print("Trying key=", key)
matching_keys_found += 1
etc...
Try to connect to online Redis hosting site like RedisLabs.com and connect using documentation or you can try changing the port.
Error is either arising from port or misconfigured Redis server setup.
Also for the same port and address see Only one usage of each socket address (protocol/network address/port) is normally permitted?
Related
Unable to timeout a grpc connection from server side. It is possible that client establishes a connection but kept on hold/sleep which is resulting in grpc server connection to hang. Is there a way at server side to disconnect the connection after a certain time or set the timeout?
We tried disconnecting the connection from client side but unable to do so from server side. In this link Problem with gRPC setup. Getting an intermittent RPC unavailable error, Angad says that it is possible but unable to define those parameters in python.
My code snippet:
def serve():
server = grpc.server(thread_pool=futures.ThreadPoolExecutor(max_workers=2), maximum_concurrent_rpcs=None, options=(('grpc.so_reuseport', 1),('grpc.GRPC_ARG_KEEPALIVE_TIME_MS', 1000)))
stt_pb2_grpc.add_ListenerServicer_to_server(Listener(), server)
server.add_insecure_port("localhost:50051")
print("Server starting in port "+str(50051))
server.start()
try:
while True:
time.sleep(60 * 60 * 24)
except KeyboardInterrupt:
server.stop(0)
if __name__ == '__main__':
serve()
I expect the connection should be timed out from grpc server side too in python.
In short, you may find context.abort(...) useful, see API reference. Timeout a server handler is not supported by the underlying C-Core API of gRPC Python. So, you have to implement your own timeout mechanism in Python.
You can try out some solution from other StackOverflow questions.
Or use a simple-but-big-overhead extra threads to abort the connection after certain length of time. It might look like this:
_DEFAULT_TIME_LIMIT_S = 5
class FooServer(FooServicer):
def RPCWithTimeLimit(self, request, context):
rpc_ended = threading.Condition()
work_finished = threading.Event()
def wrapper(...):
YOUR_ACTUAL_WORK(...)
work_finished.set()
rpc_ended.notify_all()
def timer():
time.sleep(_DEFAULT_TIME_LIMIT_S)
rpc_ended.notify_all()
work_thread = threading.Thread(target=wrapper, ...)
work_thread.daemon = True
work_thread.start()
timer_thread = threading.Thread(target=timer)
timer_thread.daemon = True
timer_thread.start()
rpc_ended.wait()
if work_finished.is_set():
return NORMAL_RESPONSE
else:
context.abort(grpc.StatusCode.DEADLINE_EXCEEDED, 'RPC Time Out!')
I've implemented an unsecured mosquitto broker which works fantastically to send large amount of data periodically (~200kb file once per minute) over port 1883.
Since i've implemented TLS, the broker seems to reject data >128kb automatically over port 8883 despite setting the message_size_limit = 0.
heres my mosquitto.conf:
listener 1883 localhost
listener 8883
certfile /etc/letsencrypt/live/example.com/cert.pem
cafile /etc/letsencrypt/live/example.com/chain.pem
keyfile /etc/letsencrypt/live/example.com/privkey.pem
And heres my script which is used to test the broker which works fine without TLS over 1883
client = mqtt.Client("test")
client.tls_set(certfile="./mqtt/cert.pem", keyfile="./mqtt/key.pem")
client.connect("example.com", 8883)
#publish file as zip
with open("./mqtt/20180319171000.gz", 'rb') as f:
byte_array = f.read()
m.update(byte_array)
file_hash = m.hexdigest()
payload_json = {'byte_array': byte_array, 'md5': file_hash}
client.publish("topic", pickle.dumps(payload_json), 0)
time.sleep(1)
client.disconnect()
Is there a limit on the payload size with TLS or is something wrong with my setting/script?
The problem here is that the MQTT Client loop is not being run.
When the payload is larger than can fit in a single TCP packet the call to client.publish() needs to queue up the rest of the message and this is then broken up into multiple packets and sent via the client loop.
The correct response is not to increase the keepalive period. There are 2 ways to solve this with the python Paho library.
First you can use the Publish class instead of the Client class. This includes a one function that handles all background tasks required to ensure the whole message is delivered.
import paho.mqtt.publish as publish
tls_opt = {
'certfile':"./mqtt/cert.pem",
'keyfile':"./mqtt/key.pem"
}
with open("./mqtt/20180319171000.gz", 'rb') as f:
byte_array = f.read()
m.update(byte_array)
file_hash = m.hexdigest()
payload_json = {'byte_array': byte_array, 'md5': file_hash}
publish.single("topic", payload=pickle.dumps(payload_json), qos=0, hostname="example.com", port=8883, tls=tls_opt)
Second is to start the network loop as follows:
client = mqtt.Client("test")
client.tls_set(certfile="./mqtt/cert.pem", keyfile="./mqtt/key.pem")
client.connect("example.com", 8883)
client.loop_start()
#publish file as zip
with open("./mqtt/20180319171000.gz", 'rb') as f:
byte_array = f.read()
m.update(byte_array)
file_hash = m.hexdigest()
payload_json = {'byte_array': byte_array, 'md5': file_hash}
client.publish("topic", pickle.dumps(payload_json), 0)
time.sleep(1)
client.loop_stop()
client.disconnect()
An old question, but I experienced the same issue with large messages (>500kb). My solution was to increase the keepalive on the client from (default) 60 to 300 sec.
This is probably related to timeout for TLS encrypton on large messages taking longer than keepalive.
Edit: Added python-code for connect:
client.connect(
host="example.com",
port=8883,
keepalive=300)
Update:
I found this question looking for answers to a problem that looked similar to mine, that is MQTT publish failed for large (> 500kb) paylods when using MQTT TLS. As #hardillb indicates in his answer, OP is missing client.loop_start(). This does not fix my problem, however.
keepalive should have no impact, but that is just not the case. Increasing the value definetely fixes the problem. My theory is that the broker failes the connection on timeout because it tries to PING the client, but the client refuses to respond withion keepalive because it is busy trying to encrypt the message. This is just a theory, though.
I've created some test code to illustrate the problem. I also included a "last will" to check if the connection is lost without a proper disconnect(), and it seems to fit my theory. Using too small keepalive definetely activates the last will on the broker, indicating a "timeout".
Increasing the keepalive does not activate "last will" on the broker.
Here is my code I used to test different keepalive values and payload sizes.
import paho.mqtt.client as mqtt_client
import time
from datetime import datetime
password = 'somepassword'
user = 'someuser'
address = 'somebroker.no'
connected = False
def on_connect(client, userdata, flags, rc):
global connected
connected = True
print("Connected!")
def on_disconnect(client, userdata, rc):
global connected
connected = False
print("Disconnected!")
client = mqtt_client.Client()
client.username_pw_set(user, password)
client.on_connect = on_connect
client.on_disconnect = on_disconnect
client.tls_set()
client.will_set(topic='tls_test/connected', payload='False', qos=0, retain=True)
client.connect(host=address, port=8883, keepalive=100)
client.loop_start()
while not connected:
time.sleep(1)
topic = 'tls_test/abc'
payload = 'a'*1000000
start = time.time()
print('Start: {}'.format(datetime.fromtimestamp(start).strftime('%H:%M:%S')))
result = client.publish(topic='tls_test/connected', payload='True', qos=0, retain=True)
result = client.publish(topic=topic, payload=payload)
if result.rc != 0:
print("MQTT Publish failed: {}".format(result.rc))
exit()
client.loop_stop()
client.disconnect()
stop = time.time()
print('Stop: {}, delta: {} sec'.format(datetime.fromtimestamp(stop).strftime('%H:%M:%S'), stop-start))
Usig the code above (keepalive=100), it sends 1000.000 bytes and tls_test/connected has the value True on the broker after finishing.
Data is transmitted successfully, The console output is:
python3 .\mqtt_tls.py
Connected!
Start: 10:51:16
Disconnected!
Stop: 10:53:01, delta: 105.57992386817932 sec
Reducing the keepalive (keepalive=10), transmission fails and tls_test/connected has the value False on the broker after finishing.
Data transmit fails, and the console output is:
python3 .\mqtt_tls.py
Connected!
Start: 11:08:23
Disconnected!
Disconnected!
Stop: 11:08:43, delta: 19.537118196487427 sec
Tailing /var/log/mosquitto/mosquitto.log on the broker gives the following error message:
1612346903: New client connected from x.x.x.x as xxx (c1, k10, u'someuser').
1612346930: Socket error on client xxx, disconnecting.
My conclusion is: keepalive does have an impact on large payloads when using TLS
When I check the webserver mod_status /server-status I noticed that there a bunch of threads in state ..reading..
Doing a strace on a thread this is what actually happens when the thread is in ..reading..
...
...
semop(327681, {{0, 1, SEM_UNDO}}, 1) = 0
gettimeofday({1452260985, 867058}, NULL) = 0
getsockname(156, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("172.31.9.248")}, [16]) = 0
fcntl(156, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(156, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1452260985, 867479}, NULL) = 0
read(156, 0x558f4c26e9d8, 8000) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=156, events=POLLIN}], 1, 300000) = 1 ([{fd=156, revents=POLLIN}])
read(156, "", 8000) = 0
gettimeofday({1452261254, 669634}, NULL) = 0
gettimeofday({1452261254, 669691}, NULL) = 0
shutdown(156, SHUT_WR) = 0
poll([{fd=156, events=POLLIN}], 1, 2000) = 1 ([{fd=156, revents=POLLIN|POLLHUP}])
read(156, "", 512) = 0
close(156) = 0
read(6, 0x7fff901f67e7, 1) = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1452261254, 670341}, NULL) = 0
semop(327681, {{0, -1, SEM_UNDO}}, 1) = 0
...
...
When the thread are in ..waiting.. the strace stops at the following line:
poll([{fd=156, events=POLLIN}], 1, 300000) = 1 ([{fd=156, revents=POLLIN}])
The apache config value of "Timout" is set to 30 in this case that reflects the value "300000".
This is the timeout value it waits, lowering the configuration value will change the value shown above and it will make the timeout faster.
From my new knowledge in using strace it looks to me that it tries to get a socket to lookup a internal IP-adress. But that is not successful.
The setting "HostnameLookups" is off.
Checking our production environment shows that it has the same patterns when Apache stops in ..reading.. but then it shows a IPV6 address pattern.
Example:
getsockname(154, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "::ffff:172.31.3.239", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
And then stops at "poll" and then get the "(Timeout)" as in the example above.
But is there some input why it stops in ..waiting.. ?
Does the "Resource temporarily unavailable" message leave any clue?
If it matters, Apache is running on EC2 instances behind a ELB in Amazon cloud.
Update:
Here is a image of how a Production server looks right now with thread states. Lots of ..reading..
Image of Apache thread states
We also are running lots of VirtualHosts on the servers if that gives any clue why this happens.
Closest thread on the World Wide Web I fund with the same problem is this one: http://apache-http-server.18135.x6.nabble.com/Apache-Hangs-Server-Status-shows-all-Reading-td4751342.html
The threads stuck in ..reading.. was caused by a mismatch of "Idle Timeout" on connection settings in ELB compared to the setting of Keepalivetimout in http.conf
The connection timeout set in ELB was a lot longer than the Keepalivetimout set in Apache configuration. This results in that the Elastic Load Balancer will try to keep open connections while Apache want's to close it.
See here http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/config-idle-timeout.html
After changing the ELB settings to match the settings in Apache config (60 seconds currently) gives the result that I have not got a long list of threads stick in state of R (Reading). They are now set in state K (Keep alive).
And this looks more like the expected behavior of threads.
It is polling its sockets waiting for one or more of them to become readable, or for the read timeout to expire.
But is there some input why it stops in ..waiting.. ?
There isn't any input. That's why it blocks.
I wirite a very simple spider program to fetch webpages from single site.
Here is a minimized version.
from twisted.internet import epollreactor
epollreactor.install()
from twisted.internet import reactor
from twisted.web.client import Agent, HTTPConnectionPool, readBody
baseUrl = 'http://acm.zju.edu.cn/onlinejudge/showProblem.do?problemCode='
start = 1001
end = 3500
pool = HTTPConnectionPool(reactor)
pool.maxPersistentPerHost = 10
agent = Agent(reactor, pool=pool)
def onHeader(response, i):
deferred = readBody(response)
deferred.addCallback(onBody, i)
deferred.addErrback(errorHandler)
return response
def onBody(body, i):
print('Received %s, Length %s' % (i, len(body)))
def errorHandler(err):
print('%s : %s' % (reactor.seconds() - startTimeStamp, err))
def requestFactory():
for i in range (start, end):
deferred = agent.request('GET', baseUrl + str(i))
deferred.addCallback(onHeader, i)
deferred.addErrback(errorHandler)
print('Generated %s' % i)
reactor.iterate(1)
print('All requests has generated, elpased %s' % (reactor.seconds() - startTimeStamp))
startTimeStamp = reactor.seconds()
reactor.callWhenRunning(requestFactory)
reactor.run()
For a few requests, like 100, it works fine. But for massive requests, it will failed.
I expect all of the requests(around 3000) should be automatically pooled, scheduled and pipelined, since I use HTTPConnectionPool, set maxPersistentPerHost, create an Agent instance with it and incrementally create the connections.
But it doesn't, the connections are not keep-alive nor pooled.
In this programm, it did establish the connections incrementally, but the connections didn't pooled, each connecction will close after body received, and later requests never wait in the pool for an available connecction.
So it will take thousands of sockets, and finally failed due to timeout, because the remote server has a connection timeout set to 30s. Thousands of requests can't be done within 30s.
Could you please give me some help on this?
I have tried my best on this, here is my finds.
Error occured exactly 30s after reactor start runing, won't be influenced by other things.
Let the spider fetch my server, I find something interesting.
The HTTP protocol version is 1.1 (I check the twisted document, the default HTTPClient is 1.0 rather than 1.1)
If I didn't add any explicit header(just like the minimized version), the request header didn't contain Connection: Keep-Alive, either do response header.
If I add explicit header to ensure it is a keep-alive connection, the request header did contain Connection: Keep-Alive, but the response header still not. (I am sure my server behave correctly, other stuff like Chrome, wget did receive Connection: Keep-Alive header.)
I check /proc/net/sockstat during running, it increase rapidly at first, and decrease rapidly later. (I have increase the ulimit to support plenty of sockets)
I write a similar program with treq, a twisted based request library). The code is almost the same, so not paste here.
Link: https://gist.github.com/Preffer/dad9b1228fcd75cebd75
It's behavior is almost the same. Not pooling. It is expected to be pooling as described in treq's feature list.
If I have add explicit header on it, Connection: Keep-Alive never appear in response header.
Based on all of the above, I am quite suspicious about the quirk Connection: Keep-Alive header ruin the program. But this header is part of HTTP 1.1 standard, and it did report as HTTP 1.1. I am completely puzzled on this.
I solved the problem myself, with help from IRC and another question in stackoverflow, Queue remote calls to a Python Twisted perspective broker?
In summary, the agent's behavior is very different from that in Nodejs(I have some experience in Nodejs). As it described on Nodejs doc
agent.requests
An object which contains queues of requests that have not yet been assigned to sockets.
agent.maxSockets
By default set to 5. Determines how many concurrent sockets the agent can have open per origin. Origin is either a 'host:port' or 'host:port:localAddress' combination.
So, here is the difference.
Twisted:
There is no doubt that Agent could queue requests if construct with a HTTPConnectionPool instance.
But if a new request is issued after connections in pool has run out, the agent will still create a new connection and perform the request, rather than put it in a queue.
Actually, it will lead to drop a connection in the pool, and push the newly generated connection into the pool, keep the connections count still equal to maxPersistentPerHost
Nodejs:
By default, agent will queue the requests with a implicit connection pool, which have a size of 5 connections.
If a new request is issued after connections in pool has run out, the agent will queue the requests into agent.requests variable, waiting for available connection.
The agent's behavior in twisted lead to a result that the agent is able to queue the requests, but actually it doesn't.
Follow our intuition, once assign a connection pool to an agent, it is in line with the intuition that agent will only use the connections in the pool, and wait for available connection if the pool has run out. That is exactly match with the agent in Nodejs.
Personally, I think it is a buggy behavior in twisted, or at least, could make an improvement to provide an option to set agent's behavior.
According to this, I have to use DeferredSemaphore to manually schedule the requests.
I raise a issue to treq project on github, and get similar solution. https://github.com/dreid/treq/issues/71
Here is my solution.
#!/usr/bin/env python
from twisted.internet import epollreactor
epollreactor.install()
from twisted.internet import reactor
from twisted.web.client import Agent, HTTPConnectionPool, readBody
from twisted.internet.defer import DeferredSemaphore
baseUrl = 'http://acm.zju.edu.cn/onlinejudge/showProblem.do?problemCode='
start = 1001
end = 3500
count = end - start
concurrency = 10
pool = HTTPConnectionPool(reactor)
pool.maxPersistentPerHost = concurrency
agent = Agent(reactor, pool=pool)
sem = DeferredSemaphore(concurrency)
done = 0
def onHeader(response, i):
deferred = readBody(response)
deferred.addCallback(onBody, i)
deferred.addErrback(errorHandler, i)
return deferred
def onBody(body, i):
sem.release()
global done, count
done += 1
print('Received %s, Length %s, Done %s' % (i, len(body), done))
if(done == count):
print('All items fetched')
reactor.stop()
def errorHandler(err, i):
print('[%s] id %s: %s' % (reactor.seconds() - startTimeStamp, i, err))
def requestFactory(token, i):
deferred = agent.request('GET', baseUrl + str(i))
deferred.addCallback(onHeader, i)
deferred.addErrback(errorHandler, i)
print('Request send %s' % i)
#this function it self is a callback emit by reactor, so needn't iterate manually
#reactor.iterate(1)
return deferred
def assign():
for i in range (start, end):
sem.acquire().addCallback(requestFactory, i)
startTimeStamp = reactor.seconds()
reactor.callWhenRunning(assign)
reactor.run()
Is it right? Beg for pointing out my error and improvements.
For a few requests, like 100, it works fine. But for massive requests,
it will failed.
This is either a protection against web crawlers or a server protection against DoS/DDoS, because you are sending too much requests from the same IP in a short time, so the Firewall or the WSA will block your future request. Just modify your script to make request in batch spaced by some time. you can use callLater() with some time after each X request.
I am trying to implement an IRC Bot on a local server. The bot that I am using is identical to the one found at Eric Florenzano's Blog. This is the simplified code (which should run)
import sys
import re
from twisted.internet import reactor
from twisted.words.protocols import irc
from twisted.internet import protocol
class MomBot(irc.IRCClient):
def _get_nickname(self):
return self.factory.nickname
nickname = property(_get_nickname)
def signedOn(self):
print "attempting to sign on"
self.join(self.factory.channel)
print "Signed on as %s." % (self.nickname,)
def joined(self, channel):
print "attempting to join"
print "Joined %s." % (channel,)
def privmsg(self, user, channel, msg):
if not user:
return
if self.nickname in msg:
msg = re.compile(self.nickname + "[:,]* ?", re.I).sub('', msg)
prefix = "%s: " % (user.split('!', 1)[0], )
else:
prefix = ''
self.msg(self.factory.channel, prefix + "hello there")
class MomBotFactory(protocol.ClientFactory):
protocol = MomBot
def __init__(self, channel, nickname='YourMomDotCom', chain_length=3,
chattiness=1.0, max_words=10000):
self.channel = channel
self.nickname = nickname
self.chain_length = chain_length
self.chattiness = chattiness
self.max_words = max_words
def startedConnecting(self, connector):
print "started connecting on {0}:{1}"
.format(str(connector.host),str(connector.port))
def clientConnectionLost(self, connector, reason):
print "Lost connection (%s), reconnecting." % (reason,)
connector.connect()
def clientConnectionFailed(self, connector, reason):
print "Could not connect: %s" % (reason,)
if __name__ == "__main__":
chan = sys.argv[1]
reactor.connectTCP("localhost", 6667, MomBotFactory('#' + chan,
'YourMomDotCom', 2, chattiness=0.05))
reactor.run()
I added the startedConnection method in the client factory, which it is reaching and printing out the proper address:host. It then disconnects and enters the clientConnectionLost and prints the error:
Lost connection ([Failure instance: Traceback (failure with no frames):
<class 'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.
]), reconnecting.
If working properly it should log into the appropriate channel, specified as the first arg in the command (e.g. python module2.py botwar. would be channel #botwar.). It should respond with "hello there" if any one in the channel sends anything.
I have NGIRC running on the server, and it works if I connect from mIRC or any other IRC client.
I am unable to find a resolution as to why it is continually disconnecting. Any help on why would be greatly appreciated. Thank you!
One thing you may want to do is make sure you will see any error output produced by the server when your bot connects to it. My hunch is that the problem has something to do with authentication, or perhaps an unexpected difference in how ngirc handles one of the login/authentication commands used by IRCClient.
One approach that almost always applies is to capture a traffic log. Use a tool like tcpdump or wireshark.
Another approach you can try is to enable logging inside the Twisted application itself. Use twisted.protocols.policies.TrafficLoggingFactory for this:
from twisted.protocols.policies import TrafficLoggingFactory
appFactory = MomBotFactory(...)
logFactory = TrafficLoggingFactory(appFactory, "irc-")
reactor.connectTCP(..., logFactory)
This will log output to files starting with "irc-" (a different file for each connection).
You can also hook directly into your protocol implementation, at any one of several levels. For example, to display any bytes received at all:
class MomBot(irc.IRCClient):
def dataReceived(self, bytes):
print "Got", repr(bytes)
# Make sure to up-call - otherwise all of the IRC logic is disabled!
return irc.IRCClient.dataReceived(self, bytes)
With one of those approaches in place, hopefully you'll see something like:
:irc.example.net 451 * :Connection not registered
which I think means... you need to authenticate? Even if you see something else, hopefully this will help you narrow in more closely on the precise cause of the connection being closed.
Also, you can use tcpdump or wireshark to capture the traffic log between ngirc and one of the working IRC clients (eg mIRC) and then compare the two logs. Whatever different commands mIRC is sending should make it clear what changes you need to make to your bot.