postgres LockError... how to investigate - locking

Hi I am using gunicorn with nginx and a postgreSQL database to run my web app. I recently change my gunicorn command from
gunicorn run:app -w 4 -b 0.0.0.0:8080 --workers=1 --timeout=300
to
gunicorn run:app -w 4 -b 0.0.0.0:8080 --workers=2 --timeout=300
using 2 workers. Now I am getting error messages like
File "/usr/local/lib/python2.7/dist-packages/flask_sqlalchemy/__init__.py", line 194, in session_signal_after_commit
models_committed.send(session.app, changes=list(d.values()))
File "/usr/local/lib/python2.7/dist-packages/blinker/base.py", line 267, in send
for receiver in self.receivers_for(sender)]
File "/usr/local/lib/python2.7/dist-packages/flask_whooshalchemy.py", line 265, in _after_flush
with index.writer() as writer:
File "/usr/local/lib/python2.7/dist-packages/whoosh/index.py", line 464, in writer
return SegmentWriter(self, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/whoosh/writing.py", line 502, in __init__
raise LockError
LockError
I can't really do much with these error messages, but they seem to be linked to whoosh search which I have on the User table in my database model
import sys
if sys.version_info >= (3, 0):
enable_search = False
else:
enable_search = True
import flask.ext.whooshalchemy as whooshalchemy
class User(db.Model):
__searchable__ = ['username','email','position','institute','id'] # these fields will be indexed by whoosh
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(100), index=True)
...
def __repr__(self):
return '<User %r>' % (self.username)
if enable_search:
whooshalchemy.whoosh_index(app, User)
any ideas how to investigate this? I thought postgres allows parallel access and hence I thought lock errors should not happen? When I used only 1 worked they did not happen, so it definitely is caused by having multiple workers...
any help is appreciated
thanks
carl

This has nothing to do with PostgreSQL. Whoosh holds file locks for writing and it's failing on the last line of this code...
class SegmentWriter(IndexWriter):
def __init__(self, ix, poolclass=None, timeout=0.0, delay=0.1, _lk=True,
limitmb=128, docbase=0, codec=None, compound=True, **kwargs):
# Lock the index
self.writelock = None
if _lk:
self.writelock = ix.lock("WRITELOCK")
if not try_for(self.writelock.acquire, timeout=timeout,
delay=delay):
raise LockError
Note, the delay default on this is 0.1 seconds and if it does not get the lock in that time it will fail. You increased your workers so now you have contention on the lock. From the following docs...
https://whoosh.readthedocs.org/en/latest/threads.html
Locking
Only one thread/process can write to an index at a time. When
you open a writer, it locks the index. If you try to open a writer on
the same index in another thread/process, it will raise
whoosh.store.LockError.
In a multi-threaded or multi-process environment your code needs to be
aware that opening a writer may raise this exception if a writer is
already open. Whoosh includes a couple of example implementations
(whoosh.writing.AsyncWriter and whoosh.writing.BufferedWriter) of ways
to work around the write lock.
While the writer is open and during the commit, the index is still
available for reading. Existing readers are unaffected and new readers
can open the current index normally.
You can find examples on how to use Whoosh concurrently.
Buffered
https://whoosh.readthedocs.org/en/latest/api/writing.html#whoosh.writing.BufferedWriter
Async
https://whoosh.readthedocs.org/en/latest/api/writing.html#whoosh.writing.AsyncWriter
I'd try the buffered version first since batching writes is almost always faster.

Related

Python BigQuery Storage Write retry strategy when writing to default stream

I'm testing python-bigquery-storage to insert multiple items into a table using the _default stream.
I used the example shown in the official docs as a basis, and modified it to use the default stream.
Here is a minimal example that's similar to what I'm trying to do:
customer_record.proto
syntax = "proto2";
message CustomerRecord {
optional string customer_name = 1;
optional int64 row_num = 2;
}
append_rows_default.py
from itertools import islice
from google.cloud import bigquery_storage_v1
from google.cloud.bigquery_storage_v1 import types
from google.cloud.bigquery_storage_v1 import writer
from google.protobuf import descriptor_pb2
import customer_record_pb2
import logging
logging.basicConfig(level=logging.DEBUG)
CHUNK_SIZE = 2 # Maximum number of rows to use in each AppendRowsRequest.
def chunks(l, n):
"""Yield successive `n`-sized chunks from `l`."""
_it = iter(l)
while True:
chunk = [*islice(_it, 0, n)]
if chunk:
yield chunk
else:
break
def create_stream_manager(project_id, dataset_id, table_id, write_client):
# Use the default stream
# The stream name is:
# projects/{project}/datasets/{dataset}/tables/{table}/_default
parent = write_client.table_path(project_id, dataset_id, table_id)
stream_name = f'{parent}/_default'
# Create a template with fields needed for the first request.
request_template = types.AppendRowsRequest()
# The initial request must contain the stream name.
request_template.write_stream = stream_name
# So that BigQuery knows how to parse the serialized_rows, generate a
# protocol buffer representation of our message descriptor.
proto_schema = types.ProtoSchema()
proto_descriptor = descriptor_pb2.DescriptorProto()
customer_record_pb2.CustomerRecord.DESCRIPTOR.CopyToProto(proto_descriptor)
proto_schema.proto_descriptor = proto_descriptor
proto_data = types.AppendRowsRequest.ProtoData()
proto_data.writer_schema = proto_schema
request_template.proto_rows = proto_data
# Create an AppendRowsStream using the request template created above.
append_rows_stream = writer.AppendRowsStream(write_client, request_template)
return append_rows_stream
def send_rows_to_bq(project_id, dataset_id, table_id, write_client, rows):
append_rows_stream = create_stream_manager(project_id, dataset_id, table_id, write_client)
response_futures = []
row_count = 0
# Send the rows in chunks, to limit memory usage.
for chunk in chunks(rows, CHUNK_SIZE):
proto_rows = types.ProtoRows()
for row in chunk:
row_count += 1
proto_rows.serialized_rows.append(row.SerializeToString())
# Create an append row request containing the rows
request = types.AppendRowsRequest()
proto_data = types.AppendRowsRequest.ProtoData()
proto_data.rows = proto_rows
request.proto_rows = proto_data
future = append_rows_stream.send(request)
response_futures.append(future)
# Wait for all the append row requests to finish.
for f in response_futures:
f.result()
# Shutdown background threads and close the streaming connection.
append_rows_stream.close()
return row_count
def create_row(row_num: int, name: str):
row = customer_record_pb2.CustomerRecord()
row.row_num = row_num
row.customer_name = name
return row
def main():
write_client = bigquery_storage_v1.BigQueryWriteClient()
rows = [ create_row(i, f"Test{i}") for i in range(0,20) ]
send_rows_to_bq("PROJECT_NAME", "DATASET_NAME", "TABLE_NAME", write_client, rows)
if __name__ == '__main__':
main()
Note:
In the above, CHUNK_SIZE is 2 just for this minimal example, but, in a real situation, I used a chunk size of 5000.
In real usage, I have several separate streams of data that need to be processed in parallel, so I make several calls to send_rows_to_bq, one for each stream of data, using a thread pool (one thread per stream of data). (I'm assuming here that AppendRowsStream is not meant to be shared by multiple threads, but I might be wrong).
It mostly works, but I often get a mix of intermittent errors in the call to append_rows_stream's send method:
google.cloud.bigquery_storage_v1.exceptions.StreamClosedError: This manager has been closed and can not be used.
google.api_core.exceptions.Unknown: None There was a problem opening the stream. Try turning on DEBUG level logs to see the error.
I think I just need to retry on these errors, but I'm not sure how to best implement a retry strategy here. My impression is that I need to use the following strategy to retry errors when calling send:
If the error is a StreamClosedError, the append_rows_stream stream manager can't be used anymore, and so I need to call close on it and then call my create_stream_manager again to create a new one, then try to call send on the new stream manager.
Otherwise, on any google.api_core.exceptions.ServerError error, retry the call to send on the same stream manager.
Am I approaching this correctly?
Thank you.
The best solution to this problem is to update to the newer lib release.
This problem happens or was happening in the older versions because once the connection write API reaches 10MB, it hangs.
If the update to the newer lib does not work you can try these options:
Limit the connection to < 10MB.
Disconnect and connect again to the API.

Twisted deferreds block when URI is the same (multiple calls from the same browser)

I have the following code
# -*- coding: utf-8 -*-
# 好
##########################################
import time
from twisted.internet import reactor, threads
from twisted.web.server import Site, NOT_DONE_YET
from twisted.web.resource import Resource
##########################################
class Website(Resource):
def getChild(self, name, request):
return self
def render(self, request):
if request.path == "/sleep":
duration = 3
if 'duration' in request.args:
duration = int(request.args['duration'][0])
message = 'no message'
if 'message' in request.args:
message = request.args['message'][0]
#-------------------------------------
def deferred_activity():
print 'starting to wait', message
time.sleep(duration)
request.setHeader('Content-Type', 'text/plain; charset=UTF-8')
request.write(message)
print 'finished', message
request.finish()
#-------------------------------------
def responseFailed(err, deferred):
pass; print err.getErrorMessage()
deferred.cancel()
#-------------------------------------
def deferredFailed(err, deferred):
pass; # print err.getErrorMessage()
#-------------------------------------
deferred = threads.deferToThread(deferred_activity)
deferred.addErrback(deferredFailed, deferred) # will get called indirectly by responseFailed
request.notifyFinish().addErrback(responseFailed, deferred) # to handle client disconnects
#-------------------------------------
return NOT_DONE_YET
else:
return 'nothing at', request.path
##########################################
reactor.listenTCP(321, Site(Website()))
print 'starting to serve'
reactor.run()
##########################################
# http://localhost:321/sleep?duration=3&message=test1
# http://localhost:321/sleep?duration=3&message=test2
##########################################
My issue is the following:
When I open two tabs in the browser, point one at http://localhost:321/sleep?duration=3&message=test1 and the other at http://localhost:321/sleep?duration=3&message=test2 (the messages differ) and reload the first tab and then ASAP the second one, then the finish almost at the same time. The first tab about 3 seconds after hitting F5, the second tab finishes about half a second after the first tab.
This is expected, as each request got deferred into a thread, and they are sleeping in parallel.
But when I now change the URL of the second tab to be the same as the one of the first tab, that is to http://localhost:321/sleep?duration=3&message=test1, then all this becomes blocking. If I press F5 on the first tab and as quickly as possible F5 on the second one, the second tab finishes about 3 seconds after the first one. They don't get executed in parallel.
As long as the entire URI is the same in both tabs, this server starts to block. This is the same in Firefox as well as in Chrome. But when I start one in Chrome and another one in Firefox at the same time, then it is non-blocking again.
So it may not neccessarily be related to Twisted, but maybe because of some connection reusage or something like that.
Anyone knows what is happening here and how I can solve this issue?
Coincidentally, someone asked a related question over at the Tornado section. As you suspected, this is not an "issue" in Twisted but rather a "feature" of web browsers :). Tornado's FAQ page has a small section dedicated to this issue. The proposed solution is appending an arbitrary query string.
Quote of the day:
One dev's bug is another dev's undocumented feature!

Is this model method locking up my PostgreSQL database?

My application is crashing really, really hard, and it appears to be related to the database. The application deals with lots and lots of data, and hundreds of simultaneous users. In an effort to speed up data loads, I am loading some records like this:
def load(filename)
rc = Publication.connection.raw_connection
rc.exec("COPY invoice_line_items FROM STDIN WITH CSV HEADER")
# open up your CSV file looping through line by line and getting the line into a format suitable for pg's COPY...
error = false
begin
CSV.foreach(filename) do |line|
until rc.put_copy_data( line.to_csv )
ErrorPrinter.print " waiting for connection to be writable..."
sleep 0.1
end
end
rescue Errno => err
User.inform_admin(false, User.me, "Line Item import failed with #{err.class.name} the following error: #{err.message}", err.backtrace)
error = true
else
rc.put_copy_end
while res = rc.get_result
if (res.result_status != 1)
User.inform_admin(false, User.me, "Line Item import result of COPY was: %s" % [ res.res_status(res.result_status) ], "")
error = true
end
end
end
end
I also have Sidekiq running with about 90 threads. Does this method of loading put an exclusive lock on that table? Is it possible that these jobs are running into each other? If they are, am I better off just doing inserts?
COPY takes the same level of lock as INSERT. (It's missing from the explicit locking chapter, but visible in the source code). So whatever's giving you trouble, it's probably not that.
You should be looking at pg_locks and pg_stat_activity to see if anything's stuck on a lock. More info on other questions on SO or DBA.SE, the manual, and the PostgreSQL wiki.

Twisted IRC Bot connection lost repeatedly to localhost

I am trying to implement an IRC Bot on a local server. The bot that I am using is identical to the one found at Eric Florenzano's Blog. This is the simplified code (which should run)
import sys
import re
from twisted.internet import reactor
from twisted.words.protocols import irc
from twisted.internet import protocol
class MomBot(irc.IRCClient):
def _get_nickname(self):
return self.factory.nickname
nickname = property(_get_nickname)
def signedOn(self):
print "attempting to sign on"
self.join(self.factory.channel)
print "Signed on as %s." % (self.nickname,)
def joined(self, channel):
print "attempting to join"
print "Joined %s." % (channel,)
def privmsg(self, user, channel, msg):
if not user:
return
if self.nickname in msg:
msg = re.compile(self.nickname + "[:,]* ?", re.I).sub('', msg)
prefix = "%s: " % (user.split('!', 1)[0], )
else:
prefix = ''
self.msg(self.factory.channel, prefix + "hello there")
class MomBotFactory(protocol.ClientFactory):
protocol = MomBot
def __init__(self, channel, nickname='YourMomDotCom', chain_length=3,
chattiness=1.0, max_words=10000):
self.channel = channel
self.nickname = nickname
self.chain_length = chain_length
self.chattiness = chattiness
self.max_words = max_words
def startedConnecting(self, connector):
print "started connecting on {0}:{1}"
.format(str(connector.host),str(connector.port))
def clientConnectionLost(self, connector, reason):
print "Lost connection (%s), reconnecting." % (reason,)
connector.connect()
def clientConnectionFailed(self, connector, reason):
print "Could not connect: %s" % (reason,)
if __name__ == "__main__":
chan = sys.argv[1]
reactor.connectTCP("localhost", 6667, MomBotFactory('#' + chan,
'YourMomDotCom', 2, chattiness=0.05))
reactor.run()
I added the startedConnection method in the client factory, which it is reaching and printing out the proper address:host. It then disconnects and enters the clientConnectionLost and prints the error:
Lost connection ([Failure instance: Traceback (failure with no frames):
<class 'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.
]), reconnecting.
If working properly it should log into the appropriate channel, specified as the first arg in the command (e.g. python module2.py botwar. would be channel #botwar.). It should respond with "hello there" if any one in the channel sends anything.
I have NGIRC running on the server, and it works if I connect from mIRC or any other IRC client.
I am unable to find a resolution as to why it is continually disconnecting. Any help on why would be greatly appreciated. Thank you!
One thing you may want to do is make sure you will see any error output produced by the server when your bot connects to it. My hunch is that the problem has something to do with authentication, or perhaps an unexpected difference in how ngirc handles one of the login/authentication commands used by IRCClient.
One approach that almost always applies is to capture a traffic log. Use a tool like tcpdump or wireshark.
Another approach you can try is to enable logging inside the Twisted application itself. Use twisted.protocols.policies.TrafficLoggingFactory for this:
from twisted.protocols.policies import TrafficLoggingFactory
appFactory = MomBotFactory(...)
logFactory = TrafficLoggingFactory(appFactory, "irc-")
reactor.connectTCP(..., logFactory)
This will log output to files starting with "irc-" (a different file for each connection).
You can also hook directly into your protocol implementation, at any one of several levels. For example, to display any bytes received at all:
class MomBot(irc.IRCClient):
def dataReceived(self, bytes):
print "Got", repr(bytes)
# Make sure to up-call - otherwise all of the IRC logic is disabled!
return irc.IRCClient.dataReceived(self, bytes)
With one of those approaches in place, hopefully you'll see something like:
:irc.example.net 451 * :Connection not registered
which I think means... you need to authenticate? Even if you see something else, hopefully this will help you narrow in more closely on the precise cause of the connection being closed.
Also, you can use tcpdump or wireshark to capture the traffic log between ngirc and one of the working IRC clients (eg mIRC) and then compare the two logs. Whatever different commands mIRC is sending should make it clear what changes you need to make to your bot.

Celery: Task Singleton?

I have a task that I need to run asynchronously from the web page that triggered it. This task runs rather long, and as the web page could be getting a lot of these requests, I'd like celery to only run one instance of this task at a given time.
Is there any way I can do this in Celery natively? I'm tempted to create a database table that holds this state for all the tasks to communicate with, but it feels hacky.
You probably can create a dedicated worker for that task configured with CELERYD_CONCURRENCY=1 then all tasks on that worker will run synchronously
You can use memcache/redis for that.
There is an example on the celery official site - http://docs.celeryproject.org/en/latest/tutorials/task-cookbook.html
And if you prefer redis (This is a Django realization, but you can also easily modify it for your needs):
from django.core.cache import cache
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
class SingletonTask(Task):
def __call__(self, *args, **kwargs):
lock = cache.lock(self.name)
if not lock.acquire(blocking=False):
logger.info("{} failed to lock".format(self.name))
return
try:
super(SingletonTask, self).__call__(*args, **kwargs)
except Exception as e:
lock.release()
raise e
lock.release()
And then use it as a base task:
#shared_task(base=SingletonTask)
def test_task():
from time import sleep
sleep(10)
This realization is nonblocking. If you want next task to wait for the previous task change blocking=False to blocking=True and add timeout