My application is crashing really, really hard, and it appears to be related to the database. The application deals with lots and lots of data, and hundreds of simultaneous users. In an effort to speed up data loads, I am loading some records like this:
def load(filename)
rc = Publication.connection.raw_connection
rc.exec("COPY invoice_line_items FROM STDIN WITH CSV HEADER")
# open up your CSV file looping through line by line and getting the line into a format suitable for pg's COPY...
error = false
begin
CSV.foreach(filename) do |line|
until rc.put_copy_data( line.to_csv )
ErrorPrinter.print " waiting for connection to be writable..."
sleep 0.1
end
end
rescue Errno => err
User.inform_admin(false, User.me, "Line Item import failed with #{err.class.name} the following error: #{err.message}", err.backtrace)
error = true
else
rc.put_copy_end
while res = rc.get_result
if (res.result_status != 1)
User.inform_admin(false, User.me, "Line Item import result of COPY was: %s" % [ res.res_status(res.result_status) ], "")
error = true
end
end
end
end
I also have Sidekiq running with about 90 threads. Does this method of loading put an exclusive lock on that table? Is it possible that these jobs are running into each other? If they are, am I better off just doing inserts?
COPY takes the same level of lock as INSERT. (It's missing from the explicit locking chapter, but visible in the source code). So whatever's giving you trouble, it's probably not that.
You should be looking at pg_locks and pg_stat_activity to see if anything's stuck on a lock. More info on other questions on SO or DBA.SE, the manual, and the PostgreSQL wiki.
Related
Hi I am using gunicorn with nginx and a postgreSQL database to run my web app. I recently change my gunicorn command from
gunicorn run:app -w 4 -b 0.0.0.0:8080 --workers=1 --timeout=300
to
gunicorn run:app -w 4 -b 0.0.0.0:8080 --workers=2 --timeout=300
using 2 workers. Now I am getting error messages like
File "/usr/local/lib/python2.7/dist-packages/flask_sqlalchemy/__init__.py", line 194, in session_signal_after_commit
models_committed.send(session.app, changes=list(d.values()))
File "/usr/local/lib/python2.7/dist-packages/blinker/base.py", line 267, in send
for receiver in self.receivers_for(sender)]
File "/usr/local/lib/python2.7/dist-packages/flask_whooshalchemy.py", line 265, in _after_flush
with index.writer() as writer:
File "/usr/local/lib/python2.7/dist-packages/whoosh/index.py", line 464, in writer
return SegmentWriter(self, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/whoosh/writing.py", line 502, in __init__
raise LockError
LockError
I can't really do much with these error messages, but they seem to be linked to whoosh search which I have on the User table in my database model
import sys
if sys.version_info >= (3, 0):
enable_search = False
else:
enable_search = True
import flask.ext.whooshalchemy as whooshalchemy
class User(db.Model):
__searchable__ = ['username','email','position','institute','id'] # these fields will be indexed by whoosh
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(100), index=True)
...
def __repr__(self):
return '<User %r>' % (self.username)
if enable_search:
whooshalchemy.whoosh_index(app, User)
any ideas how to investigate this? I thought postgres allows parallel access and hence I thought lock errors should not happen? When I used only 1 worked they did not happen, so it definitely is caused by having multiple workers...
any help is appreciated
thanks
carl
This has nothing to do with PostgreSQL. Whoosh holds file locks for writing and it's failing on the last line of this code...
class SegmentWriter(IndexWriter):
def __init__(self, ix, poolclass=None, timeout=0.0, delay=0.1, _lk=True,
limitmb=128, docbase=0, codec=None, compound=True, **kwargs):
# Lock the index
self.writelock = None
if _lk:
self.writelock = ix.lock("WRITELOCK")
if not try_for(self.writelock.acquire, timeout=timeout,
delay=delay):
raise LockError
Note, the delay default on this is 0.1 seconds and if it does not get the lock in that time it will fail. You increased your workers so now you have contention on the lock. From the following docs...
https://whoosh.readthedocs.org/en/latest/threads.html
Locking
Only one thread/process can write to an index at a time. When
you open a writer, it locks the index. If you try to open a writer on
the same index in another thread/process, it will raise
whoosh.store.LockError.
In a multi-threaded or multi-process environment your code needs to be
aware that opening a writer may raise this exception if a writer is
already open. Whoosh includes a couple of example implementations
(whoosh.writing.AsyncWriter and whoosh.writing.BufferedWriter) of ways
to work around the write lock.
While the writer is open and during the commit, the index is still
available for reading. Existing readers are unaffected and new readers
can open the current index normally.
You can find examples on how to use Whoosh concurrently.
Buffered
https://whoosh.readthedocs.org/en/latest/api/writing.html#whoosh.writing.BufferedWriter
Async
https://whoosh.readthedocs.org/en/latest/api/writing.html#whoosh.writing.AsyncWriter
I'd try the buffered version first since batching writes is almost always faster.
I’m currently writing more or less sophisticated rails generator. It’s quite straightforward, save for a lack of documentation. I see that internal methods (came from Thor, AFAIU,) like create_file and others colorize their output:
On the other hand, I could not find any appearance of the code handling this colorization during my quick-walk-through open source generators (like rails-default, pundit, devise, etc.) I expect this functionality to be exported, smth like say WARN, 'You are doing it wrong'.
Another minor question: is it indeed necessary to handle all the errors manually, print out a message and gracefully exit instead of raising some kind of rails-generators-aware standard exception?!
Any suggestions on what am I missing are strongly appreciated.
Well, while there was a silence, I managed to implement color logging myself. In case any stranger in the future would require the same functionality, I just drop it here.
SYMBOLS = {
scs: ['107', '✔'],
nfo: ['68', '✓'],
wrn: ['226', '✗'],
err: ['196', '✘']
}
def log msg
sym = SYMBOLS[caller(0)[1][/`(\w+)'/, 1].to_sym]
puts "\e[01;38;05;#{sym.first}m#{sym.last} #{generator.name}\e[0m: #{msg}"
end
def err msg
log "#{msg}\nAborting...\n\n"
exit 1
end
def nfo msg ; log msg ; end
def scs msg ; log msg ; end
def wrn msg ; log msg ; end
private_method :log
I'm trying to run multiple DDLs (around 90) on an SQL Server.
The DDLs don't contain any changes to tables, only view, stored procedures, and functions. The DDLs might have inter-dependencies between them, one STP that calls another, for example.
I don't want to start organizing the files in the correct order, because it would take too long, and I want the entire operation to fail if any one of the scripts has an error.
How can I achieve this?
My idea so far, is to start a transaction, tell the SQL to ignore errors (which I don't know how to do) run all the scripts once, tell the SQL to start throwing errors again, run all the scripts again, and then commit if everything succeeds.
Is this a good idea?
How do I CREATE \ ALTER a stored procedure or view even though it has errors?
To clarify and address some concerns...
This is not intended for production. I just don't want to leave the DB I'm testing on broken.
What I would like to achieve is this: run a big group of scripts on the server, without taking the time to order them. But if any of the scripts has an error in it, I want to rollback the entire operation.
I don't care about isolation, I only want the operation to happen as a single transaction.
Organize the files in the correct order, test the procedure on a test environment, have a validation and acceptance test, then run it in production.
While running DDL in a transaction may seem possible, in practice is not. There are many DDL statements that don't mix well with transactions. You must put the application offline, take a database backup (or create a snapshot) before the schema changes, run the tested and verified upgrade procedure (your scripts), validate the result with acceptance tests and then turn the application back online. If something fails, revert to the backup created initially (with all the implications vis-a-vis any downstream log consumer like replication, log shipping or mirroring).
This is the correct way, and as far as I'm concerned the only way. I know you'll find plenty of advice on how to do this the wrong way.
We actually do something like this to deploy our database scripts to production. We do this in an application that connects to our databases. To add to the complication, we also have 600 databases that should have the same schema, but don't really. Here's our approach:
Merge all our scripts into one big file. Injecting go's in between every single file. This makes it look like there's one very long script. We do a simple ordering based on what the coders requested.
Split everything into "go blocks". Since go isn't legal sql, we split them up into multiple blocks that get executed one at a time.
Open a database connection.
Start a transaction.
for each go block:
Make sure the transaction is still active. (This is VERY important. I'll explain why in a bit.)
Run the code, recording the errors.
If there were any errors, rollback. Otherwise, commit.
In our multi database set up, we do this whole thing twice. Run through every database once, "testing" the code to make sure there are no errors on any database, and then go back and run them again "for real".
Now on to why you need to make sure the transaction is still active. There are some commands that will rollback your transaction on error! Imagine our surprise the first time we found this out... Everything before the error was rolled back, but everything after was committed. If there is an error, however, nothing in that same block gets committed, so it's all good.
Below is our core of our execution code. We use a wrapper around SqlClient, but it should look very similar to SqlClient.
Dim T = New DBTransaction(client)
For Each block In scriptBlocks
If Not T.RestartIfNecessary Then
exceptionCount += 1
Log("Could not (re)start the transaction for {0}. Not executing the rest of the script.", scriptName)
Exit For
End If
Debug.Assert(T.IsInTransaction)
Try
client.Text = block
client.ExecNonQuery()
Catch ex As Exception
exceptionCount += 1
Log(ex.Message + " on {0} executing: '{1}'", client.Connection.Database, block.Replace(vbNewLine, ""))
End Try
Next
If exceptionCount > 0 Then Log("There were {0} exceptions while executing {1}.", exceptionCount, scriptName)
If testing OrElse
exceptionCount > 0 Then
Try
T.Rollback()
Log("Rolled back all changes for {0} on {1}.", scriptName, client.Connection.Database)
Catch ex As Exception
Log("Could not roll back {0} on {1}: {2}", scriptName, client.Connection.Database, ex.Message)
If Debugger.IsAttached Then
Debugger.Break()
End If
End Try
Else
T.Commit()
Log("Successfully committed all changes for {0} on {1}.", scriptName, client.Connection.Database)
End If
Return exceptionCount
Class DBTransaction
Private _tName As String
Public ReadOnly Property name() As String
Get
Return _tName
End Get
End Property
Private _client As OB.Core2.DB.Client
Public Sub New(client As OB.Core2.DB.Client, Optional name As String = Nothing)
If name Is Nothing Then
name = "T" & Guid.NewGuid.ToString.Replace("-", "").Substring(0, 30)
End If
_tName = name
_client = client
End Sub
Public Function Begin() As Boolean
Return RestartIfNecessary()
End Function
Public Function RestartIfNecessary() As Boolean
Try
_client.Text = "IF NOT EXISTS (Select transaction_id From sys.dm_tran_active_transactions where name = '" & name & "') BEGIN BEGIN TRANSACTION " & name & " END"
_client.ExecNonQuery()
Return IsInTransaction()
Catch ex As Exception
Return False
End Try
End Function
Public Function IsInTransaction() As Boolean
_client.Text = "Select transaction_id From sys.dm_tran_active_transactions where name = '" & name & "'"
Dim scalar As String = _client.ExecScalar
Return scalar <> ""
End Function
Public Sub Rollback()
_client.Text = "ROLLBACK TRANSACTION " & name
_client.ExecNonQuery()
End Sub
Public Sub Commit()
_client.Text = "COMMIT TRANSACTION " & name
_client.ExecNonQuery()
End Sub
End Class
You have a good answer, here is "hack" answer. For the case "You cannot do this, but if you want it very much, then go on". I'm quite confident that you will not achieve what you are thinking of, therefore
DO FULL BACKUP!
Assuming there are no COMMIT or GO statements (explicit or !implicit!) in any of these files, the only thing you need to do is to run them in a single transaction. Combine them in one file, wrap in a transaction, and run.
How to combine 90 files in 1 file:
If sorting by name brings them in right order, then run this from folder with files in command prompt:
FOR /F "tokens=1" %G IN ('dir /b /-d /o:n *.sql') DO (
type %G >> Big_SQL_Script.sql && echo. >> Big_SQL_Script.sql
)
If order is random, then create a list of files dir /b /-d *.sql > File_Name_List.txt and order it manually. Then run:
FOR /F "tokens=1" %G IN (File_Name_List.txt) DO (
type %G >> Big_SQL_Script.sql && echo. >> Big_SQL_Script.sql
)
This way you can concatenate 90 files in automated order. Run and see what happens.
Good luck!
This question is a follow up to this question, where should I place this code?
connection = ActiveRecord::Base.connection
class << connection
alias :original_exec :execute
def execute(sql, *name)
# try to log sql command but ignore any errors that occur in this block
# we log before executing, in case the execution raises an error
begin
file = File.open(RAILS_ROOT + "/log/sql.txt",'a'){|f| f.puts Time.now.to_s+": "+sql}
rescue Exception => e
;
end
# execute original statement
original_exec(sql, *name)
end
end
I have tried placing it inside of the model, but what happens is when I execute some sql query more then once it returns "stack level is to deep" error.
Put it in config/initializers. Most likely it's because of reloading classes each time in dev env. This code need to be executed only once though.
I want to save to a log file some SQL query rails performs, (namely the CREATE, UPDATE and DELETE ones)
therefore I need to intercept all queries and then filter them maybe with some regexp and log them as needed.
Where would I put such a thing in the rails code?
Here a simplified version of what c0r0ner linked to, to better show it:
connection = ActiveRecord::Base.connection
class << connection
alias :original_exec :execute
def execute(sql, *name)
# try to log sql command but ignore any errors that occur in this block
# we log before executing, in case the execution raises an error
begin
File.open(Rails.root.join("/log/sql.txt"),'a'){|f| f.puts Time.now.to_s+": "+sql}
rescue Exception => e
;
end
# execute original statement
original_exec(sql, *name)
end
end
SQL logging in rails -
In brief - you need to override ActiveRecord execute method. There you can add any logic for logging.
As a note for followers, you can "log all queries" like Rails - See generated SQL queries in Log files and then grep the files for the ones you want, if desired.
If you are using mysql I would look into mysqlbinlog . It is going to track everything that potentially updates data. you can grep out whatever you need from that log easily.
http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html
http://dev.mysql.com/doc/refman/5.0/en/binary-log.html
SQL Server? If so...
Actually, I'd do this at the SQL end. You could set up a trace, and collect every query that comes through a connection with a particular Application Name. If you save it to a table, you can easily query that table later.
Slightly updated version of #luca's answer for at least Rails 4 (and probably Rails 5)
Place this in config/initializers/sql_logger.rb:
connection = ActiveRecord::Base.connection
class << connection
alias :original_exec :execute
def execute(sql, *name)
# try to log sql command but ignore any errors that occur in this block
# we log before executing, in case the execution raises an error
begin
File.open(Rails.root.join("log/sql.log"), 'a') do |file|
file.puts Time.now.to_s + ": " + sql
end
rescue Exception => e
"Error logging SQL: #{e}"
end
# execute original statement
original_exec(sql, *name)
end
end