Django: Mixed managed and raw db commits - TransactionManagementError - sql

I'm writing a bulk insert script using Django's ORM + custom raw SQL. The code has the following outline:
import sys, os
from django.core.management import setup_environ
from my_project import settings
from my_project.my_app.models import Model1, Model2
setup_environ(settings)
from django.db import transaction
from django.db import connection
#transaction.commit_manually
def process_file(relevant_file):
data_file = open(relevant_file,'r')
cursor = connection.cursor()
while 1:
line = data_file.readline()
if line == '':
break
if not(input_row_i%1000):
transaction.commit()
if ([some rare condition]):
model_1 = Model1([Some assignments based on line])
model_1.save()
values = [Some values based on line]
cursor.execute("INSERT INTO `table_1` ('field_1', 'field_2', 'field_3') VALUES (%i, %f, %s)", values)
data_file.close()
transaction.commit()
I keep getting the following error:
django.db.transaction.TransactionManagementError: Transaction managed block ended with pending COMMIT/ROLLBACK
How can I solve this?

Use transaction.commit_unless_managed()
I've written a post to explain in greater detail with an example.

I started getting this exception in a similar circumstance. The django ORM was actually throwing a django.core.exceptions.ValidationError error because a date was incorrectly formatted. Because I was using manual transaction processing to batch database writes, the Django transactions processing code was trying to cleanup inside the raised django.core.exceptions.ValidationError exception and threw it's own exception of django.db.transaction.TransactionManagementError. Try a try / except around your model_1 code to see if any other exceptions are being thrown. Something like:
try:
model_1 ...
model_1.save()
except:
print "Unexpected error:", sys.exc_info()[0]
print 'line:', line
to see if there are any problems with the input data to object creation code.

You could try a workaround - place a transaction.commit() right after the model_1.save(). I think you need to isolate raw and ORM transactions.

Related

when using pandas.to_sql() with method = 'multi' for Oracle, got an error message 'CompileError' object has no attribute 'orig'

when I insert my query result to an Oracle and even Netezza table, the code below works well with queryResultToTable() method, all records are loaded as expected.
import cx_Oracle
import pandas
import sys
from multiprocessing.pool import ThreadPool # Import ThreadPool to enable parallel execution
from sqlalchemy import create_engine, inspect # Import create_engine to use Pandas database function, e.g. dataframe.to_sql()
from sqlalchemy.dialects.oracle import \
BFILE, BLOB, CHAR, CLOB, DATE, \
DOUBLE_PRECISION, FLOAT, INTERVAL, LONG, NCLOB, \
NUMBER, NVARCHAR, NVARCHAR2, RAW, TIMESTAMP, VARCHAR, \
VARCHAR2
import netezza_dialect
class databaseOperation():
def queryResultToTable(self, sourceDBEngineURL, targetDBEngineURL, targetSchemaName, targetTableName, targetDataTypes, queryScript):
sourceDBEngine = create_engine(sourceDBEngineURL)
try:
with sourceDBEngine.connect() as sourceDBConnection:
try:
queryResult = pandas.read_sql(queryScript,sourceDBConnection)
except Exception as e:
print(e)
except Exception as e:
print(e)
return
targetDBEngine = create_engine(targetDBEngineURL)
try:
with targetDBEngine.connect() as targetDBConnection:
targetDBConnection.execution_options(autocommit = True) # sumbit commit() automatically
try:
queryResult.to_sql(targetTableName, targetDBConnection, targetSchemaName, if_exists = 'append', index = False, dtype = targetDataTypes, method = None)
# !!! method = 'multi' doesn't work for Oracle database
except Exception as e:
print(e)
except Exception as e:
print(e)
return
if __name__=='__main__':
db = databaseOperation()
sourceORAEngineURL = "....." # the format like "oracle+cx_oracle://user:pwd#server_address1/db1"
targetORAEngineURL = "....." # the format like "oracle+cx_oracle://user:pwd#server_address2/db2"
sql = "SELECT abc, def, ggg FROM table_name WHERE abc = 'txt'"
ORA_targetSCHEMANAME = 'hr'
ORA_targetTABLENAME = 'cmpresult'
ORA_tagetDATATYPES = {
'abc': NVARCHAR2(20),
'def': NVARCHAR2(100),
'ggg': NVARCHAR2(100)
}
db.queryResultToTable(sourceORAEngineURL, targetORAEngineURL, ORA_targetSCHEMANAME, ORA_targetTABLENAME, ORA_tagetDATATYPES, sql)
sys.exit(0)
But when I change method = None to method = 'multi', like:
queryResult.to_sql(targetTableName, targetDBConnection, targetSchemaName, if_exists = 'append', index = False, dtype = targetDataTypes, method = 'multi')
with the same method, Netezza works fine, but Oracle got the message as below:
'CompileError' object has no attribute 'orig'
other than that, no more information displayed, and I have no idea what issue is. I also tried to switch Connection.execution_options(autocommit = True) on or off, but no change.
could someone can help me out?
It looks like this is not supported for oracle db.
Per pandas docs
Pass multiple values in a single INSERT clause. It uses a special SQL
syntax not supported by all backends.
The SQL Alchemy documentation notes:
Two phase transactions are not
supported under cx_Oracle due to poor driver support. As of cx_Oracle
6.0b1, the interface for two phase transactions has been changed to be more of a direct pass-through to the underlying OCI layer with less
automation. The additional logic to support this system is not
implemented in SQLAlchemy.
This question suggests using executemany from cx_Oracle

How to prevent django from loading objects in memory when using `delete()`?

I'm having memory issues because it looks like Django is loading the objects into memory when using delete(). Is there any way to prevent Django from doing that?
From the Django docs:
Django needs to fetch objects into memory to send signals and handle cascades. However, if there are no cascades and no signals, then Django may take a fast-path and delete objects without fetching into memory. For large deletes this can result in significantly reduced memory usage. The amount of executed queries can be reduced, too.
https://docs.djangoproject.com/en/1.8/ref/models/querysets/#delete
I don't use signals. I do have foreign keys on the model I'm trying to delete, but I don't see why Django would need to load the objects into memory. It looks like it does, because my memory is rising as the query runs.
You can use a function like this to iterate over an huge number of objects without using too much memory:
import gc
def queryset_iterator(qs, batchsize = 500, gc_collect = True):
iterator = qs.values_list('pk', flat=True).order_by('pk').distinct().iterator()
eof = False
while not eof:
primary_key_buffer = []
try:
while len(primary_key_buffer) < batchsize:
primary_key_buffer.append(iterator.next())
except StopIteration:
eof = True
for obj in qs.filter(pk__in=primary_key_buffer).order_by('pk').iterator():
yield obj
if gc_collect:
gc.collect()
Then you can use the function to iterate over the objects to delete:
for obj in queryset_iterator(HugeQueryset.objects.all()):
obj.delete()
For more information you can check this blog post.
You can import django database connection and use it with sql to delete. I had exact same problem as you do and this helps me a lot. Here's some snippet(I'm using mysql by the way, but you can run any sql statement):
from django.db import connection
sql_query = "DELETE FROM usage WHERE date < '%s' ORDER BY date" % date
cursor = connection.cursor()
try:
cursor.execute(sql_query)
finally:
c.close()
This should execute only the delete operation on that table without affecting any of your model relationships.

Django traceback on queries

I want a traceback from every query executed during a request, so I can find where they're coming from and reduce the count/complexity.
I'm using this excellent snippet of middleware to list and time queries, but I don't know where in the they're coming from.
I've poked around in django/db/models/sql/compiler.py but apparent form getting a local version of django and editing that code I can't see how to latch on to queries. Is there a signal I can use? it seems like there isn't a signal on every query.
Is it possible to specify the default Manager?
(I know about django-toolbar, I'm hoping there's a solution without using it.)
An ugly but effective solution (eg. it prints the trace on all queries and only requires one edit) is to add the following to the bottom of settings.py:
import django.db.backends.utils as bakutils
import traceback
bakutils.CursorDebugWrapper_orig = bakutils.CursorWrapper
def print_stack_in_project():
stack = traceback.extract_stack()
for path, lineno, func, line in stack:
if 'lib/python' in path or 'settings.py' in path:
continue
print 'File "%s", line %d, in %s' % (path, lineno, func)
print ' %s' % line
class CursorDebugWrapperLoud(bakutils.CursorDebugWrapper_orig):
def execute(self, sql, params=None):
try:
return super(CursorDebugWrapperLoud, self).execute(sql, params)
finally:
print_stack_in_project()
print sql
print '\n\n\n'
def executemany(self, sql, param_list):
try:
return super(CursorDebugWrapperLoud, self).executemany(sql, param_list)
finally:
print_stack_in_project()
print sql
print '\n\n\n'
bakutils.CursorDebugWrapper = CursorDebugWrapperLoud
Still not sure if there is a more elegant way of doing this?
Django debug toolbar will tell you what you want with spectacular awesomeness.

how to test whether program exits or not

I want to test the next class:
from random import randint
class End(object):
def __init__(self):
self.quips=['You dead', 'You broke everything you can','You turn you head off']
def play(self):
print self.quips[randint(0, len(self.quips)-1)]
exit(1)
I want to test it with nosetests so I could see that the class exits correctly with code 1. I tried differents variants but nosetest returns error like
File "C:\Python27\lib\site.py", line 372, in __call__
raise SystemExit(code)
SystemExit: 1
----------------------------------------------------------------------
Ran 1 test in 5.297s
FAILED (errors=1)
Ofcourse I can assume that it exits but I want for test to return OK status not error. Sorry if my question may be stupid. Im very new to python and I try to test something my very first time.
I would recommend using the assertRaises context manager. Here is an example test that ensures that the play() method exits:
import unittest
import end
class TestEnd(unittest.TestCase):
def testPlayExits(self):
"""Test that the play method exits."""
ender = end.End()
with self.assertRaises(SystemExit) as exitexception:
ender.play()
# Check for the requested exit code.
self.assertEqual(exitexception.code, 1)
As you can see in the traceback, sys.exit()* raises an exception called SystemExit when you call it. So, that's what you want to test for with nose's assert_raises(). If you are writing tests with unittest2.TestCase that's self.assertRaises.
*actually you used plain built-in exit() but you really should use sys.exit() in a program.

Inserting data into PostgreSQL table from MATLAB with JDBC throws BatchUpdateException

I am trying to write to a PostgreSQL database table from MATLAB. I have got the connection working using JDBC and created the table, but I am getting a BatchUpdateException when I try to insert a record.
The MATLAB query to insert the data is:
user_table = 'rm_user';
colNames = {user_id};
data = {longRecords(iterator)};
fastinsert(conn, user_table, colNames, data);
The exception says:
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO rm_user (user_id) VALUES ( '4') was aborted. Call getNextException to see the cause.
But I don't know how to call getNextException from MATLAB.
Any ideas what's causing the problem or how I can find out more about the exception?
EDIT
Turns out I was looking at documentation for a newer version of MATLAB than mine. I have changed from fastinsert to insert and it is now working. However, I'm still interested in knowing if there is a way I could use getNextException from MATLAB.
This should work:
try
user_table = 'rm_user';
colNames = {user_id};
data = {longRecords(iterator)};
fastinsert(conn, user_table, colNames, data);
catch err
err.getNextException ()
end
Alternatively, just look at the caught error, it should contain the same information.
Also, Matlab has a function lasterr which will give you the last error without a catch statement. The function is deprecated, but you can find the documentation for replacements at the link provided.