Django: get result of raw SQL query in one line - sql

This works:
connection = get_connection()
cursor=connection.cursor()
cursor.execute('show application_name')
application_name_of_connection=cursor.fetchone()[0]
But why four lines? Is there no way to get this in one line?

No, there is not a way to do this in one line. Languages such as Python are designed to describe a set of instructions. There are four instructions here. There may even be helper methods or clever/inefficient arrangements of these statements that could lower this to three lines, but you are best to keep it how it is.
For example, if you are for some reason using this code frequently, you would encapsulate it in its own method
def get_application_name_of_connection()
connection = get_connection()
cursor=connection.cursor()
cursor.execute('show application_name')
return cursor.fetchone()[0]
And then simply:
get_application_name_of_connection()
This is how it works. You want less code? Hide the functionality.

It can be done with comprehension but on sqlite:
results = [row[0] for row in get_connection().cursor().execute('select * from mytable')]
If you are using postgresql then it doesn't work, you could try this for a more compact form:
with get_connection().cursor() as cur:
cur.execute('show all')
print cur.fetchone()
But only if you need the cursor thrown away after the block execution.
I didn't tried on other db managers, results may vary.

Related

Parsing a SQL spatial column in Python

I am struggling a bit as I am new to programming. I am currently writing a python script and I am a bit stuck. The goal is to parse some spatial information the gets pulled from SQL to a format that is usable for my py script down the line.
I was able to CAST through a SQL query and fetchall using the obdc module. However once I fetch the data that is where it gets trick for me. Here is an example of a print from the fetchall:
[(u'POLYGON ((7014.186279296875 6602.99658203125 1612.5, 7015.984375 6600.416015625 1612.5))',), (u'POLYGON ((6730.962646484375 6715.2490234375 1522.5, 6730.0869140625 6714.13916015625 1522.5))',)]
I am not exactly sure what I am getting here it is like a list of tuples. which I have tried converting to a list of list, but there must be something I am missing.
Here is the usable format I am looking for:
[[7014.186279296875, 6602.99658203125, 1612.5], [7015.984375, 6600.416015625, 1612.5]]
[[6730.962646484375, 6715.2490234375, 1522.5], [6730.0869140625, 6714.13916015625, 1522.5]]
Any ideas of how I can accomplish this? Maybe there is a better way to CAST in SQL or a module in python that would be easier to use instead of just doing a cursor.fetchall() and parsing? Or any any parsing help would be useful. Thanks.
If you want to do parsing, that should be straight forward. For example you've provided next code would do the thing:
result = []
for element in data:
single_elements = element[0][10:-2].split(', ')
for se in single_elements:
row = str(se).split(' ')
result.append([float(a) for a in row])
Result will contain what you need. If parsing is not an option, then paste some of your code so I can see how you're fetching data.

How to execute SQL Script which may or may not return data?

This is an extension of an old question of mine where the answer wasn't quite what I was asking. What I'm doing is executing SQL Script on an MS SQL Server database. This script may or may not return any recordsets. The problem is that the way that ADO components work, at least to my knowledge, I can only explicitly request one or the other.
If I know a query will return data, I use TADOQuery.Open
If I know a query will not return data, I use TADOConnection.Execute
If I don't know whether query will return data or not... ???
How can I execute any query and read the response to determine whether it has any recordsets or not so I can read that recordset?
What I've tried:
Calling TADOQuery.Open, but raises exception if there's no recordset
Calling TADOQuery.ExecSql, but never returns any data
Calling TADOConnection.Execute, but never returns any data
Using Option 3 and reverting to Option 1 on exceptions, but this is double the work (script files over 38,000 lines) and kinda nasty.
Using TADOCommand.Execute, but keeps raising "Parameter object is improperly defined. Inconsistent or incomplete information was provided" on creating some stored procedures (which otherwise don't happen when using TADOConnection.Execute).
Calling TADOConnection.Execute overload which returns _Recordset, but then the TADOConnection.Errors returns empty (and I depend on this).
Just as some background, the purpose is to implement something like the SQL Query tool in the SQL Server Management Studio. It allows you to execute any SQL script, and if it returns any recordsets, it displays them, otherwise it just displays output messages. The tool I'm working on automatically breaks SQL Script at GO statements, and executes each individual block separately. Some of those blocks might return data, and others might not. It's already obvious that I cannot make this determination prior to execution, so I'm looking for a way to go ahead with the execution and observe the result. TADOConnection.Execute provides some useful information, including the Errors (or output messages).
As of now, the only option I have is to supply an option in the user interface to allow the user to choose which type of execution to use - but this is what I'm trying to eliminate.
EDIT
The TADOCommand.Execute method is the closest to what I want. However, it fails on some bits of script which otherwise work perfectly fine using TADOConnection.Execute. See #5 above in "What I've tried". I almost wrote that as my answer, until I found this error happens on almost everything.
EDIT
After posting my answer below, I then came to learn that the Errors property no longer returns anything when I use this other overload of Execute. See #6 above in "What I've tried".
Calling...
ADOConnection1.Execute('select * from something', cmdText, []);
...does not return anything in ADOConnection1.Errors, whereas...
var
R: Integer;
begin
ADOConnection1.Execute('select * from something', R);
...does return messages in ADOConnection1.Errors, which is what I need, but yet, doesn't return any recordsets.
EDIT: Not the right solution
I discovered my solution finally after digging even deeper. The answer is to use the TADOConnection.Execute() overload which supports returning the recordset:
function TADOConnection.Execute(const CommandText: WideString;
const CommandType: TCommandType = cmdText;
const ExecuteOptions: TExecuteOptions = []): _Recordset;
Then, just assign the resulting _Recordset to the Recordset property of supported dataset components.
var
RS: _Recordset;
begin
RS := ADOConnection1.Execute('select * from something', cmdText, []);
if Assigned(RS) then begin
ADODataset1.Recordset:= RS;
...
end;
end;
The downside is that you cannot use the other overload which supports returning the RowsAffected. Also, nothing is returned in the Errors property of the TADOConnection when using this overload version of Execute, whereas the other one does. But that other doesn't return a recordset.

What is the best way to format long chained methods?

What is the best way to format code with chained methods? Especially if it goes on for a long time? If you have a chain of three or so, you can put it on one line, but it gets cumbersome after you have a lot and it makes debugging difficult.
FYI, I'm talking about this: http://en.wikipedia.org/wiki/Method_chaining
Sometimes I write code like this (in Java):
DetachedCriteria criteria = DetachedCriteria.forClass(Taskdsr.class);
criteria=criteria.add(someRestriction);
criteria=criteria.add(someOtherRestriction);
criteria=criteria.setFetchMode(Criteria.DISTINCT_ROOT_ENTITY);
in place of:
DetachedCriteria criteria = DetachedCriteria.forClass(Taskdsr.class).add(someRestriction).add(someOtherRestriction).setFetchMode(Criteria.DISTINCT_ROOT_ENTITY);
You can format it across multiple lines:
DetachedCriteria criteria = DetachedCriteria.forClass(Taskdsr.class)
.add(someRestriction)
.add(someOtherRestriction)
.setFetchMode(Criteria.DISTINCT_ROOT_ENTITY);

In Django, what is the most efficient way to check for an empty query set?

I've heard suggestions to use the following:
if qs.exists():
...
if qs.count():
...
try:
qs[0]
except IndexError:
...
Copied from comment below: "I'm looking for a statement like "In MySQL and PostgreSQL count() is faster for short queries, exists() is faster for long queries, and use QuerySet[0] when it's likely that you're going to need the first element and you want to check that it exists. However, when count() is faster it's only marginally faster so it's advisable to always use exists() when choosing between the two."
query.exists() is the most efficient way.
Especially on postgres count() can be very expensive, sometimes more expensive then a normal select query.
exists() runs a query with no select_related, field selections or sorting and only fetches a single record. This is much faster then counting the entire query with table joins and sorting.
qs[0] would still includes select_related, field selections and sorting; so it would be more expensive.
The Django source code is here (django/db/models/sql/query.py RawQuery.has_results):
https://github.com/django/django/blob/60e52a047e55bc4cd5a93a8bd4d07baed27e9a22/django/db/models/sql/query.py#L499
def has_results(self, using):
q = self.clone()
if not q.distinct:
q.clear_select_clause()
q.clear_ordering(True)
q.set_limits(high=1)
compiler = q.get_compiler(using=using)
return compiler.has_results()
Another gotcha that got me the other day is invoking a QuerySet in an if statement. That executes and returns the whole query !
If the variable query_set may be None (unset argument to your function) then use:
if query_set is None:
#
not:
if query_set:
# you just hit the database
exists() is generally faster than count(), though not always (see test below). count() can be used to check for both existence and length.
Only use qs[0]if you actually need the object. It's significantly slower if you're just testing for existence.
On Amazon SimpleDB, 400,000 rows:
bare qs: 325.00 usec/pass
qs.exists(): 144.46 usec/pass
qs.count() 144.33 usec/pass
qs[0]: 324.98 usec/pass
On MySQL, 57 rows:
bare qs: 1.07 usec/pass
qs.exists(): 1.21 usec/pass
qs.count(): 1.16 usec/pass
qs[0]: 1.27 usec/pass
I used a random query for each pass to reduce the risk of db-level caching. Test code:
import timeit
base = """
import random
from plum.bacon.models import Session
ip_addr = str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))
try:
session = Session.objects.filter(ip=ip_addr)%s
if session:
pass
except:
pass
"""
query_variatons = [
base % "",
base % ".exists()",
base % ".count()",
base % "[0]"
]
for s in query_variatons:
t = timeit.Timer(stmt=s)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100000)
It depends on use context.
According to documentation:
Use QuerySet.count()
...if you only want the count, rather than doing len(queryset).
Use QuerySet.exists()
...if you only want to find out if at least one result exists, rather than if queryset.
But:
Don't overuse count() and exists()
If you are going to need other data from the QuerySet, just evaluate it.
So, I think that QuerySet.exists() is the most recommended way if you just want to check for an empty QuerySet. On the other hand, if you want to use results later, it's better to evaluate it.
I also think that your third option is the most expensive, because you need to retrieve all records just to check if any exists.
#Sam Odio's solution was a decent starting point but there's a few flaws in the methodology, namely:
The random IP address could end up matching 0 or very few results
An exception would skew the results, so we should aim to avoid handling exceptions
So instead of filtering something that might match, I decided to exclude something that definitely won't match, hopefully still avoiding the DB cache, but also ensuring the same number of rows.
I only tested against a local MySQL database, with the dataset:
>>> Session.objects.all().count()
40219
Timing code:
import timeit
base = """
import random
import string
from django.contrib.sessions.models import Session
never_match = ''.join(random.choice(string.ascii_uppercase) for _ in range(10))
sessions = Session.objects.exclude(session_key=never_match){}
if sessions:
pass
"""
s = base.format('count')
query_variations = [
"",
".exists()",
".count()",
"[0]",
]
for variation in query_variations:
t = timeit.Timer(stmt=base.format(variation))
print "{} => {:02f} usec/pass".format(variation.ljust(10), 1000000 * t.timeit(number=100)/100000)
outputs:
=> 1390.177710 usec/pass
.exists() => 2.479579 usec/pass
.count() => 22.426991 usec/pass
[0] => 2.437079 usec/pass
So you can see that count() is roughly 9 times slower than exists() for this dataset.
[0] is also fast, but it needs exception handling.
I would imagine that the first method is the most efficient way (you could easily implement it in terms of the second method, so perhaps they are almost identical). The last one requires actually getting a whole object from the database, so it is almost certainly the most expensive.
But, like all of these questions, the only way to know for your particular database, schema and dataset is to test it yourself.
I was also in this trouble. Yes exists() is faster for most cases but it depends a lot on the type of queryset you are trying to do. For example for a simple query like:
my_objects = MyObject.objets.all() you would use my_objects.exists(). But if you were to do a query like: MyObject.objects.filter(some_attr='anything').exclude(something='what').distinct('key').values() probably you need to test which one fits better (exists(), count(), len(my_objects)). Remember the DB engine is the one who will perform the query, and to get a good result in performance, it depends a lot on the data structure and how the query is formed. One thing you can do is, audit the queries and test them on your own against the DB engine and compare your results you will be surprised by how naive sometimes django is, try QueryCountMiddleware to see all the queries executed, and you will see what i am talking about.

How to quote values for LuaSQL?

LuaSQL, which seems to be the canonical library for most SQL database systems in Lua, doesn't seem to have any facilities for quoting/escaping values in queries. I'm writing an application that uses SQLite as a backend, and I'd love to use an interface like the one specified by Python's DB-API:
c.execute('select * from stocks where symbol=?', t)
but I'd even settle for something even dumber, like:
conn:execute("select * from stocks where symbol=" + luasql.sqlite.quote(t))
Are there any other Lua libraries that support quoting for SQLite? (LuaSQLite3 doesn't seem to.) Or am I missing something about LuaSQL? I'm worried about rolling my own solution (with regexes or something) and getting it wrong. Should I just write a wrapper for sqlite3_snprintf?
I haven't looked at LuaSQL in a while but last time I checked it didn't support it. I use Lua-Sqlite3.
require("sqlite3")
db = sqlite3.open_memory()
db:exec[[ CREATE TABLE tbl( first_name TEXT, last_name TEXT ); ]]
stmt = db:prepare[[ INSERT INTO tbl(first_name, last_name) VALUES(:first_name, :last_name) ]]
stmt:bind({first_name="hawkeye", last_name="pierce"}):exec()
stmt:bind({first_name="henry", last_name="blake"}):exec()
for r in db:rows("SELECT * FROM tbl") do
print(r.first_name,r.last_name)
end
LuaSQLite3 as well an any other low level binding to SQLite offers prepared statements with variable parameters; these use methods to bind values to the statement parameters. Since SQLite does not interpret the binding values, there is simply no possibility of an SQL injection. This is by far the safest (and best performing) approach.
uroc shows an example of using the bind methods with prepared statements.
By the way in Lua SQL there is an undocumented escape function for the sqlite3 driver in conn:escape where conn is a connection variable.
For example with the code
print ("con:escape works. test'test = "..con:escape("test'test"))
the result is:
con:escape works. test'test = test''test
I actually tried that to see what it'd do. Apparently there is also such a function for their postgres driver too. I found this by looking at the tests they had.
Hope this helps.