Identify Django queryset from SQL logs - sql

I use Django 1.8.17 (I know it's not so young anymore).
I have logged slow requests on PostGres for more than one minute.
I have a lot of trouble finding the Queryset to which the SQL query listed in the logs belongs.
Is there an identifier that could be added to the Queryset to find the associated SQL query in the Logs or a trick to easily identify it?
Here is an exemple of common Queryset almost impossible to identify as I have several similars ones.
Queryset:
Video.objects.filter(status='online').order_by('created')
LOGs:
duration: 1056.540 ms statement: SELECT "video"."id", "video"."title",
"video"."description", "video"."duration", "video"."html_description",
"video"."niche_id", "video"."owner_id", "video"."views",
"video"."rating" FROM "video" WHERE "video"."status" = 'online'
ORDER BY "video"."created"
Desired LOGs:
duration: 1056.540 ms statement: SELECT "video"."id", "video"."title",
"video"."description", "video"."duration", "video"."html_description",
"video"."niche_id", "video"."owner_id", "video"."views",
"video"."rating" FROM "video" WHERE "video"."status" = 'online'
ORDER BY "video"."created" (ID=555)

Add middleware to log a warning when a query takes a long time:
class LongQueryLogMiddleware(object):
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
response = self.get_response(request)
for q in connection.queries:
if float(q['time']) >= settings.LONG_QUERY_TIME_SEC:
logger.warning("Found long query (%s sec): %s", q['time'], q['sql'])
return response
I've made a small gist with all the code. Sorry for the indentation, GitHub keeps removing the indentation.
In the code above I only log the query, but you can add request information that will help you identify where the query comes from.

I don't know Django, so I may be off the mark, but there's a simple trick I heard from one of the people that runs RDS:
Add an identifier to your query as a comment.
So, include a UUID, ID, label, etc. to the query
-- as a comment
and that will flow through to the log. This is an easy way to tie Postgres log entries to specific methods/scripts, it sounds like it would need a bit of adaptation to be useful in your case. (If the idea applies at all.)

Related

Product Index Using Django ORM

I have a list of Products with a field called 'Title' and I have been trying to get a list of initial letters with not much luck. The closes I have is the following that dosn't work as 'Distinct' fails to work.
atoz = Product.objects.all().only('title').extra(select={'letter': "UPPER(SUBSTR(title,1,1))"}).distinct('letter')
I must be going wrong somewhere,
I hope someone can help.
You can get it in python after the queryset got in, which is trivial:
products = Project.objects.values_list('title', flat=True).distinct()
atoz = set([i[0] for i in products])
If you are using mysql, I found another answer useful, albeit using sql(django execute sql directly):
SELECT DISTINCT LEFT(title, 1) FROM product;
The best answer I could come up with, which isn't 100% ideal as it requires post processing is this.
atoz = sorted(set(Product.objects.all().extra(select={'letter': "UPPER(SUBSTR(title,1,1))"}).values_list('letter', flat=True)))

Why does the where() method run SQL queries after all nested relations are eager-loaded?

In my controller method for the the index view I have the following line.
#students_instance = Student.includes(:memo_tests => {:memo_target => :memo_level})
So for each Student I eager-load all necessary info.
Later on in a .map block, I call the .where() method on one of the relations as shown below.
#all_students = #students_instance.map do |student|
...
last_pass = student.memo_tests.where(:result => true).last.created_at.utc
difference_in_weeks = ((last_pass.to_i - current_date.to_i) / 1.week).round
...
end
This leads to a single SQL query for each student. And since I have over 300+ students, leads to very slow load times and over 300+ SQL queries.
Am I right in thinking that this is caused by the .where() method. I think this because I have checked everything else and these are the two lines that cause all of the queries.
More importantly, is there a better way to do this that reduces these queries to a single query?
The moment you ask where, the statement is translated to a query. Normally, the result should be sql-cached...
Anyway, in order to be sure, you can instead add programming logic to your statement. That way, you are not requesting a NEW sql statement.
last_pass = student.memo_tests.map {|m| m.created_at if m.result}.compact.sort.last
EDIT
I see the OP's question does not require sorting... So, leaving the sorting out:
last_pass = student.memo_tests.map {|m| m.created_at if m.result}.compact.last
compact is required to remove nil results from the array.

Trying to refactor Rails 4.2 scope

I have updated this question
I have the following SQL scope in a RAILS 4 app, it works, but has a couple of issues.
1) Its really RAW SQL and not the rails way
2) The string interpolation opens up risks with SQL injection
here is what I have:
scope :not_complete -> (user_id) { joins("WHERE id NOT IN
(SELECT modyule_id FROM completions WHERE user_id = #{user_id})")}
The relationship is many to many, using a join table called completions for matching id(s) on relationships between users and modyules.
any help with making this Rails(y) and how to set this up to take the arg of user_id with out the risk, so I can call it like:
Modyule.not_complete("1")
Thanks!
You should have added few info about the models and their assocciation, anyways here's my trial, might have some errors because I don't know if the assocciation is one to many or many to many.
scope :not_complete, lambda do |user_id|
joins(:completion).where.not( # or :completions ?
id: Completion.where(user_id: user_id).pluck(modyule_id)
)
end
PS: I turned it into multi line just for readability, you can change it back to a oneline if you like.

DQL query to return all files in a Cabinet in Documentum?

I want to retrieve all the files from a cabinet (called 'Wombat Insurance Co'). Currently I am using this DQL query:
select r_object_id, object_name from dm_document(all)
where folder('/Wombat Insurance Co', descend);
This is ok except it only returns a maximum of 100 results. If there are 5000 files in the cabinet I want to get all 5000 results. Is there a way to use pagination to get all the results?
I have tried this query:
select r_object_id, object_name from dm_document(all)
where folder('/Wombat Insurance Co', descend)
ENABLE (RETURN_RANGE 0 100 'r_object_id DESC');
with the intention of getting results in 100 file increments, but this query gives me an error when I try to execute it. The error says this:
com.emc.documentum.fs.services.core.CoreServiceException: "QUERY" action failed.
java.lang.Exception: [DM_QUERY2_E_UNRECOGNIZED_HINT]error:
"RETURN_RANGE is an unknown hint or is being used incorrectly."
I think I am using the RETURN_RANGE hint correctly, but maybe I'm not. Any help would be appreciated!
I have also tried using the hint ENABLE(FETCH_ALL_RESULTS 0) but this still only returns a maximum of 100 results.
To clarify, my question is: how can I get all the files from a cabinet?
You have already accepted an answer which is using DFS.
Since your are playing with DFC, these information might help you.
DFS:
If you are using DFS, you have to aware about the number of concurrent sessions that you can consume with DFS.
I think it is 100 or 150.
DFC:
Actually there is a limit that you can fetch via DFC (I'm not sure with DFS).
Go to your DFC application(webtop or da or anything) and check the dfc.properties file.
# Maximum number of results to retrieve by a query search.
# min value: 1, max value: 10000000
#
dfc.search.max_results = 100
# Maximum number of results to retrieve per source by a query search.
# min value: 1, max value: 10000000
#
dfc.search.max_results_per_source = 400
dfc.properties.full or similar file is there and you can verify these values according to your system.
And I'm talking about the ContentServer side, not the client side dfc.properties file.
If you use ENABLE (RETURN_TOP) hint with DFC, there are 2 ways to fetch the results from the ContentServer.
Object based
Row based
You have to configure this by using the parameter return_top_results_row_based in the server.ini file.
All of these changes for the documentum server side, not for your DFC/DQL client.
Aha, I've figured it out. Using DFS with Java (an abstraction layer on top of DFC) you can set the starting index for query results:
String queryStr = "select r_object_id, object_name from dm_document(all)
where folder('/Wombat Insurance Co', descend);"
PassthroughQuery query = new PassthroughQuery();
query.setQueryString(queryStr);
query.addRepository(repositoryStr);
QueryExecution queryEx = new QueryExecution();
queryEx.setCacheStrategyType(CacheStrategyType.DEFAULT_CACHE_STRATEGY);
queryEx.setStartingIndex(currentIndex); // set start index here
OperationOptions operationOptions = null;
// will return 100 results starting from currentIndex
QueryResult queryResult = queryService.execute(query, queryEx, operationOptions);
You can just increment the currentIndex variable to get all results.
Well, the hint is being used incorrectly. Start with 1, not 0.
There is no built-in limit in DQL itself. All results are returned by default. The reason you get only 100 results must have something to do with the way you're using DFC (or whichever other client you are using). Using IDfCollection in the following way will surely return everything:
IDfQuery query = new DfQuery("SELECT r_object_id, object_name "
+ "FROM dm_document(all) WHERE FOLDER('/System', DESCEND)");
IDfCollection coll = query.execute(session, IDfQuery.DF_READ_QUERY);
int i = 0;
while (coll.next()) i++;
System.out.println("Number of results: " + i);
In a test environment (CS 6.7 SP1 x64, MS SQL), this outputs:
Number of results: 37162
Now, there's proof. Using paging is however a good idea if you want to improve the overall performance in your application. As mentioned, start counting with the number 1:
ENABLE(RETURN_RANGE 1 100 'r_object_id DESC')
This way of paging requires that sorting be specified in the hint rather than as a DQL statement. If all you want is the first 100 records, try this hint instead:
ENABLE(RETURN_TOP 100)
In this case sorting with ORDER BY will work as you'd expect.
Lastly, note that adding (all) will not only find all documents matching the specified qualification, but all versions of every document. If this was your intention, that's fine.
I've worked with DFC API (with Java) for a while but I don't remember any default limit on queries, IIRC we've always got all of the documents, there weren't any limit. Actually (according to my notes) we have to set the limit explicitly with, for example, enable (return_top 2000). (As far I know the syntax might be depend on the DBMS behind EMC Documentum.)
Just a guess: check your dfc.properties file.

Apache solr - more like this score

I have a small index with ~1000 documents with only two fields:
- id (string)
- content (text_general)
I noticed that when I do MLT search by id for similar content, the original document(which id is the searched id) have a score 5.241327.
There is 1:1 duplicated document and for the duplicated content it is returning score = 1.5258181. Why? Why it is not 5.241327 when it is 100% duplicate.
Another question is can I in any way to get similarity documents by content by passing some text in the query.
Example:
/mlt/?q=content:Some encoded long text&mlt.fl=content
I am trying to check if there is similar content uploaded and the check must be performed at new content upload time.
It might be worth to try some different parameters. I also use MLT on only one field, I use the following parameters:
'mlt.boost': 'true',
'mlt.fl': 'my_field_name',
'mlt.maxqt': 1000,
'mlt.mindf': '0',
'mlt.mintf': '0',
'qt': 'mlt',
'rows': '10'
See http://wiki.apache.org/solr/MoreLikeThis for an explanation of the parameters. I think with a small index mindf might be important and I see the default mintf (term frequency) is 2, so I assume an ID is only one term, so this is probably ignored!
First, how does Solr More-Like-This works?
A regular Solr query is conducted (e.g. "?q=content:Some encoded long text&.....".
For each document returned by the above query, More-Like-This conduct More like this query...
So, the first result set "response", is just like any Solr query results set.
The More-Like-This appears below and start with something like that (Json format):
"moreLikeThis":{
"57375":{"numFound":18155,"start":0,"docs":["
For an explanation about More Like This algorithm, please read that:
http://blog.brattland.no/node/18
and: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
If you didn't solved the problem yet, please let me know and I will guide you through.