tl;dr
How to convert below SQL to Arel(or whatever is considered standard in Rails)
#toplist = ActiveRecord::Base.connection.execute(
'select ci.crash_info_id,
count(distinct(user_guid))[Occurences],
c.md5
from crashes ci
join crash_infos c on c.id=crash_info_id
group by ci.crash_info_id
order by [Occurences] desc')
--- end of tl;dr ----
I'm working on a small web project, it's goal is to take our customers crash reports(when our desktop app crashes, we send diagnostics to our servers), analyze them and then display what is the most common bug that causes our application to crash.. So that we concentrate on fixing bugs that affect biggest chunk of our users..
I have 2 tables:
crash_infos - id, md5(md5 of a stacktrace. in .net stacktrace and exception messages are sometimes translated by .net to user's native language! so I basically sanitize exception, by removing language specific parts and produce md5 out of it)
crashes - id, user_guid(user id), crash_info_id(id to crash_infos table)
Now the question, I wanted to make a query that shows most common crashes for unique user(avoid counting many times same crash for same user) and sort it by number of crashes. Unfortunately I didn't know enough Arel(or whatever is the ruby on rails way), so I ended up with raw SQL :(
#toplist = ActiveRecord::Base.connection.execute(
'select ci.crash_info_id,
count(distinct(user_guid))[Occurences],
c.md5
from crashes ci
join crash_infos c on c.id=crash_info_id
group by ci.crash_info_id
order by [Occurences] desc')
How can I convert this, to something more "railsy" ?
Thank you in advance
Not sure if this is actually any better but at least you should have a database independent query...
crashes = Arel::Table.new(:crashes)
crash_infos = Arel::Table.new(:crash_infos)
crashes.
project(crashes[:crash_info_id], crash_infos[:md5], Arel::Nodes::Count.new([crashes[:user_guid]], true, "occurences")).
join(crash_infos).on(crashes[:crash_info_id].eq(crash_infos[:id])).
group(crashes[:crash_info_id]).
order("occurences DESC").
to_sql
Gives:
SELECT "crashes"."crash_info_id", "crash_infos"."md5", COUNT(DISTINCT "crashes"."user_guid") AS occurences FROM "crashes" INNER JOIN "crash_infos" ON "crashes"."crash_info_id" = "crash_infos"."id" GROUP BY "crashes"."crash_info_id" ORDER BY occurences DESC
Related
I am using PyPika (version 0.37.6) to create queries to be used in BigQuery. I am building up a query that has two WITH clauses, and one clause is dependent on the other. Due to the dynamic nature of my application, I do not have control over the order in which those WITH clauses are added to the query.
Here is example working code:
a_alias = AliasedQuery("a")
b_alias = AliasedQuery("b")
a_subq = Query.select(Term.wrap_constant("1").as_("z")).select(Term.wrap_constant("2").as_("y"))
b_subq = Query.from_(a_alias).select("z")
q = Query.with_(a_subq, "a").from_(a_alias).select(a_alias.y)
q = q.with_(b_subq, "b").from_(b_alias).select(b_alias.z)
sql = q.get_sql(quote_char=None)
That generates a working query:
WITH a AS (SELECT '1' z,'2' y) ,b AS (SELECT a.z FROM a) SELECT a.y,b.z FROM a,b
However, if I add the b WITH clause first, then since a is not yet defined, the resulting query:
WITH b AS (SELECT a.z FROM a), a AS (SELECT '1' z,'2' y) SELECT a.y,b.z FROM a,b
does not work. Since BigQuery does not support WITH RECURSIVE, that is not an option for me.
Is there any way to control the order of the WITH clauses? I see the _with list in the QueryBuilder (the type of variable q), but since that's a private variable, I don't want to rely on that, especially as new versions of PyPika may not operate the same way.
One way I tried to do this is to always insert the first WITH clause at the beginning of the _with list, like this:
q._with.insert(0, q._with.pop())
Although this works, I'd like to use a PyPika supported way to do that.
In a related question, is there a supported way within PyPika to see what has already been added to the select list or other parts of the query? I noticed the q.selects member variable, but selects is not part of the public documentation. Using q.selects did not actually work for me when using our project's Python version (3.6) even though it did work in Python 3.7. The code I was trying to use is:
if any(field.name == "date" for field in q.selects if isinstance(field, Field))
The error I got was as follows:
def __getitem__(self, item: slice) -> "BetweenCriterion":
if not isinstance(item, slice):
> raise TypeError("Field' object is not subscriptable")
Thank you in advance for your help.
I could not figure out how to control the order of the WITH clauses after calling query.with_() (except for the hack already noted). As a result, I restructured my application to get around this problem. I am now calling query.with_() before building up the rest of the query.
This also made my related question moot, because I no longer need to see what I've already added to the query.
I use Django 1.8.17 (I know it's not so young anymore).
I have logged slow requests on PostGres for more than one minute.
I have a lot of trouble finding the Queryset to which the SQL query listed in the logs belongs.
Is there an identifier that could be added to the Queryset to find the associated SQL query in the Logs or a trick to easily identify it?
Here is an exemple of common Queryset almost impossible to identify as I have several similars ones.
Queryset:
Video.objects.filter(status='online').order_by('created')
LOGs:
duration: 1056.540 ms statement: SELECT "video"."id", "video"."title",
"video"."description", "video"."duration", "video"."html_description",
"video"."niche_id", "video"."owner_id", "video"."views",
"video"."rating" FROM "video" WHERE "video"."status" = 'online'
ORDER BY "video"."created"
Desired LOGs:
duration: 1056.540 ms statement: SELECT "video"."id", "video"."title",
"video"."description", "video"."duration", "video"."html_description",
"video"."niche_id", "video"."owner_id", "video"."views",
"video"."rating" FROM "video" WHERE "video"."status" = 'online'
ORDER BY "video"."created" (ID=555)
Add middleware to log a warning when a query takes a long time:
class LongQueryLogMiddleware(object):
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
response = self.get_response(request)
for q in connection.queries:
if float(q['time']) >= settings.LONG_QUERY_TIME_SEC:
logger.warning("Found long query (%s sec): %s", q['time'], q['sql'])
return response
I've made a small gist with all the code. Sorry for the indentation, GitHub keeps removing the indentation.
In the code above I only log the query, but you can add request information that will help you identify where the query comes from.
I don't know Django, so I may be off the mark, but there's a simple trick I heard from one of the people that runs RDS:
Add an identifier to your query as a comment.
So, include a UUID, ID, label, etc. to the query
-- as a comment
and that will flow through to the log. This is an easy way to tie Postgres log entries to specific methods/scripts, it sounds like it would need a bit of adaptation to be useful in your case. (If the idea applies at all.)
I have updated this question
I have the following SQL scope in a RAILS 4 app, it works, but has a couple of issues.
1) Its really RAW SQL and not the rails way
2) The string interpolation opens up risks with SQL injection
here is what I have:
scope :not_complete -> (user_id) { joins("WHERE id NOT IN
(SELECT modyule_id FROM completions WHERE user_id = #{user_id})")}
The relationship is many to many, using a join table called completions for matching id(s) on relationships between users and modyules.
any help with making this Rails(y) and how to set this up to take the arg of user_id with out the risk, so I can call it like:
Modyule.not_complete("1")
Thanks!
You should have added few info about the models and their assocciation, anyways here's my trial, might have some errors because I don't know if the assocciation is one to many or many to many.
scope :not_complete, lambda do |user_id|
joins(:completion).where.not( # or :completions ?
id: Completion.where(user_id: user_id).pluck(modyule_id)
)
end
PS: I turned it into multi line just for readability, you can change it back to a oneline if you like.
I've searched a lot here but couldn't find a similar topic: i need to write a Query in Arel because i use will_paginate to browse through the results, so i'd loose much comfort in implementation with raw SQL.
Here's what i need to spell in Arel:
SELECT m.*
FROM messages m
JOIN (SELECT tmp.original_id as original_id,
max(tmp.id) as id
FROM messages tmp
WHERE tmp.recipient_id = ?
GROUP BY tmp.original_id) g ON (m.id = g.id)
ORDER BY m.updated_at DESC;
Explained in short words: the subquery retrieves all messages for a user. If a message has newer replies (in replies i save the refering message id as original_id) the older ones will be ignored. For the result of all these messages i want Rails to deliver me the correpsonding objects.
I'm quite skilled in SQL but unfortunately not with Arel. Any help would be kindly appreciated.
How about something like this?
class Message < ActiveRecord::Base
scope :newer_messages lambda { |num_days=10| where("updated_at > #{Time.now} - #{num_days}.days") }
end
You can then find newer messages for a given user like this:
Message.find_by_user_id(user_id).newer_messages
With many HQL queries, time and time again I am getting this exception:
Antlr.Runtime.NoViableAltException
This is really generic and unhelpful - does anyone know how best to debug this? Obviously it's a problem with my HQL - but without any clue as to what exactly is wrong it's very much a case of trial and error. I'm pulling my hair out every time I see this.
Note, I don't want to post any HQL here, becuase it's something that I am often coming across, not a problem related to one query.
Does anyone know the best way to tackle this? Is there any tool for validating HQL queries?
Have a look at NHibernate Query Analyzer. It is not perfect, but it will be helpful in many situations.
I can't help you directly, here's something I can share.
When dealing with hibernate or nhibernate (NH), I generally debug by enabling logging on the nhibernate's log4net, or/and the logging of queries at the DB side (e.g. mysql).
They can tell me what is the queries being formulated and executed at the DB and what are the exceptions thrown back by the DB.
For instance, if you have the following error:
Exception of type 'Antlr.Runtime.NoViableAltException' was thrown. near line 1, column
745 [select afe.AFEDataId, afe.Name, afe.AdditionalDescription, afe.ProjectType,
afe.BusinessUnit, afe.plantId, afe.fuelTypeId, afe.DisplayStatus, afe.BudgetedAmount,
sum(exp.Amount), afe.CreatedDate from Company.AFE.Model.AFEData as afe inner join
afe.Expenditures as exp inner join exp.ExpenditureType where (afe.Status.StatusId =
:statusId) and (afe.IsRestrictedVisibility = false OR (select count(AFEDataId) from
Company.AFE.Model.Reader as r where r.AFEData.AFEDataId = afe.AFEDataId AND
r.Contact.ContactId = '70bc6350-c466-40d5-a067-9d1f00bed7dc') > 0 OR (select count(AFEDataId)
from Company.AFE.Model.Editor as e where e.AFEData.AFEDataId = afe.AFEDataId AND
e.Contact.ContactId = '70bc6350-c466-40d5-a067-9d1f00bed7dc') > 0 OR 1=1) afe.AFEDataId,
afe.Name, afe.AdditionalDescription, afe.ProjectType, afe.BusinessUnit, afe.plantId,
afe.fuelTypeId, afe.DisplayStatus, afe.BudgetedAmount, afe.CreatedDate order by afe.Name ASC]
Go and look at character 745 from the original query that was provided and check to see if there is a spelling error, as there was in this one that I just looked at.
Which version of NH you used, for the latest version, the query exchange has some updated, such as you can't use "count(1)" in query, must change to count([alias name]), NH will translate to "select class.id from ... "