Trying to refactor Rails 4.2 scope - sql

I have updated this question
I have the following SQL scope in a RAILS 4 app, it works, but has a couple of issues.
1) Its really RAW SQL and not the rails way
2) The string interpolation opens up risks with SQL injection
here is what I have:
scope :not_complete -> (user_id) { joins("WHERE id NOT IN
(SELECT modyule_id FROM completions WHERE user_id = #{user_id})")}
The relationship is many to many, using a join table called completions for matching id(s) on relationships between users and modyules.
any help with making this Rails(y) and how to set this up to take the arg of user_id with out the risk, so I can call it like:
Modyule.not_complete("1")
Thanks!

You should have added few info about the models and their assocciation, anyways here's my trial, might have some errors because I don't know if the assocciation is one to many or many to many.
scope :not_complete, lambda do |user_id|
joins(:completion).where.not( # or :completions ?
id: Completion.where(user_id: user_id).pluck(modyule_id)
)
end
PS: I turned it into multi line just for readability, you can change it back to a oneline if you like.

Related

Identify Django queryset from SQL logs

I use Django 1.8.17 (I know it's not so young anymore).
I have logged slow requests on PostGres for more than one minute.
I have a lot of trouble finding the Queryset to which the SQL query listed in the logs belongs.
Is there an identifier that could be added to the Queryset to find the associated SQL query in the Logs or a trick to easily identify it?
Here is an exemple of common Queryset almost impossible to identify as I have several similars ones.
Queryset:
Video.objects.filter(status='online').order_by('created')
LOGs:
duration: 1056.540 ms statement: SELECT "video"."id", "video"."title",
"video"."description", "video"."duration", "video"."html_description",
"video"."niche_id", "video"."owner_id", "video"."views",
"video"."rating" FROM "video" WHERE "video"."status" = 'online'
ORDER BY "video"."created"
Desired LOGs:
duration: 1056.540 ms statement: SELECT "video"."id", "video"."title",
"video"."description", "video"."duration", "video"."html_description",
"video"."niche_id", "video"."owner_id", "video"."views",
"video"."rating" FROM "video" WHERE "video"."status" = 'online'
ORDER BY "video"."created" (ID=555)
Add middleware to log a warning when a query takes a long time:
class LongQueryLogMiddleware(object):
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
response = self.get_response(request)
for q in connection.queries:
if float(q['time']) >= settings.LONG_QUERY_TIME_SEC:
logger.warning("Found long query (%s sec): %s", q['time'], q['sql'])
return response
I've made a small gist with all the code. Sorry for the indentation, GitHub keeps removing the indentation.
In the code above I only log the query, but you can add request information that will help you identify where the query comes from.
I don't know Django, so I may be off the mark, but there's a simple trick I heard from one of the people that runs RDS:
Add an identifier to your query as a comment.
So, include a UUID, ID, label, etc. to the query
-- as a comment
and that will flow through to the log. This is an easy way to tie Postgres log entries to specific methods/scripts, it sounds like it would need a bit of adaptation to be useful in your case. (If the idea applies at all.)

Django ORM Cross Product

I have three models:
class Customer(models.Model):
pass
class IssueType(models.Model):
pass
class IssueTypeConfigPerCustomer(models.Model):
customer=models.ForeignKey(Customer)
issue_type=models.ForeignKey(IssueType)
class Meta:
unique_together=[('customer', 'issue_type')]
How can I find all tuples of (custmer, issue_type) where there is no IssueTypeConfigPerCustomer object?
I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
Background: for every customer and for every issue-type, there should be a config in the DB.
If you can afford to make one database trip for each issue type, try something like this untested snippet:
def lacking_configs():
for issue_type in IssueType.objects.all():
for customer in Customer.objects.filter(
issuetypeconfigpercustomer__issue_type=None
):
yield customer, issue_type
missing = list(lacking_configs())
This is probably OK unless you have a lot of issue types or if you are doing this several times per second, but you may also consider having a sensible default instead of making a config object mandatory for each combination of issue type and customer (IMHO it is a bit of a design-smell).
[update]
I updated the question: I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
In Django, every Queryset is either a list of Model instances or a dict (values querysets), so it is impossible to return the format you want (a list of tuples of Model) without some Python (and possibly multiple trips to the database).
The closest thing to a cross product would be using the "extra" method without a where parameter, but it involves raw SQL and knowing the underlying table name for the other model:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype']
)
As a result, each Customer object will have an extra attribute, "issue_type_id", containing the id of one IssueType. You can use the where parameter to filter based on NOT EXISTS (SELECT 1 FROM appname_issuetypeconfigpercustomer WHERE issuetype_id=appname_issuetype.id AND customer_id=appname_customer.id). Using the values method you can have something close to what you want - this is probably enough information to verify the rule and create the missing records. If you need other fields from IssueType just include them in the select argument.
In order to assemble a list of (Customer, IssueType) you need something like:
cross_product = [
(customer, IssueType.objects.get(pk=customer.issue_type_id))
for customer in
Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
]
Not only this requires the same number of trips to the database as the "generator" based version but IMHO it is also less portable, less readable and violates DRY. I guess you can lower the number of database queries to a couple using something like this:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
issue_list = dict(
(issue.id, issue)
for issue in
IssueType.objects.filter(
pk__in=set(m.issue_type_id for m in missing)
)
)
cross_product = [(c, issue_list[c.issue_type_id]) for c in missing]
Bottom line: in the best case you make two queries at the cost of legibility and portability. Having sensible defaults is probably a better design compared to mandatory config for each combination of Customer and IssueType.
This is all untested, sorry if some homework was left for you.

How to use LINQ to Entities to make a left join using a static value

I've got a few tables, Deployment, Deployment_Report and Workflow. In the event that the deployment is being reviewed they join together so you can see all details in the report. If a revision is going out, the new workflow doesn't exist yet new workflow is going into place so I'd like the values to return null as the revision doesn't exist yet.
Complications aside, this is a sample of the SQL that I'd like to have run:
DECLARE #WorkflowID int
SET #WorkflowID = 399 -- Set to -1 if new
SELECT *
FROM Deployment d
LEFT JOIN Deployment_Report r
ON d.FSJ_Deployment_ID = r.FSJ_Deployment_ID
AND r.Workflow_ID = #WorkflowID
WHERE d.FSJ_Deployment_ID = 339
The above in SQL works great and returns the full record if viewing an active workflow, or the left side of the record with empty fields for revision details which haven't been supplied in the event that a new report is being generated.
Using various samples around S.O. I've produced some Entity to SQL based on a few multiple on statements but I feel like I'm missing something fundamental to make this work:
int Workflow_ID = 399 // or -1 if new, just like the above example
from d in context.Deployments
join r in context.Deployment_Reports.DefaultIfEmpty()
on
new { d.Deployment_ID, Workflow_ID }
equals
new { r.Deployment_ID, r.Workflow_ID }
where d.FSJ_Deployment_ID == fsj_deployment_id
select new
{
...
}
Is the SQL query above possible to create using LINQ to Entities without employing Entity SQL? This is the first time I've needed to create such a join since it's very confusing to look at but in the report it's the only way to do it right since it should only return one record at all times.
The workflow ID is a value passed in to the call to retrieve the data source so in the outgoing query it would be considered a static value (for lack of better terminology on my part)
First of all don't kill yourself on learning the intricacies of EF as there are a LOT of things to learn about it. Unfortunately our deadlines don't like the learning curve!
Here's examples to learn over time:
http://msdn.microsoft.com/en-us/library/bb397895.aspx
In the mean time I've found this very nice workaround using EF for this kind of thing:
var query = "SELECT * Deployment d JOIN Deployment_Report r d.FSJ_Deployment_ID = r.Workflow_ID = #WorkflowID d.FSJ_Deployment_ID = 339"
var parm = new SqlParameter(parameterName="WorkFlowID" value = myvalue);
using (var db = new MyEntities()){
db.Database.SqlQuery<MyReturnType>(query, parm.ToArray());
}
All you have to do is create a model for what you want SQL to return and it will fill in all the values you want. The values you are after are all the fields that are returned by the "Select *"...
There's even a really cool way to get EF to help you. First find the table with the most fields, and get EF to generated the model for you. Then you can write another class that inherits from that class adding in the other fields you want. SQL is able to find all fields added regardless of class hierarchy. It makes your job simple.
Warning, make sure your filed names in the class are exactly the same (case sensitive) as those in the database. The goal is to make a super class model that contains all the fields of all the join activity. SQL just knows how to put them into that resultant class giving you strong typing ability and even more important use-ability with LINQ
You can even use dataannotations in the Super Class Model for displaying other names you prefer to the User, this is a super nice way to keep the table field names but show the user something more user friendly.

Arel: Left outer join using symbols

I have this use case where I get the symbolized deep associations from a certain model, and I have to perform certain queries that involve using outer joins. How can one do it WITHOUT resorting to write the full SQL by hand?
Answers I don't want:
- using includes (doesn't solve deep associations very well ( .includes(:cars => [:windows, :engine => [:ignition]..... works unexpectedly ) and I don't want its side-effects)
- writing the SQL myself (sorry, it's 2013, cross-db support, etc etc..., and the objects I fetch are read_only, more side-effects)
I'd like to have an Arel solution. I know that using the arel_table's from the models I can construct SQL expressions, there's also a DSL for the joins, but somehow i cannot use it in the joins method from the model:
car = Car.arel_table
engine = Engine.arel_table
eng_exp = car.join(engine).on(car[:engine_id].eq(engine[:id]))
eng_exp.to_sql #=> GOOD! very nice!
Car.joins(eng_exp) #=> Breaks!!
Why this doesn't work is beyond me. I don't know exactly what is missing. But it's the closest thing to a solution I have now. If somebody could help me completing my example or provide me with a nice work-around or tell me when will Rails include such an obviously necessary feature will have my everlasting gratitude.
This is an old question, but for the benefit of anyone finding it through search engines:
If you want something you can pass into .joins, you can either use .create_join and .create_on:
join_on = car.create_on(car[:engine_id].eq(engine[:id]))
eng_join = car.create_join(engine, join_on, Arel::Nodes::OuterJoin)
Car.joins(eng_join)
OR
use the .join_sources from your constructed join object:
eng_exp = car.join(engine, Arel::Nodes::OuterJoin).on(car[:engine_id].eq(engine[:id]))
Car.joins(eng_exp.join_sources)
I found a blog post that purports to address this problem: http://blog.donwilson.net/2011/11/constructing-a-less-than-simple-query-with-rails-and-arel/
Based on this (and my own testing), the following should work for your situation:
car = Car.arel_table
engine = Engine.arel_table
sql = car.project(car[Arel.star])
.join(engine, Arel::Nodes::OuterJoin).on(car[:engine_id].eq(engine[:id]))
Car.find_by_sql(sql)
If you don't mind adding a dependency and skipping AREL altogether, you could use Ernie Miller's excellent Squeel gem. It would be something like
Car.joins{engine.outer}.where(...)
This would require that the Car model be associated with Engine like so:
belongs_to :engine

ActiveRecord .select(): Possible to clear old selects?

Is there a way to clear old selects in a .select("table.col1, ...") statement?
Background:
I have a scope that requests accessible items for a given user id (simplified)
scope :accessible, lambda { |user_id|
joins(:users).select("items.*")
.where("items_users.user_id = ?) OR items.created_by = ?", user_id, user_id)
}
Then for example in the index action i only need the item id and title, so i would do this:
#items = Item.accessible(#auth.id).select("polls.id, polls.title")
However, this will select the columns "items., items.id, items.title". I'd like to avoid removing the select from the scope, since then i'd have to add a select("items.") everywhere else. Am I right to assume that there is no way to do this, and i either live with fetching too many fields or have to use multiple scopes?
Fortunately you're wrong :D, you can use the #except method to remove some parts of the query made by the relation, so if you want to remove the SELECT part just do :
#items = Item.accessible(#auth.id).except(:select).select("polls.id, polls.title")
reselect (Rails 6+)
Rails 6 introduced a new method called reselect, which does exactly what you need, it replaces previously set select statement.
So now, your query can be written even shorter:
#items = Item.accessible(#auth.id).reselect("polls.id, polls.title")