Django queryset LEFT OUTER JOIN against another queryset as subquery - orm

I have two Django models, X and Y, I have a query set of X objects and I want to join it against a queryset of Y objects.
I can't use raw SQL here as the API of the code I'm working with expects a queryset to be returned.
I can do the following
def my_method(x_queryset):
y_subquery = Y.objects.distinct('x_id').order_by('-dt')
return x.queryset.filter(id__in==y_subquery)
This generates the approximate SQL:
SELECT *
FROM x
WHERE id IN (SELECT DISTINCT ON (x_id) x_id
FROM y
ORDER BY dt DESC)
But the above is not exactly what I want (and not equivalent to what I need). What I really want to to is a LEFT OUTER JOIN, with the rough SQL being:
SELECT x.*
FROM x
LEFT OUTER JOIN
(SELECT DISTINCT ON (x_id) x_id, created
FROM y
ORDER BY dt DESC) sub ON x.id = sub.x_id;
I cannot figure out how to do a custom left outer join against a subquery with Django ORM, pseudo code of what I would like is:
def my_method(x_queryset):
y_subquery = Y.objects.distinct('x_id').order_by('-dt')
return x.queryset.left_outer_join(y_subquery, {'id': 'x_id'})
Any help would be greatly appreciated!

Related

In postgres, how to sort on, among others, a result value of a string_agg function, without selecting that field? Subquery not possible

I have this query (already stripped down but more complex in reality):
SELECT
e.id,
string_agg(csb.description, ',' ORDER BY cs.id ASC) AS condition_set_descriptions
FROM event e
JOIN event_trigger_version etv ON e.event_trigger_version_id = etv.id
LEFT JOIN event_condition_set ecs ON ecs.event_id = e.id
JOIN condition_set cs ON cs.id = ecs.condition_set_id
JOIN condition_set_base csb ON cs.condition_set_base_id = csb.id
JOIN subject s ON e.subject_id = s.id
GROUP BY
e.id, event_level, s.name
ORDER BY s.name ASC, condition_set_descriptions ASC, event_level DESC
LIMIT 20 OFFSET 0
Now I can have a dynamic ORDER BY including other columns too (that are omitted in this example), but always including condition_set_descriptions somewhere in the order. This column is the result of a string_agg function. I cannot move this to a subquery because the LIMIT that is set should apply to the result of the combination of ORDER BY columns that are defined.
The example works fine, but the downside is that the condition_set_descriptions column is also returned as a result of the query, but this is a lot of data and it's not actually needed (as the actual descriptions are looked up in another way using some of the omitted data). All that is needed is that the result is sorted. How can I do this without selecting this in a subquery that would ruin the correctness of the multi-sort limited result set?
ORDER BY can work on calculated expressions too; it doesn't have to be that you calculate something in the SELECT that you then alias and reference the alias in the ORDER BY
Take a look at doing:
SELECT
e.id
FROM
...
ORDER BY
s.name ASC,
string_agg(csb.description, ',' ORDER BY cs.id ASC) ASC,
event_level DESC
LIMIT 20 OFFSET 0
Also double check that left join you have there in the middle of the inner joins; any nulls it produces will be removed again when the next table is inner joined into it, so you can either inner join it or if you're losing data adopt a pattern of:
w
JOIN x
LEFT JOIN (
y
JOIN z
) a
ie don't left join y to x then inner join z to y, inner join y and z first then left join the result onto x

SQL Left Join - OR clause

I am trying to join two tables. I want to join where all the three identifiers (Contract id, company code and book id) are a match in both tables, if not match using contract id and company code and the last step is to just look at contract id
Can the task be performed wherein you join using all three parameters, if does not, check the two parameters and then just the contract id ?
Code:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
LEFT JOIN #claim_total C
ON ( ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd
AND C.book_id_2 = A.book_id )
OR ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd )
OR ( C.contract_id_2 = A.contract_id ) )
Your ON clause boils down to C.contract_id_2 = A.contract_id. This gets you all matches, no matter whether the most precise match including company and book or a lesser one. What you want is a ranking. Two methods come to mind:
Join on C.contract_id_2 = A.contract_id, then rank the rows with ROW_NUMBER and keep the best ranked ones.
Use a lateral join in order to only join the best match with TOP.
Here is the second option. You forgot to tell us which DBMS you are using. SELECT INTO looks like SQL Server. I hope I got the syntax right:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
OUTER APPLY
(
SELECT TOP(1) *
FROM #claim_total C
WHERE C.contract_id_2 = A.contract_id
ORDER BY
CASE
WHEN C.company_cd_2 = A.company_cd AND C.book_id_2 = A.book_id THEN 1
WHEN C.company_cd_2 = A.company_cd THEN 2
ELSE 3
END
);
If you want to join all rows in case of ties (e.g. many rows matching contract, company and book), then make this TOP(1) WITH TIES.

Filtering sqlite query

Having trouble with filtering in my sqlite3 query. I am working with three tables below.
Table: models
id|data
1|car
2|truck
Table: descriptions
id|model_id|colour|make
1|1|blue|accord
2|1|green|prius
3|1|red|fusion
4|1|black|civic
5|1|white|jeep
6|1|purple|jeep
7|1|brown|jeep
8|1|brown|civic
Table: banned
model_id|colour_id|colour
1|3|black|
1|15|brown|
The below statement is counting how many of what model (cars or trucks) are what make.
SELECT models.id, make, count(make)
FROM description
JOIN models ON models.id = descriptions.model_id
GROUP BY models.id, descriptions.make;
The output would below
1|accord|1
1|prius|1
1|fusion|1
1|civic|2
1|jeep|3
However, I want to put in a qualifier that voids anything containing a banned colour/model combo, by using banned.colour.
I tried joining the table and filtering out like below, but it seems to double the count.
SELECT models.id, make, count(make)
FROM description
JOIN models ON models.id = descriptions.model_id
JOIN banned ON banned.model_id = models.id
WHERE NOT ( banned.colour = descriptions.colour)
GROUP BY models.id, descriptions.make;
My desired output is to void the two cars that fit this qualifier from the count. The final result should below
1|accord|1
1|prius|1
1|fusion|1
1|jeep|2
How can i achieve this?
You can use your approach . . . with a left join and where check:
SELECT m.id, d.make, count(*)
FROM description d JOIN
models m
ON m.id = d.model_id LEFT JOIN
banned b
ON b.model_id = m.id AND b.colour = d.colour
WHERE b.model_id IS NULL -- no match
GROUP BY m.id, d.make;
A common way to write the query would also use NOT EXISTS:
SELECT m.id, d.make, count(*)
FROM description d JOIN
models m
ON m.id = d.model_id
WHERE NOT EXISTS (SELECT 1
FROM banned b
WHERE b.model_id = m.id AND b.colour = d.colour
)
GROUP BY m.id, d.make;
Although you can also use NOT IN, I highly discourage using it with a subquery. It will not do what you want if any of the values returned by the subquery are NULL.
You may use NOT IN with a dependent subquery to avoid banned combinations.
SELECT models.id, make, count(make)
FROM description
JOIN models ON models.id = descriptions.model_id
WHERE colour NOT IN (
SELECT colour
FROM banned
WHERE banned.model_id = models.id
)
GROUP BY models.id, descriptions.make;
The only pitfall of NOT IN is a possibility to have a NULL in the subquery result. However, I believe that it is safe if the colour attribute is defined as NOT NULL (which make sense in this case).

BigQuery COALESCE() with SELECT subquery

I am getting the error:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN
On the following query
(SELECT DISTINCT video_id,
COALESCE(custom_id,
(SELECT custom_id FROM `test2.channel_map` b
WHERE a.channel_id = b.channel_id LIMIT 1),
'Default')
FROM `test2.revenue` a)
I am essentially trying to replace null custom_ids with another custom_id from a lookup table. Is there a better way to do this that BigQuery will accept?
just use regular LEFT JOIN - something like below
SELECT DISTINCT video_id,
COALESCE(
a.custom_id,
b.custom_id,
'Default'
)
FROM `test2.revenue` a
LEFT JOIN `test2.channel_map` b
ON a.channel_id = b.channel_id

Recursive SQL command and duplicate results?

I was wondering if how I am handling removing duplicate results using DISTINCT is the best way to approach my recursive call. Here is my code sample:
with cte as(
SELECT
dbo.Users.Username,
dbo.Contacts.FirstName,
dbo.Contacts.LastName,
tenant.Name,
tenant.Id,
tenant.ParentTenantId
FROM
dbo.Tenants AS tenant
INNER JOIN dbo.Users ON tenant.Id = dbo.Users.TenantId
INNER JOIN dbo.Contacts ON dbo.Users.ContactId = dbo.Contacts.Id
where tenant.Id = '6CD4C969-C794-4C95-9CA2-5984AEC0E32C'
union all
SELECT
dbo.Users.Username,
dbo.Contacts.FirstName,
dbo.Contacts.LastName,
childTenant.Name,
childTenant.Id,
childTenant.ParentTenantId
FROM
dbo.Tenants AS childTenant
INNER JOIN dbo.Users ON childTenant.Id = dbo.Users.TenantId
INNER JOIN dbo.Contacts ON dbo.Users.ContactId = dbo.Contacts.Id
INNER JOIN cte on childTenant.ParentTenantId = cte.Id)
select DISTINCT UserName, FirstName, LastName, Name, Id, ParentTenantId from cte ORDER BY Id
here are the results:
Here are the results without using the DISTINCT key word
While DISTINCT works I am wondering if it is the best way to handle the duplicate results or if I should rework my query somehow.
I'm not sure whether that is the case here, but it is a common misunderstanding that distinct is a function that applies to a certain column. Distinct applies to the row, that is:
select distinct(x), y from t
is the same as:
select distinct x, y from t or select distinct x, (y) from t
Furthermore:
select x, distinct(y) from t
is an invalid construction