Nested subquery in django ORM - sql

I need to transform this query to django, but I can't figure out how.
SELECT SUM(income)
FROM (
SELECT COUNT(keyword)*
CASE
WHEN country='ca' THEN 390
WHEN country='fi' THEN 290
WHEN country='it' THEN 280
WHEN country='nl' THEN 260
ELSE 250
END AS income
FROM analytics_conversions
WHERE keyword = 'online'
AND click_time BETWEEN '2022-06-01' AND '2022-06-30'
GROUP BY country) as _
Now I have this code, but it returns multiple rows. These rows should be summed and return only that one row to be used in a subquery.
keywords_conversions_params = {
'keyword': OuterRef('keyword'),
'keyword_type': OuterRef('keyword_type')
}
keywords_conversions_value = Conversions.objects.filter(
**keywords_conversions_params).order_by().values('keyword').annotate(
value=Count('pk') * Case(
When(country='ca', then=350),
When(country='fi', then=290),
When(country='it', then=280),
When(country='nl', then=260),
default=250
)).values('value')

I managed to fix this issue by removing grouping and simply summing the values with a condition. This is the fixed code.
keywords_conversions_params = {
'keyword': OuterRef('keyword'),
'keyword_type': OuterRef('keyword_type')
}
keywords_conversions_value = Conversions.objects.filter(
**keywords_conversions_params).values('keyword').annotate(
value=Sum(Case(
When(country='ca', then=350),
When(country='fi', then=290),
When(country='it', then=280),
When(country='nl', then=260),
default=250
))).values('value')

Related

UNION ALL Slower than N queries

This question is following this question where I wanted to select the MAX value of multiples fields while retrieving each row.
The accepted answer with UNION ALL worked like a charm but I now have some scaling issues.
To give some context, I have more than 3 million rows in my matches table and the filters used in the WHERE condition can reduce this dataset to about 5000-6000 rows. I'm using PostgreSQL.
The query takes something like 14-16 seconds to process. The strange thing is that if I run one query at a time, it will take 150ms.
So if my maths are corrects, the total duration of this query should be 150ms * 20 (number of fields to select max value) = 3 seconds, not 16 ??
Why the entire query takes so much time ?
Here are some questions I have about that:
Is it just better to do 20 queries and aggregate the final result ?
Can I speed up my query by using some index ?
Is it possible to make the WHERE filters + JOIN only once instead of doing it in all my queries ?
PS: here is the Node.js code I use if you want to read the query in a more readable way than the 500 lines of the pastebin:
const fields = [
'match_players.kills',
'match_players.deaths',
'match_players.assists',
'match_players.gold',
'matches.game_duration',
'match_players.minions',
'match_players.kda',
'match_players.damage_taken',
'match_players.damage_dealt_champions',
'match_players.damage_dealt_objectives',
'match_players.kp',
'match_players.vision_score',
'match_players.critical_strike',
'match_players.time_spent_living',
'match_players.heal',
'match_players.turret_kills',
'match_players.killing_spree',
'match_players.double_kills',
'match_players.triple_kills',
'match_players.quadra_kills',
'match_players.penta_kills',
]
const query = fields
.map((field) => {
return `
(SELECT
'${field}' AS what,
${field} AS amount,
match_players.win as result,
matches.id,
matches.date,
matches.gamemode,
match_players.champion_id
FROM
match_players
INNER JOIN
matches
ON
matches.id = match_players.match_id
WHERE
match_players.summoner_puuid = :puuid
AND match_players.remake = 0
AND matches.gamemode NOT IN (800, 810, 820, 830, 840, 850, 2000, 2010, 2020)
ORDER BY
${field} DESC, matches.id
LIMIT
1)
`
})
.join('UNION ALL ')
const { rows } = await Database.rawQuery(query, { puuid })
Thanks a lot for your time.
If your database engine and API support common table expressions (WITH keyword), then you could first perform the query that makes the join and the filtering, and then use the result set for performing the UNION ALL:
const query = `
WITH base as (
SELECT
${fields.join()},
match_players.win as result,
matches.id,
matches.date,
matches.gamemode,
match_players.champion_id
FROM
match_players
INNER JOIN
matches
ON
matches.id = match_players.match_id
WHERE
match_players.summoner_puuid = :puuid
AND match_players.remake = 0
AND matches.gamemode NOT IN (800, 810, 820, 830, 840, 850, 2000, 2010, 2020)
)
` + fields.map((field) => `
(SELECT
'${field}' AS what,
${field.split(".").pop()} AS amount,
result,
id,
date,
gamemode,
champion_id
FROM
base
ORDER BY
2 DESC, id
LIMIT
1)
`).join(' UNION ALL ');

SQL GROUP BY function returning incorrect SUM amount

I've been working on this problem, researching what I could be doing wrong but I can't seem to find an answer or fault in the code that I've written. I'm currently extracting data from a MS SQL Server database, with a WHERE clause successfully filtering the results to what I want. I get roughly 4 rows per employee, and want to add together a value column. The moment I add the GROUP BY clause against the employee ID, and put a SUM against the value, I'm getting a number that is completely wrong. I suspect the SQL code is ignoring my WHERE clause.
Below is a small selection of data:
hr_empl_code hr_doll_paid
1 20.5
1 51.25
1 102.49
1 560
I expect that a GROUP BY and SUM clause would give me the value of 734.24. The value I'm given is 211461.12. Through troubleshooting, I added a COUNT(*) column to my query to work out how many lines it's running against, and it's giving a result of 1152, furthering reinforces my belief that it's ignoring my WHERE clause.
My SQL code is as below. Most of it has been generated by the front-end application that I'm running it from, so there is some additional code in there that I believe does assist the query.
SELECT DISTINCT
T000.hr_empl_code,
SUM(T175.hr_doll_paid)
FROM
hrtempnm T000,
qmvempms T001,
hrtmspay T166,
hrtpaytp T175,
hrtptype T177
WHERE 1 = 1
AND T000.hr_empl_code = T001.hr_empl_code
AND T001.hr_empl_code = T166.hr_empl_code
AND T001.hr_empl_code = T175.hr_empl_code
AND T001.hr_ploy_ment = T166.hr_ploy_ment
AND T001.hr_ploy_ment = T175.hr_ploy_ment
AND T175.hr_paym_code = T177.hr_paym_code
AND T166.hr_pyrl_code = 'f' AND T166.hr_paid_dati = 20180404
AND (T175.hr_paym_type = 'd' OR T175.hr_paym_type = 't')
GROUP BY T000.hr_empl_code
ORDER BY hr_empl_code
I'm really lost where it could be going wrong. I have stripped out the additional WHERE AND and brought it down to just T166.hr_empl_code = T175.hr_empl_code, but it doesn't make a different.
By no means am I any expert in SQL Server and queries, but I have decent grasp on the technology. Any help would be very appreciated!
Group by is not wrong, how you are using it is wrong.
SELECT
T000.hr_empl_code,
T.totpaid
FROM
hrtempnm T000
inner join (SELECT
hr_empl_code,
SUM(hr_doll_paid) as totPaid
FROM
hrtpaytp T175
where hr_paym_type = 'd' OR hr_paym_type = 't'
GROUP BY hr_empl_code
) T on t.hr_empl_code = T000.hr_empl_code
where exists
(select * from qmvempms T001,
hrtmspay T166,
hrtpaytp T175,
hrtptype T177
WHERE T000.hr_empl_code = T001.hr_empl_code
AND T001.hr_empl_code = T166.hr_empl_code
AND T001.hr_empl_code = T175.hr_empl_code
AND T001.hr_ploy_ment = T166.hr_ploy_ment
AND T001.hr_ploy_ment = T175.hr_ploy_ment
AND T175.hr_paym_code = T177.hr_paym_code
AND T166.hr_pyrl_code = 'f' AND T166.hr_paid_dati = 20180404
)
ORDER BY hr_empl_code
Note: It would be more clear if you have used joins instead of old style joining with where.

SQL Server: weighted average + GROUP BY

I am trying to calculate a weighted average in SQL Server. I'm aware that there are tons of questions out there addressing the problem, but I have the additional problem that I query a lot of other columns with a GROUP BY and aggregate functions like sum() and avg().
Here is my query:
SELECT
AVG(tauftr.kalkek) AS 'PurchPrice',
SUM(tauftr.amount) AS 'Amount',
AVG(tauftr.price) AS 'SellingPrice',
tauftr.product AS 'Product',
auftrkopf.ins_usr AS 'Seller',
DATEPART(wk, auftrkopf.date) AS 'Week',
AVG([margin]) AS 'Margin' /* <--- THIS IS WRONG */
/* CALCULATE HERE: WEIGHTED AVERAGE BETWEEN 'amount' and 'margin' */
FROM
[tauftr] AS tauftr
JOIN
tauftrkopf AS auftrkopf ON tauftr.linktauftrkopf = auftrkopf.kopfnr
WHERE
auftrkopf.[status] = 'L'
AND auftrkopf.typ = 'B'
AND auftrkopf.date >= '01.03.2017'
AND auftrkopf.ins_usr ='HS'
GROUP BY
tauftr.product, auftrkopf.ins_usr, DATEPART(wk,auftrkopf.date)
I suppose it could be possible to use a INNER JOIN with exactly the same WHERE clause, but I don't want to execute the query two times. And I don't know ON what field to JOIN...
Is it possible without creating a table? (I do not have write permissions)
Assuming you want Weighed Avg Margin ...
Select
...
sum(amount*margin)/sum(amount) as 'Weighted Avg'
...
From ...
Group By ...
Edit - To avoid the dreaded Divide-By-Zero
case when sum(amount)=0 then null else sum(amount*margin)/sum(amount) end as 'Weighted Avg'
Edit 2 - NullIf()
...
sum(amount*margin)/NullIf(sum(amount),0) as 'Weighted Avg'
...

Update column within CASE statement with results of a subquery postgres

I need to update a column based on the results of a subquery. If the subquery returns results for that column then the columns must be updated, is the query returns no results for that column then I need to update with 0.
I do not know where to place the subquery and how to combine it with the CASE statement. This is what I thought but the syntax is not correct. Can anybody help please?
(SELECT datazones.ogc_fid, count(*) as total
FROM suppliersnew suppliers, datazone_report_resupply datazones
WHERE St_contains(datazones.geom, suppliers.geometry) AND (suppliers.status = 'Under construction' OR
suppliers.status = 'Unknown' OR suppliers.status = 'Operational') GROUP by datazones.ogc_fid ORDER BY total ASC) sources
UPDATE datazone_report_resupply
SET es_actual =
CASE
WHEN datazone_report_resupply.ogc_fid = sources.ogc_fid THEN sources.total
ELSE 0
END
The query is a little hard to follow, because the aggregation is on the outer column (this is unusual). However, you don't need aggregation or order by. You only seem to care whether a row exists.
I think the logic is:
UPDATE datazone_report_resupply r
SET es_actual =
(CASE WHEN EXISTS (SELECT 1
FROM suppliersnew s
WHERE St_contains(r.geom, s.geometry) AND
s.status IN ('Under construction', 'Unknown', 'Operational')
)
THEN 1 ELSE 0
END);

T-SQL Sum Values of Like Rows

I currently use this select statement in SSRS to report Recent Demand and Days of Inventory to end users.
select Issue.MATERIAL_NUMBER,
SUM(Issue.SHIPPED_QTY)AS DEMAND_QTY,
Main.QUANTITY_TOTAL_STOCK / SUM(Issue.SHIPPED_QTY) * 122 AS [DOI]
From AGS_DATAMART.dbo.GOODS_ISSUE AS Issue
join AGS_DATAMART.dbo.OPR_MATERIAL_DIM AS MAT on MAT.MATERIAL_NUMBER = Issue.MATERIAL_NUMBER
join AGS_DATAMART.dbo.SCE_ECC_MAIN_FINAL_INV_FACT AS MAIN on MAT.MATERIAL_SID = MAIN.MATERIAL_SID
join AGS_DATAMART.dbo.SCE_PLANT_DIM AS PLANT on PLANT.PLANT_SID = MAIN.PLANT_SID
Where Issue.SHIP_TO_CUSTOMER_ID = #CUSTID
and Issue.ACTUAL_PGI_DATE > GETDATE() - 122
and PLANT.PLANT_CODE = #CUSTPLANT
and MAIN.STORAGE_LOCATION = '0001'
Group by Issue.MATERIAL_NUMBER,Main.QUANTITY_TOTAL_STOCK
Pretty Simple.
But is has come to my attention, that they have similar Material Numbers whos values need to be combined.
Material | Qty
0242-55161W 1
0242-55161 3
The two Material Numbers above should be combined and reported as 0242-55161 Qty 4.
How do I combine rows like this? This is just 1 of many queries that will need to be adjusted. Is it possible?
EDIT - The similar material will always be the base number plus the "W", if that matters.
Please note I am brand new to SQL and SSRS, and this is my first time posting here.
Let me know if I need to include any other details.
Thanks in advance.
Answer;
Using just replace, it kept returning 2 unique lines even when using SUM.
I was able to get the desired result using the following. Can you see anything wrong with this method?
with Issue_Con AS
(
select replace(Issue.MATERIAL_NUMBER,'W','') As [MATERIAL_NUMBER],
Issue.SHIPPED_QTY AS [SHIPPED_QTY]
From AGS_DATAMART.dbo.GOODS_ISSUE AS Issue
Where Issue.SHIP_TO_CUSTOMER_ID = #CUSTSHIP
and Issue.SALES_ORDER_TYPE_CODE = 'ZTPC'
and Issue.ACTUAL_PGI_DATE > GETDATE() - 122
)
select Issue_Con.MATERIAL_NUMBER,
SUM(Issue_Con.SHIPPED_QTY)AS [DEMAND_QTY],
Main_Con.QUANTITY_TOTAL_STOCK / SUM(Issue_Con.SHIPPED_QTY) * 122 AS [DOI]
From Issue_Con
join Main_Con on Main_Con.MATERIAL_Number = Issue_Con.MATERIAL_Number
Group By Issue_Con.MATERIAL_NUMBER, Main_Con.QUANTITY_TOTAL_STOCK;
You need to replace Issue.MATERIAL_NUMBER in the select and group by with something else. What that something else is depends on your data.
If it's always 10 digits with anything afterwards ignored, then you can use substr(Issue.MATERIAL_NUMBER, 1, 10)
If the extraneous character is always W and there are no Ws in the proper number, then you can use replace(Issue.MATERIAL_NUMBER, 'W', '')
If it's anything from the first alphabetic character, then you can use case when patindex('%[A-Za-z]%', Issue.MATERIAL_NUMBER) = 0 then Issue.MATERIAL_NUMBER else substr(Issue.MATERIAL_NUMBER, 1, patindex('%[A-Za-z]%', Issue.MATERIAL_NUMBER)) end
You could group your data by this expression instead of MATERIAL_NUMBER:
CASE SUBSTRING(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER), 1)
WHEN 'W' THEN LEFT(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER) - 1)
ELSE MATERIAL_NUMBER
END
That is, check if the last character is W. If it is, return all but the last character, otherwise return the entire value.
To avoid repeating the same expression twice (once in GROUP BY and once in SELECT) you could use a subselect, for example like this:
select Issue.MATERIAL_NUMBER_GROUP,
SUM(Issue.SHIPPED_QTY)AS DEMAND_QTY,
Main.QUANTITY_TOTAL_STOCK / SUM(Issue.SHIPPED_QTY) * 122 AS [DOI]
From (
SELECT
*,
CASE SUBSTRING(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER), 1)
WHEN 'W' THEN LEFT(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER) - 1)
ELSE MATERIAL_NUMBER
END AS MATERIAL_NUMBER_GROUP
FROM AGS_DATAMART.dbo.GOODS_ISSUE
) AS Issue
join AGS_DATAMART.dbo.OPR_MATERIAL_DIM AS MAT on MAT.MATERIAL_NUMBER = Issue.MATERIAL_NUMBER
join AGS_DATAMART.dbo.SCE_ECC_MAIN_FINAL_INV_FACT AS MAIN on MAT.MATERIAL_SID = MAIN.MATERIAL_SID
join AGS_DATAMART.dbo.SCE_PLANT_DIM AS PLANT on PLANT.PLANT_SID = MAIN.PLANT_SID
Where Issue.SHIP_TO_CUSTOMER_ID = #CUSTID
and Issue.ACTUAL_PGI_DATE > GETDATE() - 122
and PLANT.PLANT_CODE = #CUSTPLANT
and MAIN.STORAGE_LOCATION = '0001'
Group by Issue.MATERIAL_NUMBER_GROUP,Main.QUANTITY_TOTAL_STOCK