SQL getting averages with multiple joins - sql

I'm trying to write a single query using 3 tables.
The tables and their columns that I will be using are:
Sec – ID, Symbol
Hss – Code, HDate, Holiday
Fddd – ID, Date, Price
Given a symbol AAA, I need to get the ID from the first table and match it with the ID from the third table. The second table's date must match the third table's dates with the condition of Code=1 and Holiday=1.
The Dates in the second and third table are in ascending order with most recent dates at the bottom. I want to get the average 50 day and 200 day prices. The dates in the tables are in ascending order so I want to make it descending and select the top 50 and 200 to get the average prices.
So far I can only get one average. I cannot add a second SELECT TOP 50 or add a subquery within the second avg().
SELECT AVG(TwoHun)TwoHunAvg --, AVG(Fifty) AS FiftyAvg
FROM (SELECT TOP 200 Fddd.price AS TwoHun --, TOP 50 Fddd.price AS Fifty
FROM Sec
JOIN Fddd
ON Sec.ID = Fddd.ID AND Sec.symbol = 'AAA'
JOIN Hss
ON Fddd.date = Hss.Hdate AND Hss.Code = 1 AND Hss.Holiday = 1
ORDER BY Fddd.Date DESC) AS tmp;
Thanks in advance!

Consider a union query which even scales for other Averages. I add a Type column to indicate the Averages.
SELECT '200 DAY AVG' As Type, AVG(TwoHun) As Avg
FROM
(SELECT TOP 200 Fddd.price AS TwoHun
FROM Sec
INNER JOIN Fddd ON Sec.ID = Fddd.ID
INNER JOIN Hss ON Fddd.date = Hss.Hdate
WHERE Sec.symbol = 'AAA' AND Hss.Code = 1 AND Hss.Holiday = 1
ORDER BY Fddd.Date DESC) AS tmp;
UNION
SELECT '50 DAY AVG' As Type, AVG(FiftyHun) As Avg
FROM
(SELECT TOP 50 Fddd.price AS FiftyHun
FROM Sec
INNER JOIN Fddd ON Sec.ID = Fddd.ID
INNER JOIN Hss ON Fddd.date = Hss.Hdate
WHERE Sec.symbol = 'AAA' AND Hss.Code = 1 AND Hss.Holiday = 1
ORDER BY Fddd.Date DESC) AS tmp;
Also, I moved some of your join expressions to where clause which should not change performance but does in readability.

I suspect your using SQL Server or MS Access.
One quick solution would be to have your total query as a subquery and then copy a modified version as a second subquery.
Quick rough example:
SELECT (SELECT
AVG(Fifty) AS FiftyAvg
FROM (SELECT TOP 50
Fddd.price AS Fifty
FROM Sec
JOIN Fddd
ON Sec.ID = Fddd.ID
AND Sec.symbol = 'AAA'
JOIN Hss
ON Fddd.date = Hss.Hdate
AND Hss.Code = 1
AND Hss.Holiday = 1
ORDER BY Fddd.Date DESC) AS tmp)
AS FiftyAvg,
(SELECT
AVG(TwoHun) TwoHunAvg
FROM (SELECT TOP 200
Fddd.price AS TwoHun
FROM Sec
JOIN Fddd
ON Sec.ID = Fddd.ID
AND Sec.symbol = 'AAA'
JOIN Hss
ON Fddd.date = Hss.Hdate
AND Hss.Code = 1
AND Hss.Holiday = 1
ORDER BY Fddd.Date DESC) AS tmp)
AS TwoHundredAverge;

Related

Problem optimizing sql query with cross apply sub query

So I have three tables:
MakerParts, that holds the primary information of a Vehicle Part:
Id
MakerId
PartNumber
Description
1
1
ABC1234
Tire
2
1
XYZ1234
Door
MakerPrices, that holds the price history variation for the parts (references MakerParts.Id on MakerPartNumberId, and the table MakerPriceUpdates on UpdateId):
Id
MakerPartNumberId
UpdateId
Price
1
1
1
9.83
2
1
2
11.23
MakerPriceUpdates, that holds the date of prices updates. This update is basically a CSV file that is uploaded to our system. One file, one line on this table, multiple prices changes on the table MakerPrices.
Id
Date
FileName
1
2019-01-09 00:00:00.000
temp.csv
2
2019-01-11 00:00:00.000
temp2.csv
This means that one part (MakerParts) may have multiple prices (MakerPrices). The date of the price change is on the table MakerPricesUpdates.
I want to select all MakerParts where the most recent price is zero, filtering by the MakerId on table MakerParts.
What I've tried:
select mp.* from MakerParts mp cross apply
(select top 1 Price from MakerPrices inner join
MakerPricesUpdates on MakerPricesUpdates.Id = MakerPrices.UpdateId where
MakerPrices.MakerPartNumberId = mp.Id order by Date desc) as p
where mp.MakerId = 1 and p.Price = 0
But that is absurdly slow (we have about 100 million lines on the MakerPrices table). I'm having a hard time optimizing this query. (the result is only two rows for the MakerId 1, and it took 2 mins to run). I also tried:
select * from (
select
mp.*,
(select top 1 Price from MakerPrices inner join
MakerPricesUpdates on MakerPricesUpdates.Id = MakerPrices.UpdateId
where MakerPrices.MakerPartNumberId = mp.Id order by Date desc) as Price
from MakerParts mp) as temp
where temp.Price = 0 and MakerId = 1
Same result, and same time. My query plan (for the first query) (no new indexes suggested by Management Studio):
I think you can avoid joining MakerPriceUpdates with makerprices since with the highest
UpdateId you can find the latest price updates. It will save you some time.
select mp.* from MakerParts mp cross apply
(select top 1 Price from MakerPrices where
MakerPrices.MakerPartNumberId = mp.Id order by MakerPrices.UpdateId desc) as p
where mp.MakerId = 1 and p.Price = 0
You can further reduced some times by avoiding sort and order by with cte and row_number() as below:
;with LatestMakerPrices as
(
select *,row_number()over(partition by MakerPartNumberId order by updateid desc)rn from MakerPrices
)
select mp.* from MakerParts mp cross apply
(select price from LatestMakerPrices lmp where lmp.MakerPartNumberId=mp.Id) as p
where mp.MakerId = 1 and p.Price = 0
Execution plan difference between query in question and my answer:
try:
WITH tab AS (
SELECT *, NULL as Price FROM MakerParts
WHERE not exists (
SELECT Id
FROM MakerPrices
WHERE MakerPrices.MakerPartNumberId = MakerParts.Id
)
)
SELECT * from tab WHERE MakerId = 2
UNION ALL
SELECT a.* , Price
FROM [dbo].[MakerParts] a
LEFT JOIN [dbo].[MakerPrices] b
ON b.MakerPartNumberId = a.Id
WHERE MakerId = 2 AND Price = 0
Try your query:
select mp.* from MakerParts mp cross apply
(select top 1 Price from MakerPrices inner join
MakerPricesUpdates on MakerPricesUpdates.Id = MakerPrices.UpdateId where
MakerPrices.MakerPartNumberId = mp.Id order by Date desc) as p
where mp.MakerId = 1 and p.Price = 0
After creating below index:
CREATE NONCLUSTERED INDEX [NCIdx_MakerPrices_MakerPartNumberId_UpdateId] ON [dbo].[MakerPrices]
(
[MakerPartNumberId] ASC,
[UpdateId] ASC
)
INCLUDE([Price])
And making ID column of MakerPricesUpdates table primary key.

Incorrect count value in subquery on table join column

The query I'm executing seems to be ignoring the where clause in the subquery
(select count(amazon) from orders where b.amazon = 2 and manifest = a.dbid)
column amazon is type INT
SQL SERVER 2014
If I run the query on its own and enter the value for manifest I get the correct result which I am expecting and is 1
select count(amazon) from orders where amazon = 2 and manifest = '211104'
Result Returns 1
When I run the query below I get a result of 5 which is the count of all orders where manifest = 211104 but the value of amazon is 1 in 4 results and 2 in 1 result.
Select distinct
top 30 DBID, today ,sum([amazon-orders])
From
(
SELECT [dbid], [today],
(select count(amazon) from orders
where b.amazon = 2 and manifest = a.dbid) as [amazon-orders]
FROM [manifest] a
join orders b on a.[dbid] = b.[manifest]
) t1
Group By
DBID, today
order by dbid desc
Can someone please help me.
Thanks
You have an extra join so you are counting multiple times... do this:
Select distinct
top 30 DBID, today ,sum([amazon-orders])
From
(
SELECT [dbid], [today],
(select count(amazon) from orders b
where b.amazon = 2 and manifest = a.dbid) as [amazon-orders]
FROM [manifest] a
) t1
Group By
DBID, today
order by dbid desc
or like this
SELECT [dbid], [today], count(o.amazon)
FROM [manifest] a
join orders o on a.dbid = o.manifest and o.amazon = 2
group by dbid, today
or this if you have columns you don't want to join (there is more going on than just this one join in your query and you need to use a left join):
SELECT [dbid], [today], sum(case when o.amazon is not null then 1 else 0 end)
FROM [manifest] a
left join orders o on a.dbid = o.manifest and o.amazon = 2
group by dbid, today
How's this?
SELECT top 30 [dbid], [today], sum(case when b.amazon = 2 then 1 else 0 end) as [amazon-orders]
FROM [manifest] a
join orders b on a.[dbid] = b.[manifest]
group by DBID, today
order by dbid desc
Pretty sure that because your query is in effect joining orders twice it's increasing the count.
Use this :
select a.DBID, a.today, count(b.amazon) from [manifest] a
join orders b on a.[dbid] = b.[manifest] and b.amazon = 2
Group By a.DBID, a.today

T-SQL cursor or if or case when

I have this table:
Table_NAME_A:
quotid itration QStatus
--------------------------------
5329 1 Assigned
5329 2 Inreview
5329 3 sold
4329 1 sold
4329 2 sold
3214 1 assigned
3214 2 Inreview
Result output should look like this:
quotid itration QStatus
------------------------------
5329 3 sold
4329 2 sold
3214 2 Inreview
T-SQL query, so basically I want the data within "sold" status if not there then "inreview" if not there then "assigned" and also at the same time if "sold" or "inreview" or "assigned" has multiple iteration then i want the highest "iteration".
Please help me, thanks in advance :)
This is a prioritization query. One way to do this is with successive comparisons in a union all:
select a.*
from table_a a
where quote_status = 'sold'
union all
select a.*
from table_a a
where quote_status = 'Inreview' and
not exists (select 1 from table_a a2 where a2.quoteid = a.quoteid and a2.quotestatus = 'sold')
union all
select a.*
from table_a a
where quote_status = 'assigned' and
not exists (select 1
from table_a a2
where a2.quoteid = a.quoteid and a2.quotestatus in ('sold', 'Inreview')
);
For performance on a larger set of data, you would want an index on table_a(quoteid, quotestatus).
You want neither cursors nor if/then for this. Instead, you'll use a series of self-joins to get these results. I'll also use a CTE to simplify getting the max iteration at each step:
with StatusIterations As
(
SELECT quotID, MAX(itration) Iteration, QStatus
FROM table_NAME_A
GROUP BY quotID, QStats
)
select q.quotID, coalesce(sold.Iteration,rev.Iteration,asngd.Iteration) Iteration,
coalesce(sold.QStatus, rev.QStatus, asngd.QStatus) QStatus
from
--initial pass for list of quotes, to ensure every quote is included in the results
(select distinct quotID from table_NAME_A) q
--one additional pass for each possible status
left join StatusIterations sold on sold.quotID = q.quotID and sold.QStatus = 'sold'
left join StatusIterations rev on rev.quotID = q.quotID and rev.QStatus = 'Inreview'
left join StatusIterations asngd on asngd.quotID = q.quotID and asngd.QStatus = 'assigned'
If you have a table that equates a status with a numeric value, you can further improve on this:
Table: Status
QStatus Sequence
'Sold' 3
'Inreview' 2
'Assigned' 1
And the code becomes:
select t.quotID, MAX(t.itration) itration, t.QStatus
from
(
select t.quotID, MAX(s.Sequence) As Sequence
from table_NAME_A t
inner join Status s on s.QStatus = t.QStatus
group by t.quotID
) seq
inner join Status s on s.Sequence = seq.Sequence
inner join table_NAME_A t on t.quotID = seq.quotID and t.QStatus = s.QStatus
group by t.quoteID, t.QStatus
The above may look like complicated at first, but it can be faster and it will scale easily beyond three statuses without changing the code.

Insert a batch number into a table in Oracle SQL

I have a temp table where I insert modified data extracted from a SELECT query.
In this temp table I want to group my rows into batches, so I added an indexed INT column called "BATCH_NUM"
The idea that I am hoping to achieve is this (for say 1000 results in my SELECT statement).
Pseudo Code
Batch Size = 100
Count = 0
For batch size in results set
Insert Into Temp Table (a , b , y , count)
Count++
Current SQL - inputs static value of 1 into BATCH_NUM column
INSERT INTO TEMP_TABLE
(
ASSET_ID,
PAR_PROM_INTEG_ID,
IGNORE
BATCH_NUM
)
SELECT carelevel.row_id, pstn.PROM_INTEG_ID,
CASE
WHEN promoprod.fabric_cd = 'Disabled'
THEN 'Y'
ELSE 'N'
END
'1'
FROM SIEBEL.S_ASSET carelevel
INNER JOIN SIEBEL.S_ASSET pstn
ON pstn.row_id = carelevel.par_asset_id
INNER JOIN SIEBEl.S_ASSET promotion
ON pstn.prom_integ_id = promotion.integration_id
INNER JOIN SIEBEL.S_PROD_INT prod
ON prod.row_id = carelevel.prod_id
INNER JOIN SIEBEL.S_ORG_EXT bill
ON carelevel.bill_accnt_id = bill.row_id
INNER JOIN SIEBEL.S_INV_PROF invoice
ON bill.row_id = invoice.accnt_id
INNER JOIN SIEBEL.S_PROD_INT promoprod
ON promotion.prod_id = promoprod.row_id
WHERE prod.part_num = 'Testproduct'
But if the select statement has 1000 results, then I want BATCH_NUM to go from 1,2,3,4,5,6,7,8,9,10 per 100 records.
Can this be done?
To map record to batch, you might simply want to use integer division. Or slightly more complicated as row are numbered from 1, but something like TRUNC((ROWNUM-1)/100)+1 will do the trick.
Here is a test for that mapping:
select level, trunc((ROWNUM-1)/100)+1 from dual connect by level <= 1000
Result:
ROWNUM TRUNC((ROWNUM-1)/100)+1
1 1
...
100 1
101 2
...
200 2
201 3
...
...
900 9
901 10
...
1000 10
Given your query:
INSERT INTO TEMP_TABLE
(
ASSET_ID,
PAR_PROM_INTEG_ID,
IGNORE,
BATCH_NUM
)
SELECT carelevel.row_id, pstn.PROM_INTEG_ID,
CASE
WHEN promoprod.fabric_cd = 'Disabled'
THEN 'Y'
ELSE 'N'
END,
TRUNC((ROWNUM-1)/100)+1,
-- ^^^^^^^^^^^^^^^^^^^^
-- map rows 1-100 to batch 1, rows 101-200 to batch 2 and so on
FROM SIEBEL.S_ASSET carelevel
INNER JOIN SIEBEL.S_ASSET pstn
ON pstn.row_id = carelevel.par_asset_id
INNER JOIN SIEBEl.S_ASSET promotion
ON pstn.prom_integ_id = promotion.integration_id
INNER JOIN SIEBEL.S_PROD_INT prod
ON prod.row_id = carelevel.prod_id
INNER JOIN SIEBEL.S_ORG_EXT bill
ON carelevel.bill_accnt_id = bill.row_id
INNER JOIN SIEBEL.S_INV_PROF invoice
ON bill.row_id = invoice.accnt_id
INNER JOIN SIEBEL.S_PROD_INT promoprod
ON promotion.prod_id = promoprod.row_id
WHERE prod.part_num = 'Testproduct'

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...