Selecting rows with highest date - sql

I have the following query that throws a result like in the example:
SELECT P.IdArt, P.IdAdr, P.gDate, P.Price
FROM dbo.T_PriceData AS P INNER JOIN
dbo.T_Adr AS A ON P.IdAdr = A.IdAdr INNER JOIN
dbo.T_Stat AS S ON A.IdStat = S.IdStat
GROUP BY P.IdArt, P.IdAdr, P.gDate, P.Price
IdArt IdAdr gDate Price
1 10 01/01/2018 1.25
1 10 02/01/2018 1.17
1 10 03/01/2018 1.18
2 15 01/01/2018 1.03
2 18 10/01/2018 0.12
3 25 12/01/2018 0.98
3 25 28/01/2018 1.99
4 30 15/01/2018 2.55
5 35 08/01/2018 0.11
The final result I want is:
When the IdArt and IdAdr are the same, there should be only one row with the highest date of all rows (CASE IdArt 1)
When IdArt is the same but IdAdr is different, there should be a row with each IdAdr with the highest date for each IdAdr. (CASE IdArt 2)
Price doens't affect anything.
So the final table I would like to have is:
IdArt IdAdr gDate Price
1 10 03/01/2018 1.18
2 15 01/01/2018 1.03
2 18 10/01/2018 0.12
3 25 28/01/2018 1.99
4 30 15/01/2018 2.55
5 35 08/01/2018 0.11
How can I do that?
I tried with a having clausule selecting by MAX(gDate) but, of course, I only get one row with the max date from the whole database.

There are lots of answers out there on how to do this, however, this gets you what you are after:
SELECT TOP 1 WITH TIES
P.IdArt,
P.IdAdr,
P.gDate,
P.Price
FROM dbo.T_PriceData P
--INNER JOIN dbo.T_Adr A ON P.IdAdr = A.IdAdr --You don't reference this in the SELECT or WHERE. Why is it here?
--INNER JOIN dbo.T_Stat S ON A.IdStat = S.IdStat --You don't reference this in the SELECT or WHERE. Why is it here?
ORDER BY ROW_NUMBER() OVER (PARTITION BY P.IdArt, P.IdAdr ORDER BY P.gDate DESC);
Edit: If the JOINs are there to ensure that there are rows in the other tables, then as per the comments I would use EXISTS. If you just use JOIN, and only returning rows from the first table, then you could end up with duplicate rows.
SELECT TOP 1 WITH TIES
P.IdArt,
P.IdAdr,
P.gDate,
P.Price
FROM dbo.T_PriceData P
WHERE EXISTS (SELECT 1
FROM dbo.T_Adr A
WHERE P.IdAdr = A.IdAdr)
AND EXISTS (SELECT 1
FROM dbo.T_Stat S
WHERE A.IdStat = S.IdStat)
ORDER BY ROW_NUMBER() OVER (PARTITION BY P.IdArt, P.IdAdr ORDER BY P.gDate DESC);

You want the highest date for each IdArt/IdAdr combination. Window functions are tempting, but the most efficient method is often a correlated subquery.
Your query is only selecting from T_PriceData, so the rest of the query (the joins and group by) do not seem necessary -- unless the joins are filtering the data which seems unlikely because the joins are to reference tables.
So I would recommend:
SELECT P.IdArt, P.IdAdr, P.gDate, P.Price
FROM dbo.T_PriceData P
WHERE P.gDate = (SELECT MAX(P2.gDate)
FROM dbo.T_PriceData P2
WHERE P2.IdArt = P.IdArt AND
P2.IAdr = P.IdAdr
);
For performance you want indexes on (IdArt, IdAdr, gDate).

You can use ROW_Number():
SELECT
q.IdArt
, q.IdArt
, q.IdADr
, q.gDate
, q.Price
FROM (
SELECT
t.IdArt
, t.IdADr
, t.gDate
, t.Price
, ROW_NUMBER() OVER (PARTITION BY t.IdArt, t.IdADr ORDER BY t.gDate DESC) rn
FROM dbo.T_PriceData t
) q
WHERE q.rn = 1

Related

How to get the most sold Product in PostgreSQL?

Given a table products
pid
name
123
Milk
456
Tea
789
Cake
...
...
and a table sales
stamp
pid
units
14:54
123
3
15:02
123
9
15:09
456
1
15:14
456
1
15:39
456
2
15:48
789
12
...
...
...
How would I be able to get the product(s) with the most sold units?
My goal is to run a SELECT statement that results in, for this example,
pid
name
123
Milk
789
Cake
because the sum of sold units of both those products is 12, the maximum value (greater than 4 for Tea, despite there being more sales for Tea).
I have the following query:
SELECT DISTINCT products.pid, products.name
FROM sales
INNER JOIN products ON sale.pid = products.pid
INNER JOIN (
SELECT pid, SUM(units) as sum_units
FROM sales
GROUP BY pid
) AS total_units ON total_units.pid = sales.pid
WHERE total_units.sum_units IN (
SELECT MAX(sum_units) as max_units
FROM (
SELECT pid, SUM(units) as sum_units
FROM sales
GROUP BY pid
) AS total_units
);
However, this seems very long, confusing, and inefficient, even repeating the sub-query to obtain total_units, so I was wondering if there was a better way to accomplish this.
How can I simplify this? Note that I can't use ORDER BY SUM(units) LIMIT 1 in case there are multiple (i.e., >1) products with the most units sold.
Thank you in advance.
Since Postgres 13 it has supported with ties so your query can be simply this:
select p.pId, p.name
from sales s
join products p on p.pid = s.pid
group by p.pId, p.name
order by Sum(units) desc
fetch first 1 rows with ties;
See demo Fiddle
Solution for your problem:
WITH cte1 AS
(
SELECT s.pid, p.name,
SUM(units) as total_units
FROM sales s
INNER JOIN products p
ON s.pid = p.pid
GROUP BY s.pid, p.name
),
cte2 AS
(
SELECT *,
DENSE_RANK() OVER(ORDER BY total_units DESC) as rn
FROM cte1
)
SELECT pid,name
FROM cte2
WHERE rn = 1
ORDER BY pid;
Working example: db_fiddle link

SQL sum grouped by field with all rows

I have this table:
id sale_id price
-------------------
1 1 100
2 1 200
3 2 50
4 3 50
I want this result:
id sale_id price sum(price by sale_id)
------------------------------------------
1 1 100 300
2 1 200 300
3 2 50 50
4 3 50 50
I tried this:
SELECT id, sale_id, price,
(SELECT sum(price) FROM sale_lines GROUP BY sale_id)
FROM sale_lines
But get the error that subquery returns different number of rows.
How can I do it?
I want all the rows of sale_lines table selecting all fields and adding the sum(price) grouped by sale_id.
You can use window function :
sum(price) over (partition by sale_id) as sum
If you want sub-query then you need to correlate them :
SELECT sl.id, sl.sale_id, sl.price,
(SELECT sum(sll.price)
FROM sale_lines sll
WHERE sl.sale_id = sll.sale_id
)
FROM sale_lines sl;
Don't use GROUP BY in the sub-query, make it a co-related sub-query:
SELECT sl1.id, sl1.sale_id, sl1.price,
(SELECT sum(sl2.price) FROM sale_lines sl2 where sl2.sale_id = sl.sale_id) as total
FROM sale_lines sl1
In addition to other approaches, You can use CROSS APPLY and get the sum.
SELECT id, sale_id,price, Price_Sum
FROM YourTable AS ot
CROSS APPLY
(SELECT SUM(price) AS Price_Sum
FROM YourTable
WHERE sale_id = ot.sale_id);
SELECT t1.*,
total_price
FROM `sale_lines` AS t1
JOIN(SELECT Sum(price) AS total_price,
sale_id
FROM sale_lines
GROUP BY sale_id) AS t2
ON t1.sale_id = t2.sale_id

how do you get average of top 5 for each records within same table

I need to calculate average for some price as shown below. I am able to do that for each records within that table but how do I do if I want to do this for only first 5 latest dated records? Below query does provide that average but for everything but I am looking for top 5 desc date. This can be solved using row_number() probably but that might be too slow as I have tons of records within that table and row_number processes each row at a time etc. is there any better way doing this? I am trying to do this on SQLServer DB BTW.
select a.id, AVG(a.someprice) as avgPrice
from
(select p.id, p.date, calculateSomePrice(p.id, p.date) someprice
from table1 p
where p.id in (select id from table1)) a
group by a.id
;
Let me give you example records that I have on table which can clarify what I am looking for.
id date prc
1 08/02/2017 1.5
1 08/01/2017 2.5
1 07/31/2017 3.5
1 07/30/2017 1
1 07/29/2017 4
1 07/23/2017 4.5
1 07/20/2017 5
1 07/22/2017 5.5
2 07/29/2017 1.5
2 07/28/2017 2.5
2 07/27/2017 4
2 07/26/2017 4
2 07/23/2017 5.5
2 07/22/2017 3.5
2 07/21/2017 5
2 07/20/2017 0.5
2 07/18/2017 4.5
Now when we run above my query, we get average of everything for all the records but what I am looking for is, top 5 average per date for each id. In this case for 1 and 2 both.
below query will give me result for id 1 that I am looking for but I need to do this for all Ids in my table and my table has 481397362 records so doing this in loop doesn't make sense.
select a.id, AVG(a.somePrice)
from (
select top 5 p.id, p.date, calculateSomePrice(p.id, p.date) somePrice
from table1 p
where p.id = 1
order by p.date desc ) a
group by a.id;
If I understand your requirement correctly you can query as below:
select a.id, AVG(a.someprice) as avgPrice
from
( select p.id, p.date, calculateSomePrice(p.id, p.date) someprice,
RowN = Row_Number() over(partition by p.Id order by p.[date] desc)
from table1 p
) a
Where a.RowN <= 5
group by a.id
;
You can pick up TOP 5 ids and use them to find out the average.
select a.id, AVG(a.someprice) as avgPrice
from
(select p.id, p.date, calculateSomePrice(p.id, p.date) someprice
from table1 p
where p.id in (select TOP 5 id from table1 order by update_Date desc)) a
group by a.id;
you can use do it like this:
select a.id, AVG(a.someprice) as avgPrice
from
( select TOP 5 p.id, p.date, calculateSomePrice(p.id, p.date) someprice
from table1 p
ORDER BY P.DATE DESC
) a
group by a.id
;
Based on your query and sample result, we need to tag your top 5 avgPrice in DESC
WITH cte AS (
SELECT a.id,
AVG(a.someprice) as avgPrice
FROM (SELECT p.id,
p.date, calculateSomePrice(p.id, p.date) someprice
FROM table1 p
WHERE p.id in (SELECT id
FROM table1)
) a
GROUP BY a.id
)
SELECT id,
avgPrice
FROM (SELECT id,
avgPrice,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY avgPrice DESC) rank
FROM cte
GROUP BY id,
avgPrice
) t
WHERE rank <=5
select a.id, avg(a.prc) as avgPrice from
(
SELECT *
FROM (
SELECT id, date, prc, Rank()
over (Partition BY id
ORDER BY date DESC ) AS Rank
FROM table
) rs WHERE Rank <= 5
)
group by a.id

Need Solution for SQL query

My SQL query is ,
SELECT
T.TAX_NAME,A.TAX_AMT_ID,A.TAX_MAP_ID,A.EFFECTIVE_FROM
FROM
MAS_TAX T
INNER JOIN
MAS_TAX_MAP M ON T.TAX_ID = M.TAX_ID
LEFT OUTER JOIN
MAS_TAX_AMOUNT A ON M.TAX_MAP_ID = A.TAX_MAP_ID
WHERE
EFFECTIVE_FROM <= GETDATE()
I am getting output for the above query is:
TAX_NAME TAX_AMT_ID TAX_MAP_ID EFFECTIVE_FROM
-------------------------------------------------------
Income Tax 12 5 02-06-2014
Service Tax 16 4 02-06-2014
Gift Tax 3 1 29-05-2014
Gift Tax 2 1 28-05-2014
Gift Tax 4 1 27-05-2014
But I need to get below output. Can any one help me?
TAX_NAME TAX_AMT_ID TAX_MAP_ID EFFECTIVE_FROM
-------------------------------------------------------
Income Tax 12 5 02-06-2014
Service Tax 16 4 02-06-2014
Gift Tax 3 1 29-05-2014
You seem to want the most recent record. You can do this in SQL Server using row_number():
select TAX_NAME, TAX_AMT_ID, TAX_MAP_ID, EFFECTIVE_FROM
from (select t.*,
row_number() over (partition by tax_name order by effective_from desc) as seqnum
from table t
) t
where seqnum = 1;
EDIT:
For your particular query:
SELECT T.TAX_NAME, A.TAX_AMT_ID, A.TAX_MAP_ID, A.EFFECTIVE_FROM
FROM (SELECT T.TAX_NAME, A.TAX_AMT_ID, A.TAX_MAP_ID, A.EFFECTIVE_FROM,
ROW_NUMBER() OVER (PARTITION BY T.TAX_NAME ORDER BY A.EFFECTIVE_FROM DESC) as seqnum
FROM MAS_TAX T INNER JOIN
MAS_TAX_MAP M
ON T.TAX_ID = M.TAX_ID LEFT OUTER JOIN
MAS_TAX_AMOUNT A
ON M.TAX_MAP_ID = A.TAX_MAP_ID
WHERE EFFECTIVE_FROM <= GETDATE()
) t
WHERE seqnum = 1;

Pivot SQL with Rank

Basically i have the following query and i am trying to distinguish only the unique ranks from this:
WITH numbered_rows
as (
SELECT Claim,
reserve,
time,
RANK() OVER (PARTITION BY ClaimNumber ORDER BY time asc) as 'Rank'
FROM (
SELECT cc.Claim,
MAX(csd.time) as time,
csd.reserve
FROM ClaimData csd WITH (NOLOCK)
JOIN Core cc WITH (NOLOCK)
on cc.ClaimID = csd.ClaimID
GROUP BY cc.Claim, csd.Reserve
) as t
)
select *
from numbered_rows cur, numbered_rows prev
where cur.Claim= prev.Claim
and cur.Rank = prev.Rank -1
The results set I get is the following:
Claim reserve Time Rank Claim reserve Time Rank
--------------------------------------------------------------------
11 0 12/10/2012 1 11 15000 5/30/2013 2
34 2000 1/21/2013 1 34 750 1/31/2013 2
34 750 1/31/2013 2 34 0 3/31/2013 3
07 800000 5/9/2013 1 07 0 5/10/2013 2
But what I only want to see the following: (have the Claim 34 Rank 2 removed because its not the highest
Claim reserve Time Rank Claim reserve Time Rank
--------------------------------------------------------------------
11 0 12/10/2012 1 11 15000 5/30/2013 2
34 750 1/31/2013 2 34 0 3/31/2013 3
07 800000 5/9/2013 1 07 0 5/10/2013 2
I think you can do this by just reversing your logic, i.e. order by time DESC, switching cur and prev in your final select and changing -1 to +1 in your final select, then just limiting prev.rank to 1, therefore ensuring that the you only include the latest 2 results for each claim:
WITH numbered_rows AS
( SELECT Claim,
reserve,
time,
[Rank] = RANK() OVER (PARTITION BY ClaimNumber ORDER BY time DESC)
FROM ( SELECT cc.Claim,
[Time] = MAX(csd.time),
csd.reserve
FROM ClaimData AS csd WITH (NOLOCK)
INNER JOIN JOIN Core AS cc WITH (NOLOCK)
ON cc.ClaimID = csd.ClaimID
GROUP BY cc.Claim, csd.Reserve
) t
)
SELECT *
FROM numbered_rows AS prev
INNER JOIN numbered_rows AS cur
ON cur.Claim= prev.Claim
AND cur.Rank = prev.Rank + 1
WHERE prev.Rank = 1;