SQL - get rid of the nested aggregate select - sql

There is a table Payment, which for example tracks the amount of money user puts into account, simplified as
===================================
Id | UserId | Amount | PayDate |
===================================
1 | 42 | 11 | 01.02.99 |
2 | 42 | 31 | 05.06.99 |
3 | 42 | 21 | 04.11.99 |
4 | 24 | 12 | 05.11.99 |
What is need is to receive a table with balance before payment moment, eg:
===============================================
Id | UserId | Amount | PayDate | Balance |
===============================================
1 | 42 | 11 | 01.02.99 | 0 |
2 | 42 | 31 | 05.06.99 | 11 |
3 | 42 | 21 | 04.11.99 | 42 |
4 | 24 | 12 | 05.11.99 | 0 |
Currently the select statement looks something like
SELECT
Id,
UserId,
Amount,
PaidDate,
(SELECT sum(amount) FROM Payments nestedp
WHERE nestedp.UserId = outerp.UserId AND
nestedp.PayDate < outerp.PayDate) as Balance
FROM
Payments outerp
How can I rewrite this select to get rid of the nested aggregate selection? The database in question is SQL Server 2019.

You need to use cte with some custom logic to handle this type of problem.
WITH PaymentCte
AS (
SELECT ROW_NUMBER() OVER (
PARTITION BY UserId ORDER BY Id
) AS RowId
,Id
,UserId
,PayDate
,Amount
,SUM(Amount) OVER (
PARTITION BY UserId ORDER BY Id
) AS Balance
FROM Payment
)
SELECT X.Id
,X.UserId
,X.Amount
,X.PayDate
,Y.Balance
FROM PaymentCte x
INNER JOIN PaymentCte y ON x.userId = y.UserId
AND X.RowId = Y.RowId + 1
UNION
SELECT X.Id
,X.UserId
,X.Amount
,X.PayDate
,0 AS Balance
FROM PaymentCte x
WHERE X.RowId = 1
This provides the desired output

You can try the following using lag with a cumulative sum
with b as (
select * , isnull(lag(amount) over (partition by userid order by id),0) Amt
from t
)
select Id, UserId, Amount, PayDate,
Sum(Amt) over (partition by userid order by id) Balance
from b
order by Id

Thanks to other participants' leads I came up with a query that (seems) to work:
SELECT
Id,
UserId,
Amount,
PayDate,
COALESCE(sum(Amount) over (partition by UserId
order by PayDate
rows between unbounded preceding and 1 preceding), 0) as Balance
FROM
Payments
ORDER BY
UserId, PayDate
Lots of related examples can be found here

Related

Filtering consecutive dates ranges using SQL Server

I want to filter categories that only have consecutive dates.
I will explain with an example.
My table is
| ID | Category | Date |
|--------------------|-----------------|---------------------|
| 1 | 1 | 01-04-2021 |
| 2 | 1 | 02-04-2021 |
| 3 | 2 | 01-03-2021 |
| 4 | 2 | 04-03-2021 |
| 5 | 2 | 01-02-2010 |
| 6 | 3 | 02-02-2010 |
| 7 | 3 | 03-02-2010 |
| 8 | 4 | 03-02-2010 |
Expected output:
| Category |
|----------------|
| 1 |
| 3 |
| 4 |
I would like to filter my data such as I only have categories that do not contain consecutive dates.
… for unique dates per category
select category
from mytable
group by category
having max(Date) = dateadd(day, count(*)-1, min(Date))
Here's one way. You'll have to maybe adjust it for your particular flavor of SQL.
WITH a AS (
SELECT
category,
DATEDIFF('days', date, LAG(date) OVER (PARTITION BY category ORDER BY
date)) AS days_apart
FROM tbl
),
b AS (
SELECT
category,
MAX(days_apart) AS max_days_apart
FROM a
GROUP BY 1
)
SELECT
category
FROM b
WHERE max_days_apart IS NULL OR max_days_apart = 1
select distinct category
from dates
where category not in (
select distinct category
from (
select category, [date],
row_number() over (partition by category order by [date]) as days_cnt,
min([date]) over (partition by category) as min_date
from dates
group by category, [date]
) as c
where c.[date]<>dateadd(d, c.days_cnt-1, c.min_date))
order by category
Categories where the sequence of dates is the same as the sequence of ids.
with cte as (
select [category],
row_number() over (partition by [category] order by [date], [id])
- row_number() over (partition by [category] order by [id]) drn
)
select [category]
from cte
group by [category]
having sum(abs(drn)) = 0;

How to find the next highest value in a new column? (SQL)

I have a table with multiple work orders for the same product with each row showing a different hour meter for how long the product has been run. I would like to create a new column that shows the next highest hour meter reading for the product next to the highest one but have not had much luck. I've been trying to rank the entries by hour meter but have not had any luck past that
("ROW_NUMBER() OVER(PARTITION BY Product ORDER BY Current_Counter_Reading DESC") 'Ranking'
Does anyone have any advice on how to approach this? Below is an example of what I am trying to do:
Product | Work Order | Hour Meter
--------+------------+------------
Car1 1 100
Car1 2 200
Product | Higher Hour Meter | Lower Hour Meter
--------+-------------------+-----------------
Car1 200 100
Thanks!
See also LEAD or LAG window functions:
WITH cte1 AS (
SELECT t.*
, LEAD(HourMeter) OVER (PARTITION BY Product ORDER BY HourMeter DESC) AS NextLower
, ROW_NUMBER() OVER (PARTITION BY Product ORDER BY HourMeter DESC) AS rn
FROM work_orders AS t
)
SELECT Product, HourMeter, NextLower
FROM cte1
WHERE rn = 1
ORDER BY Product
;
Given the following data:
+-----------+---------+-----------+
| WorkOrder | Product | HourMeter |
+-----------+---------+-----------+
| 2 | Car1 | 200 |
| 1 | Car1 | 100 |
| 3 | Car1 | 50 |
| 5 | Car2 | 66 |
| 4 | Car2 | 55 |
| 6 | Car2 | 45 |
+-----------+---------+-----------+
The result:
+---------+-----------+-----------+
| Product | HourMeter | NextLower |
+---------+-----------+-----------+
| Car1 | 200 | 100 |
| Car2 | 66 | 55 |
+---------+-----------+-----------+
Working Test Case
WITH CTE AS (
SELECT 'Car1' as Product,1 as WorkOrder,100 as HourMeter
union all
select 'Car1',2,200
union all
select 'Car1',3,20
)
SELECT
c1.Product,
MAX(c1.HourMeter) as HigherHourMeter,
MAX(c2.HourMeter) LowerHourMeter
FROM CTE c1
INNER JOIN CTE c2 ON c2.Product=c1.Product and c2.HourMeter<c1.HourMeter
GROUP BY c1.Product;
Assuming lower meter is the 2nd highest
Here we use the window function row_number() over() in concert with a conditional aggregation.
Select Product
,Higher_Meter = max( case when RN=1 then [Hour Meter] end)
,Lower_Meter = max( case when RN=2 then [Hour Meter] end)
From (
Select *
,RN = row_number() over (partition by Product order by [Hour Meter] desc)
From YourTable
) src
Group By Product

PostgreSQL: Filter select query by comparing against other rows

Suppose I have a table of Events that lists a userId and the time the Event occurred:
+----+--------+----------------------------+
| id | userId | time |
+----+--------+----------------------------+
| 1 | 46 | 2020-07-22 11:22:55.307+00 |
| 2 | 190 | 2020-07-13 20:57:07.138+00 |
| 3 | 17 | 2020-07-11 11:33:21.919+00 |
| 4 | 46 | 2020-07-22 10:17:11.104+00 |
| 5 | 97 | 2020-07-13 20:57:07.138+00 |
| 6 | 17 | 2020-07-04 11:33:21.919+00 |
| 6 | 17 | 2020-07-11 09:23:21.919+00 |
+----+--------+----------------------------+
I want to get the list of events that had a previous event on the same day, by the same user. The result for the above table would be:
+----+--------+----------------------------+
| id | userId | time |
+----+--------+----------------------------+
| 1 | 46 | 2020-07-22 11:22:55.307+00 |
| 3 | 17 | 2020-07-11 11:33:21.919+00 |
+----+--------+----------------------------+
How can I perform a select query that filters results by evaluating them against other rows in the table?
This can be done using an EXISTS condition:
select t1.*
from the_table t1
where exists (select *
from the_table t2
where t2.userid = t1.userid -- for the same user
and t2.time::date = t1.time::date -- on the same
and t2.time < t1.time); -- but previously on that day
You can use lag():
select t.*
from (select t.*,
lag(time) over (partition by userid, time::date order by time) as prev_time
from t
) t
where prev_time is not null;
Here is a db<>fiddle.
Or row_number():
select t.*
from (select t.*,
row_number() over (partition by userid, time::date order by time) as seqnum
from t
) t
where seqnum >= 2;
You can use LAG() to find the previous row for a user. Then a simple comparison will tell if it occured in the same day or not.
For example:
select *
from (
select
*,
lag(time) over(partition by userId order by time) as prev_time
from t
) x
where date::date = prev_time::date
You can use ROW_NUMBER() analytic function :
SELECT id , userId , time
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY UserId, date_trunc('day',time) ORDER BY time DESC) AS rn,
t.*
FROM Events
) q
WHERE rn > 1
in order to bring the latest event for UserId who takes place in more than one event.

Getting Top 40% users basis sales

I have a table which has columns date, user_id, sales_amount. The table sample is as below
+------------+---------+--------------+
| date | user_id | sales_amount |
+------------+---------+--------------+
| 2020-01-01 | 1 | 27 |
| 2020-01-01 | 2 | 32 |
| 2020-01-01 | 3 | 17 |
| 2020-01-03 | 1 | 19 |
| 2020-01-03 | 2 | 18 |
| 2020-01-03 | 3 | 40 |
| ………….. | ………….. | ………….. |
| ………….. | ………….. | ………….. |
| ………….. | ………….. | ………….. |
+------------+---------+--------------+
I want to get top 40% users basis sales. I would have used something like SELECT TOP 40 PERCENT users after aggregation. But I am not using MS-SQL server, so that method is not applicable.
Something that I know is as below
First get number of rows from below query
SELECT MAX(Rn) AS number_of_rows
FROM(
SELECT *,row_number() OVER(ORDER BY Amt DESC) as Rn
FROM
(SELECT user_id, SUM(AMOUNT) AS Amt
FROM table
GROUP BY user_id) A ) B
Second calculate the 40 % of the above value and get the users
SELECT *
FROM
(SELECT *,row_number() OVER(ORDER BY Amt DESC) as Rn
FROM
(SELECT user_id, SUM(AMOUNT) AS Amt
FROM table
GROUP BY user_id) A ) B
WHERE Rn <= 0.4* (number_of_rows)
Above two steps can be combined as below
SELECT *
FROM
(SELECT *,row_number() OVER(ORDER BY Amt DESC) as Rn
FROM
(SELECT user_id, SUM(AMOUNT) AS Amt
FROM table
GROUP BY user_id) A ) B
WHERE Rn <= 0.4 * (SELECT MAX(Rn) AS number_of_rows
FROM(
SELECT *,row_number() OVER(ORDER BY Amt DESC) as Rn
FROM
(SELECT user_id, SUM(AMOUNT) AS Amt
FROM table
GROUP BY user_id) A ) B)
Is there any optimum way/builtin function to obtain this in hive ?
Yes! You can do both in one step:
SELECT u.*
FROM (SELECT user_id, SUM(AMOUNT) as amt,
ROW_NUMBER() OVER (ORDER BY SUM(AMOUNT) DESC) as seqnum,
COUNT(*) OVER () as cnt
FROM t
GROUP BY user_id
) u
WHERE seqnum <= cnt * 0.4;

Getting the whole row from grouped result

Here's a sample database table :
| ID | ProductID | DateChanged | Price
| 1 | 12 | 2011-11-11 | 93
| 2 | 2 | 2011-11-12 | 12
| 3 | 3 | 2011-11-13 | 25
| 4 | 4 | 2011-11-14 | 17
| 5 | 12 | 2011-11-15 | 97
Basically, what I want to happen is get the latest price of grouped by ProductID.
The result should be like this :
| ID | ProductID | Price
| 2 | 2 | 12
| 3 | 3 | 25
| 4 | 4 | 17
| 5 | 12 | 97
If you notice, the first row is not there because there is a new price for ProductID 12 which is the row of ID 5.
Basically, it should be something like get ID,ProductID and Price grouped by productID where DateChanged is the latest.
SELECT ID, ProductId, Price
FROM
(
SELECT ID, ProductId, Price
, ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY DateChanged DESC) AS rowNumber
FROM yourTable
) AS t
WHERE t.rowNumber = 1
SELECT ID, ProductID,DateChanged, Price
FROM myTable
WHERE ID IN
(
SELECT MAX(ID)
FROM myTable
GROUP BY ProductID
)
select a.id, a.productid, a.price
from mytable a,
(select productid, max(datechanged) as maxdatechanged
from mytable
group by productid) as b
where a.productid = b.productid and a.datechanged = b.maxdatechanged
SELECT ID, ProductId, Price
from myTable A
where DateChanged >= all
(select DateChanged
from myTable B
where B.ID = A.ID);