How to find the next highest value in a new column? (SQL)

How to find the next highest value in a new column? (SQL) - sql

I have a table with multiple work orders for the same product with each row showing a different hour meter for how long the product has been run. I would like to create a new column that shows the next highest hour meter reading for the product next to the highest one but have not had much luck. I've been trying to rank the entries by hour meter but have not had any luck past that
("ROW_NUMBER() OVER(PARTITION BY Product ORDER BY Current_Counter_Reading DESC") 'Ranking'
Does anyone have any advice on how to approach this? Below is an example of what I am trying to do:
Product | Work Order | Hour Meter
--------+------------+------------
Car1 1 100
Car1 2 200
Product | Higher Hour Meter | Lower Hour Meter
--------+-------------------+-----------------
Car1 200 100
Thanks!

See also LEAD or LAG window functions:
WITH cte1 AS (
SELECT t.*
, LEAD(HourMeter) OVER (PARTITION BY Product ORDER BY HourMeter DESC) AS NextLower
, ROW_NUMBER() OVER (PARTITION BY Product ORDER BY HourMeter DESC) AS rn
FROM work_orders AS t
)
SELECT Product, HourMeter, NextLower
FROM cte1
WHERE rn = 1
ORDER BY Product
;
Given the following data:
+-----------+---------+-----------+
| WorkOrder | Product | HourMeter |
+-----------+---------+-----------+
| 2 | Car1 | 200 |
| 1 | Car1 | 100 |
| 3 | Car1 | 50 |
| 5 | Car2 | 66 |
| 4 | Car2 | 55 |
| 6 | Car2 | 45 |
+-----------+---------+-----------+
The result:
+---------+-----------+-----------+
| Product | HourMeter | NextLower |
+---------+-----------+-----------+
| Car1 | 200 | 100 |
| Car2 | 66 | 55 |
+---------+-----------+-----------+
Working Test Case

WITH CTE AS (
SELECT 'Car1' as Product,1 as WorkOrder,100 as HourMeter
union all
select 'Car1',2,200
union all
select 'Car1',3,20
)
SELECT
c1.Product,
MAX(c1.HourMeter) as HigherHourMeter,
MAX(c2.HourMeter) LowerHourMeter
FROM CTE c1
INNER JOIN CTE c2 ON c2.Product=c1.Product and c2.HourMeter<c1.HourMeter
GROUP BY c1.Product;

Assuming lower meter is the 2nd highest
Here we use the window function row_number() over() in concert with a conditional aggregation.
Select Product
,Higher_Meter = max( case when RN=1 then [Hour Meter] end)
,Lower_Meter = max( case when RN=2 then [Hour Meter] end)
From (
Select *
,RN = row_number() over (partition by Product order by [Hour Meter] desc)
From YourTable
) src
Group By Product

Related

SQL - get rid of the nested aggregate select

There is a table Payment, which for example tracks the amount of money user puts into account, simplified as
===================================
Id | UserId | Amount | PayDate |
===================================
1 | 42 | 11 | 01.02.99 |
2 | 42 | 31 | 05.06.99 |
3 | 42 | 21 | 04.11.99 |
4 | 24 | 12 | 05.11.99 |
What is need is to receive a table with balance before payment moment, eg:
===============================================
Id | UserId | Amount | PayDate | Balance |
===============================================
1 | 42 | 11 | 01.02.99 | 0 |
2 | 42 | 31 | 05.06.99 | 11 |
3 | 42 | 21 | 04.11.99 | 42 |
4 | 24 | 12 | 05.11.99 | 0 |
Currently the select statement looks something like
SELECT
Id,
UserId,
Amount,
PaidDate,
(SELECT sum(amount) FROM Payments nestedp
WHERE nestedp.UserId = outerp.UserId AND
nestedp.PayDate < outerp.PayDate) as Balance
FROM
Payments outerp
How can I rewrite this select to get rid of the nested aggregate selection? The database in question is SQL Server 2019.

You need to use cte with some custom logic to handle this type of problem.
WITH PaymentCte
AS (
SELECT ROW_NUMBER() OVER (
PARTITION BY UserId ORDER BY Id
) AS RowId
,Id
,UserId
,PayDate
,Amount
,SUM(Amount) OVER (
PARTITION BY UserId ORDER BY Id
) AS Balance
FROM Payment
)
SELECT X.Id
,X.UserId
,X.Amount
,X.PayDate
,Y.Balance
FROM PaymentCte x
INNER JOIN PaymentCte y ON x.userId = y.UserId
AND X.RowId = Y.RowId + 1
UNION
SELECT X.Id
,X.UserId
,X.Amount
,X.PayDate
,0 AS Balance
FROM PaymentCte x
WHERE X.RowId = 1
This provides the desired output

You can try the following using lag with a cumulative sum
with b as (
select * , isnull(lag(amount) over (partition by userid order by id),0) Amt
from t
)
select Id, UserId, Amount, PayDate,
Sum(Amt) over (partition by userid order by id) Balance
from b
order by Id

Thanks to other participants' leads I came up with a query that (seems) to work:
SELECT
Id,
UserId,
Amount,
PayDate,
COALESCE(sum(Amount) over (partition by UserId
order by PayDate
rows between unbounded preceding and 1 preceding), 0) as Balance
FROM
Payments
ORDER BY
UserId, PayDate
Lots of related examples can be found here

SQL how to calculate median not based on rows

I have a sample of cars in my table and I would like to calculate the median price for my sample with SQL. What is the best way to do it?
+-----+-------+----------+
| Car | Price | Quantity |
+-----+-------+----------+
| A | 100 | 2 |
| B | 150 | 4 |
| C | 200 | 8 |
+-----+-------+----------+
I know that I can use percentile_cont (or percentile_disc) if my table is like this:
+-----+-------+
| Car | Price |
+-----+-------+
| A | 100 |
| A | 100 |
| B | 150 |
| B | 150 |
| B | 150 |
| B | 150 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
+-----+-------+
But in the real world, my first table has about 100 million rows and the second table should have about 3 billiard rows (and moreover I don't know how to transform my first table into the second).

Here is a way to do this in sql server
In the first step i do is calculate the indexes corresponding to the lower and upper bounds for the median (if we have odd number of elements then the lower and upper bounds are same else its based on the x/2 and x/2+1th value)
Then i get the cumulative sum of the quantity and the use that to choose the elements corresponding to the lower and upper bounds as follows
with median_dt
as (
select case when sum(quantity)%2=0 then
sum(quantity)/2
else
sum(quantity)/2 + 1
end as lower_limit
,case when sum(quantity)%2=0 then
(sum(quantity)/2) + 1
else
sum(quantity)/2 + 1
end as upper_limit
from t
)
,data
as (
select *,sum(quantity) over(order by price asc) as cum_sum
from t
)
,rnk_val
as(select *
from (
select price,row_number() over(order by d.cum_sum asc) as rnk
from data d
join median_dt b
on b.lower_limit<=d.cum_sum
)x
where x.rnk=1
union all
select *
from (
select price,row_number() over(order by d.cum_sum asc) as rnk
from data d
join median_dt b
on b.upper_limit<=d.cum_sum
)x
where x.rnk=1
)
select avg(price) as median
from rnk_val
+--------+
| median |
+--------+
| 200 |
+--------+
db fiddle link
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c5cfa645a22aa9c135032eb28f1749f6

This looks right on few results, but try on a larger set to double-check.
First create a table which has the total for each car (or use CTE or sub-query), your choice. I'm just creating a separate table here.
create table table2 as
(
select car,
quantity,
price,
price * quantity as total
from table1
)
Then run this query, which looks for the price group that falls in the middle.
select price
from (
select car, price,
sum(total) over (order by car) as rollsum,
sum(total) over () as total
from table2
)a
where rollsum >= total/2
Correctly returns a value of $200.

SQL: Get top records per category, per day, per country?

A little trickier than just getting the top # per category. I want the top 2 videos per artist, per day per country.
My code, which didn't give me the right results is:
Select *
From
(
Select t2.*, dense_rank() over(partition by artist order by views desc)
From
(select country, day, artist, song, sum(view) as views
From t1
Group by 1,2,3,4
) t2
)
Where rn >=5
Sample data results
| Country | Date | artist | video | views | rn |
|---------|------|----------|-------|-------|----|
| US | Jan1 | Beyonce | ab | 100 | 1 |
| US | Jan1 | Beyonce | ac | 99 | 2 |
| US | Jan2 | C. Brown | ad | 89 | 1 |
| US | Jan2 | C. Brown | ai | 103 | 2 |
| AU | Jan1 | Beyonce | bf | 99 | 1 |
| AU | Jan1 | Beyonce | bb | 89 | 2 |
I want all artists per day, per country but only 10 videos per artist..
I am kind confused as to how to achieve this..
I generally struggle when it comes to window functions, so I would appreciate any help.
I am using Amazon Redshift
Thanks

You need to partition by all of the columns you mentioned since you are ranking views within each combination of these elements.
Because you've renamed the aggregate column as "views", you need to call it by that name.
Finally, if you want the top 2 videos/songs, use this condition: where rn <= 2
Select *
From
(
Select t2.*, dense_rank() over(partition by country, day, artist order by views desc)
From
(select country, day, artist, song, sum(views) as views
From t1
Group by 1,2,3,4
) t2
)
Where rn <= 2

This will rank per artist per day the views and show the two two for each artist per day
Select *
From
(
Select t2.*, ROW_NUMBER() over(partition by artist, day, country order by views desc) as rn
From t1 t2
)
Where rn <= 2

How to get Top 10 for a grouped column?

My data is a list of customers and products, and the cost for each product
Member Product Cost
Bob A123 $25
Bob A123 $25
Bob A123 $75
Joe A789 $50
Joe A789 $50
Bob C321 $50
Joe A123 $50
etc, etc, etc
My current query grabs each customer, product and cost, and also the total cost for that customer. It gives results like this:
Member Product Cost Total Cost
Bob A123 $125 $275
Bob A1433 $100 $275
Bob C321 $50 $275
Joe A123 $150 $250
Joe A789 $100 $250
How can I get the top 10 by Total Cost, not just the top 10 records overall? My query is:
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'Total Cost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
If I do a SELECT TOP 10 it only gives me the first 10 rows. The actual Top 10 would end up being more like 40 or 50 rows.
Thanks!

Try this one.
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
WHERE stbl.rn <= 10
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
Online Demo: http://sqlfiddle.com/#!18/87857/1/0
Table structure & Sample Data
CREATE TABLE mytable
(
member NVARCHAR(50),
product NVARCHAR(10),
cost INT
)
INSERT INTO mytable
VALUES ('Bob','A123','25'),
('Bob','A123','25'),
('Bob','A123','75'),
('Joe','A789','50'),
('Joe','A789','50'),
('Bob','C321','50'),
('Joe','A123','50'),
('Rock','A123','50'),
('Anord','A100','50'),
('Jack','A123','50'),
('Anord','A123','50'),
('Joe','A123','50'),
('Karma','A123','50'),
('Seetha','A123','50'),
('Aruna','A123','50'),
('Jake','A123','50'),
('Paul','A123','50'),
('Logan','A123','50'),
('Joe','A123','50');
Subquery - Total cost per customer
SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member
Subquery: Output
+---------+------------+----+
| member | totalcost | rn |
+---------+------------+----+
| Joe | 250 | 1 |
| Bob | 175 | 2 |
| Anord | 100 | 3 |
| Aruna | 50 | 4 |
| Jack | 50 | 5 |
| Jake | 50 | 6 |
| Karma | 50 | 7 |
| Logan | 50 | 8 |
| Paul | 50 | 9 |
| Rock | 50 | 10 |
| Seetha | 50 | 11 |
+---------+------------+----+
Record Count: 11
Main Query
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost,
Max(stbl.rn) AS rn
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
Main Query: Output
+---------+----------+-------+------------+----+
| member | product | cost | totalcost | rn |
+---------+----------+-------+------------+----+
| Joe | A123 | 150 | 250 | 1 |
| Joe | A789 | 100 | 250 | 1 |
| Bob | C321 | 50 | 175 | 2 |
| Bob | A123 | 125 | 175 | 2 |
| Anord | A100 | 50 | 100 | 3 |
| Anord | A123 | 50 | 100 | 3 |
| Aruna | A123 | 50 | 50 | 4 |
| Jack | A123 | 50 | 50 | 5 |
| Jake | A123 | 50 | 50 | 6 |
| Karma | A123 | 50 | 50 | 7 |
| Logan | A123 | 50 | 50 | 8 |
| Paul | A123 | 50 | 50 | 9 |
| Rock | A123 | 50 | 50 | 10 |
| Seetha | A123 | 50 | 50 | 11 |
+---------+----------+-------+------------+----+
Record Count: 14

You can use rank() and partition by but you may also need to use a window function:
with temp as (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member)
as 'Total Cost'
FROM MyTable a
GROUP BY a.Member,a.Product
)
select a.*, rank() over (partition by member order by [Total Cost]
desc) as rank
from temp a
order by rank desc limit 10

You can use dense_rank() with apply :
select mt.*
from (select mt.*, sum(mt.Cost) over (partition by Product, Member) as Cost,
dense_rank() over (order by TotalCost desc) as seq
from MyTable mt cross apply
(select sum(mt1.Cost) as TotalCost
from MyTable mt1
whete mt1.member = mt.member
) mt1
) mt
where mt.seq <= 10;

Use a subquery to get the TOP 10 total costs and join to your query:
SELECT
t.Member, t.Product, t.Cost, g.[Total Cost]
FROM (
SELECT Member, Product, SUM(Cost) as Cost
FROM MyTable
GROUP BY Member, Product
) t INNER JOIN (
SELECT TOP (10) Member, SUM(Cost) as [Total Cost]
FROM MyTable
GROUP BY Member
ORDER BY [Total Cost] DESC
) g on g.Member = t.Member
ORDER BY g.[Total Cost] DESC, t.Member, t.Cost DESC
Depending on your requirement you may use:
SELECT TOP (10) WITH TIES...

You don't have to select from the same table twice. Use SUM OVER to get the total per member.
Use DENSE_RANK to get the totals ranked (highest total = 1, second highest total = 2, ...).
Use TOP(10) WITH TIES to get all rows having the top ten totals.
The query:
select top(10) with ties *
from
(
select
member,
product,
sum(cost),
sum(sum(cost)) over (partition by member) as total_cost
from mytable
group by member, product
) results
order by dense_rank() over (order by total_cost) desc;

If you want exactly 10 customers even when there are ties, then a slight variation on Thorsten's method will work:
select top(10) with ties t.*
from (select member, product, sum(cost) as cost,
sum(sum(cost)) over (partition by member) as total_cost
from t
group by member, product
) t
order by dense_rank() over (order by total_cost) desc, member;
The addition of member as a second key may seem like a minor addition. However, it ensures that the dense_rank() is unique for each member (of course ordered by total_cost). This, in turn, guarantees that you get exactly 10 customers.

You can use dense_rank() like below. Worked in SQL Server 2016. Change the value of limit variable to filter number of rows returned.
declare #limit int = 10;
SELECT *
FROM
(
select x.*,rn = dense_rank() over (order by x.TotalCost desc)
from (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'TotalCost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
) x
) y
where rn <= #limit
order by rn

Rolling up remaining rows into one called "Other"

I have written a query which selects lets say 10 rows for this example.
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Store9 | 1 |
| Store2 | 1 |
| Store3 | 1 |
| Store4 | 1 |
| Store5 | 0 |
| Store6 | 0 |
| Store10 | 0 |
+-----------+------------+
How would I go about displaying the TOP 3 rows BUT Having the remaining rows roll up into a row called "other", and it adds all of their Complaints together?
So like this for example:
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Other | 4 |
+-----------+------------+
So what has happened above, is it displays the top3 then adds the complaints of the remaining rows into a row called other
I have exhausted all my resources and cannot find a solution. Please let me know if this makes sense.
I have created a SQLfiddle of the above tables that you can edit if it is possible :)
Here's hoping this is possible :)
Thanks,
Mike

Something like this may work
select *, row_number() over (order by complaints desc) as sno
into #temp
from
(
SELECT
a.StoreName
,COUNT(b.StoreID) AS [Complaints]
FROM Stores a
LEFT JOIN
(
SELECT
StoreName
,Complaint
,StoreID
FROM Complaints
WHERE Complaint = 'yes') b on b.StoreID = a.StoreID
GROUP BY a.StoreName
) as t ORDER BY [Complaints] DESC
select storename,complaints from #temp where sno<4
union all
select 'other',sum(complaints) as complaints from #temp where sno>=4

I do this with double aggregation and row_number():
select (case when seqnum <= 3 then storename else 'Other' end) as StoreName,
sum(numcomplaints) as numcomplaints
from (select c.storename, count(*) as numcomplaints,
row_number() over (order by count(*) desc) as seqnum
from complaints c
where c.complaint = 'Yes'
group by c.storename
) s
group by (case when seqnum <= 3 then storename else 'Other' end) ;
From what I can see, you don't really need any additional information from stores, so this version just leaves that table out.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to find the next highest value in a new column? (SQL) - sql

Related

SQL - get rid of the nested aggregate select

SQL how to calculate median not based on rows

SQL: Get top records per category, per day, per country?

How to get Top 10 for a grouped column?

Rolling up remaining rows into one called "Other"

Categories

Resources