Finding last and second last date and corresponding values - sql

Consider the following schema (fiddle):
CREATE TABLE meters
(
id int,
description varchar(10)
);
CREATE TABLE readings
(
id int,
meterid int,
date date,
value int
);
INSERT INTO readings (id, meterid, date, value)
VALUES
(1, 4, '20081231', 500),
(2, 4, '20090203', 550),
(3, 1, '20090303', 300),
(4, 2, '20090303', 244),
(5, 4, '20090303', 600),
(6, 1, '20090403', 399),
(7, 2, '20090403', 288),
(8, 3, '20090403', 555);
INSERT INTO meters (id, description)
VALUES
(1, 'this'),
(2, 'is'),
(3, 'not'),
(4, 'really'),
(5, 'relevant');
For each meter.id I need to find the latest reading date, value, and the value difference vs previous reading.
For the sample data my output would look like this (plus some other columns from meters):
meterid
latest
value
delta value
1
20090403
399
99
2
20090403
288
44
3
20090403
555
null
4
20090303
600
50
5
null
null
null
I figured I could first create a query with the relevant info and then join with that, but I struggle with achieving that
I've tried to adapt this method but for each id I get 2 rows instead of one
SELECT
p.meterid,
[1] AS [LastDate],
[2] AS [BeforeLastDate]
FROM
(SELECT TOP (2) WITH ties
*,
RowN = ROW_NUMBER() OVER (PARTITION BY r.meterid ORDER BY date DESC)
FROM
readings AS r
ORDER BY
(ROW_NUMBER() OVER (PARTITION BY r.meterid ORDER BY date DESC) - 1) / 2 + 1) a
PIVOT
(MAX(date) FOR RowN IN ([1], [2])) p
ORDER BY
p.meterId
I'm looking for ideas how to solve the double row issue, or if that's a dead end how to get my desired output

If I understand correctly, you can just use window functions:
select m.id, r.date, r.value, r.value - prev_value
from meters m left join
(select r.*,
lag(value) over (partition by meterid order by date) as prev_value,
row_number() over (partition by meterid order by date desc) as seqnum
from readings r
) r
on r.meterid = m.id and seqnum = 1
order by m.id;
No aggregation is necessary. Here is a db<>fiddle.

Use LEAD to get the next value going backwards, and ROW_NUMBER to get the first row.
SELECT *
FROM
(SELECT *,
delta_value = value - LEAD(value) over (PARTITION BY r.meterid ORDER BY date DESC),
RowN = Row_Number() over(PARTITION BY r.meterid ORDER BY date DESC)
FROM readings AS r
) a
WHERE RowN = 1
ORDER BY p.meterId

Related

Display duplicate row indicator and get only one row when duplicate

I built the schema at http://sqlfiddle.com/#!18/7e9e3
CREATE TABLE BoatOwners
(
BoatID INT,
OwnerDOB DATETIME,
Name VARCHAR(200)
);
INSERT INTO BoatOwners (BoatID, OwnerDOB,Name)
VALUES (1, '2021-04-06', 'Bob1'),
(1, '2020-04-06', 'Bob2'),
(1, '2019-04-06', 'Bob3'),
(2, '2012-04-06', 'Tom'),
(3, '2009-04-06', 'David'),
(4, '2006-04-06', 'Dale1'),
(4, '2009-04-06', 'Dale2'),
(4, '2013-04-06', 'Dale3');
I would like to write a query that would produce the following result characteristics :
Returns only one owner per boat
When multiple owners on a single boat, return the youngest owner.
Display a column to indicate if a boat has multiple owners.
So the following data set when apply that query would produce
I tried
ROW_NUMBER() OVER (PARTITION BY ....
but haven't had much luck so far.
with data as (
select BoatID, OwnerDOB, Name,
row_number() over (partition by BoatID order by OwnerDOB desc) as rn,
count() over (partition by BoatID) as cnt
from BoatOwners
)
select BoatID, OwnerDOB, Name,
case when cnt > 1 then 'Yes' else 'No' end as MultipleOwner
from data
where rn = 1
This is just a case of numbering the rows for each BoatId group and also counting the rows in each group, then filtering accordingly:
select BoatId, OwnerDob, Name, Iif(qty=1,'No','Yes') MultipleOwner
from (
select *, Row_Number() over(partition by boatid order by OwnerDOB desc)rn, Count(*) over(partition by boatid) qty
from BoatOwners
)b where rn=1

how to simplify this multiple-CTE solution to this sql question?

DDL:
create table transactions
(
product_id int,
store_id int,
quantity int,
price numeric
);
DML:
insert into transactions values
(1, 1, 10, 2),
(2, 1, 5, 2),
(1, 2, 5, 4),
(2, 2, 2, 4),
(2, 3, 1, 20),
(1, 3, 1, 8),
(2, 4, 2, 10),
(1, 5, 2, 5),
(2, 5, 1, 3),
(2, 6, 4, 8);
I'm trying to find the top 3 products of the top 3 stores, both are based on sale amount. The solution I have is to use cte as below:
with cte as
(
select store_id, rank_store
from
(select
*,
dense_rank() over(order by sale desc) as rank_store
from
(select
store_id, sum(quantity * price) as sale
from transactions
group by 1) t) t2
where
rank_store <= 3
),
cte2 as
(
select
a.store_id, a.product_id,
sum(a.quantity * a.price) as sale_store_product
from
transactions as a
join
cte as b on a.store_id = b.store_id
group by
1, 2
order by
1, 2
),
cte3 as
(
select
*,
dense_rank() over (partition by store_id order by sale_store_product desc) as rank_product
from
cte2
)
select *
from cte3
where rank_product <= 3;
Here is the expected result:
Basically, the first cte is to get the top 3 stores based on sale amount, I use dense_rank() window function to handle tie cases. then the 2nd cte is to get the top 3 stores' products and their total sale amount. The last cte is to use dense_rank() window function again to rank the products in each stores based on their sale amount. then my last query is to get the top 3 products in each store based on the sale amount.
I'm wondering if this can be improved a bit since I feel three CTEs is kind of too complicated. Appreciate for sharing any solutions and ideas. Thanks.
I'm trying to find the top 3 products of the top 3 stores
How can this be done without aggregating the data twice -- once for store/products and once for stores? This is possible using window functions along with aggregation:
select sp.*
from (select sp.*,
dense_rank() over (order by store_sales, store_id) as store_seqnum
from (select t.store_id, t.product_id,
sum(quantity * price) as sp_sales,
sum(sum(quantity * price)) over (partition by store_id) as store_sales,
row_number() over (partition by t.store_id order by sum(quantity * price)) as sp_seqnum
from transactions t
group by t.store_id, t.product_id
) sp
) sp
where store_seqnum <= 3 and sp_seqnum <= 3;
The inner subquery calculates the product/store information. The next level ranks the stores -- notes that ties are broken using store_id.
Here is a db<>fiddle.

How to query the V shaped data?

TxnID RunningAmount MemberID
==================================
1 80000 20
2 90000 20
3 70000 20 //<==== Falls but previously never below 100k, hence ignore
4 90000 20
5 110000 20
6 60000 20 //<==== Falls below 100k, hence we want ID 8
7 80000 20
8 120000 20
9 85000 28
...
....
How to construct the query such that it group by members, get the first transactionID that formed the "V" shape. Even a pseudocode is fine, I can't share my attempt because I am totally clueless about how to do it.
UPDATES:
Sorry for the lack of explanations on the conditions. The base amount we looking is 100k. ID is random, definitely we need to have rownumber
We ignore all transactions before ID = 5 because their runningAmount is never exceeded 100k.
Now when ID=5, exceeded 100k, we check if transactions after ID=5 if there is a down trend in runningAmount that falls below 100k.
Immediately we see ID=6 falls below 100k, so we want to find the first transaction that exceed 100k again(if there is).
From the data sample above, the expected result is only one record, which is ID=8.
For every member, there will only be either one or zero record found based on the conditions I've mentioned
Try this query:
declare #tbl table(TxnID int, RunningAmount int, MemberID int);
insert into #tbl values
(1, 80000, 20),
(2, 90000, 20),
(3, 70000, 20),
(4, 90000, 20),
(5, 110000, 20),
(6, 60000, 20),
(7, 120000, 20),
(8, 85000, 28);
select TxnID, RunningAmount, MemberID,
LAG(VShape) over (partition by MemberID order by TxnID) VShape
from (
select TxnID, RunningAmount, MemberID,
case when rn < lagrn and rn < leadrn then 1 else 0 end VShape
from (
select *,
LAG(rn) over (partition by MemberID order by TxnID) lagRn,
LEAD(rn) over (partition by MemberID order by TxnID) leadRn
from (
select TxnID,
RunningAmount,
MemberID,
ROW_NUMBER() over (partition by MemberID order by RunningAmount) rn
from #tbl
) a
) a
) a
Last column VShape indicates if value in RunningAmount completes V shape (although you could be more clearer on what it means instead of everybody figuring it out). Now you can filter values based on RunningAmount (wheter they fall below or above 100k).
Here is version for earlier versions of SQL Server that don't have LAG and LEAD functions:
;with cte as (
select *,
ROW_NUMBER() over (partition by MemberID order by RunningAmount) rn
from #tbl
), cte2 as (
select c1.TxnID, c1.RunningAmount, c1.MemberID, c1.rn, c2.rn [lagRn] , c3.rn [leadRn]
from cte c1
left join cte c2 on c1.TxnID = c2.TxnID + 1 and c1.MemberID = c2.MemberID
left join cte c3 on c1.TxnID = c3.TxnID - 1 and c1.MemberID = c3.MemberID
), cte3 as (
select TxnID, RunningAmount, MemberID,
case when rn < lagrn and rn < leadrn then 1 else 0 end VShape
from cte2
), FinalResult as (
select c1.TxnID, c1.RunningAmount, c1.MemberID, c2.VShape
from cte3 c1
left join cte3 c2 on c1.TxnID = c2.TxnID + 1 and c1.MemberID = c2.MemberID
)
select fr.*, fr2.RunningAmount RunningAmountLagBy2 from FinalResult fr
left join FinalResult fr2 on fr.TxnID = fr2.TxnID + 2
where fr.RunningAmount > 100000 and fr2.RunningAmount > 100000 and fr.VShape = 1
UPDATE
After question update, here's solution:
select TxnID from (
select *, ROW_NUMBER() over (partition by VShape order by TxnID) CompletesVShape from (
select TxnID,
RunningAmount,
MemberID,
sum(case when RunningAmount >= 100000 then 1 else 0 end) over (partition by MemberID order by TxnID rows between unbounded preceding and current row) VShape
from #tbl
) a
) a where VShape > 1 and CompletesVShape = 1
Based on your question update and assuming for V shape necessary condition is to get above and below running amounts > 100000 and middle be smaller than above and below running amounts, below is a query showing how to do it in 2008 sql server.
also see live demo
; with firstlargeamount as
(
select MemberId, minTrxid=min(TxnID)
from t
where RunningAmount>100000
group by MemberId
)
,tbl as
(
select *,
rn=row_number() over( partition by MemberId order by TxnId)
from
t
)
select t3.*,f.*
from tbl t1
join tbl t2
on
t1.memberId=t2.memberid and t1.rn=t2.rn +1
and t1.RunningAmount<t2.RunningAmount
join tbl t3
on
t1.memberId=t3.memberid and t1.rn=t3.rn -1
and t1.RunningAmount<t3.RunningAmount
join firstlargeamount f
on
f.Memberid=t2.memberid and f.minTrxid>=t1.TxnID
Explanation:
First step is to generate a row number sequence at member level as cte tbl and min limiting transaction in cte firstlargeamount
Second step is double self join to find above and below records per row which satisfy the V shape criteria as well join with firstlargeamount to find rows which satisfy the 100000 criteria
Note that the above and below records are simply found using +1/-1 from the current records's row number computed in the step 1

query in Server Management Studio + table in ssrs reports+-using select with correlated sub-query

i have a table, consist fruit groups and sizes
i.e:
send send
**fruit-package / size / start-date/ end-date/ **
--------------------------------------------------
apple s 2.2.16 5.2.16
apple s 7.2.16 **10.2.16**
apple s **20.2.16** 21.2.16
--------------------------------------------------
apple l 1.2.16 **5.2.16**
apple l **25.2.16** 26.2.16
apple l 26.2.16 27.2.16
-------------------------------------------------
orange m 1.1.16 2.1.16
orange m 3.1.16 **4.1.16**
orange m **24.1.16** 25.1.16
---------------------------------------------------
i need , for each specific group of fruit-package and size
(like apple+small), to find the max days,in the group, passed between
one package send-end-date to the followed package, in the group ,send-start-day
and then select that send-end-date and follow start date and calculate
that max diff between these two values, and put them in the result table for that specific group, doing it for each group
so the result table would be
send send
**fruit-package / size / start-date / end-date/ **
--------------------------------------------------
apple s 20.2.16 10.2.16
--------------------------------------------------
apple l 25.2.16 5.2.16
-------------------------------------------------
orange m 24.1.16 4.1.16
---------------------------------------------------
i tried to do this in parts.
first part:
for each group of fruit -
find all combination of:
(fruit-package) + (size) + (current end_date) and the start_date of the follow package
like that:
select P.fruit
,P.size
,P.end_date
,(SELECT top 1 (pa.start_date)
FROM packages as pa
WHERE pa.start_date >= pa.end_date
and p.fruit=pa.fruit and p.size=pa.size
order by pa.start_date desc ) as start
into #temp
from packages p
group by p.fruit
, P.size
,p.end_date
and second step would be, simplly find the row with the largest day-diff in each group
but the first part i wrote won't work- got null value as start date,
or one end_date and not for each group from inside select -
why and
how to correct it?
please help
thanks
This should work for you if you have 2012 or later.
Create Table #Tbl (Name Varchar(8000), Size Char(1), StartDate Date, EndDate Date)
Insert #Tbl Values ('apple', 's', '2.2.16', '2.5.16')
Insert #Tbl Values ('apple', 's', '2.7.16', '2.10.16')
Insert #Tbl Values ('apple', 's', '2.20.16', '2.21.16')
Insert #Tbl Values ('apple', 'l', '2.1.16', '2.5.16')
Insert #Tbl Values ('apple', 'l', '2.25.16', '2.26.16')
Insert #Tbl Values ('apple', 'l', '2.26.16', '2.27.16')
Insert #Tbl Values ('orange', 'm', '1.1.16', '1.2.16')
Insert #Tbl Values ('orange', 'm', '1.3.16', '1.4.16')
Insert #Tbl Values ('orange', 'm', '1.24.16', '1.25.16')
;With cteQry As
(
Select *,
Lead(StartDate) Over (Partition By Name, Size Order By StartDate) NextStartDate,
DateDiff(d, EndDate, Lead(StartDate) Over (Partition By Name, Size Order By StartDate)) Days
From #Tbl
)
Select *
From
(
Select *,
Row_Number() Over (Partition By Name, Size Order By Days Desc) SortOrder
From cteQry
) A
Where SortOrder = 1
EDIT: Without lead function.
;With cteQry2 As
(
Select *,
DateDiff(d, EndDate,
(Select Top 1 StartDate
From #Tbl
Where Name = T1.Name
And Size = T1.Size
And StartDate > T1.StartDate
Order By StartDate)) Days
From #Tbl T1
)
Select *
From
(
Select *,
Row_Number() Over (Partition By Name, Size Order By Days Desc) SortOrder
From cteQry2
) A
Where SortOrder = 1
Order By Name, Size, StartDate

Adding a rank to first row of each group

This is returning what I want but is there a simpler, more elegant, approach?
IF OBJECT_ID('TEMPDB..#test') IS NOT NULL DROP TABLE #test;
CREATE TABLE #test
(
userAcc VARCHAR(100),
game VARCHAR(100),
amount INT
);
INSERT INTO #test
values
('jas', 'x', 10),
('jas', 'y', 100),
('jas', 'z', 20),
('sam', 'j', 10),
('sam', 'q', 5);
--initial table sample
SELECT userAcc,
game,
amount
FROM #test;
WITH
X AS
(
SELECT rn = ROW_NUMBER() OVER (PARTITION BY userAcc ORDER BY game),
userAcc,
game,
amount,
rk = RANK() OVER (PARTITION BY userAcc ORDER BY amount DESC)
FROM #test
),
Y AS
(
SELECT RK,userAcc,
game,
targ = rn
FROM X
WHERE rk = 1
)
SELECT X.userAcc,
X.game,
X.amount,
ISNULL(Y.targ,0)
FROM X
LEFT OUTER JOIN Y
ON
X.userAcc = Y.userAcc AND
X.rn = Y.rk
ORDER BY X.userAcc,X.rn;
It returns this:
Here is the initial table:
What the script is doing is this:
Add a new column to original table
In new column add the rank of the game for each userAcc with the highest amount.
The rank is the alphabetical position of the game with the highest amount amongst the user's games. So for jas his highest game is y and that is positioned 2nd amongst his games.
The rank found in step 3 should only go against the first alphabetical game of the respective user.
You don't need a join for this. You can use accumulation.
If I understand correctly:
select userAcc, game, amount,
isnull( (case when rn = 1
then max(case when rk = 1 then rn end) over (partition by userAcc)
end),0) as newcol
from (select t.*,
ROW_NUMBER() OVER (PARTITION BY userAcc ORDER BY game) as rn,
RANK() OVER (PARTITION BY userAcc ORDER BY amount DESC) as rk
from #test t
) t
order by userAcc;