How to get first_value from previous window partition - sql

I want to display the the BalanceEndOfYesterday Value from the day before in a query as shown below.
| Date | Amout | BalanceEndOfDay | BalanceEndOfYesterday |
|------------|-------|-----------------|-----------------------|
| 2020-04-30 | 10 | 130 | 80 |
| 2020-04-30 | 20 | 130 | 80 |
| 2020-04-30 | 30 | 130 | 80 |
| 2020-04-30 | -10 | 130 | 80 |
| 2020-04-29 | 50 | 80 | 0 |
| 2020-04-29 | -10 | 80 | 0 |
| 2020-04-29 | 40 | 80 | 0 |
My query is
SELECT
BalanceEndOfDay ,
first_value(BalanceEndOfDay) OVER (ORDER BY Date DESC) -- here is some sort of window needed
FROM AccountTransactions

You can use apply :
SELECT at.*, COALESCE(at1.BalanceEndOfDay, 0) AS BalanceEndOfYesterday
FROM AccountTransactions at OUTER APPLY
( SELECT TOP (1) at1.BalanceEndOfDay
FROM AccountTransactions at1
WHERE at1.Date < at.Date
ORDER BY at1.Date DESC
) at1;
EDIT : If you want yesterday only balance then you can use dateadd() :
SELECT DISTINCT at.*, COALESCE(at1.balanceendofday, 0) AS BalanceEndOfYesterday
FROM AccountTransactions at LEFT JOIN
AccountTransactions at1
ON at1.date = dateadd(day, -1, at.date);

We could use LAG here, after first aggregating by date to obtain a single end of day balance for each date. Then, we can join your table to this result to pull in the end of day balance from yesterday.
WITH cte AS (
SELECT Date, MAX(BalanceEndOfDay) AS BalanceEndOfDay,
LAG(MAX(BalanceEndOfDay), 1, 0) OVER (ORDER BY Date) As BalanceEndOfYesterday
FROM AccountTransactions
GROUP BY Date
)
SELECT
a1.Date,
a1.Amount,
a1.BalanceEndOfDay,
a2.BalanceEndOfYesterday
FROM AccountTransactions a1
INNER JOIN cte a2
ON a1.Date = a2.Date
ORDER BY
a1.Date DESC;
Demo

If you want to do this using only window functions, you can use:
select at.*,
max(case when prev_date = dateadd(day, -1, date) then prev_BalanceEndOfDay end) over (partition by date) as prev_BalanceEndOfDay
from (select at.*,
lag(BalanceEndOfDay) over (order by date) as prev_BalanceEndOfDay,
lag(date) over (order by date) as prev_date
from accounttransactions at
) at;
Note: This interprets "the day before" as being exactly one day before. It is means "the day before in the data", then the first comparison should just be max(case when prev_date <> date . . . ).
Here is a db<>fiddle.
Note that in databases that fully support the range window specification, this can be done directly with logic like this:
max(BalanceEndOfDay) over (order by datediff(day, '2000-01-01', date)
range between 1 preceding and 1 preceding
)
Alas, SQL Server does not support this (standard) functionality.

Related

Most efficient way of dividing rows by name and then transposing to one column for each name

I'm using standard SQL in Google Bigquery. So I have some data about metrics in this format:
Date | metric_name | metric_level
01/02/2019 | metric_one | 1
02/03/2019 | metric_one | 2
14/02/2019 | metric_two | 6
17/02/2019 | metric_two | 4
01/03/2019 | metric_three | 2
10/03/2019 | metric_three | 7
I want to get it in this format, date history going back one year, and then each metric filled in for each date. If a metric has no data for a particular date then it uses the most recent data point:
Date | metric_one | metric_two | metric_three
..........
01/02/2019 | 1 | null | null
02/02/2019 | 1 | null | null
03/02/2019 | 1 | null | null
...........
...........
13/02/2019 | 1 | null | null
14/02/2019 | 1 | 6 | null
15/02/2019 | 1 | 6 | null
...........
...........
09/03/2019 | 2 | 4 | 2
10/03/2019 | 2 | 4 | 7
11/03/2019 | 2 | 4 | 7
...........
and so on.
I've managed to write some code that does this, but I want to know if there's a more efficient way of doing it. There are actually a lot more than 3 metrics, so if I can make it more efficient in any way then it will save a lot of resources in the long run.
This is my code
WITH date_arr AS(
SELECT
date
FROM UNNEST(
GENERATE_DATE_ARRAY(
DATE_SUB(CURRENT_DATE(),INTERVAL 365 DAY),
CURRENT_DATE(),
INTERVAL 1 day
)
) AS date
),
metric_one_raw AS (
SELECT
date,
metric_level
FROM database
WHERE metric_name = 'metric_one'
),
metric_one_gapless AS (
SELECT
d.date AS date,
IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_one
FROM date_arr d
LEFT JOIN metric_one_raw i
ON d.date = i.date
WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
),
metric_two_raw AS (
SELECT
date,
metric_level
FROM database
WHERE metric_name = 'metric_two'
),
metric_two_gapless AS (
SELECT
d.date AS date,
IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_two
FROM date_arr d
LEFT JOIN metric_two_raw i
ON d.date = i.date
WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
),
metric_three_raw AS (
SELECT
date,
metric_level
FROM database
WHERE metric_name = 'metric_three'
),
metric_three_gapless AS (
SELECT
d.date AS date,
IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_three
FROM date_arr d
LEFT JOIN metric_three_raw i
ON d.date = i.date
WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
)
SELECT
*
FROM metric_one_gapless
LEFT JOIN metric_two_gapless USING(date)
LEFT JOIN metric_three_gapless USING(date)
Hope that makes sense. Thanks in advance!
You can do the following:
Generate the dates
Use a cross join to get all the rows
Use a left join to bring in your data
Use last_value() to fill in NULL values.
In other database, I would prefer lag(ignore nulls), but BigQuery does not support that.
So:
select d, m.metric,
coalesce(mm.metric_level,
last_value(mm.metric_level ignore nulls) over (partition by m.metric order by d)
) as metric_level
from (select distinct metric from metrics) m cross join
unnest(gnerate_date_array(date_sub(current_date(), interval 1 year), interval 1 day) d left join
metrics mm
on mm.metric = m.metric and mm.date = d;
after doing some research, I came up with somethig, due you are using left join and there would be more than one, or even, variable number of left joins, and also you can not use declare in BigQuery Web UI, you probably need better to use the API Rest BigQuery feature, you can find here the dependencies, you can use C#, GO, JAVA, NODE.JS, PHP, PYTHON or RUBY coding, this would allow you to assign in to a variable the number of metrics, so I recommend first to do a select distinct to know how many metrics are there and after that you can save them into a variable and after that do a loop to execute the left joins you want.
I hope this information helps you, and I'm here if you need more information.

Cumulated sum based on condition in other column

I would like to create a view based on data in following structure:
CREATE TABLE my_table (
date date,
daily_cumulative_precip float4
);
INSERT INTO my_table (date, daily_cumulative_precip)
VALUES
('2016-07-28', 3.048)
, ('2016-08-04', 2.286)
, ('2016-08-11', 5.334)
, ('2016-08-12', 0.254)
, ('2016-08-13', 2.794)
, ('2016-08-14', 2.286)
, ('2016-08-15', 3.302)
, ('2016-08-17', 3.81)
, ('2016-08-19', 15.746)
, ('2016-08-20', 46.739998);
I would like to accumulate the precipitation for consecutive days only.
Below is the desired result for a different test case - except that days without rain should be omitted:
I have tried window functions with OVER(PARTITION BY date, rain_on_day) but they do not yield the desired result.
How could I solve this?
SELECT date
, dense_rank() OVER (ORDER BY grp) AS consecutive_group_nr -- optional
, daily_cumulative_precip
, sum(daily_cumulative_precip) OVER (PARTITION BY grp ORDER BY date) AS cum_precipitation_mm
FROM (
SELECT date, t.daily_cumulative_precip
, row_number() OVER (ORDER BY date) - t.rn AS grp
FROM (
SELECT generate_series (min(date), max(date), interval '1 day')::date AS date
FROM my_table
) d
LEFT JOIN (SELECT *, row_number() OVER (ORDER BY date) AS rn FROM my_table) t USING (date)
) x
WHERE daily_cumulative_precip > 0
ORDER BY date;
db<>fiddle here
Returns all rainy days with cumulative sums for consecutive days (and a running group number).
Basics:
Select longest continuous sequence
Here's a way to calculate cumulative precipitation without having to explicitly enumerate all dates:
SELECT date, daily_cumulative_precip, sum(daily_cumulative_precip) over (partition by group_num order by date) as cum_precip
FROM
(SELECT date, daily_cumulative_precip, sum(start_group) over (order by date) as group_num
FROM
(SELECT date, daily_cumulative_precip, CASE WHEN (date != prev_date + 1) THEN 1 ELSE 0 END as start_group
FROM
(SELECT date, daily_cumulative_precip, lag(date, 1, '-infinity'::date) over (order by date) as prev_date
FROM my_table) t1) t2) t3
yields
| date | daily_cumulative_precip | cum_precip |
|------------+-------------------------+------------|
| 2016-07-28 | 3.048 | 3.048 |
| 2016-08-04 | 2.286 | 2.286 |
| 2016-08-11 | 5.334 | 5.334 |
| 2016-08-12 | 0.254 | 5.588 |
| 2016-08-13 | 2.794 | 8.382 |
| 2016-08-14 | 2.286 | 10.668 |
| 2016-08-15 | 3.302 | 13.97 |
| 2016-08-17 | 3.81 | 3.81 |
| 2016-08-19 | 15.746 | 15.746 |
| 2016-08-20 | 46.74 | 62.486 |

Grouping rows if their dates overlap, and ranking them

My situation is I have a table of transactions, with start and end dates. The problem is that often times these transaction dates overlap with each other, and I want to group these scenarios together.
For example in the case below, transaction #1 is the "root" transaction, while #2-#4 are overlapping with #1 and/or with each other. However, transaction #5 is not overlapping with anything, hence it is a new "root" transaction.
+----------------+-----------+-----------+----------------------------------+
| Transaction ID | StartDate | EndDate | |
+----------------+-----------+-----------+----------------------------------+
| 1 | 1/1/2017 | 1/3/2017 | root transaction |
| 2 | 1/2/2017 | 1/6/2017 | overlaps with #1 |
| 3 | 1/5/2017 | 1/10/2017 | overlaps with #2 |
| 4 | 1/3/2017 | 1/13/2017 | overlaps with #2 and #3 |
| 5 | 1/15/2017 | 1/20/2017 | no overlap, new root transaction |
+----------------+-----------+-----------+----------------------------------+
Below is how I want the output to look. I want to
Identify the root transaction (column 4)
Rank the transactions in a chain by EndDate, so that the root is always = 1
+----------------+-----------+-----------+------------------+------+
| Transaction ID | Start | End | Root Transaction | Rank |
+----------------+-----------+-----------+------------------+------+
| 1 | 1/1/2017 | 1/3/2017 | 1 | 1 |
| 2 | 1/2/2017 | 1/6/2017 | 1 | 2 |
| 3 | 1/5/2017 | 1/10/2017 | 1 | 3 |
| 4 | 1/3/2017 | 1/13/2017 | 1 | 4 |
| 5 | 1/15/2017 | 1/20/2017 | 5 | 1 |
+----------------+-----------+-----------+------------------+------+
How would I go about this in SQL?
Here is one method using an OUTER APPLY
Declare #YourTable table ([Transaction ID] int,StartDate date,EndDate date)
Insert Into #YourTable values
(1,'1/1/2017','1/3/2017'),
(2,'1/2/2017','1/6/2017'),
(3,'1/5/2017','1/10/2017'),
(4,'1/3/2017','1/13/2017'),
(5,'1/15/2017','1/20/2017')
Select [Transaction ID]
,[Start] = StartDate
,[End] = EndDate
,[Root Transaction]=Grp
,[Rank] = Row_Number() over (Partition By Grp Order by [Transaction ID])
From (
Select A.*
,Grp = max(Flag*[Transaction ID]) over (Order By [Transaction ID])
From (
Select A.*,Flag = IsNull(B.Flg,1)
From #YourTable A
Outer Apply (
Select Top 1 Flg=0
From #YourTable
Where (StartDate between A.StartDate and A.EndDate
or EndDate between A.StartDate and A.EndDate )
and [Transaction ID]<A.[Transaction ID]
) B
) A
) A
Returns
EDIT - Some Commentary
In the OUTER APPLY, Flag will be set to 1 or 0. 1 Indicates a New Group. 0 Indicates that the record will overlap with an existing range
Then the next query "up", We use the window function to apply a Grp Code (Flag*Trans ID). Remember a new group is 1 and existing is 0.
Now the window function will take max of this product, as it traverses the Transactions.
The final query is just to apply the Rank using the window function partition by the Grp, Order by Trans ID
If it helps with the visualization:
The 1st sub-query (outer apply) genererates
The 2nd sub-query generates
This is an example of "gaps-and-islands". For your data, you can determine the "island"s by determining where each starts -- that is, where a record does not overlap with the previous record. You can then get the rank using row_number().
So, here is a method:
select t.*,
min(transactionId) over (partition by island) as start,
row_number() over (partition by island order by endDate) as rnk
from (select t.*,
sum(startIslandFlag) over (order by startDate) as island
from (select t.*,
(case when not exists (select 1
from t t2
where t2.startdate < t.startdate and
t2.enddate >= t.startdate
)
then 1 else 0
end) as startIslandFlag
from t
) t
) t;
Notes:
In the event that the lowest transaction id is not the "root", then a tweak may be needed to the code to get the transaction id with the minimum start date.
If there are duplicate start dates in the code, a tweak may be needed with the cumulative sums (using an explicit range window).
Identify the root transactions:
with roots as (
select *
from tran as t1
where not exists (
select 1
from tran as t2
where t2.Transaction_ID < t1.Transaction_ID
and (
t1.StartDate between t2.StartDate and t2.EndDate
or
t1.EndDate between t2.StartDate and t2.EndDate
)
)
)
Create a two root system to capture all the overlaps in between them
select t.Transaction_ID,
t.StartDate as [Start],
t.EndDate as [End],
r1.Transaction_ID as Root_Transaction,
row_number() over (partition by r1.Transaction_ID order by t.EndDate) as [Rank]
from roots as r1
inner join roots as r2
on r2.Transaction_ID > r1.Transaction_ID
inner join tran as t
on t.Transaction_ID >= r1.Transaction_ID
and t.Transaction_ID < r2.Transaction_ID
where not exists ( --this "not exists" makes sure r1 and r2 are consequetive roots
select 1
from roots as r3
where r3.Transaction_ID > r1.Transaction_ID
and r3.Transaction_ID < r2.Transaction_ID
)

get calculation result from two columns in sql server

Please refer to this table below.
|RefNbr | DocDate | OrigAmt | AdjAmt | Balances |
|INV001 | 2016-03-15 | 5,000.00 | 250.00 | 4,750.00 |
|INV002 | 2016-03-16 | 5,000.00 | 750.00 | 4,000.00 |
|INV003 | 2016-03-17 | 5,000.00 | 1,000.00 | 3,000.00 |
|INV004 | 2016-03-19 | 5,000.00 | 500.00 | 2,500.00 |
how to provide query to get value of balances ?
(Balances = OrigAmt - AdjAmt (this rule only for the first row), and then in second row, Balances = Prev Balances (balances in first row) - AdjAmt, and etc).
Here is one way using windowed aggregate function
select OrigAmt - sum(AdjAmt) over(order by DocDate asc) as Balances
From yourtable
For anything less than sql server 2012 use this
SELECT OrigAmt - cum_sum AS Balances
FROM yourtable a
CROSS apply (SELECT Sum(AdjAmt)
FROM yourtable b
WHERE b.DocDate <= a.DocDate) cs( cum_sum)
Try below codes it may help you little .
**
CREATE TABLE #TAB(REFNBR VARCHAR(MAX),DOCDATE DATETIME ,ORIGAMT DECIMAL(18,2),ADJAMT DECIMAL(18,2))
INSERT INTO #TAB VALUES ('INV001','2016-03-15',5000.00,250.00),('INV002','2016-03-16',5000.00,750.00),
('INV003','2016-03-17',5000.00,1000.00),('INV004','2016-03-19',5000.00,500.00)
;WITH CTE AS (
SELECT REFNBR,
DOCDATE,
ORIGAMT,
ADJAMT,
ORIGAMT-ADJAMT AS BALANCE,
ROW_NUMBER() OVER ( ORDER BY DOCDATE) AS RN
FROM #TAB)
SELECT a.REFNBR,
a.DOCDATE,
a.ORIGAMT,
a.ADJAMT,
CASE WHEN ISNULL(LAG(a.BALANCE + ISNULL(x.ADDS,0)) OVER (ORDER BY a.RN),0) + a.ORIGAMT - a.ADJAMT < 0
THEN 0
ELSE a.BALANCE + ISNULL(x.ADDS,0)
END AS FINAL_BALANCE
FROM CTE a
CROSS APPLY (SELECT SUM(BALANCE) AS ADDS
FROM CTE f
WHERE f.REFNBR = a.REFNBR AND f.RN < a.RN
) x
**
The above code is for 2014 for less than 2014 try below code once
SELECT REFNBR,
DocDate,
OrigAmt,
AdjAmt,
CASE
WHEN RNO > 1 THEN Sum(OrigAmt - ADJAMT)
OVER(
PARTITION BY REFNBR
ORDER BY RNO)
ELSE Iif(( OrigAmt - ADJAMT ) < 0, 0, OrigAmt - ADJAMT)
END
FROM (SELECT *,
Row_number()
OVER(
PARTITION BY REFNBR
ORDER BY DocDate) AS RNO
FROM #TAB) A

Get Monthly Totals from Running Totals

I have a table in a SQL Server 2008 database with two columns that hold running totals called Hours and Starts. Another column, Date, holds the date of a record. The dates are sporadic throughout any given month, but there's always a record for the last hour of the month.
For example:
ContainerID | Date | Hours | Starts
1 | 2010-12-31 23:59 | 20 | 6
1 | 2011-01-15 00:59 | 23 | 6
1 | 2011-01-31 23:59 | 30 | 8
2 | 2010-12-31 23:59 | 14 | 2
2 | 2011-01-18 12:59 | 14 | 2
2 | 2011-01-31 23:59 | 19 | 3
How can I query the table to get the total number of hours and starts for each month between two specified years? (In this case 2011 and 2013.) I know that I need to take the values from the last record of one month and subtract it by the values from the last record of the previous month. I'm having a hard time coming up with a good way to do this in SQL, however.
As requested, here are the expected results:
ContainerID | Date | MonthlyHours | MonthlyStarts
1 | 2011-01-31 23:59 | 10 | 2
2 | 2011-01-31 23:59 | 5 | 1
Try this:
SELECT c1.ContainerID,
c1.Date,
c1.Hours-c3.Hours AS "MonthlyHours",
c1.Starts - c3.Starts AS "MonthlyStarts"
FROM Containers c1
LEFT OUTER JOIN Containers c2 ON
c1.ContainerID = c2.ContainerID
AND datediff(MONTH, c1.Date, c2.Date)=0
AND c2.Date > c1.Date
LEFT OUTER JOIN Containers c3 ON
c1.ContainerID = c3.ContainerID
AND datediff(MONTH, c1.Date, c3.Date)=-1
LEFT OUTER JOIN Containers c4 ON
c3.ContainerID = c4.ContainerID
AND datediff(MONTH, c3.Date, c4.Date)=0
AND c4.Date > c3.Date
WHERE
c2.ContainerID is null
AND c4.ContainerID is null
AND c3.ContainerID is not null
ORDER BY c1.ContainerID, c1.Date
Using recursive CTE and some 'creative' JOIN condition, you can fetch next month's value for each ContainterID:
WITH CTE_PREP AS
(
--RN will be 1 for last row in each month for each container
--MonthRank will be sequential number for each subsequent month (to increment easier)
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY ContainerID, YEAR(Date), MONTH(DATE) ORDER BY Date DESC) RN
,DENSE_RANK() OVER (ORDER BY YEAR(Date),MONTH(Date)) MonthRank
FROM Table1
)
, RCTE AS
(
--"Zero row", last row in decembar 2010 for each container
SELECT *, Hours AS MonthlyHours, Starts AS MonthlyStarts
FROM CTE_Prep
WHERE YEAR(date) = 2010 AND MONTH(date) = 12 AND RN = 1
UNION ALL
--for each next row just join on MonthRank + 1
SELECT t.*, t.Hours - r.Hours, t.Starts - r.Starts
FROM RCTE r
INNER JOIN CTE_Prep t ON r.ContainerID = t.ContainerID AND r.MonthRank + 1 = t.MonthRank AND t.Rn = 1
)
SELECT ContainerID, Date, MonthlyHours, MonthlyStarts
FROM RCTE
WHERE Date >= '2011-01-01' --to eliminate "zero row"
ORDER BY ContainerID
SQLFiddle DEMO (I have added some data for February and March in order to test on different lengths of months)
Old version fiddle