SQL Redshift conditional running sum based on the running sum result - sql

I have a table with different invoice values (column "invoice") aggregated by month (column "date") and partner (column "partner"). I need to have as output a running sum that starts when my invoice value is negative and continues until the running sum becomes positive. Below is the representation of what would be the result I need:
In the date 2022-05-01, the invoice value is 100, so it should not be considered in the running sum. In the date 2022-06-01, the invoice value is negative (-250) so the running sum should start until it turns into a positive value (which happens in the date 2022-09-01). The same logic should happen continuously: In the date 2022-11-01 the running sum starts again and goes until 2023-01-01 when it becomes positive again.
What SQL query could get me the output I need?
I tried to perform a partition sum based on the partner and ordered by date when the invoice value is negative, however the running sum starts and stop in the negative values and don't consider the positive values in the sequence.
Select
date
,partner
,case when invoice > 0 then 1 else 0 end as index
,sum(invoice) over (partition by partner,index order by date)
from table t

The logic you describe requires iterating over the dataset. Window functions do not provide this feature: in SQL, we would use a recursive query to address this problem.
Redshift recently added support for recursive CTEs: yay! Consider:
with recursive
data (date, partner, invoice, rn) as (
select date, partner, invoice,
row_number() over(partition by partner order by date) rn
from mytable
),
cte (date, partner, invoice, rn, rsum) as (
select date, partner, invoice, rn, invoice
from data
where rn = 1
union all
select d.date, d.partner, d.invoice, d.rn,
case when c.rsum >= 0 and c.rn > 1 then d.invoice else c.rsum + d.invoice end
from cte c
inner join data d on d.partner = c.partner and d.rn = c.rn + 1
)
select * from cte order by partner, date

Related

Past 7 days running amounts average as progress per each date

So, the query is simple but i am facing issues in implementing the Sql logic. Heres the query suppose i have records like
Phoneno Company Date Amount
83838 xyz 20210901 100
87337 abc 20210902 500
47473 cde 20210903 600
Output expected is past 7 days progress as running avg of amount for each date (current date n 6 days before)
Date amount avg
20210901 100 100
20210902 500 300
20210903 600 400
I tried
Select date, amount, select
avg(lg) from (
Select case when lag(amount)
Over (order by NULL) IS NULL
THEN AMOUNT
ELSE
lag(amount)
Over (order by NULL) END AS LG)
From table
WHERE DATE>=t.date-7) as avg
From table t;
But i am getting wrong avg values. Could anyone please help?
Note: Ive tried without lag too it results the wrong avgs too
You could use a self join to group the dates
select distinct
a.dt,
b.dt as preceding_dt, --just for QA purpose
a.amt,
b.amt as preceding_amt,--just for QA purpose
avg(b.amt) over (partition by a.dt) as avg_amt
from t a
join t b on a.dt-b.dt between 0 and 6
group by a.dt, b.dt, a.amt, b.amt; --to dedupe the data after the join
If you want to make your correlated subquery approach work, you don't really need the lag.
select dt,
amt,
(select avg(b.amt) from t b where a.dt-b.dt between 0 and 6) as avg_lg
from t a;
If you don't have multiple rows per date, this gets even simpler
select dt,
amt,
avg(amt) over (order by dt rows between 6 preceding and current row) as avg_lg
from t;
Also the condition DATE>=t.date-7 you used is left open on one side meaning it will qualify a lot of dates that shouldn't have been qualified.
DEMO
You can use analytical function with the windowing clause to get your results:
SELECT DISTINCT BillingDate,
AVG(amount) OVER (ORDER BY BillingDate
RANGE BETWEEN TO_DSINTERVAL('7 00:00:00') PRECEDING
AND TO_DSINTERVAL('0 00:00:00') FOLLOWING) AS RUNNING_AVG
FROM accounts
ORDER BY BillingDate;
Here is a DBFiddle showing the query in action (LINK)

Creating one record for a continuous sequnce of dates to a new table

We have a table in Microsoft SQL Server 2014 as shown below which has Id, LogId, AccountId, StateCode, Number and LastSentDate column.
Our goal was to move the data to a new table. When we move it we need to maintain the first and last record for that series. Based on our data the lastsentdate starts from 5/1 and continues till 5/5, then we should create a new row as shown below(we set the FirstSentDate as 5/1, Log Id as first log id that appeared - 28369 and since the series ended on 5/5 we update LastsentDate as 5/5 and LastSentLog Id as 28752)
if there are some dates with the difference in time, the desired output will be
Since our date series continues the last row in the new table will be
We were trying to group by date and achieve this
WITH t
AS (SELECT LastSentDate d,
ROW_NUMBER() OVER(
ORDER BY LastSentDate) i
FROM [dbo].[RegistrationActivity]
GROUP BY LastSentDate)
SELECT MIN(d),
MAX(d)
FROM t
GROUP BY DATEDIFF(day, i, d);
Use lag() to define where a group begins. Then use a cumulative sum to assign a group id to each group. And finally, extract the data you want. I'm not sure what data you actually want, but here is the idea:
select accountid, min(lastsentdate), max(lastsentdate)
from (select t.*,
sum(case when prev_lsd > dateadd(day, 1, lastsentdate )then 0 else 1 end) over (partition by accountid order by lastsentdate) as grp
from (select t.*, lag(lastsentdate) over (partition by accountid) as prev_lsd
from t
) t
) t
group by accountid;

Microsoft SQL Server : getting highest cost for last purchase date

It's been a while since I used SQL so I'm a bit rusty. Let's say you want to compare the cost of things purchased from the previous month to this month. So an example would be a data table like this...
An item purchased on October cost $3 but the same item cost in September was $2 and $1. So you'd get the max cost of the max date (which would then be the $2 not $1). This would happen for every row of data.
I've done this with a stored scalar-value function, but when handling 100K+ rows of data, speeds are no where near fast. How would you do this with a select query in itself? What I did before was select both the max's in a select statement and only return 1, then call that function in a select statement. I want to do the same without stored procedures or functions for speed reasons. I know the following query won't work because you can only return 1 value, but it's something that I'm going for.
Select
Purchase, Item, USD,
(select MAX(Purchase), MAX(USD) from Table
where Item = 845 and MONTH(Purchase) = MONTH(Purchase) -1) LastCost
from Table
An example of what it should display can be portrayed as this.
What would be the best way to approach this?
Attention:
Select MAX(Purchase), MAX(USD) from Table will not return the highest cost for the highest date, but will return the highest date and the highest cost (no matter of what date).
This is how I would do this (on at least SQL Server 2012):
To get only one record per month and item (with the highest cost on the latest date), I use a numbering for the purchase date and cost (per item and month) with a descending sort order, first by date, then by cost. In the next step, I filter out only those records where the numbering is 1 (max cost for max date per item and month) and use the LAG function to access the previous cost:
WITH
numbering (Purchase, Item, Cost, p_no) AS (
SELECT Purchase,Item, Cost
,ROW_NUMBER() OVER (PARTITION BY Item, EOMONTH(Purchase) ORDER BY Purchase DESC, Cost DESC)
FROM tbl
)
SELECT Purchase, Item, Cost
, LAG(Cost) OVER (PARTITION BY Item ORDER BY Purchase) AS LastCost
FROM numbering
WHERE p_no = 1
SELECT Date, item, usd,
LAG(Date, 1) OVER(Order by date asc) as FormerDate,
LAG(usd, 1) OVER(Order by date asc) as FormerUsd
from (select date, item, max(usd) as usd from Data group by date, item) t
This basically returns the day before the current entry with its max price.
For SQL server 2017 below query will work for sample data
select purchase,item,
substring(usd,CHARINDEX(',',usd),len(usd)) as USD,
substring(usd,1,CHARINDEX(',',usd)) as lastcost from
(select max(purchase) as purchase,item, STRING_AGG (usd, ',') AS usd
from
(
select purchase,item,max(usd) as usd from t
group by purchase,item
) as T group by item
) T1
For your results, you need to use MAX() and ROW_NUMBER() with OVER(). Then partition the records by Item, Year, and Month. This will assure that the sort will be on each item, by each year, by each month. The ROW_NUMBER() will act as as simple way to put the last records at the top of the results, so you'll call row number 1 for each item to get the latest cost. After that, you use it as subquery to refine it as needed. For a start (your sample), you'll need to use CASE in order to split the USD (previous and Last cost). then you do the rest from there (simple methods).
I need to note that it's important to sort the records by year first, then month. then if you need to include the day, include it. This way you'll insure the records will be sorted correctly.
So, the query would look like something like this :
SELECT
MAX(Purchase) Purchase
, MAX(Item) Item
, MAX(CASE WHEN LastCost > USD THEN LastCost ELSE NULL END) USD
, MAX(CASE WHEN LastCost = USD THEN LastCost ELSE NULL END) LastCost
FROM (
SELECT
Purchase
, Item
, USD
, MAX(USD) OVER(PARTITION BY Item, YEAR(Purchase), MONTH(Purchase)) LastCost
, ROW_NUMBER() OVER(PARTITION BY Item, YEAR(Purchase), MONTH(Purchase) ORDER BY MONTH(Purchase)) RN
FROM Table
) D
WHERE
RN = 1
with data as (
select Item, eomonth(Purchase) as PurchaseMonth, max(USD) as MaxUSD
from T
group by Item, eomonth(Purchase)
)
select
PurchaseMonth, Item,
lag(MaxUSD) over (partition by Item order by PurchaseMonth) as PriorUSD
from data;

How to take only one entry from a table based on an offset to a date column value

I have a requirement to get values from a table based on an offset conditions on a date column.
Say for eg: for the below attached table, if there is any dates that comes close within 15 days based on effectivedate column I should return only the first one.
So my expected result would be as below:
Here for A1234 policy, it returns 6/18/16 entry and skipped 6/12/16 entry as the offset between these 2 dates is within 15 days and I took the latest one from the list.
If you want to group rows together that are within 15 days of each other, then you have a variant of the gaps-and-islands problem. I would recommend lag() and cumulative sum for this version:
select polno, min(effectivedate), max(expirationdate)
from (select t.*,
sum(case when prev_ed >= dateadd(day, -15, effectivedate)
then 1 else 0
end) over (partition by polno order by effectivedate) as grp
from (select t.*,
lag(expirationdate) over (partition by polno order by effectivedate) as prev_ed
from t
) t
) t
group by polno, grp;

Calculating business days in Teradata

I need help in business days calculation.
I've two tables
1) One table ACTUAL_TABLE containing order date and contact date with timestamp datatypes.
2) The second table BUSINESS_DATES has each of the calendar dates listed and has a flag to indicate weekend days.
using these two tables, I need to ensure business days and not calendar days (which is the current logic) is calculated between these two fields.
My thought process was to first get a range of dates by comparing ORDER_DATE with TABLE_DATE field and then do a similar comparison of CONTACT_DATE to TABLE_DATE field. This would get me a range from the BUSINESS_DATES table which I can then use to calculate count of days, sum(Holiday_WKND_Flag) fields making the result look like:
Order# | Count(*) As DAYS | SUM(WEEKEND DATES)
100 | 25 | 8
However this only works when I use a specific order number and cant' bring all order numbers in a sub query.
My Query:
SELECT SUM(Holiday_WKND_Flag), COUNT(*) FROM
(
SELECT
* FROM
BUSINESS_DATES
WHERE BUSINESS.Business BETWEEN (SELECT ORDER_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
AND
(SELECT CONTACT_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
TEMP
Uploading the table structure for your reference.
SELECT ORDER#, SUM(Holiday_WKND_Flag), COUNT(*)
FROM business_dates bd
INNER JOIN actual_table at ON bd.table_date BETWEEN at.order_date AND at.contact_date
GROUP BY ORDER#
Instead of joining on a BETWEEN (which always results in a bad Product Join) followed by a COUNT you better assign a bussines day number to each date (in best case this is calculated only once and added as a column to your calendar table). Then it's two Equi-Joins and no aggregation needed:
WITH cte AS
(
SELECT
Cast(table_date AS DATE) AS table_date,
-- assign a consecutive number to each busines day, i.e. not increased during weekends, etc.
Sum(CASE WHEN Holiday_WKND_Flag = 1 THEN 0 ELSE 1 end)
Over (ORDER BY table_date
ROWS Unbounded Preceding) AS business_day_nbr
FROM business_dates
)
SELECT ORDER#,
Cast(t.contact_date AS DATE) - Cast(t.order_date AS DATE) AS #_of_days
b2.business_day_nbr - b1.business_day_nbr AS #_of_business_days
FROM actual_table AS t
JOIN cte AS b1
ON Cast(t.order_date AS DATE) = b1.table_date
JOIN cte AS b2
ON Cast(t.contact_date AS DATE) = b2.table_date
Btw, why are table_date and order_date timestamp instead of a date?
Porting from Oracle?
You can use this query. Hope it helps
select order#,
order_date,
contact_date,
(select count(1)
from business_dates_table
where table_date between a.order_date and a.contact_date
and holiday_wknd_flag = 0
) business_days
from actual_table a