Cumulative Additions in SQL Server 2008 - sql

I am trying to add the previous row value to the current row and keep this adding until I reach the total.
I have tried the below query and I have set row number to the rows as per my needs, the additions should take place in the same manner.
select *
from #Finalle
order by rownum ;
Output:
Type Count_DB Rownum
------------------------------------------
Within 30 days 399480 1
Within 60 days 30536 2
Within 90 days 10432 3
Within 120 days 11777 4
Greater than 120 days 13091 5
Blank 29297 6
Total 494613 7
When I try the below query, it works fine until the 6th row, but fails for the last row:
select
f1.[type],
(select
Sum(f.[Count of ED_DB_category]) as [Cummumative]
from
#Finalle f
where
f1.rownum >= f.rownum)
from
#Finalle f1
order by
rownum
Output:
Type No column name
--------------------------------------
Within 30 days 399480
Within 60 days 430016
Within 90 days 440448
Within 120 days 452225
Greater than 120 days 465316
Blank 494613 <---Add only until here
Total 989226
Here the total should return the same value as in the first table.
How do I achieve this?

Try this:
select f1.[type],
CASE
WHEN type = 'Total' THEN f1.[Count_DB]
ELSE SUM(f1.[Count_DB]) OVER (ORDER BY rownum)
END
from #Finalle f1
order by rownum

You may looking for this
select f1.[type],
(select Sum(f.[Count of ED_DB_category])
from #Finalle f where f1.rownum >= f.rownum AND
f.Type <> 'Total'
)as [Cummumative]
from #Finalle f1
order by rownum

Maybe you could use windowed function....
Here's a SWAG but I am not sure of your table structures.....or what to order by
SELECT
f1.[type],
SUM(f.[Count of ED_DB_category]) OVER(ORDER BY f1.[type] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS [Cummumative]
FROM
FROM #Finalle f1
WHERE f.Type <> 'Total'

Related

How do I use SQL to perform a cumulative sum where the increments have an expiration?

Say the scenario is this:
I have a database of student infractions. When a student is late to class, or misses a homework assignment they get an infraction.
student_id
infraction_type
day
1
tardy
0
2
missed_assignment
0
1
tardy
29
2
missed_assignment
15
1
tardy
99
2
missed_assignment
29
The school has three strike system, at each infraction disciplinary action is taken. Call them D0,D1,D2.
Infractions expire after 30 days.
I want to be able to perform a query to calculate the total counts of disciplinary actions taken in a given time period.
So the number of disciplinary actions taken in the last 100 days (at day 99) would be
disciplinary_action
count
D0
3
D1
2
D2
1
A table generated showing the disciplinary actions taken would look like:
student_id
infraction_type
day
disciplinary_action_gen
1
tardy
0
D0
2
missed_assignment
0
D0
1
tardy
29
D1
2
missed_assignment
15
D1
1
tardy
99
D0
2
missed_assignment
29
D2
What SQL query could I use to do such a cumulative sum?
You can solve your problem by checking in the following order:
if <30 days have passed from the last two infractions, assign D2
if <30 days have passed from last infraction, assign D1
assign D0 (given its the first infraction)
This will work assuming your DBMS supports the tools used for this solution, namely:
the CASE expression, to conditionally assign infraction values
the LAG window function, to retrieve the previous "day" values
SELECT *,
CASE WHEN day - LAG(day,2) OVER(PARTITION BY student_id
ORDER BY day ) < 30 THEN 'D2'
WHEN day - LAG(day,1) OVER(PARTITION BY student_id
ORDER BY day ) < 30 THEN 'D1'
ELSE 'D0'
END AS disciplinary_action_gen
FROM tab
Check a MySQL demo here.
A similar approach using COUNT() as a window function and a frame definition -
SELECT
*,
CONCAT(
'D',
LEAST(
3,
COUNT(*) OVER (
PARTITION BY student_id
ORDER BY day ASC
RANGE BETWEEN 30 PRECEDING AND CURRENT ROW
)
) - 1
) AS disciplinary_action_gen
FROM infractions;
The frame definition (RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) tells the server that we want to include all rows with a day value between (current row's value of day - 30) and (the current row's value of day). So, if the current row has a day value of 99, the count will be for all rows in the partition with a day value between 69 and 99.
To get the disciplinary counts, we can simply wrap this in a normal GROUP BY -
SELECT disciplinary_action, COUNT(*) AS count
FROM (
SELECT
CONCAT(
'D',
LEAST(
3,
COUNT(*) OVER (
PARTITION BY student_id
ORDER BY day ASC
RANGE BETWEEN 30 PRECEDING AND CURRENT ROW
)
) - 1
) AS disciplinary_action
FROM infractions
) t
GROUP BY disciplinary_action;
If your infractions are stored with a date, as opposed to the days in your example, this can be easily updated to use a date interval in the frame definition. And, if looking at counts of disciplinary actions in the last 100 days we need to include the previous 30 days, as these could impact the action (D0, D1 or D2) on the first day we are interested in.
SELECT disciplinary_action, COUNT(*) AS count
FROM (
SELECT
`date`,
CONCAT(
'D',
LEAST(
3,
COUNT(*) OVER (
PARTITION BY student_id
ORDER BY `date` ASC
RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND CURRENT ROW
)
) - 1
) AS disciplinary_action
FROM infractions
WHERE `date` >= CURRENT_DATE - INTERVAL 130 DAY
) t
WHERE `date` >= CURRENT_DATE - INTERVAL 100 DAY
GROUP BY disciplinary_action;
Here's a db<>fiddle

Include Non-Existent Rows in Aggregate (Partition with Preceding and Following Rows)

I'm trying to get an average of the previous six months counts. However, I noticed if there are only 4 previous months, it will only do an average of those 4 months instead of 6 months. Is there a way to make so I forcefully sum over the 6 months?
select
ST.AccountNumber
, st.PrevMonth
, st.[Transaction Effective Date]
, st.[Transaction Amt]
, st.CurrentMonthTransCnt
, mt.EndOfMonth
, AvgMonthlyTransCntLast6Months = AVG(isnull(cnt, 0)) OVER (PARTITION BY MT.AccountNumber order by rowid ROWS BETWEEN 1 following and 6 following)
--into #AvgCntAndStdDev
from EDWAnalytics.ML.TEMP_SymitarTransactionsFinal as ST
left join MonthlyTransCnt as MT on MT.EndOfMonth = ST.PrevMonth and ST.AccountNumber = MT.AccountNumber
where ST.AccountNumber = '0000709510'
If you want the average of the previous six months for each month then the order should be inverted:
order by rowid desc
Another option is to use the current order, but with ROWS BETWEEN 6 preceding and 1 preceding
If you want the average to always be computed over six months you should replace avg with sum/6.
isnull(sum(cnt) OVER (PARTITION BY MT.AccountNumber order by rowid ROWS BETWEEN 6 preceding and 1 preceding),0)/6

Get moving average over time frame in PostgreSQL with inconsistent data

I have a table called answers with columns created_at and response, response being an integer 0 (for 'no'), 1 (for 'yes'), or 2 (for 'don't know'). I want to get a moving average for the response values, filtering out 2s for each day, only taking in to account the previous 30 days. I know you can do ROWS BETWEEN 29 AND PRECEDING AND CURRENT ROW but that only works if you have data for each day, and in my case there might be no data for a week or more.
My current query is this:
SELECT answers.created_at, answers.response,
AVG(answers.response)
OVER(ORDER BY answers.created_at::date ROWS
BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_average
FROM answers
WHERE answers.user_id = 'insert_user_id''
AND (answers.response = 0 OR answers.response = 1)
GROUP BY answers.created_at, answers.response
ORDER BY answers.created_at::date
But this will return an average based on the previous rows, if a user responded with a 1 on 2018-3-30 and a 0 on 2018-5-15, the rolling average on 2018-5-15 would be 0.5 instead of 0 as I want. How can I create a query that will only take in to account the responses that were created within the last 30 days for the rolling average?
Since Postgres 11 you can do this:
SELECT created_at,
response,
AVG(response) OVER (ORDER BY created_at
RANGE BETWEEN '29 day' PRECEDING AND current row) AS rolling_average
FROM answers
WHERE user_id = 1
AND response in (0,1)
ORDER BY created_at;
Try something like this:
SELECT * FROM (
SELECT
d.created_at, d.response,
Avg(d.response) OVER(ORDER BY d.created_at::date rows BETWEEN 29 PRECEDING AND CURRENT row) AS rolling_average
FROM (
SELECT
COALESCE(a.created_at, d.dates) AS created_at, response, a.user_id
FROM
(SELECT generate_series('2018-01-01'::date, '2018-05-31'::date, '1day'::interval)::date AS dates) d
LEFT JOIN
(SELECT * FROM answers WHERE answers.user_id = 'insert_user_id' AND ( answers.response = 0 OR answers.response = 1)) a
ON d.dates = a.created_at::date
) d
GROUP BY d.created_at, d.response
) agg WHERE agg.response IS NOT NULL
ORDER BY agg.created_at::date
generate_series creates list of days - you have to set reasonable boundaries
this list of days is LEFT JOINed with preselected answers
this result is used for rolling average calculation
after it I select only records with response and I get:
created_at | response | rolling_averagte
2018-03-30 | 1 | 1.00000000000000000000
2018-05-15 | 0 | 0.00000000000000000000

Count occurrences of combinations of columns

I have daily time series (actually business days) for different companies and I work with PostgreSQL. There is also an indicator variable (called flag) taking the value 0 most of the time, and 1 on some rare event days. If the indicator variable takes the value 1 for a company, I want to further investigate the entries from two days before to one day after that event for the corresponding company. Let me refer to that as [-2,1] window with the event day being day 0.
I am using the following query
CREATE TABLE test AS
WITH cte AS (
SELECT *
, MAX(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN 1 preceding AND 2 following) Lead1
FROM mytable)
SELECT *
FROM cte
WHERE Lead1 = 1
ORDER BY day,company
The query takes the entries ranging from 2 days before the event to one day after the event, for the company experiencing the event.
The query does that for all events.
This is a small section of the resulting table.
day company flag
2012-01-23 A 0
2012-01-24 A 0
2012-01-25 A 1
2012-01-25 B 0
2012-01-26 A 0
2012-01-26 B 0
2012-01-27 B 1
2012-01-30 B 0
2013-01-10 A 0
2013-01-11 A 0
2013-01-14 A 1
Now I want to do further calculations for every [-2,1] window separately. So I need a variable that allows me to identify each [-2,1] window. The idea is that I count the number of windows for every company with the variable "occur", so that in further calculations I can use the clause
GROUP BY company, occur
Therefore my desired output looks like that:
day company flag occur
2012-01-23 A 0 1
2012-01-24 A 0 1
2012-01-25 A 1 1
2012-01-25 B 0 1
2012-01-26 A 0 1
2012-01-26 B 0 1
2012-01-27 B 1 1
2012-01-30 B 0 1
2013-01-10 A 0 2
2013-01-11 A 0 2
2013-01-14 A 1 2
In the example, the company B only occurs once (occur = 1). But the company A occurs two times. For the first time from 2012-01-23 to 2012-01-26. And for the second time from 2013-01-10 to 2013-01-14. The second time range of company A does not consist of all four days surrounding the event day (-2,-1,0,1) since the company leaves the dataset before the end of that time range.
As I said I am working with business days. I don't care for holidays, I have data from monday to friday. Earlier I wrote the following function:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$BODY$
WITH alldates AS (
SELECT i,
$1 + (i * CASE WHEN $2 < 0 THEN -1 ELSE 1 END) AS date
FROM generate_series(0,(ABS($2) + 5)*2) i
),
days AS (
SELECT i, date, EXTRACT('dow' FROM date) AS dow
FROM alldates
),
businessdays AS (
SELECT i, date, d.dow FROM days d
WHERE d.dow BETWEEN 1 AND 5
ORDER BY i
)
-- adding business days to a date --
SELECT date FROM businessdays WHERE
CASE WHEN $2 > 0 THEN date >=$1 WHEN $2 < 0
THEN date <=$1 ELSE date =$1 END
LIMIT 1
offset ABS($2)
$BODY$
LANGUAGE 'sql' VOLATILE;
It can add/substract business days from a given date and works like that:
select * from addbusinessdays('2013-01-14',-2)
delivers the result 2013-01-10. So in Jakub's approach we can change the second and third last line to
w.day BETWEEN addbusinessdays(t1.day, -2) AND addbusinessdays(t1.day, 1)
and can deal with the business days.
Function
While using the function addbusinessdays(), consider this instead:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$func$
SELECT day
FROM (
SELECT i, $1 + i * sign($2)::int AS day
FROM generate_series(0, ((abs($2) * 7) / 5) + 3) i
) sub
WHERE EXTRACT(ISODOW FROM day) < 6 -- truncate weekend
ORDER BY i
OFFSET abs($2)
LIMIT 1
$func$ LANGUAGE sql IMMUTABLE;
Major points
Never quote the language name sql. It's an identifier, not a string.
Why was the function VOLATILE? Make it IMMUTABLE for better performance in repeated use and more options (like using it in a functional index).
(ABS($2) + 5)*2) is way too much padding. Replace with ((abs($2) * 7) / 5) + 3).
Multiple levels of CTEs were useless cruft.
ORDER BY in last CTE was useless, too.
As mentioned in my previous answer, extract(ISODOW FROM ...) is more convenient to truncate weekends.
Query
That said, I wouldn't use above function for this query at all. Build a complete grid of relevant days once instead of calculating the range of days for every single row.
Based on this assertion in a comment (should be in the question, really!):
two subsequent windows of the same firm can never overlap.
WITH range AS ( -- only with flag
SELECT company
, min(day) - 2 AS r_start
, max(day) + 1 AS r_stop
FROM tbl t
WHERE flag <> 0
GROUP BY 1
)
, grid AS (
SELECT company, day::date
FROM range r
,generate_series(r.r_start, r.r_stop, interval '1d') d(day)
WHERE extract('ISODOW' FROM d.day) < 6
)
SELECT *, sum(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING
AND 2 following) AS window_nr
FROM (
SELECT t.*, max(t.flag) OVER(PARTITION BY g.company ORDER BY g.day
ROWS BETWEEN 1 preceding
AND 2 following) in_window
FROM grid g
LEFT JOIN tbl t USING (company, day)
) sub
WHERE in_window > 0 -- only rows in [-2,1] window
AND day IS NOT NULL -- exclude missing days in [-2,1] window
ORDER BY company, day;
How?
Build a grid of all business days: CTE grid.
To keep the grid to its smallest possible size, extract minimum and maximum (plus buffer) day per company: CTE range.
LEFT JOIN actual rows to it. Now the frames for ensuing window functions works with static numbers.
To get distinct numbers per flag and company (window_nr), just count flags from the start of the grid (taking buffers into account).
Only keep days inside your [-2,1] windows (in_window > 0).
Only keep days with actual rows in the table.
Voilá.
SQL Fiddle.
Basically the strategy is to first enumarate the flag days and then join others with them:
WITH windows AS(
SELECT t1.day
,t1.company
,rank() OVER (PARTITION BY company ORDER BY day) as rank
FROM table1 t1
WHERE flag =1)
SELECT t1.day
,t1.company
,t1.flag
,w.rank
FROM table1 AS t1
JOIN windows AS w
ON
t1.company = w.company
AND
w.day BETWEEN
t1.day - interval '2 day' AND t1.day + interval '1 day'
ORDER BY t1.day, t1.company;
Fiddle.
However there is a problem with work days as those can mean whatever (do holidays count?).

MySQL AVG function for recent 15 records by date (order date desc) in every symbol

I am trying to create a statement in SQL (for a table which holds stock symbols and price on specified date) with avg of 5 day price and avg of 15 days price for each symbol.
Table columns:
symbol
open
high
close
date
The average price is calculated from last 5 days and last 15 days. I tried this for getting 1 symbol:
SELECT avg(close),
avg(`trd_qty`)
FROM (SELECT *
FROM cashmarket
WHERE symbol = 'hdil'
ORDER BY `M_day` desc
LIMIT 0,15 ) s
but I couldn't get the desired list for showing avg values for all symbols.
You can either do it with row numbers as suggested by astander, or you can do it with dates.
This solution will also take the last 15 days if you don't have rows for every day while the row number solution takes the last 15 rows. You have to decide which one works better for you.
EDIT: Replaced AVG, use CASE to avoid division by 0 in case no records are found within the period.
SELECT
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.close * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS close_5,
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.trd_qty * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS trd_qty_5,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.close * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS close_15,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.trd_qty * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS trd_qty_15
FROM
(
SELECT
cashmarket.*,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 15, 1, 0) AS is_15,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 5, 1, 0) AS is_5
FROM cashmarket
) c
The query returns the averages of close and trd_qty for the last 5 and the last 15 days. Current date is included, so it's actually today plus the last 4 days (replace < by <= to get current day plus 5 days).
Use:
SELECT DISTINCT
t.symbol,
x.avg_5_close,
y.avg_15_close
FROM CASHMARKET t
LEFT JOIN (SELECT cm_5.symbol,
AVG(cm_5.close) 'avg_5_close',
AVG(cm_5.trd_qty) 'avg_5_qty'
FROM CASHMARKET cm_5
WHERE cm_5.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 5 DAY) AND NOW()
GROUP BY cm_5.symbol) x ON x.symbol = t.symbol
LEFT JOIN (SELECT cm_15.symbol,
AVG(cm_15.close) 'avg_15_close',
AVG(cm_15.trd_qty) 'avg_15_qty'
FROM CASHMARKET cm_15
WHERE cm_15.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 15 DAY) AND NOW()
GROUP BY cm_15.symbol) y ON y.symbol = t.symbol
I'm unclear on what trd_qty is, or how it factors into your equation considering it isn't in your list of columns.
If you want to be able to specify a date rather than the current time, replace the NOW() with #your_date, an applicable variable. And you can change the interval values to suit, in case they should really be 7 and 21.
Have a look at How to number rows in MySQL
You can create the row number per item for the date desc.
What you can do is to retrieve the Rows where the rownumber is between 1 and 15 and then apply the group by avg for the selected data you wish.
trdqty is the quantity traded on particular day.
the days are not in order coz the market operates only on weekdays and there are holidays too so date may not be continuous