Partition SQL WINDOW function on certain criteria - sql

I am trying to calculate a running total of the AddToCart metric that only starts after a 'product/search/details' page was seen.
Here's the link to SQL Fiddle: http://sqlfiddle.com/#!15/bbf9b/1
In the sqlfiddle link, I've manually created a column to reflect my desiredoutput. The workingoutput column shows where I have gotten to with my code.
SUM(AddToCart) OVER (PARTITION BY SessionID ORDER BY HitNumber ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as workingoutput
I know the below syntax is all wrong, but this is essentially what I am trying to achieve
SUM(AddToCart) OVER (PARTITION BY SessionID ORDER BY HitNumber ROWS BETWEEN UNBOUNDED PRECEDING AND FIRST_VALUE(ROW LIKE "%/product/search/details%")) as workingoutput

You need to nest your window functions here
Start with a running conditional count, checking if we have reached /product/search/details yet, and only return AddToCart based on that
Do a running sum over that result
SELECT
wd.SessionID,
wd.HitNumber,
wd.HitType,
wd.EventType,
wd.PageName,
wd.AddToCart,
SUM(wd.AddToCartFromSearch) OVER (PARTITION BY wd.SessionID
ORDER BY HitNumber ROWS UNBOUNDED PRECEDING) AS DesiredOutput
FROM (
SELECT *,
CASE WHEN COUNT(CASE WHEN wd.PageName = '/product/search/details' THEN 1 END)
OVER (PARTITION BY wd.SessionID ORDER BY HitNumber ROWS UNBOUNDED PRECEDING) > 0
THEN AddToCart ELSE 0 END AS AddToCartFromSearch
FROM WebData wd
) wd
ORDER BY HitNumber;
SQL Fiddle

Related

How to calculate total 6 months in Oracle SQL?

I try this SQL statement:
SELECT
custodycd,
SUM(mramt) mramt_6_month,
txdate,
CASE
WHEN LAG(mramt, 6) OVER (ORDER BY txdate) IS NOT NULL
THEN SUM(mramt) OVER (ORDER BY txdate ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING)
ELSE NULL
END AS mramt_6_month_1
FROM
(SELECT
MAX(mramt) mramt,
t.afacctno,
t.custodycd,
t.txdate
FROM
tbl_mr3007_log t
WHERE
txdate >= '30/nov/2020'
AND mramt <> 0
GROUP BY
t.afacctno,
t.txdate,
t.custodycd)
GROUP BY
custodycd,
txdate
ORDER BY
txdate
and I get an error:
ORA-00979: not a GROUP BY expression
Thanks for your help
The error message seems pretty clear. However, what you want to do is not clear.
I am guessing that for each custodycd you have multiple rows by date. Starting at the seventh row, you want the sum of the previous six rows.
If so, then the code looks like:
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY custodycd ORDER BY txdate) > 6
THEN SUM(SUM(mramt)) OVER (PARTITION BY custodycd
ORDER BY txdate
ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
)
END) AS mramt_6_month_1

Grouping Column Without Breaking The Sequence

The main objective is to group the rows following Amount Column sequentially so that, if there is any different value between the 2 same values, they will be numbered separately.
This is the raw data here:
SELECT Area, DateA, DateB, Amount
FROM (VALUES
('ABC', '2019-08-18', '2019-08-18 00:07:47.000', 3.75),
('ABC','2019-08-19', '2019-08-19 00:08:47.000', 3.75),
('ABC','2019-08-20', '2019-08-20 00:09:47.000', 3.65),
('ABC','2019-08-21', '2019-08-21 00:09:57.000', 3.75))
AS FeeCollection(Area, DateA, DateB, Amount)
I've tried this but, I don't know the real matter to number in a special way.
DENSE_RANK() OVER(ORDER BY Area, Amount)
This is the sample result I want to achieve. I'm looking for simple logic to do it. Using cursor or while looping will not be efficient for me.
I believe this is what you want. I use LAG to get the value of the prior row in a CTE, and then use a windowed COUNT to reduce the value of ROW_NUMBER by 1 for each row with the same consecutive value for amount:
WITH CTE AS(
SELECT Area,
DateA,
DateB,
Amount,
LAG(Amount) OVER (PARTITION BY Area ORDER BY DateA) AS PrevAmount
FROM (VALUES
('ABC', '2019-08-18', '2019-08-18 00:07:47.000', 3.75),
('ABC','2019-08-19', '2019-08-19 00:08:47.000', 3.75),
('ABC','2019-08-20', '2019-08-20 00:09:47.000', 3.65),
('ABC','2019-08-21', '2019-08-21 00:09:57.000', 3.75))
AS FeeCollection(Area, DateA, DateB, Amount))
SELECT Area,
DateA,
DateB,
Amount,
ROW_NUMBER() OVER (PARTITION BY Area ORDER BY DateA) -
COUNT(CASE Amount WHEN PrevAmount THEN 1 END) OVER (PARTITION BY Area ORDER BY DateA
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Number
FROM CTE
ORDER BY DateA;
I did assume your PARTITION BY clause, which you may need to change/remove/move to the ORDER BY. As we had only one value for Area was impossible to know what the value should be when it changes.
I would do this using lag() and a cumulative sum, but looking like:
select t.*,
sum(case when prev_amount = amount then 0 else 1 end) over
(partition by area order by datea) as number
from (select t.*,
lag(amount) over (partition by area order by datea) as prev_amount
from t
) t;

Reset rolling sum to 0 after reaching the threshold

I'm trying to compute a running total and reset it to 0 based on 2 conditions or if the limit is reached.
Here is an example.
As in the image above, I need to get the running total while the following conditions are met:
monthly discount = 0 and monthly ticket=1
If one of discount=1 and ticket=0, the next value for running total has to be 0.
running_total<50
If running total>=50, the value for running total has to start from the value on the same row.
Here is what I'm trying to do now:
Is there any possibility to do this in HIVE? Thank you so much!!!
SELECT * ,
SUM(tag_flg) OVER (PARTITION BY account, flg_sum
ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS running_sum
FROM
( SELECT * ,
SUM(CASE
WHEN tag_flg>=50 THEN value
ELSE tag_flg
END) OVER (PARTITION BY account
ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS flg_sum
FROM
( SELECT * ,
CASE
WHEN month_disc =0
AND month_ticket = 1 THEN value
ELSE 0
END AS tag_flg
FROM source_table) x) y
Do the 40, 60 and 20 that aren't being accounted for matter at all in your report? Like would you want them to be counted then a new row added with a total of 0 to restart?
Here is the way I managed to do it:
SELECT *,
SUM(case when month_disc=1 OR month_ticket=0 then 0 else value end) OVER (PARTITION BY account, flg_sum, band_sum ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM (
SELECT *,
FLOOR(SUM(case when month_disc=1 OR month_ticket=0 then 0 else value end) OVER (PARTITION BY account, flg_sum ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)/50.000001) as band_sum ---- create bands for running total
FROM (
SELECT *,
SUM(tag_flg) OVER (PARTITION BY account ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flg_sum
FROM (
SELECT *,
CASE WHEN (month_disc=1 OR month_ticket=0) THEN 1 ELSE 0 END AS tag_flg ---- flag to count when the value is reset due to one of the conditions
FROM source_table) x ) y) z

SUM OVER with GROUP BY

I am working on a large database with millions of rows and I am trying to be efficient in my queries. The database contains regular snapshots of a loan portfolio where sometimes loans default (status goes from '1' to <>'1'). When they do, they appear only once in the corresponding snapshot, then they are no longer reported. I am trying to get a cumulative count of such loans - as they develop over time and divided into many buckets depending on country of origin, vintage, etc.
SUM (...) OVER seems to be a very efficient function to achieve the result but when I run the following query
Select
assetcountry, edcode, vintage, aa25 as inclusionYrMo, poolcutoffdate, aa74 as status,
AA16 AS employment, AA36 AS product, AA48 AS newUsed, aa55 as customerType,
count(1) as Loans, sum(aa26) as OrigBal, sum(aa27) as CurBal,
SUM(count(1)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as LoanCountCumul,
SUM(aa27) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as CurBalCumul,
SUM(aa26) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as OrigBalCumul
from myDatabase
where aa22>='2014-01' and aa22<='2014-12' and vintage='2015' and active=0 and aa74<>'1'
group by assetcountry, edcode, vintage, aa25, aa74, aa16, aa36, aa48, aa55, poolcutoffdate
order by poolcutoffdate
I get
SQL Error (8120) column aa27 is invalid in the selected list because it is not contained in either an aggregate function or the GROUP BY clause
Can anyone shed some light? Thanks
I believe you want:
Select assetcountry, edcode, vintage, aa25 as inclusionYrMo, poolcutoffdate, aa74 as status,
AA16 AS employment, AA36 AS product, AA48 AS newUsed, aa55 as customerType,
count(1) as Loans, sum(aa26) as OrigBal, sum(aa27) as CurBal,
SUM(count(1)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as LoanCountCumul,
SUM(SUM(aa27)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as CurBalCumul,
SUM(SUM(aa26)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as OrigBalCumul
from myDatabase
where aa22 >= '2014-01' and aa22 <= '2014-12' and vintage = '2015' and
active = 0 and aa74 <> '1'
group by assetcountry, edcode, vintage, aa25, aa74, aa16, aa36, aa48, aa55, poolcutoffdate
order by poolcutoffdate;
Note the SUM(SUM()) in the cumulative sum expressions.
This is what I found to be working, comparing my results with some external research data.
I have simplified the fields for readability:
select
poolcutoffdate,
count(1) as LoanCount,
MAX(sum(case status when 'default' then 1 else 0 end))
over (order by poolcutoffdate
ROWS between unbounded preceding AND CURRENT ROW) as CumulDefaults
from myDatabase
group by poolcutoffdate
order by poolcutoffdate asc
I am thus counting all loans that have been in the 'default' status at least once from inception to the current cutoff date.
Note the use of MAX(SUM()) so that the result is the largest of the various iteration from the first to the current row. Using SUM(SUM()) would add the various iterations leading to a cumulative of cumulatives.
I considered using SUM(SUM()) with "PARTITION BY poolcutoffdate" so that the tally restarts from 0 and does not add from the previous cutoff date but this would only include loans from the latest cutoff so if a loan had defaulted and removed from the pool it would wrongly not be counted.
Note the CASE in the OVER statement.
Thanks for all the help

Calculate moving weather stats in PostgreSQL

I'm trying to calculate the days since last rain and the amount of rain in that event for each day in my PostgreSQL table of weather data. I've been trying to achieve this with window functions but the limitation of ranges having to be unbounded has left me a bit stuck on how to proceed.
Here's the query I have so far:
SELECT
station_num,
ob_date,
rain,
max(rain) OVER (PARTITION BY station_num ORDER BY ob_date ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as prev_rain_mm,
'' as days_since_rain --haven't attempted this calculation yet
FROM
obs_daily_ground_moisture
This results in the following:
but I'm trying to achieve something more like this:
I feel like all the pieces are there in regards to window functions range & filter and nested queries but I'm not sure how to pull it all together. Also the above data is just a subset of the actual dataset, the entire dataset is just over half a million rows.
The key here is to group the observations starting from the first occurrence of rain>0 value to the next occurrence of rain>0 value. Thereafter you can use window functions to calculate the needed columns.
select
x.station_num,
x.ob_date,
max(rain) over(partition by station_num,col) prev_rain,
case when rain > 0 then 0
else row_number() over(partition by station_num, col order by ob_date)-1 end days_since_rain
from (select t.*,
sum(case when rain > 0 then 1 else 0 end) over(partition by station_num order by ob_date) col
from t) x
Sample Demo
try this.
DECLARE #Rain AS FLOAT
UPDATE A
SET
#Rain = CASE WHEN A.Rain = 0 THEN #Rain ELSE A.Rain END,
A.Rain = CASE WHEN #Rain IS NULL OR A.Rain <> 0 THEN A.Rain ELSE #Rain END
FROM obs_daily_ground_moisture A
SELECT ob_date, Rain,
max(rain) OVER (PARTITION BY station_num ORDER BY ob_date ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as prev_rain_mm,
ROW_NUMBER() OVER(PARTITION BY Rain ORDER BY ob_date) - 1 as days_since_rain
FROM obs_daily_ground_moisture ORDER BY ob_date