Fill in gap with prior record value having a populated quantity LIMIT: no analytics can be used - sql

Assume data with structure like this: Demo
WITH CAL AS(
SELECT 2022 YR, '01' PERIOD UNION ALL
SELECT 2022 YR, '02' PERIOD UNION ALL
SELECT 2022 YR, '03' PERIOD UNION ALL
SELECT 2022 YR, '04' PERIOD UNION ALL
SELECT 2022 YR, '05' PERIOD UNION ALL
SELECT 2022 YR, '06' PERIOD UNION ALL
SELECT 2022 YR, '07' PERIOD UNION ALL
SELECT 2022 YR, '08' PERIOD UNION ALL
SELECT 2022 YR, '09' PERIOD UNION ALL
SELECT 2022 YR, '10' PERIOD UNION ALL
SELECT 2022 YR, '11' PERIOD UNION ALL
SELECT 2022 YR, '12' PERIOD ),
Data AS (
SELECT 2022 YR, '01' PERIOD, 10 qty UNION ALL
SELECT 2022 YR, '02' PERIOD, 5 qty UNION ALL
SELECT 2022 YR, '04' PERIOD, 10 qty UNION ALL
SELECT 2022 YR, '05' PERIOD, 7 qty UNION ALL
SELECT 2022 YR, '09' PERIOD, 1 qty)
SELECT *
FROM CAL A
LEFT JOIN data B
on A.YR = B.YR
and A.Period = B.Period
WHERE A.Period <10 and A.YR = 2022
ORDER by A.period
Giving us:
+------+--------+------+--------+-----+
| YR | PERIOD | YR | PERIOD | qty |
+------+--------+------+--------+-----+
| 2022 | 01 | 2022 | 01 | 10 |
| 2022 | 02 | 2022 | 02 | 5 |
| 2022 | 03 | | | |
| 2022 | 04 | 2022 | 04 | 10 |
| 2022 | 05 | 2022 | 05 | 7 |
| 2022 | 06 | | | |
| 2022 | 07 | | | |
| 2022 | 08 | | | |
| 2022 | 09 | 2022 | 09 | 1 |
+------+--------+------+--------+-----+
With Expected result of:
+------+--------+------+--------+-----+
| YR | PERIOD | YR | PERIOD | qty |
+------+--------+------+--------+-----+
| 2022 | 01 | 2022 | 01 | 10 |
| 2022 | 02 | 2022 | 02 | 5 |
| 2022 | 03 | 2022 | 03 | 5 | -- SQL derives
| 2022 | 04 | 2022 | 04 | 10 |
| 2022 | 05 | 2022 | 05 | 7 |
| 2022 | 06 | 2022 | 06 | 7 | -- SQL derives
| 2022 | 07 | 2022 | 07 | 7 | -- SQL derives
| 2022 | 08 | 2022 | 08 | 7 | -- SQL derives
| 2022 | 09 | 2022 | 09 | 1 |
+------+--------+------+--------+-----+
QUESTION:
How would one go about filling in the gaps in period 03, 06, 07, 08 with a record quantity referencing the nearest earlier period/year. Note example is limited to a year, but gap could be on period 01 of 2022 and we would need to return 2021 period 12 quantity if populated or keep going back until quantity is found, or no such record exists.
LIMITS:
I am unable to use table value functions. (No lateral, no Cross Apply)
I'm unable to use analytics (no lead/lag)
correlated subqueries are iffy.
Why the limits? this must be done in a HANA graphical calculation view. Which supports neither of those concepts. I've not done enough to know how to do a correlated subquery at this time to know if it's possible.
I can create any number of inline views or materialized datasets needed.
STATISTICS:
this table has over a million rows and grows at a rate of productlocationperiodsyears. so if you have 100020126=1.4 mil+ in 6 years with just 20 locations and 1000 products...
each product inventory may be recorded at at the end of a month for a given location. (no activity for product/location, no record hence a gap. Silly mainframe save storage technique used in a RDBMS... I mean how do I know the system just didn't error on inserting the record for that material; or omit it for some reason... )
In the cases where it is not recorded, we need to fill in the gap. The example provided is broken down to the bear bones without location and material as I do not believe it is not salient to a solution.
ISSUE:
I'll need to convert the SQL to a "HANA Graphical calculation view"
Yes, I know I could create a SQL Script to do this. This is not allowed.
Yes, I know I could create a table function to do this. This is not allowed.
This must be accomplished though Graphical calculation view which supports basic SQL functions
BASIC Joins (INNER, OUTER, FULL OUTER, Cross), filters, aggregation, a basic rank at a significant performance impact if all records are evaluated. (few other things) but not window functions, not cross Join, lateral...
as to why it has to do with maintenance and staffing. The staffed area is a reporting area who uses tools to create views used in universes. The area wishes to keep all Scripts out of use to keep cost for employees lower as SQL knowledge wouldn’t be required for future staff positions, though it helps!
For those familiar this issue is sourced from MBEWH table in an ECC implementation

This can be done with graphical calculation views in SAP HANA.
It's not pretty and probably not very efficient, though.
Whether or not the persons that are supposedly able to maintain graphical calc. views but not SQL statement will be able to successfully maintain this is rather questionable.
First, the approach in SQL, so that the approach becomes clear:
create column table calendar
( yr integer
, period nvarchar (2)
, primary key (yr, period))
insert into calendar
( select year (generated_period_start) as yr
, ABAP_NUMC( month(generated_period_start), 2) as period
from series_generate_date ('INTERVAL 1 MONTH', '2022-01-01', '2023-01-01'));
create column table data
( yr integer
, period nvarchar (2)
, qty integer
, primary key (yr, period));
insert into data values (2022, '01', 10);
insert into data values (2022, '02', 5);
insert into data values (2022, '04', 10);
insert into data values (2022, '05', 7);
insert into data values (2022, '09', 1);
SELECT *
FROM CALendar A
LEFT JOIN data B
on A.YR = B.YR
and A.Period = B.Period
WHERE A.Period <'10' and A.YR =2022
ORDER BY A.period;
/*
YR PERIOD YR PERIOD QTY
2,022 01 2,022 01 10
2,022 02 2,022 02 5
2,022 03 ? ? ?
2,022 04 2,022 04 10
2,022 05 2,022 05 7
2,022 06 ? ? ?
2,022 07 ? ? ?
2,022 08 ? ? ?
2,022 09 2,022 09 1
*/
The NUMC() function creates ABAP NUMC strings (with leading zeroes) from integers. Other than this it's pretty much the tables from OP.
The general approach is to use the CALENDAR table as the main driving table that establishes for which dates/periods there will be output rows.
This is outer joined with the DATA table, leaving "missing" rows with NULL in the corresponding columns.
Next, the DATA table is joined again, this time with YEAR||PERIOD combinations that are strictly smaller then the YEAR||PERIOD from the CALENDAR table. This gives us rows for all the previous records in DATA.
Next, we need to pick which of the previous rows we want to look at.
This is done via the ROWNUM() function and a filter to the first record.
As graphical calculation views don't support ROWNUM() this can be exchanged with RANK() - this works as long as there are no two actual DATA records for the same YEAR||PERIOD combination.
Finally, in the projection we use COALESCE to switch between the actual information available in DATA and - if that is NULL - the previous period information.
/*
CAL_YR CAL_PER COALESCE(DAT_YR,PREV_YR) COALESCE(DAT_PER,PREV_PER) COALESCE(DAT_QTY,PREV_QTY)
2,022 01 2,022 01 10
2,022 02 2,022 02 5
2,022 03 2,022 02 5
2,022 04 2,022 04 10
2,022 05 2,022 05 7
2,022 06 2,022 05 7
2,022 07 2,022 05 7
2,022 08 2,022 05 7
2,022 09 2,022 09 1
*/
So far, so good.
The graphical calc. view for that looks like this:
As it's cumbersome to screenshoot every single node, I will include the just most important ones:
1. CAL_DAT_PREV
Since only equality joins are supported in graphical calc. views we have to emulate the "larger than" join. For that, I created to calculated/constant columns join_const with the same value (integer 1 in this case) and joined on those.
2. PREVS_ARE_OLDER
This is the second part of the emulated "larger than" join: this projection simply filters out the records where cal_yr_per is larger or equal than prev_yr_per. Equal values must be allowed here, since we don't want to loose records for which there is no smaller YEAR||PERIOD combination. Alternatively, one could insert an intial record into the DATA table, that is guranteed to be smaller than all other entries, e.g. YEAR= 0001 and PERIOD=00 or something similar. If you're familiar with SAP application tables, then you've seen this approach.
By the way - for convenience reasons, I created calculated columns that combine the YEAR and PERIOD for the different tables - cal_yr_per, dat_yr_per, and prev_yr_per.
3. RANK_1
Here the rank is created for PREV_YR_PR, picking the first one only, and starting a new group for every new value fo cal_yr_per.
This value is returned via Rank_Column.
4. REDUCE_PREV
The final piece of the puzzle: using a filter on Rank_Column = 1 we ensure to only get one "previous" row for every "calendar" row.
Also: by means of IF(ISNULL(...), ... , ...) we emulate COALESCE(...) in three calculated columns, aptly named FILL....
And that's the nuts and bolts of this solution.
"It's works on my computer!" is probably the best I can say about it.
SELECT "CAL_YR", "CAL_PERIOD"
, "DAT_YR", "DAT_PER", "DAT_QTY"
, "FILL_YR", "FILL_QTY", "FILL_PER"
FROM "_SYS_BIC"."scratch/QTY_FILLUP"
ORDER BY "CAL_YR" asc, "CAL_PERIOD" asc;
/*
CAL_YR CAL_PERIOD DAT_YR DAT_PER DAT_QTY FILL_YR FILL_QTY FILL_PER
2,022 01 2,022 01 10 2,022 10 01
2,022 02 2,022 02 5 2,022 5 02
2,022 03 ? ? ? 2,022 5 02
2,022 04 2,022 04 10 2,022 10 04
2,022 05 2,022 05 7 2,022 7 05
2,022 06 ? ? ? 2,022 7 05
2,022 07 ? ? ? 2,022 7 05
2,022 08 ? ? ? 2,022 7 05
2,022 09 2,022 09 1 2,022 1 09
2,022 10 ? ? ? 2,022 1 09
2,022 11 ? ? ? 2,022 1 09
2,022 12 ? ? ? 2,022 1 09
*/

Related

Average of last 3 months (SQL vertica)

I need to find the average days past due for the last 3 months for each client. Not as a rolling/moving average, but one time number, always calculating the last 3 months, no matter if the data changes.
For example now the last data I have is from Sept 2022, so I need the average of Sept 2022, August 2022 and July 2022. But if the data changes and now I have October 2022, then I would need average of Oct, Sept, August and so on.
I tried this but it calculates wrong
CREATE TABLE AVERAGE_dpd
AS (
SELECT "SUM_WEIGHTED_AVG_PERMONTH"."NAME",
AVG("SUM_WEIGHTED_AVG_PERMONTH"."SUM")
OVER (PARTITION BY "SUM_WEIGHTED_AVG_PERMONTH"."NAME"
order by MONTH ("SUM_WEIGHTED_AVG_PERMONTH"."LAST DAY OF MONTH_NETDUEDATE") desc
rows between 2 preceding and CURRENT ROW)
as AVG3Months
FROM "SUM_WEIGHTED_AVG_PERMONTH");
Thank you so much for your help!
I assume you are expecting this result:
client_name | avg_last_3mth
-------------+--------------
Client 01 | 16.76
Client 02 | 5.75
Client 03 | -13.95
So I assume you have something like this as input data (and this is how we usually like the data to accompany the question):
month_begin | client_name | dpd
------------+-------------+----
2022-01-05 | Client 01 | 12
2022-01-06 | Client 01 | 14
2022-01-07 | Client 01 | 18
2022-01-08 | Client 01 | 17
2022-01-05 | Client 02 | 12
2022-01-06 | Client 02 | 14
2022-01-07 | Client 02 | 18
2022-01-08 | Client 02 | 17
2022-01-05 | Client 03 | 12
2022-01-06 | Client 03 | 14
2022-01-07 | Client 03 | 18
2022-01-08 | Client 03 | 17
With this input data, you probably want the rows with month_begin of the first of the current month (TRUNC(CURRENT_DATE,'MONTH')), plus the two previous months. And this is what I do, then I obviously group by the client name:
WITH
-- input data I made up, don't use in query ..
dpd(month_begin,client_name,dpd) AS (
SELECT DATE '2022-05-01','Client 01',12
UNION ALL SELECT DATE '2022-06-01','Client 01',14
UNION ALL SELECT DATE '2022-07-01','Client 01',18
UNION ALL SELECT DATE '2022-08-01','Client 01',17
UNION ALL SELECT DATE '2022-05-01','Client 02', 2
UNION ALL SELECT DATE '2022-06-01','Client 02', 4
UNION ALL SELECT DATE '2022-07-01','Client 02', 8
UNION ALL SELECT DATE '2022-08-01','Client 02', 7
UNION ALL SELECT DATE '2022-05-01','Client 03',22
UNION ALL SELECT DATE '2022-06-01','Client 03',24
UNION ALL SELECT DATE '2022-07-01','Client 03',28
UNION ALL SELECT DATE '2022-08-01','Client 03',27
)
-- real query starts here ..
SELECT
client_name
, AVG(dpd)::NUMERIC(5,2) AS avg_last_3mth
FROM dpd
WHERE month_begin >= TRUNC(CURRENT_DATE,'MONTH') - '2 MONTHS'::INTERVAL YEAR TO MONTH
GROUP BY
client_name;
-- out client_name | avg_last_3mth
-- out -------------+---------------
-- out Client 02 | 6.33
-- out Client 01 | 16.33
-- out Client 03 | 26.33

Calculate running sum of previous 3 months from monthly aggregated data

I have a dataset that I have aggregated at monthly level. The next part needs me to take, for every block of 3 months, the sum of the data at monthly level.
So essentially my input data (after aggregated to monthly level) looks like:
month
year
status
count_id
08
2021
stat_1
1
09
2021
stat_1
3
10
2021
stat_1
5
11
2021
stat_1
10
12
2021
stat_1
10
01
2022
stat_1
5
02
2022
stat_1
20
and then my output data to look like:
month
year
status
count_id
3m_sum
08
2021
stat_1
1
1
09
2021
stat_1
3
4
10
2021
stat_1
5
8
11
2021
stat_1
10
18
12
2021
stat_1
10
25
01
2022
stat_1
5
25
02
2022
stat_1
20
35
i.e 3m_sum for Feb = Feb + Jan + Dec. I tried to do this using a self join and wrote a query along the lines of
WITH CTE AS(
SELECT date_part('month',date_col) as month
,date_part('year',date_col) as year
,status
,count(distinct id) as count_id
FROM (date_col, status, transaction_id) as a
)
SELECT a.month, a.year, a.status, sum(b.count_id) as 3m_sum
from cte as a
left join cte as b on a.status = b.status
and b.month >= a.month - 2 and b.month <= a.month
group by 1,2,3
This query NEARLY works. Where it falls apart is in Jan and Feb. My data is from August 2021 to Apr 2022. The means, the value for Jan should be Nov + Dec + Jan. Similarly for Feb it should be Dec + Jan + Feb.
As I am doing a join on the MONTH, all the months of Aug - Nov are treated as being values > month of jan/feb and so the query isn't doing the correct sum.
How can I adjust this bit to give the correct sum?
I did think of using a LAG function, but (even though I'm 99% sure a month won't ever be missed), I can't guarantee we will never have a month with 0 values, and therefore my LAG function will be summing the wrong rows.
I also tried doing the same join, but at individual date level (and not aggregating in my nested query) but this gave vastly different numbers, as I want the sum of the aggregation and I think the sum from the individual row was duplicated a lot of stuff I do a COUNT DISTINCT on to remove.
You can use a SUM with a window frame of 2 PRECEDING. To ensure you don't miss rows, use a calendar table and left-join all the results to it.
SELECT *,
SUM(a.count_id) OVER (ORDER BY c.year, c.month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM Calendar c
LEFT JOIN a ON a.year = c.year AND a.month = c.month
WHERE c.year >= 2021 AND c.year <= 2022;
db<>fiddle
You could also use LAG but you would need it twice.
It should be #Charlieface's answer - only that I get one different result than you put in your expected result table:
WITH
-- your input - and I avoid keywords like "MONTH" or "YEAR"
-- and also identifiers starting with digits are forbidden -
indata(mm,yy,status,count_id,sum_3m) AS (
SELECT 08,2021,'stat_1',1,1
UNION ALL SELECT 09,2021,'stat_1',3,4
UNION ALL SELECT 10,2021,'stat_1',5,8
UNION ALL SELECT 11,2021,'stat_1',10,18
UNION ALL SELECT 12,2021,'stat_1',10,25
UNION ALL SELECT 01,2022,'stat_1',5,25
UNION ALL SELECT 02,2022,'stat_1',20,35
)
SELECT
*
, SUM(count_id) OVER(
ORDER BY yy,mm
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sum_3m_calc
FROM indata;
-- out mm | yy | status | count_id | sum_3m | sum_3m_calc
-- out ----+------+--------+----------+--------+-------------
-- out 8 | 2021 | stat_1 | 1 | 1 | 1
-- out 9 | 2021 | stat_1 | 3 | 4 | 4
-- out 10 | 2021 | stat_1 | 5 | 8 | 9
-- out 11 | 2021 | stat_1 | 10 | 18 | 18
-- out 12 | 2021 | stat_1 | 10 | 25 | 25
-- out 1 | 2022 | stat_1 | 5 | 25 | 25
-- out 2 | 2022 | stat_1 | 20 | 35 | 35

SQL: The second oldest date

Imagine you've got a table similar to this:
|email | purchase_date |
|:--------------|:---------------------|
|stan#gmail.com | Jun 30 2020 12:00AM |
|stan#gmail.com | Aug 05 2020 5:00PM |
|stan#gmail.com | Mar 22 2018 3:00AM |
|eric#yahoo.com | Aug 05 2020 5:00PM |
|eric#yahoo.com | Mar 22 2018 3:00PM |
|kyle#gmail.com | Mar 22 2018 3:00PM |
|kyle#gmail.com | Jun 30 2020 12:00AM |
|kyle#gmail.com | Aug 05 2020 5:00PM |
|kenny#gmail.com| Aug 05 2020 5:00PM |
Totally random. The actual database I work with is actually more complex with much more columns.
Both the columns are STRING type. Which is not convenient. The purchase date should be DATE type. Kenny made only one purchase, so there shouldn't be any row for him in the result table.
Also notice that a there's a lot of identical dates.
I'd like to select the email and the 2nd oldest purchase date (named as 'second_purchase') for each email address, so that the result looks like this:
|email | second_purchase |
|:--------------|:-------------------- |
|stan#gmail.com | Jun 30 2020 12:00AM |
|eric#yahoo.com | Aug 05 2021 5:00PM |
|kyle#gmail.com | Jun 30 2020 12:00AM |
I can't seem to get the logic or syntax right. I don't want to put all my codes in here, because I've tried many variations of my idea...
It didn't seem to work somehow. But I'd love to see an example code from someone skilled in SQL. My idea is maybe not that great..:-)
This version is actually SOQL (Salesforce Object Query Language). That could be important.
Sorry for not styling the table properly, I didn't seem to work either, even when I used the recommended styling. I wasn't able to post. That was actually quite frustrating.
Anyway, thank you for any help!
You could try the following sql which uses a dense_rank over each user's email and orders by a casted purchase_date
Query #1
WITH date_converted_table AS (
SELECT
email,
purchase_date,
DENSE_RANK() OVER (
PARTITION BY email
ORDER BY CAST(purchase_date as timestamp) ASC
) dr
FROM
mytable
)
SELECT
email,
purchase_date as second_purchase
FROM
date_converted_table
WHERE dr=2;
email
second_purchase
eric#yahoo.com
Aug 05 2020 5:00PM
kyle#gmail.com
Jun 30 2020 12:00AM
stan#gmail.com
Jun 30 2020 12:00AM
Query #2
SELECT
email,
purchase_date as second_purchase
FROM (
SELECT
email,
purchase_date,
DENSE_RANK() OVER (
PARTITION BY email
ORDER BY CAST(purchase_date as timestamp) ASC
) dr
FROM
mytable
) tb
WHERE dr=2;
email
second_purchase
eric#yahoo.com
Aug 05 2020 5:00PM
kyle#gmail.com
Jun 30 2020 12:00AM
stan#gmail.com
Jun 30 2020 12:00AM
View on DB Fiddle
Update 1
As it pertains to follow up question in comment:
Is it possible to upgrade the result so that there are first_purchase
dates (where dr=1) adn second_purchase dates (where dr=2) in separate
columns?
A case expression and aggregation may assist you as shown below. The having clause ensures that there is both a first and second purchase date.
SELECT
email,
MAX(CASE
WHEN dr=1 THEN purchase_date
END) as first_purchase,
MAX(CASE
WHEN dr=2 THEN purchase_date
END) as second_purchase
FROM (
SELECT
email,
purchase_date,
DENSE_RANK() OVER (
PARTITION BY email
ORDER BY CAST(purchase_date as timestamp) ASC
) dr
FROM
mytable
) tb
GROUP BY email
HAVING
SUM(
CASE WHEN dr=1 THEN 1 ELSE 0 END
) > 0 AND
SUM(
CASE WHEN dr=2 THEN 1 ELSE 0 END
) > 0;
email
first_purchase
second_purchase
eric#yahoo.com
Mar 22 2018 3:00PM
Aug 05 2020 5:00PM
kyle#gmail.com
Mar 22 2018 3:00PM
Jun 30 2020 12:00AM
stan#gmail.com
Mar 22 2018 3:00AM
Jun 30 2020 12:00AM
View on DB Fiddle
Let me know if this works for you.

Adding rows, running count, running sum to query results

I have a table with the following ddl.
CREATE TABLE "LEDGER"
("FY" NUMBER,
"FP" VARCHAR2(20 BYTE),
"FUND" VARCHAR2(20 BYTE),
"TYPE" VARCHAR2(2 BYTE),
"AMT" NUMBER
)
The table contains the following data.
REM INSERTING into LEDGER
SET DEFINE OFF;
Insert into LEDGER (FY,FP,FUND,TYPE,AMT) values (15,'03','A','03',1);
Insert into LEDGER (FY,FP,FUND,TYPE,AMT) values (15,'04','A','03',2);
Insert into LEDGER (FY,FP,FUND,TYPE,AMT) values (16,'04','A','03',3);
Insert into LEDGER (FY,FP,FUND,TYPE,AMT) values (12,'05','A','04',6);
based on the partition of fy,fp,fund and type I would like to write a query to keep a running count from the beginning of fp(fp though it is a varchar it represents a number in the month. i.E 2 equals february and 3 equals march etc.) to a hard number of 14. So taking a closer look at the data you will notice that in FY 15 the max period is 04 so i must add another 10 periods to my report to get my report to have the full 14 periods. here is the expected output.
here is what i tried, but I'm just simply stumbling all together on this.
WITH fy_range AS
(
SELECT MIN (fy) AS min_fy
, MAX (fy) AS max_fy
FROM ledger
),all_fys AS
(
SELECT min_fy + LEVEL - 1 AS fy
FROM fy_range
CONNECT BY LEVEL <= max_fy + 1 - min_fy
)
,all_fps AS
(
SELECT TO_CHAR (LEVEL, 'FM00') AS fp
FROM dual
CONNECT BY LEVEL <= 14
)
SELECT
FUND
,G.TYPE
,G.FY
,G.FP
,LAST_VALUE(G.AMT ignore nulls) OVER (PARTITION BY G.FUND ORDER BY Y.FY P.FP ) AS AMT
FROM all_fys y
CROSS JOIN all_fps p
LEFT OUTER JOIN LEDGER G PARTITION BY(FUND)
ON g.fy = y.fy
AND g.fp = p.fp;
but I end up with a bunch of nulls and some strange results.
This may not be the most efficient solution, but it is easy to understand and maintain. First (in the most deeply nested subquery) we find the min FP for each combination of FY, FUND and TYPE. Then we use a CONNECT BY query to fill all the FP for all FY, FUND, TYPE combinations (up to the hard upper limit of 14). Then we left-outer-join to the original data in the LEDGER table. So far we densified the data. In the final query (the join) we also add the column for the cumulative sum - that part is easy after we densified the data.
TYPE is an Oracle keyword, so it is probably best not to use it as a column name. It is also best not to use double-quoted table and column names (I had to use upper case everywhere because of that). I also made sure to convert from varchar2 to number and back to varchar2 - we shouldn't rely on implicit conversions.
select S.FY, to_char(S.FP, 'FM09') as FP, S.FUND, S.TYPE,
sum(L.AMT) over (partition by S.FY, S.FUND, S.TYPE order by S.FP) as CUMULATIVE_AMT
from (
select FY, MIN_FP + level - 1 as FP, FUND, TYPE
from (
select FY, min(to_number(FP)) as MIN_FP, FUND, TYPE
from LEDGER
group by FY, FUND, TYPE
)
connect by level <= 15 - MIN_FP
and prior FY = FY
and prior FUND = FUND
and prior TYPE = TYPE
and prior sys_guid() is not null
) S left outer join LEDGER L
on S.FY = L.FY and S.FP = L.FP and S.FUND = L.FUND and S.TYPE = L.TYPE
;
Output:
FY FP FUND TYPE CUMULATIVE_AMT
--- --- ---- ---- --------------
12 05 A 04 6
12 06 A 04 6
12 07 A 04 6
12 08 A 04 6
12 09 A 04 6
12 10 A 04 6
12 11 A 04 6
12 12 A 04 6
12 13 A 04 6
12 14 A 04 6
15 03 A 03 1
15 04 A 03 3
15 05 A 03 3
15 06 A 03 3
15 07 A 03 3
15 08 A 03 3
15 09 A 03 3
15 10 A 03 3
15 11 A 03 3
15 12 A 03 3
15 13 A 03 3
15 14 A 03 3
16 04 A 03 3
16 05 A 03 3
16 06 A 03 3
16 07 A 03 3
16 08 A 03 3
16 09 A 03 3
16 10 A 03 3
16 11 A 03 3
16 12 A 03 3
16 13 A 03 3
16 14 A 03 3

SQL find rows in groups where a column has a null and a non-null value

The Data
row ID YEAR PROD STA DATE
01 01 2011 APPLE NEW 2011-11-18 00:00:00.000
02 01 2011 APPLE NEW 2011-11-18 00:00:00.000
03 01 2013 APPLE OLD NULL
04 01 2013 APPLE OLD NULL
05 02 2013 APPLE OLD 2014-04-08 00:00:00.000
06 02 2013 APPLE OLD 2014-04-08 00:00:00.000
07 02 2013 APPLE OLD 2014-11-17 10:50:14.113
08 02 2013 APPLE OLD 2014-11-17 10:46:04.947
09 02 2013 MELON OLD 2014-11-17 11:01:19.657
10 02 2013 MELON OLD 2014-11-17 11:19:35.547
11 02 2013 MELON OLD NULL
12 02 2013 MELON OLD 2014-11-21 10:32:36.017
13 03 2006 APPLE NEW 2007-04-11 00:00:00.000
14 03 2006 APPLE NEW 2007-04-11 00:00:00.000
15 04 2004 APPLE OTH 2004-09-27 00:00:00.000
16 04 2004 APPLE OTH NULL
ROW is not a column in the table. Is just to show which records i want.
The question
I need to find rows where a group consisting of (ID, YEAR, PROD, STA) has at least one NULL DATE and a non-NULL DATE.
Expected result
From the above dataset this would be rows 9 to 12 and 15 to 16
Im sitting in front od SSMS and have no idea how to get this. Thinking about group by and exists but really no idea.
You can use COUNT ... OVER:
SELECT ID, YEAR, PROD, STA, [DATE]
FROM (
SELECT ID, YEAR, PROD, STA, [DATE],
COUNT(IIF([DATE] IS NULL, 1, NULL)) OVER
(PARTITION BY ID, YEAR, PROD, STA) AS cnt_nulls,
COUNT(IIF([DATE] IS NOT NULL, 1, NULL)) OVER
(PARTITION BY ID, YEAR, PROD, STA) AS cnt_not_nulls
FROM mytable) AS t
WHERE t.cnt_nulls > 0 AND t.cnt_not_nulls > 0
The window version of COUNT is applied twice over ID, YEAR, PROD, STA partitions of data: it returns for every row the population of the current partition. The count is conditionally performed:
the first COUNT counts the number of NULL [Date] values within the partition
the second COUNT counts the number of NOT NULL [Date] values within the partition.
The outer query checks for partitions having a count of at least one for both of the two COUNT functions of the inner query.