Get Running Total of IDs in Presto SQL - sql

I would like to keep a running/cumulative array of new IDs.
Starting with this:
Date
IDs_Used_Today
New_IDs
Dec 6
1, 2, 3
1, 2, 3
Dec 7
1, 4
4
Dec 8
2, 3, 4
3
Dec 9
1, 2, 3, 5
5
And getting this:
Date
IDs_Used_Today
New_IDs
All_IDs_To_Date
Dec 6
1, 2, 3
1, 2, 3
1, 2, 3
Dec 7
1, 4
4
1, 2, 3, 4
Dec 8
2, 3, 4
null
1, 2, 3, 4
Dec 9
1, 2, 3, 5
5
1, 2, 3, 4, 5
I need to do this by getting the values for "All_IDs_To_Date" from previous "All_IDs_To_Date" + "New_IDs"
by doing it that way, the table will always be accurate as long as there is one previous row of data.
So basically a combination of CONCAT( LAG(All_IDs_To_Date), New_IDs) with an IF conditional when there is no LAG(ALL_IDs_To_Date) then use that date's "New_IDs" value.
It is very important that if old rows are deleted, the most current rows keep the same data. Meaning if I start with 10 rows stored, with the last running total being "1,2,3,4,5" and then I delete the first 9 rows. My next calculation would be based off that last stored row, so my running total would still be adding to the "1,2,3,4,5" that was previously stored.

Once you have unnested every element of "New_IDs", you can select the first time each element appears, then use ARRAY_AGG window function to compute a running array aggregation over your date. ARRAY_REMOVE is needed to remove null values, generated by days without new ids.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY elements ORDER BY date_) AS rn
FROM tab, UNNEST(New_IDs) AS elements
)
SELECT DISTINCT date_, ids_used_today, new_ids,
ARRAY_REMOVE(ARRAY_AGG(CASE WHEN rn = 1 THEN elements END) OVER(ORDER BY date_), NULL) AS All_IDs_To_Date
FROM cte
ORDER BY date_
Check the demo here.

Related

Oracle SQL - Add numbers separated by delimiter, columnwise

I have multiple rows with values like
a_b_c_d_e_f and x_y_z_m_n_o
and I need a SQL query with a result like a+x_b+y_c+z_d+m.......
Sample data as requested
What I am willing to do is aggregate it at Datetime..aggregating Total is simple, but how can I do that for the last column, thanks.
Expected Result
Here's one option; read comments within code. I didn't feel like typing too much so two dates will have to do.
Sample data (you already have that & don't type it. Code you need begins at line #10):
SQL> with
2 -- sample data
3 test (datum, total, col) as
4 (select date '2020-07-20', 100, '10,0,20,30,0' from dual union all
5 select date '2020-07-20', 150, '15,3,40,30,2' from dual union all
6 --
7 select date '2020-07-19', 200, '50,6,50,30,8' from dual union all
8 select date '2020-07-19', 300, '20,1,40,10,2' from dual
9 ),
Split CSV values into rows. Note the RB value which will help us sum matching values
10 -- split comma-separated values into rows
11 temp as
12 (select
13 datum,
14 total,
15 to_number(regexp_substr(col, '\d+', 1, column_value)) val,
16 column_value rb
17 from test cross join
18 table(cast(multiset(select level from dual
19 connect by level <= regexp_count(col, ',') + 1
20 ) as sys.odcinumberlist))
21 ),
Computing summaries is simple; nothing special about it. We'll keep the RB value as it'll be needed in the last step:
22 -- compute summaries
23 summary as
24 (select datum,
25 sum(total) total,
26 sum(val) sumval,
27 rb
28 from temp
29 group by datum, rb
30 )
The last step. Using LISTAGG, aggregate comma-separated values back, but this time added to each other:
31 -- final result
32 select datum,
33 total,
34 listagg(sumval, ',') within group (order by rb) new_col
35 from summary
36 group by datum, total
37 order by datum desc, total;
DATUM TOTAL NEW_COL
------------------- ---------- --------------------
20.07.2020 00:00:00 250 25,3,60,60,2
19.07.2020 00:00:00 500 70,7,90,40,10
SQL>

Get id based on MAX(Date) for each user [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 4 years ago.
I'm trying to query for last read report and the date it was read.
UserReport
UserId, ReportId, DateRead
1, 2, 2018-01-01
1, 1, 2015-02-12
2, 3, 2016-03-11
3, 2, 2017-04-10
1, 3, 2016-01-01
2, 1, 2018-02-02
So to get for a specific user I can do a query like this:
SELECT TOP 1 *
FROM UserReport
WHERE UserId = 1
ORDER BY DateRead DESC
But I'm having troubles figuring out how to do this for each user. What is throwing me off is TOP 1
Expected Result:
UserId, ReportId, DateRead
1, 2, 2018-01-01
2, 1, 2018-02-02
3, 2, 2017-04-10
You could use:
SELECT TOP 1 WITH TIES *
FROM UserReport
ORDER BY ROW_NUMBER() OVER(PARTITION BY UserId ORDER BY DateRead DESC)

how to get sum across rows in a group based on condition in oracle

I have a dataset:
type month id1 id2 id3 value
history jan-17 1 2 3 10
future jan-17 1 2 3 15
history feb 1 2 3 12
history march 1 2 3 11
future march 1 2 3 14
I want to get value for each month based on some calculation and based on the value of type column.
For eg : the output should look like this:
month id1 id2 id3 value
JAN-17 1 2 3 15(future value of jan) + 0(as future value of feb is not present)+ 14(take the future value of march)
FEB-17 1 2 3 10(history value of jan)+14(take the future value of march)
MAR-17 1 2 3 10(history value of jan)+12(history value of feb)+11(history value of mar)
The calculation is based on the quarter number of each month in a year.
If it is the first month of a quarter, take the future value of first month + future of 2nd month + future value of 3rd month
If the month is 2nd month of a quarter, take the history value of 1st month + future value of 2nd month + future value of 3rd month
If the month is 3rd month of a quarter, take the history value of 1st month + history value of 2nd month + future value of 3rd month .
I have tried partitioning the dataset based on month id1, id2, id3, but it does not give me the expected result .
Your desired output contradict the rules which you wrote further:
If it is the first month of a quarter, take the future value of first month + future of 2nd month + future value of 3rd month
If the month is 2nd month of a quarter, take the history value of 1st month + future value of 2nd month + future value of 3rd month
If the month is 3rd month of a quarter, take the history value of 1st month + history value of 2nd month + future value of 3rd month
I implemented these rules, because they have more clear logic and are easier to implement:
with t1 as (select 'history' type from dual union all select 'future' type from dual),
d as (select add_months(date '2017-01-01', rownum - 1) month, 1 id1, 2 id2, 3 id3 from dual connect by level < 4),
t2 as (select * from t1, d),
source (type, month, id1, id2, id3, value) as (
select 'history', 'jan-17', 1, 2, 3, 10 from dual union all
select 'future', 'jan-17', 1, 2, 3, 15 from dual union all
select 'history', 'feb-17', 1, 2, 3, 12 from dual union all
select 'history', 'mar-17', 1, 2, 3, 11 from dual union all
select 'future', 'mar-17', 1, 2, 3, 14 from dual)
select to_char(month, 'mon-yy') mon, id1, id2, id3, history, future,
sum(history) over (partition by to_char(month, 'Q') order by month) - history +
sum(future) over (partition by to_char(month, 'Q') order by month desc) value
from (select t2.type, t2.month, nvl(s.id1, t2.id1) id1, nvl(s.id2, t2.id2) id2, nvl(s.id3, t2.id3) id3, nvl(value, 0) value
from t2 left join source s on s.type = t2.type and s.month = to_char(t2.month, 'mon-yy'))
pivot (sum(value) for type in ('history' history, 'future' future))
order by month
How it works:
subqueries t1, t2 and d are used to generate full list of months and types
then join them with source data and make a pivot
calculate running sum of "futures" and "histories" in different directions
then combine these running sums together minus current history value, what gives us exactly what we need:
MON ID1 ID2 ID3 HISTORY FUTURE VALUE
------------ ---------- ---------- ---------- ---------- ---------- ----------
jan-17 1 2 3 10 15 29
feb-17 1 2 3 12 0 24
mar-17 1 2 3 11 14 36
This solution could be applied to any period of time, data will be calculated for each quarter separately.
I have no idea what to do with id1,id2,id3 so i choose to ignore it, Better you explain the corresponding logic for it. Will it always be the same value for history,future of a particular month?
with
x as
(select 'hist' type, To_Date('JAN-2017','MON-YYYY') ym , 10 value from dual union all
select 'future' type, To_Date('JAN-2017','MON-YYYY'), 15 value from dual union all
select 'future' type, To_Date('FEB-2017','MON-YYYY'), 1 value from dual),
y as
(select * from x Pivot(Sum(Value) For Type in ('hist' as h,'future' as f))),
/* Pivot for easy lag,lead query instead of working with rows..*/
z as
(
select ym,sum(h) H,sum(f) F from (
Select y.ym,y.H,y.F from y
union all
select add_months(to_Date('01-JAN-2017','DD-MON-YYYY'),rownum-1) ym, 0 H, 0 F
from dual connect by rownum <=3 /* depends on how many months you are querying...
so this dual adds the corresponding missing 0 records...*/
) group by ym
)
select
ym,
Case
When MOD(Extract(Month from YM),3) = 1
Then F + Lead(F,1) Over(Order by ym) + Lead(F,2) Over(Order by ym)
When MOD(Extract(Month from YM),3) = 2
Then Lag(H,1) Over(Order by ym) + F + Lead(F,1) Over(Order by ym)
When MOD(Extract(Month from YM),3) = 3
Then Lag(H,2) Over(Order by ym) + Lag(H,1) Over(Order by ym) + F
End Required_Value
from z

How to Dense Rank Sets of data

I am trying to get a dense rank to group sets of data together. In my table I have ID, GRP_SET, SUB_SET, and INTERVAL which simply represents a date field. When records are inserted using an ID they get inserted as GRP_SETs of 3 rows shown as a SUB_SET. As you can see when inserts happen the interval can change slightly before it finishes inserting the set.
Here is some example data and the DRANK column represents what ranking I'm trying to get.
with q as (
select 1 id, 'a' GRP_SET, 1 as SUB_SET, 123 as interval, 1 as DRANK from dual union all
select 1, 'a', 2, 123, 1 from dual union all
select 1, 'a', 3, 124, 1 from dual union all
select 1, 'b', 1, 234, 2 from dual union all
select 1, 'b', 2, 235, 2 from dual union all
select 1, 'b', 3, 235, 2 from dual union all
select 1, 'a', 1, 331, 3 from dual union all
select 1, 'a', 2, 331, 3 from dual union all
select 1, 'a', 3, 331, 3 from dual)
select * from q
Example Data
ID GRP_SET SUBSET INTERVAL DRANK
1 a 1 123 1
1 a 2 123 1
1 a 3 124 1
1 b 1 234 2
1 b 3 235 2
1 b 2 235 2
1 a 1 331 3
1 a 2 331 3
1 a 3 331 3
Here is the query I Have that gets close but I seem to need something like a:
Partition By: ID
Order within partition by: ID, Interval
Change Rank when: ID, GRP_SET (change)
select
id, GRP_SET, SUB_SET, interval,
DENSE_RANK() over (partition by ID order by id, GRP_SET) as DRANK_TEST
from q
Order by
id, interval
Using the MODEL clause
Behold for you are pushing your requirements beyond the limits of what is easy to express in "ordinary" SQL. But luckily, you're using Oracle, which features the MODEL clause, a device whose mystery is only exceeded by its power (excellent whitepaper here). You shall write:
SELECT
id, grp_set, sub_set, interval, drank
FROM (
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)
MODEL PARTITION BY (id)
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
MEASURES (grp_set, sub_set, interval, drank)
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)
Proof on SQLFiddle
Explanation:
SELECT
id, grp_set, sub_set, interval, drank
FROM (
-- Here, we initialise your "dense rank" to 1
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)
-- Then we partition the data set by ID (that's your requirement)
MODEL PARTITION BY (id)
-- We generate row numbers for all columns ordered by interval and sub_set,
-- such that we can then access row numbers in that particular order
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
-- These are the columns that we want to generate from the MODEL clause
MEASURES (grp_set, sub_set, interval, drank)
-- And the rules are simple: Each "dense rank" value is equal to the
-- previous "dense rank" value + 1, if the grp_set value has changed
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)
Of course, this only works if there are no interleaving events, i.e. there is no other grp_set than a between 123 and 124
This might work for you. The complicating factor is that you want the same "DENSE RANK" for intervals 123 and 124 and for intervals 234 and 235. So we'll truncate them to the nearest 10 for purposes of ordering the DENSE_RANK() function:
SELECT id, grp_set, sub_set, interval, drank
, DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval, -1), grp_set ) AS drank_test
FROM q
Please see SQL Fiddle demo here.
If you want the intervals to be even closer together in order to be grouped together, then you can multiply the value before truncating. This would group them by 3s (but maybe you don't need them so granular):
SELECT id, grp_set, sub_set, interval, drank
, DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval*10/3, -1), grp_set ) AS drank_test
FROM q

Query the Minimum Value per day within a month's worth of data

I have two sets of pricing data (A and B). Set A consists of all of my pricing data per order over a month. Set B consists of all of my competitor's pricing data over the same month. I want to compare my competitor's lowest price to each of my prices per day.
Graphically, the data appears like this:
Date:-- Set A: -- Set B:
1---------25---------31
1---------54---------47
1---------23---------56
1---------12---------23
1---------76---------40
1---------42
I want pass only the lowest price to a case statement which evaluates which prices are better. I would like to process an entire month's worth of data all at one time, so in my example, Dates 1 thru 30(1) would be included and crunched all at once, and for each day, there would only be one value from set B included: the lowest price in the set.
Important notes: Set B does not have a datapoint for each point in Set A
Hopefully this makes sense. Thanks in advance for any help you may be able to render.
That's a strange example you have - do you really have prices ranging from 12 to 76 within a single day?
Anyway, left joining your (grouped) data with their (grouped) data should work (untested):
with
my_prices as (
select price_date, min(price_value) min_price from my_prices group by price_date),
their_prices as (
select price_date, min(price_value) min_price from their_prices group by price_date)
select
mine.price_date,
(case
when theirs.min_price is null then mine.min_price
when theirs.min_price >= mine.min_price then mine.min_price
else theirs.min_price
end) min_price
from
my_min_prices mine
left join their_prices theirs on mine.price_date = theirs.price_date
I'm still not sure that I understand your requirements. My best guess is that you want something like
SQL> ed
Wrote file afiedt.buf
1 with your_data as (
2 select 1 date_id, 25 price_a,31 price_b from dual
3 union all
4 select 1, 54, 47 from dual union all
5 select 1, 23, 56 from dual union all
6 select 1, 12, 23 from dual union all
7 select 1, 76, 40 from dual union all
8 select 1, 42, null from dual)
9 select date_id,
10 sum( case when price_a < min_price_b
11 then 1
12 else 0
13 end) better,
14 sum( case when price_a = min_price_b
15 then 1
16 else 0
17 end) tie,
18 sum( case when price_a > min_price_b
19 then 1
20 else 0
21 end) worse
22 from( select date_id,
23 price_a,
24 min(price_b) over (partition by date_id) min_price_b
25 from your_data )
26* group by date_id
SQL> /
DATE_ID BETTER TIE WORSE
---------- ---------- ---------- ----------
1 1 1 4