Oracle agregate by ID with time range - sql

I'm sure I saw it somewhere, but I cannot find it.
Given this table Historic:
ID1
ID2
Event_Date
Label
1
1
2020-01-01
1
1
1
2020-01-02
1
1
1
2020-01-04
1
1
1
2020-01-08
1
1
1
2020-01-20
1
1
1
2020-12-30
1
1
1
2020-01-01
0
1
1
2020-01-02
1
1
1
2020-01-04
0
1
1
2020-01-08
1
1
1
2020-01-20
0
1
1
2020-12-30
1
1
2
2020-01-01
1
1
2
2020-01-02
1
1
2
2020-01-04
1
2
1
2020-01-08
1
2
1
2020-01-20
1
2
1
2020-12-30
1
And the table startingpoint
ID1
ID2
Event_Date
1
1
2020-01-01
1
1
2020-01-02
1
1
2020-01-05
1
1
2020-01-08
1
1
2020-01-21
1
1
2021-01-01
1
1
2020-01-01
1
1
2020-01-03
1
1
2020-01-06
1
1
2020-01-11
1
1
2020-01-20
1
1
2020-12-31
1
2
2020-01-03
1
2
2020-01-05
1
2
2020-01-08
2
1
2020-01-08
2
1
2020-01-21
2
1
2021-01-01
For each row in startingpoint, compute the number of rows in historic with the same ID1 and ID2, where Event_Date in historic is between StartingPoint.Event_date - n days (I make it n so that I can change for different values) and StartingPoint.Event_date - 2 days. Then use the same rules to compute the fraction of rows with label = 1.
I know I can do this with a join , but if historic and startingpoint are very large, this looks very inefficient (for every row in startingpoint it will create a large join, and in the end it will sumarize the same set of rows many times repetadly). From an abstract point, it looks to me like it would be better to first sumarize historic for every ID1, ID2, Event_date, and the join with the startingpoint and select the best, but I'm open to other solutions.

You can try below solution with subquery:
select * ,(select count(*) from historic h where h.id1=s.id1 and h.id2=s.id2 and h.event_date between dateadd(day,-30,s.event_date) and dateadd(day,-2,s.event_date) )from startingpoint s

You have to have some form of join; either joining directly, or with a scalar subquery, which is probably not going to be as efficient.
The simplest way to do this is probably just a plain join, if you only want to see rows which have historic data:
select sp.id1, sp.id2, sp.event_date,
count(h.event_date) as any_label,
count(case when h.label = 1 then h.label end) as label_1,
count(case when h.label = 1 then h.label end) / count(h.event_date) as fraction_1
from startingpoint sp
join historic h on h.id1 = sp.id1
and h.id2 = sp.id2
and h.event_date >= sp.event_date - 10
and h.event_date < sp.event_date - 2
group by sp.id1, sp.id2, sp.event_date
order by sp.id1, sp.id2, sp.event_date;
where n is 10; which with your data would give you:
ID1 ID2 EVENT_DATE ANY_LABEL LABEL_1 FRACTION_1
--- --- ---------- --------- ------- --------------------
1 1 2020-01-05 4 3 .75
1 1 2020-01-06 4 3 .75
1 1 2020-01-08 6 4 .6666666666666666667
1 1 2020-01-11 8 6 .75
1 2 2020-01-05 2 2 1
1 2 2020-01-08 3 3 1
Or if you want to see zero counts, you can use an outer join; though then the fraction calculation needs some logic to avoid a divide-by-zero error:
select sp.id1, sp.id2, sp.event_date,
count(h.event_date) as any_label,
count(case when h.label = 1 then h.label end) as label_1,
case when count(h.event_date) > 0 then
count(case when h.label = 1 then h.label end) / count(h.event_date)
end as fraction_1
from startingpoint sp
left join historic h on h.id1 = sp.id1
and h.id2 = sp.id2
and h.event_date >= sp.event_date - 10
and h.event_date < sp.event_date - 2
group by sp.id1, sp.id2, sp.event_date
order by sp.id1, sp.id2, sp.event_date;
which gets:
ID1 ID2 EVENT_DATE ANY_LABEL LABEL_1 FRACTION_1
--- --- ---------- --------- ------- --------------------
1 1 2020-01-01 0 0
1 1 2020-01-02 0 0
1 1 2020-01-03 0 0
1 1 2020-01-05 4 3 .75
1 1 2020-01-06 4 3 .75
1 1 2020-01-08 6 4 .6666666666666666667
1 1 2020-01-11 8 6 .75
1 1 2020-01-20 0 0
1 1 2020-01-21 0 0
1 1 2020-12-31 0 0
1 1 2021-01-01 0 0
1 2 2020-01-03 0 0
1 2 2020-01-05 2 2 1
1 2 2020-01-08 3 3 1
2 1 2020-01-08 0 0
2 1 2020-01-21 0 0
2 1 2021-01-01 0 0
db<>fiddle

Related

Need YTD and MTD calculations in SQL

Date Amt ytd mtd
01-Jan-21 1 2 2
01-Jan-21 1 2 2
02-Jan-21 1 3 3
03-Jan-21 1 4 4
01-Feb-21 1 5 1
02-Feb-21 1 6 2
03-Feb-21 1 7 3
04-Feb-21 1 8 4
05-Feb-21 1 9 5
01-Mar-21 1 10 1
02-Mar-21 1 11 2
03-Mar-21 1 12 3
04-Mar-21 1 13 4
01-Apr-21 1 14 1
02-Apr-21 1 15 2
03-Apr-21 1 16 3
01-May-21 1 17 1
02-May-21 1 18 2
03-May-21 1 19 3
04-May-21 1 20 4
05-May-21 1 21 5
06-May-21 1 22 6
I have the first two columns (Date, Amt) and i need the YTD and MTD columns in MS SQL so that i can show the above table.
Seems like a rolling COUNT OVER was used to calculate the ytd & mtd in the Oracle source.
(Personally, I would prefere RANK or DENSE_RANK)
And since Oracle datestamps can be casted to a DATE as-is.
SELECT [Date], Amt
, ytd = COUNT(*) OVER (ORDER BY CAST([Date] AS DATE))
, mtd = COUNT(*) OVER (PARTITION BY EOMONTH(CAST([Date] AS DATE)) ORDER BY CAST([Date] AS DATE))
FROM your_table
ORDER BY CAST([Date] AS DATE)
Date
Amt
ytd
mtd
01-Jan-21
1
2
2
01-Jan-21
1
2
2
02-Jan-21
1
3
3
03-Jan-21
1
4
4
01-Feb-21
1
5
1
02-Feb-21
1
6
2
03-Feb-21
1
7
3
04-Feb-21
1
8
4
05-Feb-21
1
9
5
db<>fiddle here

Need help joining incremental data to a fact table in an incremental manor

TableA
ID
Counter
Value
1
1
10
1
2
28
1
3
34
1
4
22
1
5
80
2
1
15
2
2
50
2
3
39
2
4
33
2
5
99
TableB
StartDate
EndDate
2020-01-01
2020-01-11
2020-01-02
2020-01-12
2020-01-03
2020-01-13
2020-01-04
2020-01-14
2020-01-05
2020-01-15
2020-01-06
2020-01-16
TableC (output)
ID
Counter
StartDate
EndDate
Val
1
1
2020-01-01
2020-01-11
10
2
1
2020-01-01
2020-01-11
15
1
2
2020-01-02
2020-01-12
28
2
2
2020-01-02
2020-01-12
50
1
3
2020-01-03
2020-01-13
34
2
3
2020-01-03
2020-01-13
39
1
4
2020-01-04
2020-01-14
22
2
4
2020-01-04
2020-01-14
33
1
5
2020-01-05
2020-01-15
80
2
5
2020-01-05
2020-01-15
99
1
1
2020-01-06
2020-01-16
10
2
1
2020-01-06
2020-01-16
15
I am attempting to come up with some SQL to create TableC. What TableC is, it takes the data from TableB, in chronological order, and for each ID in tableA, it finds the next counter in the sequence, and assigns that to the Start/End date combination for that ID, and when it reaches the end of the counter, it will start back at 1.
Is something like this even possible with SQL?
Yes this is possible. Try to do the following:
Calculate maximal value for Counter in TableA using SELECT MAX(Counter) ... into max_counter.
Add identifier row_number to each row in TableB so it will be able to find matching Counter value using SELECT ROW_NUMBER() OVER() ....
Establish relation between row number in TableB and Counter in TableA like this ... FROM TableB JOIN TableA ON (COALESCE(NULLIF(TableB.row_number % max_counter = 0), max_counter)) = TableA.Counter.
Then gather all these queries using CTE (Common Table Expression) into one query as official documentation shows.
Consider below approach
select id, counter, StartDate, EndDate, value
from tableA
join (
select *, mod(row_number() over(order by StartDate) - 1, 5) + 1 as counter
from tableB
)
using (counter)
if applied to sample data in your question - output is

Distinguish the first rows where a given column's value changes in a grouped result

I want to create a select query in SQL Server where I group the rows by a column (BaseId) and also order them by Status, RTime and Version. I want to add a column "isFirst" that has the value 1 if the BaseId value is the first in the group, and 0 if it's not.
My sample table:
Table name: Head
Id BaseId Name RTime Status Version
2 2 abc 04-12 12:34 1 1
3 3 xyz 04-12 13:10 9 1
4 2 abc 04-13 14:25 0 2
5 3 xyz 04-14 12:34 0 2
6 3 xyz 04-14 13:10 9 3
7 3 xyz 04-16 14:25 1 4
8 2 abc 04-16 17:40 1 3
9 9 sql 04-17 02:23 9 1
10 9 sql 04-17 07:31 0 2
Expected result:
isFirst Id BaseId Name RTime Status Version
1 10 9 sql 04-17 07:31 0 2
0 9 9 sql 04-17 02:23 9 1
1 5 3 xyz 04-14 12:34 0 2
0 7 3 xyz 04-16 14:25 1 4
0 6 3 xyz 04-14 13:10 9 3
0 3 3 xyz 04-12 13:10 9 1
1 4 2 abc 04-13 14:25 0 2
0 8 2 abc 04-16 17:40 1 3
0 2 2 abc 04-12 12:34 1 1
My query now looks like this:
SELECT *
FROM Head
ORDER BY BaseId desc, Status, RTime desc, Version desc
I think I should use CASE to create the isFirst column, but I've had no luck so far. Anyone could help me?
You can use row_number() and a case expression:
select
case when row_number() over(
partition by BaseId
order by Status, RTime desc, Version desc
) = 1
then 1
else 0
end isFirst,
h.*
from head h
order by BaseId desc, Status, RTime desc, Version desc

How to count one or more values and display them in a column.

I'm just starting to get a hang of SQL but I've tried to find an answer to this but with no avail.
I have a table that kinda looks like this:
date base version prod_id place
2016-01-02 1 1 22 home
2016-01-02 1 1 22 home
2016-01-02 1 1 1 home
2016-01-02 1 1 1 store
2016-01-02 1 1 22 store
2016-01-02 1 1 2 store
2016-01-02 1 1 2 web
2016-01-02 1 1 24 web
2016-01-02 1 1 1 web
2016-01-02 1 2 24 home
2016-01-02 1 2 22 home
2016-01-02 1 2 22 store
2016-01-02 1 2 1 store
2016-01-02 1 2 2 web
2016-01-03 1 1 22 home
2016-01-03 1 1 22 home
2016-01-03 1 1 1 home
2016-01-03 1 1 24 store
2016-01-03 1 2 24 store
2016-01-03 1 2 22 web
2016-01-03 1 2 1 web
2016-01-03 1 2 2 web
2016-01-03 1 2 1 web
I'm trying to do a query that gives me this
date base version place 1,2 22,24 Total
2016-01-02 1 1 home 1 2 3
2016-01-02 1 1 store 2 1 3
2016-01-02 1 1 web 2 1 3
2016-01-02 1 2 home 2 0 2
2016-01-02 1 2 store 1 1 2
2016-01-02 1 2 web 1 0 1
2016-01-03 1 1 home 2 1 3
2016-01-03 1 1 store 0 1 1
2016-01-03 1 2 store 0 1 1
2016-01-03 1 2 web 3 0 3
So to put it in words: I'm trying to group and count the occurances of each value in prod_id and put them in an column. And its not only one value that needs to be counted and grouped together, sometimes two or more. So in this example i've added up all occurances of 1 and 2 on a specific date, version and place and so also 22 and 24. Do I make any sense?
Nice problem. You can count a complicated CASE.
The query
SELECT
prod_id,
CASE WHEN prod_id IN (1,2) THEN 1 ELSE 0 AS v1Or2,
CASE WHEN prod_id IN (22,24) THEN 1 ELSE 0 AS v22or24
FROM table1
This gives you something you can usefully SUM
prod_id v1or2 v22or24
22 0 1
22 0 1
1 1 0
1 1 0
22 0 1
2 1 0
2 1 0
With the appropriate GROUP BY you have...
SELECT date,place,base,version,
SUM(CASE WHEN prod_id IN (1,2) THEN 1 ELSE 0) AS v1Or2,
SUM(CASE WHEN prod_id IN (22,24) THEN 1 ELSE 0) AS v22or24
FROM table1
GROUP BY date,place,base,version
Select [Date]
,Base
,[version]
,place
,SUM(Case when prod_id IN (1,2) THEN 1 ELSE 0 END) AS [1,2]
,SUM(Case when prod_id IN (22,24) THEN 1 ELSE 0 END) AS [22,24]
,Count(*) AS Total
From TableName
GROUP BY [Date] ,Base ,[version],place
You can solve this by combining the sum aggregate function with case expressions to do the counts based on what value the prod_id has. This technique is sometimes referred to as conditional aggregation.
Try this:
select
date, base, version, place,
sum(case when prod_id in (1,2) then 1 else 0 end) as "1,2",
sum(case when prod_id in (22,24) then 1 else 0 end) as "22,24",
count(prod_id) as Total
from your_table
group by date, base, version, place
order by date, base, version;

T-SQL recursion

I have a set of data that looks like below
Name Time Perc Group Mode Control Cancelled
A 10:52 10.10 10 0 1 0
B 09:00 10.23 10 1 1 1
C 12:02 12.01 12 0 1 1
D 10:45 12.12 12 1 7 1
E 12:54 12.56 12 1 3 0
F 01:01 13.90 13 0 11 1
G 02:45 13.23 13 1 12 1
H 09:10 13.21 13 1 1 0
I need an output like below;
Group Perc Cancelled
10 20.33 1
12 36.69 2
13 40.34 2
What I'm getting was something like;
Group Perc Cancelled
10 20.33 5
12 36.69 5
13 40.34 5
I don't know what to call this, I have something in my mind to call it like CTE?, but I really can't figure it out.
Here's my source;
SELECT Group, SUM(Perc), Cancelled FROM
(SELECT Group, Perc, (SELECT COUNT(*) FROM tblName WHERE Cancelled=1) AS Cancelled FROM tblName WHERE 1=1 AND Group>=10)dt
GROUP BY Group, Cancelled
From your example, you don't need the nested query, any recursion, etc...
SELECT
Group,
SUM(Perc) AS total_perc,
SUM(cancelled) AS total_cancelled
FROM
tblName
WHERE
1=1
AND Group >= 10
GROUP BY
Group
If you did have some different data, then you might want to use something like...
SUM(CASE WHEN cancelled > 0 THEN 1 ELSE 0 END) AS total_cancelled