SQL distinct window frames of given length (time period) - sql

I have a table with several index-dates and I need to build groups of rows with maximum interval between first and last row not exceeding a specific value.
A reproducible example looks like this:
create myTable (id CHAR(1), rownr INTEGER, index DATE);
insert into myTable values ('A', 1, '2018-01-01');
insert into myTable values ('A', 2, '2018-08-01');
insert into myTable values ('A', 3, '2019-03-04');
insert into myTable values ('A', 4, '2019-09-09');
insert into myTable values ('A', 5, '2020-02-15');
insert into myTable values ('B', 1, '2020-12-31');
insert into myTable values ('B', 2, '2021-03-04');
insert into myTable values ('B', 3, '2022-01-01');
insert into myTable values ('B', 4, '2022-02-20');
I need a new Variable index_grp where the index date of all rows of the group is within 1 year. First row which index-date is farther then 1 year from current minimum index-date starts a new group (window). The desired result looks like this:
I tried windows moving functions:
select id, rownr, index
, min(index) over(partition by id order by index range between interval '1' year preceding and current row) as index_grp
from myTable;
but obviously index_grp of row A3 would be 2018-08-01 so the windows would overlap.
In Teradata-SQL there is a neat little addon to the analytical function, called "reset when", to start a new window when a given condition is met.
However, I need a ANSI-SQL solution.

Related

Contiguous Date Functionality: SAP HANA

We are trying to find continuous date from table
The expected output is attached in the image below:
Expected Output
create column table "PS_CMP_TIME_ANALYTICS"."Temp2" (
"ID" integer,
"Period" date);
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (4, '2010-04-03');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (5, '2010-04-07');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (2, '2010-04-10');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (3, '2010-04-15');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (6, '2010-04-16');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (7, '2010-04-17');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (3, '2010-04-22');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (4, '2010-04-24');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (7, '2010-04-30');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (2, '2010-05-01');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (5, '2010-05-02');
INSERT INTO "PS_CMP_TIME_ANALYTICS"."Temp2" VALUES (3, '2010-05-03');
The query we tried in SAP HANA and output we are getting is mentioned below:
SELECT MIN("Period") AS BeginRange,
MAX("Period") AS EndRange
FROM (
SELECT "Period",
--DATEDIFF(D, ROW_NUMBER() OVER(ORDER BY "Period"), "Period") AS DtRange
cast(ROW_NUMBER() OVER(ORDER BY "Period") as date) as xyz,
days_between(to_date(cast(ROW_NUMBER() OVER(ORDER BY "Period") as
Date),'YYYY-MM-DD'), "Period") AS DtRange
FROM "PS_CMP_TIME_ANALYTICS"."Temp2") AS dt
GROUP BY DtRange;
But we are not getting the output as expected find the attached out we got using SAP HANA SQL The END date should be changed
Our Output
How can we achieve in SAP HANA SQL
You Can below Code in SAP HANA
SELECT MIN("Period") as "Start_Date",MAX("Period") as "End_Date",
(days_between(Min("Period"),Max("Period")) + 1) as "Days"
FROM
(select "Period",add_days("Period" ,- ROW_NUMBER() OVER(ORDER BY "Period"))rn
from "SchemaName"."Temp2")
GROUP BY rn
Actually you are looking for upper bound of data islands in your ordered table columns which is similar to finding lower limits of gaps
An other solution can be following SQLScript where I used SQL LEAD() function to calculate the next value in date field
with cte as (
select
"ID", "Period", LEAD("Period", 1) over (order by "Period") NextDate
from "Temp2"
)
select "ID", "Period"
from cte
where IFNULL(DAYS_BETWEEN("Period",NextDate),9) > 1
I also used SQL CTE expressions which might me a new syntax for many HANA developers coming from ABAP programming experience.
But I use it a lot and is especially more powerful with multiple CTE in single SELECT statements.
Output is as follows for the above SQLScript query

PL/SQL update all records except with max value

Please help with SQL query. I've got a table:
CREATE TABLE PCDEVUSER.tabletest
(
id INT PRIMARY KEY NOT NULL,
name VARCHAR2(64),
pattern INT DEFAULT 1 NOT NULL,
tempval INT
);
Let's pretend it was filled with values:
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (1, 'A', 1, 10);
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (2, 'A', 1, 20);
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (3, 'A', 2, 10);
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (5, 'A', 2, 20);
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (4, 'A', 2, 30);
And I need to update all records (grouped by pattern) with NO MAX value TEMPVALUE. So as result I have to update records with Ids (1, 3, 5). Records with IDs (2, 4) has max values in there PATTERN group.
HELP PLZ
This select statement will help you get the IDs you need :
SELECT
*
FROM
(SELECT
id
,name
,pattern
,tempval
,MAX(tempval) OVER (PARTITION BY pattern) max_tempval
FROM
tabletest
)
WHERE 1=1
AND tempval != max_tempval
;
You should be able to build an update statement around that easily enough
Something like this:
update tabletest t
set ????
where t.tempval < (select max(tempval) from tabletest tt where tt.pattern = t.pattern);
It is unclear what values you want to set. The ???? is for the code that sets the values.

SQL merge statement with multiple conditions

I have a requirement with some business rules to implement on SQL (within a PL/SQL block): I need to evaluate such rules and according to the result perform the corresponding update, delete or insert into a target table.
My database model contains a "staging" and a "real" table. The real table stores records inserted in the past and the staging one contains "fresh" data coming from somewhere that needs to be merged into the real one.
Basically these are my business rules:
Delta between staging MINUS real --> Insert rows into the real
Delta between real MINUS staging--> Delete rows from the real
Rows which PK is the same but any other fields different: Update.
(Those "MINUS" will compare ALL the fields to get equality and distinguise the 3rd case)
I haven't figured out the way to accomplish such tasks without overlapping between rules by using a merge statement: Any suggestion for the merge structure? Is it possible to do it all together within the same merge?
Thank you!
If I understand you task correctly following code should do the job:
--drop table real;
--drop table stag;
create table real (
id NUMBER,
col1 NUMBER,
col2 VARCHAR(10)
);
create table stag (
id NUMBER,
col1 NUMBER,
col2 VARCHAR(10)
);
insert into real values (1, 1, 'a');
insert into real values (2, 2, 'b');
insert into real values (3, 3, 'c');
insert into real values (4, 4, 'd');
insert into real values (5, 5, 'e');
insert into real values (6, 6, 'f');
insert into real values (7, 6, 'g'); -- PK the same but at least one column different
insert into real values (8, 7, 'h'); -- PK the same but at least one column different
insert into real values (9, 9, 'i');
insert into real values (10, 10, 'j'); -- in real but not in stag
insert into stag values (1, 1, 'a');
insert into stag values (2, 2, 'b');
insert into stag values (3, 3, 'c');
insert into stag values (4, 4, 'd');
insert into stag values (5, 5, 'e');
insert into stag values (6, 6, 'f');
insert into stag values (7, 7, 'g'); -- PK the same but at least one column different
insert into stag values (8, 8, 'g'); -- PK the same but at least one column different
insert into stag values (9, 9, 'i');
insert into stag values (11, 11, 'k'); -- in stag but not in real
merge into real
using (WITH w_to_change AS (
select *
from (select stag.*, 'I' as action from stag
minus
select real.*, 'I' as action from real
)
union (select real.*, 'D' as action from real
minus
select stag.*, 'D' as action from stag
)
)
, w_group AS (
select id, max(action) as max_action
from w_to_change
group by id
)
select w_to_change.*
from w_to_change
join w_group
on w_to_change.id = w_group.id
and w_to_change.action = w_group.max_action
) tmp
on (real.id = tmp.id)
when matched then
update set real.col1 = tmp.col1, real.col2 = tmp.col2
delete where tmp.action = 'D'
when not matched then
insert (id, col1, col2) values (tmp.id, tmp.col1, tmp.col2);

How to group rows by their DATEDIFF?

I hope you can help me.
I need to display the records in HH_Solution_Audit table -- if 2 or more staffs enter the room within 10 minutes. Here are the requirements:
Display only the events that have a timestamp (LAST_UPDATED) interval of less than or equal to 10 minutes. Therefore, I must compare the current row to the next row and previous row to check if their DATEDIFF is less than or equal to 10 minutes. I’m done with this part.
Show only the records if the number of distinct STAFF_GUID inside the room for less than or equal to 10 minutes is at least 2.
HH_Solution_Audit Table Details:
ID - PK
STAFF_GUID - staff id
LAST_UPDATED - datetime when a staff enters a room
Here's what I got so far. This satisfies requirement # 1 only.
CREATE TABLE HH_Solution_Audit (
ID INT PRIMARY KEY,
STAFF_GUID NVARCHAR(1),
LAST_UPDATED DATETIME
)
GO
INSERT INTO HH_Solution_Audit VALUES (1, 'b', '2013-04-25 9:01')
INSERT INTO HH_Solution_Audit VALUES (2, 'b', '2013-04-25 9:04')
INSERT INTO HH_Solution_Audit VALUES (3, 'b', '2013-04-25 9:13')
INSERT INTO HH_Solution_Audit VALUES (4, 'a', '2013-04-25 10:15')
INSERT INTO HH_Solution_Audit VALUES (5, 'a', '2013-04-25 10:30')
INSERT INTO HH_Solution_Audit VALUES (6, 'a', '2013-04-25 10:33')
INSERT INTO HH_Solution_Audit VALUES (7, 'a', '2013-04-25 10:41')
INSERT INTO HH_Solution_Audit VALUES (8, 'a', '2013-04-25 11:02')
INSERT INTO HH_Solution_Audit VALUES (9, 'a', '2013-04-25 11:30')
INSERT INTO HH_Solution_Audit VALUES (10, 'a', '2013-04-25 11:45')
INSERT INTO HH_Solution_Audit VALUES (11, 'a', '2013-04-25 11:46')
INSERT INTO HH_Solution_Audit VALUES (12, 'a', '2013-04-25 11:51')
INSERT INTO HH_Solution_Audit VALUES (13, 'a', '2013-04-25 12:24')
INSERT INTO HH_Solution_Audit VALUES (14, 'b', '2013-04-25 12:27')
INSERT INTO HH_Solution_Audit VALUES (15, 'b', '2013-04-25 13:35')
DECLARE #numOfPeople INT = 2,
--minimum number of people that must be inside
--the room for #lengthOfStay minutes
#lengthOfStay INT = 10,
--number of minutes of stay
#dateFrom DATETIME = '04/25/2013 00:00',
#dateTo DATETIME = '04/25/2013 23:59';
WITH cteSource AS
(
SELECT ID, STAFF_GUID, LAST_UPDATED,
ROW_NUMBER() OVER (ORDER BY LAST_UPDATED) AS row_num
FROM HH_SOLUTION_AUDIT
WHERE LAST_UPDATED >= #dateFrom AND LAST_UPDATED <= #dateTo
)
SELECT [current].ID, [current].STAFF_GUID, [current].LAST_UPDATED
FROM
cteSource AS [current]
LEFT OUTER JOIN
cteSource AS [previous] ON [current].row_num = [previous].row_num + 1
LEFT OUTER JOIN
cteSource AS [next] ON [current].row_num = [next].row_num - 1
WHERE
DATEDIFF(MINUTE, [previous].LAST_UPDATED, [current].LAST_UPDATED)
<= #lengthOfStay
OR
DATEDIFF(MINUTE, [current].LAST_UPDATED, [next].LAST_UPDATED)
<= #lengthOfStay
ORDER BY [current].ID, [current].LAST_UPDATED
Running the query returns IDs:
1, 2, 3, 5, 6, 7, 10, 11, 12, 13, 14
That satisfies requirement # 1 of having less than or equal to 10 minutes interval between the previous row, current row and next row.
Can you help me with the 2nd requirement? If it's applied, the returned IDs should only be:
13, 14
Here's an idea. You don't need ROW_NUMBER and previous and next records. You just need to queries unioned - one looking for everyone that have someone checked X minutes behind, and another looking for X minutes upfront. Each uses a correlated sub-query and COUNT(*) to find number of matching people. If number is greater then your #numOfPeople - that's it.
EDIT: new version: Instead of doing two queries with 10 minutes upfront and behind, we'll only check for 10 minutes behind - selecting those that match in cteLastOnes. After that will go in another part of query to search for those that actually exist within those 10 minutes. Ultimately again making union of them and the 'last ones'
WITH cteSource AS
(
SELECT ID, STAFF_GUID, LAST_UPDATED
FROM HH_SOLUTION_AUDIT
WHERE LAST_UPDATED >= #dateFrom AND LAST_UPDATED <= #dateTo
)
,cteLastOnes AS
(
SELECT * FROM cteSource c1
WHERE #numOfPeople -1 <= (SELECT COUNT(DISTINCT STAFF_GUID)
FROM cteSource c2
WHERE DATEADD(MI,#lengthOfStay,c2.LAST_UPDATED) > c1.LAST_UPDATED
AND C2.LAST_UPDATED <= C1.LAST_UPDATED
AND c1.STAFF_GUID <> c2.STAFF_GUID)
)
SELECT * FROM cteLastOnes
UNION
SELECT * FROM cteSource s
WHERE EXISTS (SELECT * FROM cteLastOnes l
WHERE DATEADD(MI,#lengthOfStay,s.LAST_UPDATED) > l.LAST_UPDATED
AND s.LAST_UPDATED <= l.LAST_UPDATED
AND s.STAFF_GUID <> l.STAFF_GUID)
SQLFiddle DEMO - new version
SQLFiddle DEMO - old version

Is it possible to do this in SQL (summing rows over different criteria per user)

I have an SQL table that looks like this:
CREATE TABLE diet_watch (
entry_date date NOT NULL,
user_id int default 1,
weight double precision NOT NULL
);
INSERT INTO diet_watch VALUES ('2001-01-01', 1, 128.2);
INSERT INTO diet_watch VALUES ('2001-01-02', 1, 121.2);
INSERT INTO diet_watch VALUES ('2001-01-03', 1, 100.6);
INSERT INTO diet_watch VALUES ('2001-01-04', 1, 303.7);
INSERT INTO diet_watch VALUES ('2001-01-05', 1, 121.0);
INSERT INTO diet_watch VALUES ('2001-01-01', 2, 121.0);
INSERT INTO diet_watch VALUES ('2001-01-06', 2, 128.0);
INSERT INTO diet_watch VALUES ('2001-01-07', 2, 138.0);
INSERT INTO diet_watch VALUES ('2001-01-01', 3, 128.2);
INSERT INTO diet_watch VALUES ('2001-01-02', 3, 125.5);
INSERT INTO diet_watch VALUES ('2001-01-03', 3, 112.8);
INSERT INTO diet_watch VALUES ('2001-01-06', 3, 111.2);
I further have this table:
CREATE TABLE summing_period (
user_id INT NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL);
insert into summing_period VALUES (1, '2001-01-01', '2001-01-03');
insert into summing_period VALUES (2, '2001-01-02', '2001-01-06');
insert into summing_period VALUES (3, '2001-01-03', '2001-01-06');
I want to write a query that returns DISTINCT ROWS with the following columns:
the user_id
the sum of the weights in table diet_watch between the specified dates in table summing_period (for the user_id)
So the result of the query based on the data in table summing period should be:
1,350.0
2,128.0
3,224.0
Unfortunately, this time, I have reached the limit of my SQLfu - and I no idea how to even get started in writing the SQL. Ideally, the solution should be ANSI SQL (i.e. db agnostic). however, since I am developing to a PostgreSQL 8.4 backend, if the solution is db centric, it must at least run on PG.
SELECT
sp.user_id, SUM(dw.weight)
FROM
summing_period sp,
diet_watch dw
WHERE
dw.user_id = sp.user_id AND
dw.entry_date >= sp.start_date AND
dw.entry_date <= sp.end_date
GROUP BY
sp.user_id
What this query does is join each diet_watch row to the row in summing_period that matches its user_id and whose date falls in the summing_period's range.
The SELECT then asks for the SUM of the weights for each different user_id (as a result of the GROUP BY user_id).