Identify rows subsequent to other rows based on criteria? - sql

I am fairly new to DB2 and SQL. There exists a table of customers and their visits. I need to write a query to find visits by the same customer subsequent and within 24hr to a visit when Sale = 'Y'.
Based on this example data:
CustomerId
VisitID
Sale
DateTime
1
1
Y
2021-04-23 20:16:00.000000
2
2
N
2021-04-24 20:16:00.000000
1
3
N
2021-04-23 21:16:00.000000
2
4
Y
2021-04-25 20:16:00.000000
3
5
Y
2021-04-23 20:16:00.000000
2
6
N
2021-04-25 24:16:00.000000
3
7
N
2021-5-23 20:16:00.000000
The query results should return:
VisitID
3
6
How do I do this?

Try this. You may uncomment the commented out block to run this statement as is.
/*
WITH MYTAB (CustomerId, VisitID, Sale, DateTime) AS
(
VALUES
(1, 1, 'Y', '2021-04-23 20:16:00'::TIMESTAMP)
, (1, 3, 'N', '2021-04-23 21:16:00'::TIMESTAMP)
, (2, 2, 'N', '2021-04-24 20:16:00'::TIMESTAMP)
, (2, 4, 'Y', '2021-04-25 20:16:00'::TIMESTAMP)
, (2, 6, 'N', '2021-04-25 23:16:00'::TIMESTAMP)
, (3, 5, 'Y', '2021-04-23 20:16:00'::TIMESTAMP)
, (3, 7, 'N', '2021-05-23 20:16:00'::TIMESTAMP)
)
*/
SELECT VisitID
FROM MYTAB A
WHERE EXISTS
(
SELECT 1
FROM MYTAB B
WHERE B.CustomerID = A.CustomerID
AND B.Sale = 'Y'
AND B.VisitID <> A.VisitID
AND A.DateTime BETWEEN B.DateTime AND B.DateTime + 24 HOUR
)

Related

Denormalize column

I have data in my database like this:
Code
meta
meta_ID
date
A
1,2
1
01/01/2022 08:08:08
B
1,2
2
01/01/2022 02:00:00
B
null
2
01/01/1900 02:00:00
C
null
3
01/01/2022 02:00:00
D
8
8
01/01/2022 02:00:00
E
5,6,7
5
01/01/2022 02:00:00
F
1,2
2
01/01/2022 02:00:00
I want to have this with the last date (comparing with day, month year)
Code
meta
meta_ID
list_Code
date
A
2,3
1
A,B,F
01/01/2022 08:08:08
B
1,3
2
A,B,F
01/01/2022 02:00:00
C
null
3
C
01/01/2022 02:00:00
D
8
8
D
01/01/2022 02:00:00
E
5,6,7
5
E
01/01/2022 02:00:00
F
1,2
3
A,B,F
01/01/2022 02:00:00
I want to have the list of code having the same meta group, do you know how to do it with SQL Server?
The code below inputs the 1st table and outputs the 2nd table exactly. The Meta and Date columns had duplicate values, so in the CTE I took the MAX for both fields. Different logic can be applied if needed.
It uses XML Path to merge all rows into one column to create the List_Code column. The Stuff function removes the leading comma (,) delimiter.
CREATE TABLE MetaTable
(
Code VARCHAR(5),
Meta VARCHAR(100),
Meta_ID INT,
Date DATETIME
)
GO
INSERT INTO MetaTable
VALUES
('A', '1,2', '1', '01/01/2022 08:08:08'),
('B', '1,2','2', '01/01/2022 02:00:00'),
('B', NULL,'2', '01/01/1900 02:00:00'),
('C', NULL,'3', '01/01/2022 02:00:00'),
('D', '8','8', '01/01/2022 02:00:00'),
('E', '5,6,7', '5', '01/01/2022 02:00:00'),
('F', '1,2','2', '01/01/2022 02:00:00')
GO
WITH CTE_Meta
AS
(
SELECT
Code,
MAX(Meta) AS 'Meta',
Meta_ID,
MAX(Date) AS 'Date'
FROM MetaTable
GROUP BY
Code,
Meta_ID
)
SELECT
T1.Code,
T1.Meta,
T1.Meta_ID,
STUFF
(
(
SELECT ',' + Code
FROM CTE_Meta T2
WHERE ISNULL(T1.Meta, '') = ISNULL(T2.Meta, '')
FOR XML PATH('')
), 1, 1, ''
) AS 'List_Code',
T1.Date
FROM CTE_Meta T1
ORDER BY 1
I like the first answer using XML. It's very concise. This is more verbose, but might be more flexible if the data can have different meta values spread about in different records. The CAST to varchar(12) in various places is just for the display. I use STRING_AGG and STRING_SPLIT instead of XML.
WITH TestData as (
SELECT t.*
FROM (
Values
('A', '1,2', '1', '01/01/2022 08:08:08'),
('B', '1,2', '2', '01/01/2022 02:00:00'),
('B', null, '2', '01/01/1900 02:00:00'),
('C', null, '3', '01/01/2022 02:00:00'),
('D', '8', '8', '01/01/2022 02:00:00'),
('E', '5,6,7', '5', '01/01/2022 02:00:00'),
('F', '1,2', '2', '01/01/2022 02:00:00'),
('G', '16', '17', '01/01/2022 02:00:00'),
('G', null, '17', '01/02/2022 03:00:00'),
('G', '19', '18', '01/03/2022 04:00:00'),
('G', '19', '18', '01/03/2022 04:00:00'),
('G', '20', '19', '01/04/2022 05:00:00'),
('G', '20', '20', '01/05/2022 06:00:00')
) t (Code, meta, meta_ID, date)
), CodeLookup as ( -- used to find the Code from the meta_ID
SELECT DISTINCT meta_ID, Code
FROM TestData
), Normalized as ( -- split out the meta values, one per row
SELECT t.Code, s.Value as [meta], meta_ID, [date]
FROM TestData t
OUTER APPLY STRING_SPLIT(t.meta, ',') s
), MetaLookup as ( -- used to find the distinct list of meta values for a Code
SELECT n.Code, CAST(STRING_AGG(n.meta, ',') WITHIN GROUP ( ORDER BY n.meta ASC ) as varchar(12)) as [meta]
FROM (
SELECT DISTINCT Code, meta
FROM Normalized
WHERE meta is not NULL
) n
GROUP BY n.Code
), MetaIdLookup as ( -- used to find the distinct list of meta_ID values for a Code
SELECT n.Code, CAST(STRING_AGG(n.meta_ID, ',') WITHIN GROUP ( ORDER BY n.meta_ID ASC ) as varchar(12)) as [meta_ID]
FROM (
SELECT DISTINCT Code, meta_ID
FROM Normalized
) n
GROUP BY n.Code
), ListCodeLookup as ( -- for every code, get all codes for the meta values
SELECT l.Code, CAST(STRING_AGG(l.lookupCode, ',') WITHIN GROUP ( ORDER BY l.lookupCode ASC ) as varchar(12)) as [list_Code]
FROM (
SELECT DISTINCT n.Code, c.Code as [lookupCode]
FROM Normalized n
INNER JOIN CodeLookup c
ON c.meta_ID = n.meta
UNION -- every record needs it's own code in the list_code?
SELECT DISTINCT n.Code, n.Code as [lookupCode]
FROM Normalized n
) l
GROUP BY l.Code
)
SELECT t.Code, m.meta, mi.meta_ID, lc.list_Code, t.[date]
FROM (
SELECT Code, MAX([date]) as [date]
FROM TestData
GROUP BY Code
) t
LEFT JOIN MetaLookup m
ON m.Code = t.Code
LEFT JOIN MetaIdLookup mi
ON mi.Code = t.Code
LEFT JOIN ListCodeLookup lc
ON lc.Code = t.Code
Code meta meta_ID list_Code date
---- ------------ ------------ ------------ -------------------
A 1,2 1 A,B,F 01/01/2022 08:08:08
B 1,2 2 A,B,F 01/01/2022 02:00:00
C NULL 3 C 01/01/2022 02:00:00
D 8 8 D 01/01/2022 02:00:00
E 5,6,7 5 E 01/01/2022 02:00:00
F 1,2 2 A,B,F 01/01/2022 02:00:00
G 16,19,20 17,18,19,20 G 01/05/2022 06:00:00

Function that returns MAX OR MIN dates based on ID count

I have a task in SQL Server where I need to return the RESULT_DATE column using ID, PRODUCT_ID and DATE columns. Task criteria:
If DATE column is filled once for each PRODUCT_ID then I need to return the only date (like for PRODUCT_ID 1 and 3). Let`s say its MIN date.
If DATE column is filled more than one time (like for PRODUCT_ID 2) then I need to return the next filled DATE row.
Data:
CREATE TABLE #temp (
ID INT,
PRODUCT_ID INT,
[DATE] DATETIME
)
INSERT #temp (ID, PRODUCT_ID, DATE) VALUES
(1, 1, '2008-04-24 00:00:00.000'),
(2, 1, NULL),
(3, 2, '2015-12-09 00:00:00.000'),
(4, 2, NULL),
(5, 2, NULL),
(6, 2, '2022-01-01 13:06:45.253'),
(7, 2, NULL),
(8, 2, '2022-01-19 13:06:45.253'),
(9, 3, '2018-04-25 00:00:00.000'),
(10,3, NULL),
(11,3, NULL)
ID
PRODUCT_ID
DATE
RESULT_DATE
1
1
2008-04-24 00:00:00.000
2008-04-24 00:00:00.000
2
1
NULL
2008-04-24 00:00:00.000
3
2
2015-12-09 00:00:00.000
2022-01-01 13:06:45.253
4
2
NULL
2022-01-01 13:06:45.253
5
2
NULL
2022-01-01 13:06:45.253
6
2
2022-01-01 13:06:45.253
2022-01-19 13:06:45.253
7
2
NULL
2022-01-19 13:06:45.253
8
2
2022-01-19 13:06:45.253
2022-01-19 13:06:45.253
9
3
2018-04-25 00:00:00.000
2018-04-25 00:00:00.000
10
3
NULL
2018-04-25 00:00:00.000
11
3
NULL
2018-04-25 00:00:00.000
I have tried different techniques, for example using LEAD and LAG SQL function combinations. The latest script: (However, still not working)
SELECT
COALESCE(DATE,
CAST(
SUBSTRING(
MAX(CAST(DATE AS BINARY(4)) + CAST(DATE AS BINARY(4))) OVER ( PARTITION BY PRODUCT_ID ORDER BY DATE ROWS UNBOUNDED PRECEDING)
,5,4)
AS INT)
) AS RESULT_DATE,
*
FROM TABLE
You can use a CTE, Select all rows with a non-NULL Date giving each a row_number, then use a second CTE to fetch all rows from the first CTE equivalent to the date with the largest row number per product_id that is less than 3. Finally join this CTE to the original table to supply the 2nd Date to each row:
Set Up
CREATE TABLE #temp (
ID INT,
PRODUCT_ID INT,
MyDATE DATETIME
)
INSERT #temp (ID, PRODUCT_ID, MyDate)
VALUES
(1, 1, '2008-04-24 00:00:00.000'),
(2, 1, NULL),
(3, 2, '2015-12-09 00:00:00.000'),
(4, 2, NULL),
(5, 2, NULL),
(6, 2, '2022-01-01 13:06:45.253'),
(7, 2, NULL),
(8, 2, '2022-01-19 13:06:45.253'),
(9, 3, '2018-04-25 00:00:00.000'),
(10,3, NULL),
(11,3, NULL);
Query:
;WITH CTE
AS
(
SELECT ID, Product_ID, MyDate,
ROW_NUMBER() OVER (PARTITION BY Product_ID ORDER BY Id) AS rn
from #temp
WHERE MyDate IS NOT NULL
),
CTE2
AS
(
SELECT *
FROM CTE C1
WHERE C1.rn < 3
AND
C1.rn =
(SELECT MAX(rn) FROM CTE WHERE Product_Id = C1.Product_Id AND rn<3)
)
SELECT T.Id, T.Product_Id, T.MyDate, C.MyDate As Result_date
FROM #temp T
INNER JOIN CTE2 C
ON T.Product_Id = C.Product_Id
ORDER BY T.Id;
Results:
Id Product_Id MyDate Result_Date
1 1 2008-04-24 00:00:00.000 2008-04-24 00:00:00.000
2 1 NULL 2008-04-24 00:00:00.000
3 2 2015-12-09 00:00:00.000 2022-01-01 13:06:45.253
4 2 NULL 2022-01-01 13:06:45.253
5 2 NULL 2022-01-01 13:06:45.253
6 2 2022-01-01 13:06:45.253 2022-01-01 13:06:45.253
7 2 NULL 2022-01-01 13:06:45.253
8 2 2022-01-19 13:06:45.253 2022-01-01 13:06:45.253
9 3 2018-04-25 00:00:00.000 2018-04-25 00:00:00.000
10 3 NULL 2018-04-25 00:00:00.000
11 3 NULL 2018-04-25 00:00:00.000

How to get minimum and maximum value by using partition in sql

type value prod date
a 20 2 2019-07-08
a 20 3 2019-07-08
b 30 2 2019-07-08
b 35 1 2019-07-08
a 40 4 2019-07-09
a 20 4 2019-07-09
b 32 3 2019-07-09
b 31 3 2019-07-09
b 30 2 2019-07-09
b 33 2 2019-07-09
b 12 1 2019-07-10
b 23 1 2019-07-10
b 20 2 2019-07-10
b 22 2 2019-07-10
My table looks like this:
First thing, I want to get the result of prod / value as util for each type and also date, but for every result I need to also sum from the previous dates.
By that, I also need to know the minimum and the maximum value from each type and also date.
What I have done so far:
select *, t1.value / t1.prod as util
select
type, date, sum(value), sum(prod)
from table1
where true
and event_date <= ‘2019-07-11’
group by type, date) t1
How can I get the minimum and the maximum util by the logic I have that the util calculation should be sum from the previous dates. I assume I need to use partition, but I am still not sure for this.
Thanks in advance
Not sure if you are looking for this. It gives you min, max, sum values of value column by ordering by date and partitioning by type.
Check this:
drop table tmp_table10
create table tmp_table10
(
type nvarchar(5) null,
value float null,
prod nvarchar(255) null,
date nvarchar(255) null,
)
insert into tmp_table10
values('a', '20' ,2 , '2019-07-08'),
('a', '20' ,3 , '2019-07-08'),
('b', '30' ,2 , '2019-07-08'),
('b', '35' ,1 , '2019-07-08'),
('a', '40' ,4 , '2019-07-09'),
('a', '20' ,4 , '2019-07-09'),
('b', '32' ,3 , '2019-07-09'),
('b', '31' ,3 , '2019-07-09'),
('b', '30' ,2 , '2019-07-09'),
('b', '33' ,2 , '2019-07-09'),
('b', '12' ,1 , '2019-07-10'),
('b', '23' ,1 , '2019-07-10'),
('b', '20' ,2 , '2019-07-10'),
('b', '22' ,2 , '2019-07-10')
select
*
, max(value) over(partition by type order by date) maxValueByType
, min(value) over(partition by type order by date) minValueByType
, sum(value) over(partition by type order by date) sumValue
from tmp_table10
order by type, date
If I interpret your question as you want the cumulative min and max of util which is calculated like this:
select type, date, sum(value) / sum(prod) as util
from table1
where event_date <= ‘2019-07-11’
group by type, date;
Then you can use window functions:
select type, date, sum(value) / sum(prod) as util,
min(sum(value) / sum(prod)) over (partition by type order by date) as min_running_util,
max(sum(value) / sum(prod)) over (partition by type order by date) as max_running_util
from table1
where event_date <= ‘2019-07-11’
group by type, date;

How to find all those Sellers from the table who had increase in sales in at least 3 months consecutively in SQL?

How to find all those Sellers from below table who had increase in sales in at least 3 months consecutively?
Record | Seller_id | Months | Sales_amount
0 121 Feb 100
1 121 Jan 87
2 121 Mar 95
3 121 May 105
4 121 Apr 100
5 321 Jan 100
6 321 Feb 87
7 321 Mar 95
8 321 Apr 105
9 321 May 110
10 597 Jan 100
11 597 Feb 105
12 597 Mar 95
13 597 Apr 100
14 597 May 110
This is curious you have no year and months are three letter codes. Do it with lag
and table of months
With tbl as (
select * from (values
-- source data
(0 , 121,'Feb',100)
,(1 , 121,'Jan',87 )
,(2 , 121,'Mar',95 )
,(3 , 121,'May',105)
,(4 , 121,'Apr',100)
,(5 , 321,'Jan',100)
,(6 , 321,'Feb',87 )
,(7 , 321,'Mar',95 )
,(8 , 321,'Apr',105)
,(9 , 321,'May',110)
,(10, 597,'Jan',100)
,(11, 597,'Feb',105)
,(12, 597,'Mar',95 )
,(13, 597,'Apr',100)
,(14, 597,'May',110)
) t(id, Seller_id, Months, Sales_amount)
), months as (
select * from ( values
(1, 'Jan')
,(2, 'Feb')
,(3, 'Mar')
,(4, 'Apr')
,(5, 'May')
-- , etc
) t(id,name)
)
select *
from (
select t.*,
lag(Sales_amount,1) over (partition by Seller_id order by m.id) m1,
lag(Sales_amount,2) over (partition by Seller_id order by m.id) m2
from tbl t
join months m on m.name=t.Months
) t
where Sales_amount > m1 and m1 > m2;
WITH a
AS (SELECT *
FROM
(
VALUES -- source data
(0, 121, 'Feb', 100),
(1, 121, 'Jan', 87),
(2, 121, 'Mar', 95),
(3, 121, 'May', 105),
(4, 121, 'Apr', 100),
(5, 321, 'Jan', 100),
(6, 321, 'Feb', 87),
(7, 321, 'Mar', 95),
(8, 321, 'Apr', 105),
(9, 321, 'May', 110),
(10, 597, 'Jan', 100),
(11, 597, 'Feb', 105),
(12, 597, 'Mar', 95),
(13, 597, 'Apr', 100),
(14, 597, 'May', 110)
) t (id, Seller_id, Months, Sales_amount) ),
b
AS (SELECT *
FROM
(
VALUES
(1, 'Jan'),
(2, 'Feb'),
(3, 'Mar'),
(4, 'Apr'),
(5, 'May') -- , etc
) t (id, name) ),
c
AS (SELECT a.*,
b.id id2,
ROW_NUMBER() OVER (PARTITION BY a.Seller_id ORDER BY b.id ASC) rnk
FROM a
LEFT JOIN b
ON a.Months = b.name),
d
AS (SELECT --c1.*
c1.Seller_id,
c1.Months AS m1,
c2.Months AS m2,
c3.Months AS m3,
c1.Sales_amount AS sa1,
c2.Sales_amount AS sa2,
c3.Sales_amount AS sa3
FROM c c1
LEFT JOIN c c2
ON c1.id2 = c2.id2 - 1
AND c1.Seller_id = c2.Seller_id
LEFT JOIN c c3
ON c2.id2 = c3.id2 - 1
AND c2.Seller_id = c3.Seller_id)
SELECT *,
CASE
WHEN sa1 < sa2
AND sa2 < sa3 THEN
1
ELSE
0
END is_con
FROM d;

SQL query to compare time difference

I've used the code below to query and got the output shown. Now, I would like to query as describe below. How should I do it?
Find code 2, check if code 1 comes after code 2 within the same ItemID. If yes, compare the time difference. If time difference is less than 10 seconds, display the two compared rows.
SELECT [Date]
,[Code]
,[ItemId]
,[ItemName]
FROM [dbo].[Log] as t
join Item as d
on t.ItemId = d.Id
where ([Code] = 2 or [Code] = 1) and ([ItemId] > 97 and [ItemId] < 100)
order by [ItemId], [Date]
Output from the above query
Date Code ItemName ItemID
2017-01-06 11:00:49.000 2 B 98
2017-01-06 11:00:49.000 1 A 98
2017-01-06 11:00:55.000 2 B 98
2017-01-06 12:01:56.000 1 A 98
2017-01-06 12:02:37.000 2 B 98
2017-01-06 12:03:49.000 1 A 98
2017-01-06 12:05:44.000 2 B 98
2017-01-06 20:24:32.000 1 A 98
2017-01-06 20:24:55.000 2 B 98
2017-03-14 16:37:42.000 2 B 99
2017-03-14 17:40:24.000 1 A 99
2017-03-14 17:40:25.000 2 B 99
2017-03-14 21:28:46.000 1 A 99
2017-03-15 08:03:07.000 2 B 99
2017-03-15 10:43:00.000 1 A 99
2017-03-15 12:01:17.000 2 B 99
2017-03-15 14:18:19.000 2 B 99
Expected Result
Date Code ItemName ItemID
2017-01-06 11:00:49.000 2 B 98
2017-01-06 11:00:49.000 1 A 98
create table results ([Date] datetime, Code int, ItemName char(1), ItemID int);
insert into results values
('2017-01-06 11:00:49', 2, 'B', 98),
('2017-01-06 11:00:49', 1, 'A', 98),
('2017-01-06 11:00:55', 2, 'B', 98),
('2017-01-06 12:01:56', 1, 'A', 98),
('2017-01-06 12:01:58', 1, 'A', 98),
('2017-01-06 12:02:37', 2, 'B', 98),
('2017-01-06 12:03:49', 1, 'A', 98),
('2017-01-06 12:05:44', 2, 'B', 98),
('2017-01-06 20:24:32', 1, 'A', 98),
('2017-01-06 20:24:55', 2, 'B', 98),
('2017-03-07 00:02:27', 1, 'A', 91),
('2017-03-07 00:02:27', 1, 'A', 58),
('2017-03-14 16:37:42', 2, 'B', 99),
('2017-03-14 17:40:24', 1, 'A', 99),
('2017-03-14 17:40:38', 2, 'B', 99),
('2017-03-14 21:28:46', 1, 'A', 99),
('2017-03-15 08:03:07', 2, 'B', 99),
('2017-03-15 10:43:00', 1, 'A', 99),
('2017-03-15 12:01:17', 2, 'B', 99),
('2017-03-15 14:18:19', 1, 'A', 99);
--= set a reset point when ItemId changes, or there is no correlative (2,1) couples
--= keep in mind this solution assumes that first Code must be 2
--
WITH SetReset AS
(
SELECT [Date], Code, ItemName, ItemId,
CASE WHEN LAG([ItemId]) OVER (PARTITION BY ItemId ORDER BY [Date]) IS NULL
OR ([Code] = 2)
OR ([Code] = COALESCE(LAG([Code]) OVER (PARTITION BY ItemId ORDER BY [Date]), [Code]))
THEN 1 END is_reset
FROM results
)
--
--= set groups according to reset points
--
, SetGroup AS
(
SELECT [Date], Code, ItemName, ItemId,
COUNT(is_reset) OVER (ORDER BY [ItemId], [Date]) grp
FROM SetReset
)
--
--= calcs diff date for each group
, CalcSeconds AS
(
SELECT [Date], Code, ItemName, ItemId,
DATEDIFF(SECOND, MIN([Date]) OVER (PARTITION BY grp), MAX([Date]) OVER (PARTITION BY grp)) dif_sec,
COUNT(*) OVER (PARTITION BY grp) num_items
FROM SetGroup
)
--
--= selects those rows with 2 items by group and date diff less than 10 sec
SELECT [Date], Code, ItemName, ItemId
FROM CalcSeconds
WHERE dif_sec < 10
AND num_items = 2
;
GO
Date | Code | ItemName | ItemId
:------------------ | ---: | :------- | -----:
06/01/2017 11:00:49 | 2 | B | 98
06/01/2017 11:00:49 | 1 | A | 98
Warning: Null value is eliminated by an aggregate or other SET operation.
dbfiddle here