Related
I have database table in SNOWFLAKE, where I need check for each customer if there is FLAG_1 == 1 at minimum 3 days in row. Flag_1 indicates whether the order contained any specific goods. And create new table with customer_id and flag_2. I really don't know how to handle this problem.
Sample table:
CREATE TABLE TMP_TEST
(
CUSTOMER_ID INT,
ORDER_DATE DATE,
FLAG_1 INT
);
INSERT INTO TMP_TEST (CUSTOMER_ID, ORDER_DATE, FLAG_1)
VALUES
(001, '2020-04-01', 0),
(001, '2020-04-02', 1),
(001, '2020-04-03', 1),
(001, '2020-04-04', 1),
(001, '2020-04-05', 1),
(001, '2020-04-06', 0),
(001, '2020-04-07', 0),
(001, '2020-04-08', 0),
(001, '2020-04-09', 1),
(002, '2020-04-10', 1),
(002, '2020-04-11', 0),
(002, '2020-04-12', 0),
(002, '2020-04-13', 1),
(002, '2020-04-14', 1),
(002, '2020-04-15', 0),
(002, '2020-04-16', 1),
(002, '2020-04-17', 1);
Expected output table:
CUSTOMER_ID FLAG_2
001 1
002 0
Maybe this can be help:
with calcflag as (
select customer_id, IFF( sum(flag_1) over (PARTITION by customer_id order by order_date rows between 3 preceding and 1 preceding) = 3, 1, 0 ) as new_flag
from tmp_Test)
select customer_id, max(new_flag) flag_2
from calcflag
group by 1
order by 1;
+-------------+--------+
| CUSTOMER_ID | FLAG_2 |
+-------------+--------+
| 1 | 1 |
| 2 | 0 |
+-------------+--------+
using COUNT_IF also works:
with calcflag as (
select
customer_id,
IFF(
count_if(flag_1 = 1) over (
PARTITION by customer_id
order by order_date
rows between 2 preceding and current row
) = 3, 1, 0
) as new_flag
from tmp_Test
)
select
customer_id,
max(new_flag) flag_2
from calcflag
group by 1
+-------------+--------+
| CUSTOMER_ID | FLAG_2 |
|-------------+--------|
| 1 | 1 |
| 2 | 0 |
+-------------+--------+
Snowflake supports MATCH_RECOGNIZE which is the easiest way to detect advanced patterns across multiple rows:
To find 3 or more occurences the pattern is PATTERN ( a{3,}):
SELECT *
FROM TMP_TEST
MATCH_RECOGNIZE (
PARTITION BY CUSTOMER_ID
ORDER BY ORDER_DATE
MEASURES MATCH_NUMBER() AS mn
ALL ROWS PER MATCH WITH UNMATCHED ROWS
PATTERN ( a{3,} )
DEFINE a AS FLAG_1 = 1
) mr
ORDER BY CUSTOMER_ID, ORDER_DATE;
Output:
Collapsing to single row per group:
SELECT CUSTOMER_ID, COALESCE(MIN(MN),0) AS FLAG_2
FROM TMP_TEST
MATCH_RECOGNIZE (
PARTITION BY CUSTOMER_ID
ORDER BY ORDER_DATE
MEASURES MATCH_NUMBER() AS mn
ALL ROWS PER MATCH WITH UNMATCHED ROWS
PATTERN ( a{3,})
DEFINE a AS FLAG_1 = 1
) mr
GROUP BY CUSTOMER_ID;
Output:
The power of this solution lies at the PATTERN part which could be easily extended with new conditions. For instance:
PATTERN ( a b{1,2} a )
DEFINE a AS FLAG_1 = 1,
b AS FLAT_2 = 0;
Here: Find sequence of flag = 1, followed by one or two occurences of flag = 0 and ended by flag = 1.
I have a readonly table with a list of products,
and I need to avoid selecting duplicates based on the serial number ('serial').
When I have a duplicate I want to select the duplicate which 'label' first letter is between A and J.
Here are my data and my try to get a selection without duplicates:
CREATE TABLE products(id INT, serial VARCHAR(25), label VARCHAR(50) , type VARCHAR(25));
INSERT INTO products(id, serial, label, type)
VALUES
( 1, '111', 'A1', 'computer'),
( 2, '222', 'B2', 'computer'),
( 3, '333', 'Z3', 'computer'),
( 4, '333', 'D4', 'computer'),
( 5, '555', 'E5', 'computer'),
( 6, '666', 'X6', 'computer'),
( 7, '777', 'G7', 'computer'),
( 8, '777', 'Y7', 'computer'),
( 9, '888', 'I8', 'computer'),
(10, '999', 'J9', 'screen'),
(11, '777', 'G7bis', 'computer'),
(12, '666', 'X6bis', 'computer');
SELECT COUNT(serial) OVER(PARTITION BY serial) as nbserial, *
FROM products
where type='computer' and nbserial=1 or
(nbserial>1 and LEFT(label, 1) between 'A' and 'J')
;
I have several problems: here I cannot define a condition about nbserial in the where clause.
And if there are 3 duplicates, I need to select a line which verifies the condition: label first letter is between A and J.
And if there are several duplicates, but none verifies the condition (firstletter between A and J), then select any line.
example of expected result:
(no serial duplicate, and if possile label starts with letter between A and J)
( 1, '111', 'A1', 'computer'),
( 2, '222', 'B2', 'computer'),
( 4, '333', 'D4', 'computer'),
( 5, '555', 'E5', 'computer'),
( 6, '666', 'X6', 'computer'),
( 7, '777', 'G7', 'computer'),
( 9, '888', 'I8', 'computer'),
(10, '999', 'J9', 'screen'),
How can I do that with a SELECT, and I cannot change the table content ?
Thanks
You can use row_number() and a conditional sort:
select *
from (
select p.*,
row_number() over(
partition by serial
order by case when left(label, 1) between 'A' and 'J' then 0 else 1 end, id
) rn
from products p
) p
where rn = 1
Or better yet, use distinct on in Postgres:
select distinct on (serial) p.*
from products p
order by serial, (left(label, 1) between 'A' and 'J') desc, id
This gives one row per serial, and prioritizes labels whose first letter is between "A" and "J". When there are ties, the row with the least id is retained.
Demo on DB Fiddle:
id | serial | label | type
-: | :----- | :---- | :-------
1 | 111 | A1 | computer
2 | 222 | B2 | computer
4 | 333 | D4 | computer
5 | 555 | E5 | computer
6 | 666 | X6 | computer
7 | 777 | G7 | computer
9 | 888 | I8 | computer
10 | 999 | J9 | screen
I have data that looks like this:
ID | Value
-----------
1 | a
1 | b
2 | a
2 | c
3 | a
3 | d
And I would like it to look like this:
ID | Value_a | Value_b | Value_c | Value_d
---------------------------------------------
1 | 1 | 1 | 0 | 0
2 | 1 | 0 | 1 | 0
3 | 1 | 0 | 0 | 1
I think a dynamic conditional aggregation is required. Any help would be appreciated.
Conditional aggregation goes like:
select
id,
max(case when value = 'a' then 1 else 0 end) value_a,
max(case when value = 'b' then 1 else 0 end) value_b,
max(case when value = 'c' then 1 else 0 end) value_c,
max(case when value = 'd' then 1 else 0 end) value_d
from mytable
group by id
Here is a sample implementation of dynamic conditional aggregation:
--create test table
create table #values (
[ID] int
,[Value] char(1))
--populate test table
insert into #values
values
(1, 'a')
,(1, 'b')
,(2, 'a')
,(2, 'c')
,(3, 'a')
,(3, 'd')
--declare variable that will hold dynamic query
declare #query nvarchar(max) = ' select [ID] '
--build dynamic query and assign it to variable
select
#query = #query + max(',max(case when [value] = '''
+ [value] + ''' then 1 else 0 end) as Value_' + [value] )
from
#values
group by
[value]
--add group by clause to dunamic query
set #query = #query + ' from #values group by [id]'
--execute dynamic query
exec (#query)
this is the result:
Now you can add a value (for example id = 4 and value = 'e') replacing the original insert with this one:
insert into #values
values
(1, 'a')
,(1, 'b')
,(2, 'a')
,(2, 'c')
,(3, 'a')
,(3, 'd')
,(4, 'a')
,(4, 'e')
this is the new output:
This question already has answers here:
Oracle query to match all values in the list among all rows in table
(3 answers)
simple Oracle select statement syntax
(4 answers)
Closed 5 years ago.
I have a supplier table which has columns Item, suppliername and status. For the given items, I have to fetch the rows which has the same value in suppliername and status column only if the same values exists for all the given items.
For example, if the below is the table
Item Suppliername Status
A S1 Created
A S1 Approved
B S1 Approved
B S2 Created
C S1 Created
C S1 Approved
Input given are Items 'A', 'B', 'C'
The output should be as below.
Suppliername Status
S1 Approved
A few options:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TYPE CharList IS TABLE OF CHAR(1)
/
CREATE TABLE table_name ( Item, Suppliername, Status ) AS
SELECT 'A', 'S1', 'Created' FROM DUAL UNION ALL
SELECT 'A', 'S1', 'Approved' FROM DUAL UNION ALL
SELECT 'B', 'S1', 'Approved' FROM DUAL UNION ALL
SELECT 'B', 'S2', 'Created' FROM DUAL UNION ALL
SELECT 'C', 'S1', 'Created' FROM DUAL UNION ALL
SELECT 'C', 'S1', 'Approved' FROM DUAL
/
Query 1:
SELECT Suppliername, Status
FROM table_name
GROUP BY Suppliername, Status
HAVING CharList( 'A', 'B', 'C' )
SUBMULTISET OF CAST( COLLECT( Item ) AS CharList )
Results:
| SUPPLIERNAME | STATUS |
|--------------|----------|
| S1 | Approved |
Query 2:
SELECT Suppliername, Status
FROM table_name
WHERE Item IN ( 'A', 'B', 'C' )
GROUP BY Suppliername, Status
HAVING COUNT( DISTINCT item ) = 3
Results:
| SUPPLIERNAME | STATUS |
|--------------|----------|
| S1 | Approved |
Query 3:
SELECT Suppliername, Status
FROM table_name
WHERE Item MEMBER OF CharList( 'A', 'B', 'C' )
GROUP BY Suppliername, Status
HAVING COUNT( DISTINCT item ) = CARDINALITY( CharList( 'A', 'B', 'C' ) )
Results:
| SUPPLIERNAME | STATUS |
|--------------|----------|
| S1 | Approved |
This is how I would do it:
WITH picklist(Item) AS
(
VALUES ('A'),
('B'),
('C')
), grouping AS
(
SELECT T.Suppliername, T.Status, COUNT(*) AS C
FROM TABLE T
JOIN picklist ON T.Item = picklist.Item
GROUP BY T.Suppliername, T.Status
)
SELECT Suppliername, Status
FROM grouping
WHERE C = (SELECT COUNT(*) FROM picklist)
If Values does not work on your platform you can use for SQL Server
SELECT 'A' as Item
UNION ALL
SELECT 'B' as Item
UNION ALL
SELECT 'C' as Item
and For Oracle (according to MT0)
SELECT 'A' as Item FROM DUAL
UNION ALL
SELECT 'B' as Item FROM DUAL
UNION ALL
SELECT 'C' as Item FROM DUAL
I have two sets of interval data I.E.
Start End Type1 Type2
0 2 L NULL
2 5 L NULL
5 7 L NULL
7 10 L NULL
2 3 NULL S
3 5 NULL S
5 8 NULL S
11 12 NULL S
What I'd like to do is merge these sets into one. This seems possible by utilising an islands and gaps solution but due to the non-continuous nature of the intervals I'm not sure how to go about applying it... The output I'm expecting would be:
Start End Type1 Type2
0 2 L NULL
2 3 L S
3 5 L S
5 7 L S
7 8 L S
8 10 L NULL
11 12 NULL S
Anyone out there done something like this before??? Thanks!
Create script below:
CREATE TABLE Table1
([Start] int, [End] int, [Type1] varchar(4), [Type2] varchar(4))
;
INSERT INTO Table1
([Start], [End], [Type1], [Type2])
VALUES
(0, 2, 'L', NULL),
(2, 3, NULL, 'S'),
(2, 5, 'L', NULL),
(3, 5, NULL, 'S'),
(5, 7, 'L', NULL),
(5, 8, NULL, 'S'),
(7, 10, 'L', NULL),
(11, 12, NULL, 'S')
;
I assume that Start is inclusive, End is exclusive and given intervals do not overlap.
CTE_Number is a table of numbers. Here it is generated on the fly. I have it as a permanent table in my database.
CTE_T1 and CTE_T2 expand each interval into the corresponding number of rows using a table of numbers. For example, interval [2,5) generates rows with Values
2
3
4
This is done twice: for Type1 and Type2.
Results for Type1 and Type2 are FULL JOINed together on Value.
Finally, a gaps-and-islands pass groups/collapses intervals back.
Run the query step-by-step, CTE-by-CTE and examine intermediate results to understand how it works.
Sample data
I added few rows to illustrate a case when there is a gap between values.
DECLARE #Table1 TABLE
([Start] int, [End] int, [Type1] varchar(4), [Type2] varchar(4))
;
INSERT INTO #Table1 ([Start], [End], [Type1], [Type2]) VALUES
( 0, 2, 'L', NULL),
( 2, 3, NULL, 'S'),
( 2, 5, 'L', NULL),
( 3, 5, NULL, 'S'),
( 5, 7, 'L', NULL),
( 5, 8, NULL, 'S'),
( 7, 10, 'L', NULL),
(11, 12, NULL, 'S'),
(15, 20, 'L', NULL),
(15, 20, NULL, 'S');
Query
WITH
e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
) -- 10
,e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b) -- 10*10
,e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2) -- 10*100
,CTE_Numbers
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY n) AS Number
FROM e3
)
,CTE_T1
AS
(
SELECT
T1.[Start] + CA.Number - 1 AS Value
,T1.Type1
FROM
#Table1 AS T1
CROSS APPLY
(
SELECT TOP(T1.[End] - T1.[Start]) CTE_Numbers.Number
FROM CTE_Numbers
ORDER BY CTE_Numbers.Number
) AS CA
WHERE
T1.Type1 IS NOT NULL
)
,CTE_T2
AS
(
SELECT
T2.[Start] + CA.Number - 1 AS Value
,T2.Type2
FROM
#Table1 AS T2
CROSS APPLY
(
SELECT TOP(T2.[End] - T2.[Start]) CTE_Numbers.Number
FROM CTE_Numbers
ORDER BY CTE_Numbers.Number
) AS CA
WHERE
T2.Type2 IS NOT NULL
)
,CTE_Values
AS
(
SELECT
ISNULL(CTE_T1.Value, CTE_T2.Value) AS Value
,CTE_T1.Type1
,CTE_T2.Type2
,ROW_NUMBER() OVER (ORDER BY ISNULL(CTE_T1.Value, CTE_T2.Value)) AS rn
FROM
CTE_T1
FULL JOIN CTE_T2 ON CTE_T2.Value = CTE_T1.Value
)
,CTE_Groups
AS
(
SELECT
Value
,Type1
,Type2
,rn
,ROW_NUMBER() OVER
(PARTITION BY rn - Value, Type1, Type2 ORDER BY Value) AS rn2
FROM CTE_Values
)
SELECT
MIN(Value) AS [Start]
,MAX(Value) + 1 AS [End]
,Type1
,Type2
FROM CTE_Groups
GROUP BY rn-rn2, Type1, Type2
ORDER BY [Start];
Result
+-------+-----+-------+-------+
| Start | End | Type1 | Type2 |
+-------+-----+-------+-------+
| 0 | 2 | L | NULL |
| 2 | 8 | L | S |
| 8 | 10 | L | NULL |
| 11 | 12 | NULL | S |
| 15 | 20 | L | S |
+-------+-----+-------+-------+
A step-by-step way is:
-- Finding all break points
;WITH breaks AS (
SELECT Start
FROM yourTable
UNION
SELECT [End]
FROM yourTable
) -- Finding Possible Ends
, ends AS (
SELECT Start
, (SELECT Min([End]) FROM yourTable WHERE yourTable.Start = breaks.Start) End1
, (SELECT Max([End]) FROM yourTable WHERE yourTable.Start < breaks.Start) End2
FROM breaks
) -- Finding periods
, periods AS (
SELECT Start,
CASE
WHEN End1 > End2 And End2 > Start THEN End2
WHEN End1 IS NULL THEN End2
ELSE End1
END [End]
FROM Ends
WHERE NOT(End1 IS NULL AND Start = End2)
) -- Generating results
SELECT p.Start, p.[End], Max(Type1) Type1, Max(Type2) Type2
FROM periods p, yourTable t
WHERE p.start >= t.Start AND p.[End] <= t.[End]
GROUP BY p.Start, p.[End];
In above query some situations may not fit at analyzing all of them, you can improve it as you want ;).
First getting all the numbers of start and end via a Union.
Then joining those numbers on both the 'L' and 'S' records.
Uses a table variable for the test.
DECLARE #Table1 TABLE (Start int, [End] int, Type1 varchar(4), Type2 varchar(4));
INSERT INTO #Table1 (Start, [End], Type1, Type2)
VALUES (0, 2, 'L', NULL),(2, 3, NULL, 'S'),(2, 5, 'L', NULL),(3, 5, NULL, 'S'),
(5, 7, 'L', NULL),(5, 8, NULL, 'S'),(7, 10, 'L', NULL),(11, 12, NULL, 'S');
select
n.Num as Start,
(case when s.[End] is null or l.[End] <= s.[End] then l.[End] else s.[End] end) as [End],
l.Type1,
s.Type2
from
(select Start as Num from #Table1 union select [End] from #Table1) n
left join #Table1 l on (n.Num >= l.Start and n.Num < l.[End] and l.Type1 = 'L')
left join #Table1 s on (n.Num >= s.Start and n.Num < s.[End] and s.Type2 = 'S')
where (l.Start is not null or s.Start is not null)
order by Start, [End];
Output:
Start End Type1 Type2
0 2 L NULL
2 3 L S
3 5 L S
5 7 L S
7 8 L S
8 10 L NULL
11 12 NULL S