Selecting rows from a table with specific values per id - sql

I have the below table
Table 1
Id WFID data1 data2
1 12 'd' 'e'
1 13 '3' '4f'
1 15 'e' 'dd'
2 12 'f' 'ee'
3 17 'd' 'f'
2 17 'd' 'f'
4 12 'd' 'f'
5 20 'd' 'f'
From this table I just want to select the rows which has 12 and 17 only exclusively. Like from the table I just want to retrieve the distinct id's 2,3 and 4. 1 is excluded because it has 12 but also has 13 and 15. 5 is excluded because it has 20.
2 in included because it has just 12 and 17.
3 is included because it has just 17
4 is included because it has just 12

If you just want the list of distinct ids that satisfy the conditions, you can use aggregation and filter with a having clause:
select id
from mytable
group by id
having max(case when wfid not in (12, 17) then 1 else 0 end) = 0
This filters out groups that have any wfid other than 12 or 17.
If you want the entire corresponding rows, then window functions are more appropriate:
select
from (
select t.*,
max(case when wfid not in (12, 17) then 1 else 0 end) over(partition by id) flag
from mytable t
) t
where flag = 0

You really need to start thinking in terms of sets. And it helps everyone if you provide a script that can be used to experiment and demonstrate. Here is another approach using the EXCEPT operator. The idea is to first generate a set of IDs that we want based on the filter. You then generate a set of IDs that we do not want. Using EXCEPT we can then remove the 2nd set from the 1st.
declare #x table (Id tinyint, WFID tinyint, data1 char(1), data2 varchar(4));
insert #x (Id, WFID, data1, data2) values
(1, 12, 'd', 'e'),
(1, 13, '3', '4f'),
(1, 15, 'e', 'dd'),
(2, 12, 'f', 'ee'),
(3, 17, 'd', 'f'),
(2, 17, 'd', 'f'),
(4, 12, 'd', 'f'),
(2, 12, 'z', 'ef'),
(5, 20, 'd', 'f');
select * from #x
select id from #x where WFID not in (12, 17);
select id from #x where WFID in (12, 17)
except
select id from #x where WFID not in (12, 17);
Notice the added row to demonstrate what happens when there are "duplicates".

Related

Replicating the functionality of CONDITIONAL_TRUE_EVENT (Snowflake) in ANSI SQL to group events Together

I have to rewrite a script written for Snowflake into Databricks and need some help on how to replicate CONDITIONAL_TRUE_EVENT as Databricks doesn't have that function.
The need is for me to group events together if they have the same user and device and took place within 300 seconds (5 minutes) of each other.
CREATE TABLE events
(
event_timestamp timestamp,
user_id bigint,
device_id bigint
);
INSERT INTO events VALUES
('2022-07-12 05:00:00',1,1),
('2022-07-12 05:03:00',1,1),
('2022-07-12 05:04:00',1,2),
('2022-07-12 05:05:00',1,2),
('2022-07-12 05:06:00',2,1),
('2022-07-12 05:07:00',1,1),
('2022-07-12 05:15:00',1,1);
SELECT event_timestamp, user_id, device_id, group_id
FROM events
should return
'2022-07-12 05:00:00',1,1,1
'2022-07-12 05:03:00',1,1,1
'2022-07-12 05:04:00',1,2,2
'2022-07-12 05:05:00',1,2,2
'2022-07-12 05:06:00',2,1,3
'2022-07-12 05:07:00',1,1,1
'2022-07-12 05:15:00',1,1,4
The first 3 instances where user_id = 1, device_id = 1 are all group_id = 1 because the next event is within 5 minute of the previous event except for the last one because (group_id = 4) because at 05:15:00 it is more than 5 minutes away from the previous event with user_id = 1, device_id = 1 (05:07:00).
It seems to me that I should be able to find some combination of LAG, CASE, and SUM to calculate the group_id, but I just cannot figure it out.
Edit: I had previously answered this for the CONDITIONAL_CHANGE_EVENT, which is a bit more challenging to express in ANSI SQL. This updated answer is for CONDITIONAL_TRUE_EVENT as the question asks.
It is simply a matter of conditional summing in the window function.
create or replace table T1(PK int, EVNT string);
insert into T1(PK, EVNT) values
(1, 'A'), (2, 'C'), (3, 'B'), (4, 'A'), (5, 'A'),
(6, 'C'), (7, 'C'), (8, 'A'), (9, 'D'), (10, 'A');
select
PK,
conditional_true_event(EVNT = 'A') over (partition by null order by PK)
from T1;
PK
CONDITIONAL_TRUE_EVENT(EVNT = 'A') OVER (PARTITION BY NULL ORDER BY PK)
1
1
2
1
3
1
4
2
5
3
6
3
7
3
8
4
9
4
10
5
select
PK,
sum(iff(EVNT = 'A', 1, 0)) over (partition by null order by PK) as TRUE_EVENT
from T1;
PK
TRUE_EVENT
1
1
2
1
3
1
4
2
5
3
6
3
7
3
8
4
9
4
10
5
So for your query you would replace iif(EVNT = 'A', 1, 0) in the conditional sum with iif(TIME_DIFF > 300, 1, 0)

SQL query to fetch distinct records

Can someone help me out with this sql query on postgres which I have to write but I just can't come up with, I have tried my best to simplify the problem from 1 million records and more constraints to this, I know this looks easy, but I am still unable to resolve this somehow :-
Table_name = t
Column_1_name = id
Column_2_name = st
Column_1_elements = [1,1,1,1,2,2,2,3,3]
Column_2_elements = [a,b,c,d,a,c,d,b,d]
Now I want to print to those distinct ids from id where they do not have their corresponding st equals to 'b' or 'a'.
For example, for the above example, the ouput should be [2,3] as 2 does not have corresponding 'b' and 3 does not have 'a'. [even though 3 does not have c also, but we are not concerned about 'c']. id=1 is not returned in solution as it has a relation with both 'a' and 'b'.
Let me know if you need more clarity.
Thanks in advance for helping.
edit1:- The number of elements for id = 1,2,3 could be anything. I just want those ids where there corresponding st does not "contain" 'a' or 'b'.
if there is an id=4 which has just one st which is 'r', and there is an id=5 which contains 'a','b','c','d','e','f','k','z'.
Then we want id=4 in the output as well as it does not contain 'a' or 'b'..
You might need to correct the syntax a little bit based on you SQL engine but this one is a working solution in Google BigQuery -
with temp as (
select 1 as id, 'a' as st union all
select 1 as id, 'b' as st union all
select 1 as id, 'c' as st union all
select 1 as id, 'd' as st union all
select 2 as id, 'a' as st union all
select 2 as id, 'c' as st union all
select 2 as id, 'd' as st union all
select 3 as id, 'b' as st union all
select 3 as id, 'd' as st union all
select 4 as id, 'e' as st union all
select 5 as id, 'g' as st union all
select 5 as id, 'h' as st
)
-- add 2 columns for is_a and is_b flags
, temp2 as (
select *
, case when st = 'a' then 1 else 0 end is_a
,case when st = 'b' then 1 else 0 end as is_b
from temp
)
-- IDs that have both the flags as 1 should be filtered out (like ID = 1)
select id
from temp2
group by 1
having max(is_a) + max(is_b) < 2
This solution takes care of the problem you mentioned with ID 4 . Let me know if this works for you.
See if this works:
create table t (id integer, st varchar);
insert into t values (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (2, 'a'), (2, 'c'), (2, 'd'), (3, 'b'), (3, 'd'), (4, 'r');
insert into t values (5, 'a'), (5, 'b'), (5, 'c'), (5, 'd'), (5, 'e'), (5, 'f'), (5, 'k'), (5, 'z');
select id, array['a', 'b'] <# array_agg(st)::text[] as tf from t group by id;
id | tf
----+----
3 | f
5 | t
4 | f
2 | f
1 | t
select * from (select id, array['a', 'b'] <# array_agg(st)::text[] as tf from t group by id) as agg where agg.tf = 'f';
id | tf
----+----
3 | f
4 | f
2 | f
In the first select query the array_agg(st) aggregates all the st values for an id via the group by id. array['a', 'b'] <# array_agg(st)::text[] then asks if the a and b are both in the array_agg.
The query is then turned into a sub-query where the outer query selects those rows that where 'f'(false), in other words did not have both a and b in the aggregated id values.

How to return boolean value in Redshift

I have a following table
id person type counted expected
1 a A 0 1
2 a A 1 0
3 a B 1 0
4 a B 2 0
5 a B 3 4
6 b C 0 0
First I'd like to group by type and aggregate by summing counted and expected
person type sum(counted) sum(expected)
a A 1 1
a B 6 4
b C 0 0
Then I'd like to add boolean whether sum(counted)equalsum(expected) or not.
person type sum(counted) sum(expected) counted=expected
a A 1 1 true
a B 6 4 false
b C 0 0 true
And then I'd like to group by in person and return boolean with and in person
person has_false
a false
b true
Are there any way to achieve this?
I went halfway but didn't proceed yet.
select person,type,sum(counted),sum(expected)
from table
group by person,type
If someone has opinion,please let me know
Thanks
This should work. I've laid it out like you described but I don't think you need to sum by person, type - rather just summing by person will work (for this example).
drop table if exists test;
create table test (id int, person varchar(1), typ varchar(1), counted int, expected int);
insert into test values
(1, 'a', 'A', 0, 1),
(2, 'a', 'A', 1, 0),
(3, 'a', 'B', 1, 0),
(4, 'a', 'B', 2, 0),
(5, 'a', 'B', 3, 4),
(6, 'b', 'C', 0, 0);
with grouped as (
select person, typ, sum(counted) as scount, sum(expected) as ecount, scount = ecount as equal
from test
group by person,typ
)
select person, bool_and(equal) as has_false
from grouped
group by person;

How to recode a field into this specific structure using T-SQL?

I am using SQL Server 2014 and I have a column (ID) in a table (tbl1). The column ID is a nvarchar field.
Here are some examples of what it contains:
ID
18FD64245
533040174
12AZ61356
19AK13355
18HD24189
I would like to run a T-SQL query to recode those values based on the following logic:
IF THEN IF THEN
A 1 0 3
B 2 1 6
C 3 2 7
D 4 3 1
E 5 4 2
F 6 5 4
G 7 6 8
H 8 7 9
I 9 8 5
J 10 9 0
K 11
L 12
M 13
N 14
O 15
P 16
Q 17
R 18
S 19
T 20
U 21
V 22
W 23
X 24
Y 25
Z 26
Therefore the first 2 values shown above would be recoded as:
ID ID2
18FD64245 656482724
533040174 411323692
I am having a hard time approaching the problem from a T-SQL point of view. I am thinking about using CASE Statements to solve the problem. I also had a look at the REPLACE function.
But I am stuck as to how to go about it since the ID field is an alpha-numeric field.
Any ideas on how I move forward with this?
Edit (to show my sql codes as per answer proposed by #Squirrel):
declare #map table
(
map_fr char(1),
map_to varchar(2)
)
insert into #map
values
('A', '1'),
('B', '2'),
('C', '3'),
('D', '4'),
('E', '5'),
('F', '6'),
('G', '7'),
('H', '8'),
('I', '9'),
('J', '10'),
('K', '11'),
('L', '12'),
('M', '13'),
('N', '14'),
('O', '15'),
('P', '16'),
('Q', '17'),
('R', '18'),
('S', '19'),
('T', '20'),
('U', '21'),
('V', '22'),
('W', '23'),
('X', '24'),
('Y', '25'),
('Z', '26')
; with rcte as
(
select [ID], idx = 1, ch = substring([ID], 1, 1)
from Table1
WHERE [ID] IS NOT NULL
union all
select [ID], idx = idx + 1, ch = substring([ID], idx + 1, 1)
from rcte
where idx < len([ID])
),
cte as
(
select r.[ID], r.idx, m.map_to
from rcte r
inner join #map m on r.ch = m.map_fr
)
select [ID],
(select '' + map_to from cte x where x.[ID] = c.[ID] order by idx for xml path('')) as ID2
from cte c
group by [ID]
order by [ID]
I would create a mapping table like
declare #map table
(
map_fr char(1),
map_to varchar(2)
)
and insert the mapping there
insert into #map
values ('A', '1'), ('B', '2'), ('C', '3'), ('D', '4'), ('E', '5'), ('F', '6'),
('G', '7'), ('H', '8'), ('I', '9'), ('J', '10'),('K', '11'),('L', '12'),
('M', '13'),('N', '14'),('O', '15'),('P', '16'),('Q', '17'),('R', '18'),
('S', '19'),('T', '20'),('U', '21'),('V', '22'),('W', '23'),('X', '24'),
('Y', '25'),('Z', '26'),
('0', '3'), ('1', '6'), ('2', '7'), ('3', '1'), ('4', '2'), ('5', '4'),
('6', '8'), ('7', '9'), ('8', '5'), ('9', '0')
then use recursive CTE to split the character and join to the mapping table. And finally concatenate back the string using the mapped value.
; with rcte as
(
select ID, idx = 1, ch = substring(ID, 1, 1)
from yourtbl
union all
select ID, idx = idx + 1, ch = substring(ID, idx + 1, 1)
from rcte
where idx < len(ID)
),
cte as
(
select r.ID, r.idx, m.map_to
from rcte r
inner join #map m on r.ch = m.map_fr
)
select ID,
(select '' + map_to from cte x where x.ID = c.ID order by idx for xml path('')) as ID2
from cte c
group by ID
order by ID
This is better suited as an scalar function, but if you want to do it all on a single SQL statement, here is a way :
select ID,
case substring(ID, 1, 1) when 'A' then '1'
when 'B' then '2'
...
when '9' then '0'
end
+
case substring(ID, 2, 1) when 'A' then '1'
when 'B' then '2'
...
when '9' then '0'
end
+
...
...
case substring(ID, 9, 1) when 'A' then '1'
when 'B' then '2'
...
when '9' then '0'
end
as ID2
from MY_TABLE
You can also map these using a tally table and some of the new features of SQL Server 2017 (STRING_AGG):
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE IDS
(
ID NVARCHAR(9)
)
INSERT INTO IDS
VALUES ('18FD64245'),
('533040174'),
('12AZ61356'),
('19AK13355'),
('18HD24189');
Query 1:
WITH Tally
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY Nums.Num) AS Number
FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS Nums(Num)
CROSS APPLY (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS Nums2(Num)
),
Chars
As
(
-- Turn each character of ID to new row
SELECT ID, SUBSTRING(ID, Number, 1) AS OldChar, Number As Ind
FROM IDS
CROSS APPLY Tally
WHERE SUBSTRING(ID, Number, 1) <> ''
),
NewChars
AS
(
-- Map old characters to new characters
SELECT *,
CASE WHEN ISNumeric(OldChar) = 1 THEN
-- effectively a mapping string to map old characters to new
SUBSTRING('3671248950', CHARINDEX(OldChar, '0123456789'), 1)
ELSE
-- for alphanumeric we can simply make 'A' be 1 and 'B' be 2
-- by subtracting the ASCII value of 'A' from the ASCII of the
-- Character and add 1
ASCII(OldChar) - ASCII('A') + 1
END As NewChar
FROM Chars
)
-- Recombine New Characters to form new Id (SQL Server 2017 only)
SELECT ID, STRING_AGG(NewChar,'') WITHIN GROUP (ORDER BY Ind) AS NewId
FROM NewChars
GROUP BY ID
ORDER BY Id
Results:
| ID | NewId |
|-----------|------------|
| 12AZ61356 | 6712686148 |
| 18FD64245 | 656482724 |
| 18HD24189 | 658472650 |
| 19AK13355 | 6011161144 |
| 533040174 | 411323692 |

Fetch duplicate records as well as Unique record in ssis

I have table mentioned below (id and Loc are Primary Keys)
ID LOC RNK NBR1 NBR2
1 2 A 10 b --->
3 4 A 10 b --->
5 6 A 11 C
8 2 A 12 D
6 3 A 10 b --->
SO here I have to fetch only duplicate records according to NBR1 and NBR2, It should fetch all the records not only the duplicates(marked as --->).
If I understood your question correctly you can do it with a subquery
CREATE TABLE #Test (ID int, LOC int, RNK char(1), NBR1 int, NBR2 char(1) )
INSERT INTO #Test VALUES
(1, 2, 'A', 10, 'b'),
(3, 4, 'A', 10, 'b'),
(5, 6, 'A', 11, 'C'),
(8, 2, 'A', 12, 'D'),
(6, 3, 'A', 10, 'b')
SELECT *
FROM #Test t1
WHERE EXISTS
(SELECT 1
FROM #Test t2
WHERE t1.NBR1 = t2.NBR1
AND t1.NBR2 = t2.NBR2
GROUP BY NBR1, NBR2
HAVING COUNT(1) > 1)
You can also use this, but cost will be more. The RowsCount having values greater than 1 are duplicate and having values 1 are unique records.
With Temp As
(
Select ID,LOC,RNK,NBR1,NBR2,Row_NUMBER() OVER (PARTITION BY NBR2 ORDER BY NBR1) AS ROWSCOUNT FROM <<TABLE_NAME>>
)
Select * from Temp