Grouped IF statement in SQL - sql

My data take this basic shape: (http://sqlfiddle.com/#!9/d4ae98/1)
CREATE TABLE Table1
(`ID` int, `Type` varchar(1));
INSERT INTO Table1
(`ID`, `Type`)
VALUES
(123, 'A'),
(123, 'B'),
(123, 'C'),
(456, 'A'),
(789, 'A'),
(789, 'B')
;
What I want is, a third column which is true/false for every row, based on whether that row's ID value has type='B' anywhere in the data. So the desired output would be:
ID Type V3
123 A t
123 B t
123 C t
456 A f
789 A t
789 B t
What is the best way to do this? (And, yes, I am aware that a scripting language like R or Python could easily do what I want here, but I want to use this output as a WITH clause in a larger SQL query.)

You can do this with a Case in the Select:
select *, CASE
WHEN id in (select id from table1 where type like '%B%') then 't'
ELSE 'f'
END V3
from table1;
Fiddle link: http://sqlfiddle.com/#!9/e47bc37/1

May a solution like this one can help you:
with Table2 as (
select * from table1 where type ='B'
)
select t1.*, case t2.type when 'B' then 't' else 'f' end v3 from table1 t1 left
outer join table2 t2 on t1.id = t2.id ;

Related

How to select just the rows in a table that fit a criteria , avoiding duplicates in SQL

So I have a df like follows:
USER Value object
0001 V V
0002 A NULL
0002 C C
0003 A NULL
0004 A NULL
0004 A NULL
0003 V V
So I basically want USER to be the unique id for each row of this DF. If there is an A in the Value column, I only want it if that's the only option for the ID. So there are two 002's, I only want to see the instance where it is not A , so C.
Because 0004 doesn't have a non-A Value, I'll take the A.
Final result:
USER Value
0001 V
0002 C
0003 V
0004 A
I think you are looking for the following:
select user,
'A' as value
from tbl
group by user
having sum(case when value = 'A' then 1 else 0 end) > 0
and sum(case when value <> 'A' then 1 else 0 end) = 0
union all
select user,
value
from tbl
where value <> 'A'
order by user;
See Fiddle:
http://www.sqlfiddle.com/#!9/b28f4c/2/0
The desired result is achieved with your example data. However, your example data does not contain any users having more than one non-A value row. The above query will keep all of them. If you only want to keep one or some, explain how to pick which you want.
This will return the one Value per tuple, returning A at last resort (if A is the smallest of the potential values):
select USER, max(Value) as value from Table
group by User
or, this might return multiple users if they have several tuples with different object (when not null)
select distinct user, coalesce(object, value)
from table ;
Here's a solution if you don't like typing :-)
select
distinct USR
,VAL
from
TBL
qualify
max(ascii(VAL)) over (partition by USR ) = ascii(VAL)
Copy|Paste|Run in snowflake:
CREATE or replace TABLE tbl( USR varchar(4), VAL varchar(1), OBJ varchar(4));
INSERT INTO tbl (USR,VAL,OBJ)
VALUES
('0001', 'V', 'V'),
('0002', 'A', NULL),
('0002', 'C', 'C'),
('0003', 'A', NULL),
('0004', 'A', NULL),
('0004', 'A', NULL),
('0003', 'V', 'V');
select
distinct USR
,VAL
from
TBL
qualify
max(ascii(VAL)) over (partition by USR ) = ascii(VAL);
You can try the following if you are using SQL-Server
select distinct USER
,Value
from
(
select *,rank() over (partition by USER order by Value desc) as ranking
from your_table_name
) as t
where ranking =1

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019
You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo
Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);
My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

Join type (inner, left) and data type casting influences query plan, and order of operations

create or replace table test.bugs.table_one as (
select *, random(1337) as cost
from (
values
('', '2010-01-01', 'one')
, ('10', '2010-01-01', 'two')
, ('11', '2010-01-01', 'three')
, ('12', '2010-01-01', 'four')
)
);
create or replace table test.bugs.table_two as (
select *, random(1337) as budget
from (
values
(9, '2010-01-01', 'one')
, (10, '2010-01-01', 'two')
)
);
with
t1 as (
select
column1::int as column1
, column2
, column3
, cost
from table_one
where column1 !=''
),
t2 as (
select
column1
, column2
, column3
, budget
from table_two
)
select *
from t1
inner join t2
on t1.column1 = t2.column1
and t1.column2 = t2.column2
and t1.column3 = t2.column3;
Returns: 3 rows
Changing the join type to INNER results in error: Numeric value '' is not recognized. Instead of ::int I ended up using try_to_number() function, but it took a bit of trial and error to figure out (query above is simplified, mine was more convoluted).
Is this a bug, or am I doing something odd?
Databases do not guarantee the order of evaluation of expressions. In some databases, your code would always work. In others, it might work sometimes and fail other times.
Is this a bug? I consider it a bug, but clearly some database vendors do not. You have found the work around. Another method would be a case expression:
select (case when column1 regexp '^[0-9]+$' then column1::int end)
This should work, because case should guarantee the order of evaluation of its arguments.
When the join because an inner join things done before or after the join are equal. So things like cast can get hoisted.
The WHERE clause is supposed to evaluate before the SELECT section of t1 CTE.
I just retested by bug submition code, and now the broken case works, but the working case (with the correct TRY_TO_NUMBER fails).
I have queries like your that worked, and then once an extra layer of select around the outside was run with an aggregation over the results, the cast was hoisted back to the error state.
But yes, it's a bug, so I would report it.

COALESCE function won't return CHAR(1)

Using COALESCE function but getting the following error:
Conversion failed when converting the varchar value 'X' to data type int.
I have to join two tables on two conditions. I want that if the second condition doesn't hold but there is a blank cell (not null but blank '') in Table 1 then to join to that row. If the second condition doesn't hold then to return a zero.
Join Table 1 and Table 2 - return Table 2 and column 3 from Table 1.
Table 1
(A, 1, X),
(A, 2, Y),
(A, 3, Z),
(A, , X),
(B, 1, X),
(B, 2, Z),
(B, 3, Y),
Table 2
(A, 1),
(A, 2),
(A, 3),
(A, 5),
(B, 1),
(B, 2),
(B, 3),
(B, 5)
I want to get a return of
(A, 1, X),
(A, 2, Y),
(A, 3, Z),
(A, 5, X),
(B, 1, X),
(B, 2, Z),
(B, 3, Y),
(B, 5, NULL)
Code:
DECLARE #table1 TABLE (letter1 CHAR(1), num1 INT, letter2 CHAR(1))
DECLARE #table2 TABLE (letter1 CHAR(1), num1 INT)
INSERT INTO #table1 VALUES
('A', 1, 'X'),
('A', 2, 'Y'),
('A', 3, 'Z'),
('A', null, 'X'),
('B', 1, 'X'),
('B', 2, 'Y'),
('B', 3, 'Z')
INSERT INTO #table2 VALUES
('A', 1),
('A', 2),
('A', 3),
('A', 5),
('B', 1),
('B', 2),
('B', 3),
('B', 5)
SELECT t2.*,
COALESCE(
(SELECT TOP 1 letter2 FROM #table1 WHERE letter1 = t2.letter1 AND num1 = t2.num1),
(SELECT TOP 1 letter2 FROM #table1 WHERE letter1 = t2.letter1 AND num1 IS NULL),
0
) AS missing_letter
FROM #table2 t2
Perhaps you need :
select t1.*, t2.*
from table1 t1 outer apply
( select top (1) t2.*
from table2 t2
where t1.col1 = t.col1 and t1.col2 in ('', t2.col2)
order by t2.col2 desc
) t2;
If I understand correctly, this has less to do with coalesce() and more to do with the joins:
select t2.*, coalesce(t1.letter2, t1def.letter2) as letter2
from table2 t2 left join
table1 t1
on t2.letter1 = t1.letter1 and t2.num1 = t1.num1 left join
table1 t1def
on t2.letter1 = t1def.letter1 and t1def.num1 is null;
The problem here is your datatype. COALESCE is short hand for a CASE expression. For example. COALESCE('a',1,'c') would be short hand for:
CASE WHEN 'a' IS NOT NULL THEN 'a'
WHEN 1 IS NOT NULL THEN 1
ELSE 'c'
END
The Documentation (COALESCE (Transact-SQL) describes this as well:
The COALESCE expression is a syntactic shortcut for the CASE
expression. That is, the code COALESCE(expression1,...n) is
rewritten by the query optimizer as the following CASE expression:
CASE
WHEN (expression1 IS NOT NULL) THEN expression1
WHEN (expression2 IS NOT NULL) THEN expression2
...
ELSE expressionN
END
A CASE expression follows Data type precedence, and int has a higher datatype precedence than varchar; thus everything will implicit cast to an int. This is why both the COALESCE and CASE expression will fail, because neither 'a' or 'c' can be converted to an int.
You'll need to therefore explicitly CONVERT your int to a varchar:
COALESCE('a',CONVERT(char(1),1),'c')
The documentation (cited above), however, also goes to state:
This means that the input values (expression1, expression2,
expressionN, etc.) are evaluated multiple times. Also, in compliance
with the SQL standard, a value expression that contains a subquery is
considered non-deterministic and the subquery is evaluated twice. In
either case, different results can be returned between the first
evaluation and subsequent evaluations.
For example, when the code COALESCE((subquery), 1) is executed, the
subquery is evaluated twice. As a result, you can get different
results depending on the isolation level of the query. For example,
the code can return NULL under the READ COMMITTED isolation level in a
multi-user environment. To ensure stable results are returned, use the
SNAPSHOT ISOLATION isolation level, or replace COALESCE with the
ISNULL function.
Considering you are using a subquery, a (nested) ISNULL might be the better choice here.
It's worth noting, as people seem to confuse them as they are functionally similar, but COALESCE and ISNULL do not behave the same. COALESCE uses Data Type precedence, however, ISNULL implicitly casts the second value to whatever the datatype of the first paramter is. Thus ISNULL('a',1) works fine, but COALESCE('a',1) does not.
Just change the zero to a null. You can't mix datatypes in a coalesce:
SELECT t2.*,
COALESCE(
(SELECT TOP 1 letter2 FROM #table1 WHERE letter1 = t2.letter1 AND num1 = t2.num1),
(SELECT TOP 1 letter2 FROM #table1 WHERE letter1 = t2.letter1 AND num1 IS NULL),
null
) AS missing_letter
FROM #table2 t2
The query works if the 0 in the COALESCE is replaced by '0'.
That way the COALESCE doesn't contain mixed data types.
SELECT t2.*,
COALESCE(
(SELECT TOP 1 letter2 FROM #table1 t1 WHERE t1.letter1 = t2.letter1 AND t1.num1 = t2.num1),
(SELECT TOP 1 letter2 FROM #table1 t1 WHERE t1.letter1 = t2.letter1 AND t1.num1 IS NULL),
'0'
) AS missing_letter
FROM #table2 t2
ORDER BY t2.letter1, t2.num1;
And you can avoid having to retrieve data from table1 twice.
By using an OUTER APPLY.
Since the expected results has a NULL for ('B',5), the COALESCE isn't even needed this way.
SELECT t2.letter1, t2.num1, t1.letter2 AS missing_letter
FROM #table2 AS t2
OUTER APPLY (
select top 1 t.letter2
from #table1 AS t
where t.letter1 = t2.letter1
and (t.num1 is null or t.num1 = t2.num1)
order by t.num1 desc
) AS t1
ORDER BY t2.letter1, t2.num1;
Result:
letter1 num1 missing_letter
------- ---- --------------
A 1 X
A 2 Y
A 3 Z
A 5 X
B 1 X
B 2 Y
B 3 Z
B 5 NULL

Update column based on IF Else Condition

I have two tables A and B
Table A
ID_number as PK
first_name,
L_Name
Table B
ID_number,
Email_id,
Flag
I have several people who have multiple email ID and are already flagged as X on table B.
Whereas i am trying to find list of people who have an email id or multiple email ID, but were never flagged.
e.g John clark might have 2 email in table B, but was never flagged.
Simply use not exists:
select a.*
from a
where not exists (select 1
from b
where b.id_number = a.id_number and b.flag = 'X'
);
You may want to perform an update, but your question seems to be only about selecting (probably to update based on select). It should be something like this:
SELECT A.L_Name
FROM A
WHERE NOT EXISTS (
SELECT 1
FROM B
WHERE B.ID_number = A.ID_number AND B.Flag = 'X'
)
OR the LEFT JOIN version
SELECT 1
FROM A
LEFT JOIN B ON B.ID_number = A.ID_number AND B.Flag = 'X'
WHER B.ID_number IS NULL
Usually, the first version is faster than the second one.
Forget Table A...
SELECT DISTINCT ID_number FROM table_b t1
WHERE NOT EXISTS(
SELECT NULL FROM table_b t2 WHERE t1.ID_number=t2.ID_number AND t2.flag='X'
)
Judging by your responses in the comments, I believe this is what you are looking for:
--drop table update_test;
create table update_test
(
id_num number,
email_id number,
flag varchar2(1) default null
);
insert into update_test values (1, 1, null);
insert into update_test values (1, 2, null);
insert into update_test values (2, 3, null);
insert into update_test values (2, 7, null);
insert into update_test values (3, 2, null);
insert into update_test values (3, 3, 'X');
insert into update_test values (3, 7, null);
select * from update_test;
select id_num, min(email_id)
from update_test
group by id_num;
update update_test ut1
set flag = case
when email_id = (
select min(email_id)
from update_test ut2
where ut2.id_num = ut1.id_num
) then 'X'
else null end
where id_num not in (
select id_num
from update_test
where Flag is not null);
The last update statement will update and set the Flag field on the record for each id_num group with the lowest email_id. If the id_num group already has the Flag field set for one it will ignore it.