There is some legacy code that I am convinced could be replaced in a more elegant and robust solution.
A series of flags are used to identify the classification of a row. A simplified example:
case when flag1 is True
and flag2 is True
and flag3 is True then 'ABC',
case when flag1 is False
and flag2 is True
and flag3 is True then 'DEF',
the challenge arises as not all flags are used in the case statements. The code continues:
case when flag3 is True
and flag4 is False then 'CEA',
etc.
I had thought of having a reference table which would have all classification combinations and could then be joined to the flags to get the classifications.
flag1
flag2
flag3
flag4
classification
True
True
True
ABC
False
True
True
DEF
...
...
...
...
...
True
False
CEA
Because of way I've had the joins working, all flags are required and I have not found a way to join just flag1, flag2, and flag3 for the first case and just flag3 and flag4 for the last case, etc.
It is acceptable for flag4 to be any value for the first two cases ('ABC' and 'DEF'), and so on for other cases where the flags are not explicitly defined.
The code I'm looking as has nearly 10000 lines of these case statements. There are no rules found that simplify the classifications enough to generate them in some other way.
Is there an elegant way to replace repetitive case statements as seen in this example?
I believe a reference table or similar solution would be ideal, as it would avoid code changes if any cases are added or modified.
I'm not proficient on Bigquery but in Oracle I would do something like this:
from
data d
join
classification c
on coalesce(c.flag1, d.flag1, False) = coalesce(d.flag1, False)
and coalesce(c.flag2, d.flag2, False) = coalesce(d.flag2, False)
...
The idea is if the classification reference table doesn't care about a flag, you just compare the base table's flag against itself. The False defaults are there to handle cases where the base table has a null flag.
The main thing to be careful of when joining like this is you lose the "first match" short circuit of a CASE statement and your base row could easily end up joining to multiple reference rows. You'll want a priority column on the reference table you can sort out the "first match" after joining.
Consider below for BigQuery
with classifications as (
select True flag1, True flag2, True flag3, null flag4, 'ABC' classification union all
select False, True, True, null, 'DEF' union all
select null, null, True, False, 'CEA'
)
select *,
(
select classification
from classifications
where if(flag1 is null, true, flag1 = t.flag1)
and if(flag2 is null, true, flag2 = t.flag2)
and if(flag3 is null, true, flag3 = t.flag3)
and if(flag4 is null, true, flag4 = t.flag4)
limit 1
) as classification
from your_table t
you can test it using below dummy data for your_table
with your_table as (
select true flag1, true flag2, true flag3, true flag4 union all
select false, true, true, false union all
select false, false, true, false
), classifications as (
select True flag1, True flag2, True flag3, null flag4, 'ABC' classification union all
select False, True, True, null, 'DEF' union all
select null, null, True, False, 'CEA'
)
select *,
(
select classification
from classifications
where if(flag1 is null, true, flag1 = t.flag1)
and if(flag2 is null, true, flag2 = t.flag2)
and if(flag3 is null, true, flag3 = t.flag3)
and if(flag4 is null, true, flag4 = t.flag4)
limit 1
) as classification
from your_table t
with output
You can use the below solution to get one row per classification of your choice. Only thing is: you have to keep the classification rules hardcoded inside the query (which may or may not be fine depending on your code):
-- this is your raw data
with data as (
select true as flag1, true as flag2, true as flag3, true as flag4
union all
select false as flag1, true as flag2, true as flag3, false as flag4
union all
select null as flag1, false as flag2, false as flag3, true as flag4
)
select distinct p.flag1, p.flag2, p.flag3, p.flag4, p.classification from data, unnest([
struct(
flag1 as flag1,
flag2 as flag2,
flag3 as flag3,
flag4 as flag4,
if(flag1 is true and flag2 is true and flag3 is true, 'ABC', null) as classification
), -- rule#1
struct(
flag1 as flag1,
flag2 as flag2,
flag3 as flag3,
flag4 as flag4,
if(flag1 is false and flag2 is true and flag3 is true, 'DEF', null) as classification
), -- rule#2
struct(
flag1 as flag1,
flag2 as flag2,
flag3 as flag3,
flag4 as flag4,
if(flag3 is true and flag4 is false, 'CEA', null) as classification
), -- rule#3
struct(
flag1 as flag1,
flag2 as flag2,
flag3 as flag3,
flag4 as flag4,
if(flag2 is true, 'XYZ', null) as classification
) -- rule#4
]) as p
where p.classification is not null
Related
Input Data:
columnA columnB
true false
true true
false false
false true
Problem Statement:
From above mentioned data, I want to use different columns to get the result.
Expected Output:
columnA columnB result
true false A
true true B
false false C
false true C
Tried SQL Query:
SELECT
columnA,
columnB,
CASE columnA WHEN 'true' AND columnB ='false' THEN 'A'
WHEN 'true' AND columnB ='true' THEN 'B'
ELSE 'C' END AS result
It seems unable to use different columns in CASE expression. Is there any solution?
Yes you can use different columns but, you need to rewrite your query
SELECT
columnA,
columnB,
CASE WHEN columnA = 'true' AND columnB ='false' THEN 'A'
WHEN columnA = 'true' AND columnB ='true' THEN 'B'
ELSE 'C' END AS result
FROM mytable
Consider below "version"
select *,
case (columnA, columnB)
when (true, false) then 'A'
when (true, true) then 'B'
else 'C'
end result
from your_table
if applied to sample data in your question - output is
I've been stuck on this one for quite some time now and I can't figure it out.
Here's my problem:
I have two boolean columns condition_1 and condition_2, and I want to create a third column inc where the value increments each time this condition if condition_2 is false and lead(condition_1) over(partition by column_x order by column_y) is false is met.
The result would look something like that:
column_x column_y condition_1 condition_2 inc
A 12/03/2020 true true 1
A 13/03/2020 true false 1
A 14/03/2020 false false 2
A 15/03/2020 false true 3
A 16/03/2020 true false 3
A 17/03/2020 false true 4
Doing something like
if(condition_2 is false and lead(condition_1) over(partition by column_x order by column_y) is false, lag(inc) over(partition by column_x order by column_y) + 1, lag(inc) over(partition by column_x order by column_y)) inc obv doesn't work since inc doesn't yet exist at the time of the query, and doing
if(condition_2 is false and lead(condition_1) over(partition by column_x order by column_y) is false, + 1, + 0) inc won't be incremental as it will reset to 0 for each row.
Does someone have an idea?
Thanks a lot!
You describe this formula:
select t.*,
countif( (not condition_2) and (not next_1)) over (partition by column_x order by column_y)
from (select t.*,
lead(condition_1) over (partition by column_x order by column_y) as next_1
from t
) t;
If you want the numbers to start at 1, then you need to add "1" to the value.
I have a table where I am determining whether a person's ID number exists across multiple databases. If the ID exists in only one database, then I would like to add another column that labels the person as "UNIQUE"; otherwise, it should be labeled as "NOT UNIQUE".
My query thus far is set up like this:
/* CTE that creates a long column of all distinct PersonID's across three databases */
WITH cte as
(SELECT DISTINCT t1.*
FROM
(SELECT PersonID FROM DB_1.dbo.Persons
UNION
SELECT PersonID FROM DB_2.dbo.Persons
UNION
SELECT PersonID FROM DB_3.dbo.Persons)
t1)
/* Use CASE WHEN statements to check if Person exists in three other tables in DB_1, DB_2, and DB_3 */
SELECT PersonID,
CASE WHEN PersonID IN (SELECT PersonID FROM DB_1.dbo.Table_1
UNION
SELECT PersonID FROM DB_1.dbo.Table_2
UNION
SELECT PersonID FROM DB_1.dbo.Table_3)
THEN 'TRUE'
ELSE 'FALSE'
END AS IN_DB_1,
CASE WHEN PersonID IN (SELECT PersonID FROM DB_2.dbo.Table_1
UNION
SELECT PersonID FROM DB_2.dbo.Table_2
UNION
SELECT PersonID FROM DB_2.dbo.Table_3)
THEN 'TRUE'
ELSE 'FALSE'
END AS IN_DB_2,
CASE WHEN PersonID IN (SELECT PersonID FROM DB_3.dbo.Table_1
UNION
SELECT PersonID FROM DB_3.dbo.Table_2
UNION
SELECT PersonID FROM DB_3.dbo.Table_3)
THEN 'TRUE'
ELSE 'FALSE'
END AS IN_DB_3
FROM cte
The results look like this:
PersonID IN_DB_1 IN_DB_2 IN_DB_3
---------|----------|----------|----------|
001 TRUE FALSE FALSE
002 FALSE TRUE TRUE
003 TRUE FALSE FALSE
004 FALSE TRUE FALSE
005 TRUE FALSE TRUE
As can be seen, PersonID numbers 001, 003, and 004 appear only in one database.
I would like to add a fifth column called "PID_UNIQUE" that counts the number of "TRUE" text values across the columns and specifies whether the person is unique.
It should look like this:
PersonID IN_DB_1 IN_DB_2 IN_DB_3 PID_UNIQUE
---------|----------|----------|----------|-----------|
001 TRUE FALSE FALSE UNIQUE
002 FALSE TRUE TRUE NOT UNIQUE
003 TRUE FALSE FALSE UNIQUE
004 FALSE TRUE FALSE UNIQUE
005 TRUE FALSE TRUE NOT UNIQUE
I assume this would be set up using another CASE WHEN expression. I am a little stuck as to how I could write that out to count across the three "IN_DB_no" columns.
I tried this:
CASE WHEN COUNT('TRUE') = 1
THEN 'UNIQUE'
ELSE 'NOT UNIQUE'
END AS PID_UNIQUE
However, it returned a column where all records were unique, which is not what I need.
I have a table where I am determining whether a person's ID number exists across multiple databases.
Your sample query references many more tables than this suggests. Hence, it seems much more complicated than necessary.
Let me assume that there are really three tables, one in each database. I see just an aggregation after UNION ALL:
SELECT PersonID, MAX(in_1), MAX(in_2), MAX(in_3),
(CASE WHEN MAX(in_1) + MAX(in_2) + MAX(in_3) = 1 THEN 'UNIQUE'
ELSE 'NOT UNIQUE'
END) as pid_Unique
FROM ((SELECT DISTINCT PersonID, 1 as in_1, 0 as in_2, 0 as in_3
FROM DB_1.dbo.Persons
) UNION ALL
(SELECT DISTINCT PersonID, 0 as in_1, 1 as in_2, 0 as in_3
FROM DB_2.dbo.Persons
) UNION ALL
(SELECT DISTINCT PersonID, 0 as in_1, 0 as in_2, 1 as in_3
FROM DB_3.dbo.Persons
)
) p
GROUP BY PersonId;
I figured out a solution that works for me using the CROSS APPLY operator, along with a CASE / WHEN expression.
Basically, I added an additional column to the table I already made.
The query looked like this:
SELECT * FROM My_New_DB.dbo.My_New_Tbl
CROSS APPLY (
SELECT CASE WHEN 1 = (SELECT COUNT(*)
FROM (VALUES (IN_DB_1), (IN_DB_2), (IN_DB_3)) C (Val)
WHERE Val = 'TRUE')
THEN 'UNIQUE'
ELSE 'NOT UNIQUE'
END AS UNIQUE_ID
) A
Simply put, when 1 = 1, it is unique.
I had to create 5 flags in my dataset, per member record. Now the final requirement is to sum all the flags for each member.
For eg, one member has - Y,Y,,,Y i.e 3 flags set to Y. I need sum of these as 3 in my last created sum field.
I am doing this in Oracle ( Proc SQL in SAS)
Please help somebody!!
Thanks a lot..
Use CATX to combine all into one field
Use COUNTC to count them all.
select countc(catx(', ', flag1, flag2, flag3, flag4, flag5), 'Y') as num_y
One method uses case:
select t.*,
((case when flag1 = 'Y' then 1 else 0 end) +
(case when flag2 = 'Y' then 1 else 0 end) +
(case when flag3 = 'Y' then 1 else 0 end) +
(case when flag4 = 'Y' then 1 else 0 end) +
(case when flag5 = 'Y' then 1 else 0 end)
) as num_ys
from t;
Concatenate the 5 flags, then use regexp_count() to count the "Y"s. the following also allows for upper/lower case if that is an issue.
select regexp_count(flag1||flag2||flag3||flag4||flag5,'[yY]')
Oracle 10g and up
I'd recommend a reconfiguration of the table to ingest 1,0 for Y,N. If possible.
If not, oracle has a regexp_count. Concatenate the fields then regexp_count.
user9026530 specifically said he is using SAS, here is a pure SQL solution involving no SAS functions:
WITH
aset
AS
(SELECT 'Y' flag1
, 'N' flag2
, 'Y' flag3
, 'Y' flag4
, NULL flag5
FROM DUAL)
SELECT aset.*, LENGTH (REGEXP_REPLACE (flag1 || flag2 || flag3 || flag4 || flag5, '[^Y]', NULL)) flag_count
FROM aset
FLAG1 FLAG2 FLAG3 FLAG4 FLAG5 FLAG_COUNT
Y N Y Y 3
I have table with columns:
flag1
flag2
flag1_column1
flag1_column2
flag1_column3
flag2_column1
flag2_column2
flag2_column3
my requirement is:
If both flag1 and flag2 have value true then in result I should get two records:
flag1, flag1_column1, flag1_column2, flag1_column3
flag2, flag2_column1, flag2_column2, flag2_column3
my second requirement:
If flag1 is null or 0 then I should get only one record:
flag2 ,flag2_column1, flag2_column2, flag2_column3
my third requirement:
If flag2 is null or 0 then I should get only one record:
flag1 , flag1_column1, flag1_column2, flag1_column3
This is query that returns desired result:
select flag1 as flag,
flag1_column1 as c1,
flag1_column2 as c2,
flag1_column3 as c3
from t where flag1=1
union
select flag2 as flag,
flag2_column1 as c1,
flag2_column2 as c3,
flag2_column3 as c3
from t where flag2=1