Nested case with multiple sub conditions - sql

I'm having trouble to understand how to nest case statements properly.
(MSSQL Server 2012)
Let's have the following table given.
The Column StatusMissing is what I want to create
+------+--+------+--+------+--+------+--+------+--+------+--+---------------+
| a1 | | a2 | | a3 | | b1 | | c1 | | d2 | | StatusMissing |
+------+--+------+--+------+--+------+--+------+--+------+--+---------------+
| OK | | OK | | OK | | OK | | OK | | OK | | AllOK |
| NULL | | NULL | | OK | | OK | | OK | | OK | | As |
| OK | | NULL | | OK | | OK | | OK | | OK | | As |
| OK | | OK | | NULL | | OK | | OK | | OK | | As |
| OK | | OK | | OK | | NULL | | OK | | OK | | B |
| OK | | OK | | OK | | OK | | NULL | | OK | | C |
| OK | | OK | | OK | | OK | | OK | | NULL | | D |
| NULL | | OK | | OK | | NULL | | NULL | | OK | | ABC |
| NULL | | OK | | OK | | OK | | NULL | | NULL | | ACD |
| NULL | | OK | | OK | | NULL | | OK | | NULL | | ABD |
| NULL | | OK | | OK | | NULL | | NULL | | NULL | | ABCD |
| NULL | | OK | | OK | | OK | | NULL | | NULL | | ACD |
| OK | | OK | | OK | | NULL | | NULL | | OK | | BC |
| OK | | OK | | OK | | OK | | OK | | OK | | AllOK |
| OK | | NULL | | OK | | OK | | NULL | | OK | | AC |
| OK | | OK | | OK | | NULL | | OK | | NULL | | BD |
| OK | | OK | | OK | | OK | | NULL | | NULL | | CD |
+------+--+------+--+------+--+------+--+------+--+------+--+---------------+
First, to understand the concept of nesting I simplified the table:
+------+--+------+--+------+
| a1 | | a2 | | b1 |
+------+--+------+--+------+
| OK | | OK | | OK |
| OK | | OK | | NULL |
| OK | | NULL | | OK |
| NULL | | OK | | OK |
| NULL | | NULL | | OK |
| NULL | | OK | | NULL |
| OK | | NULL | | NULL |
+------+--+------+--+------+
These attempts lead to these failures.
Query1
SELECT a1, a2, b1 'StatusMissing' =
CASE
WHEN a1 IS NULL
THEN
CASE
WHEN a1 IS NULL
THEN
CASE
WHEN b1 IS NULL
THEN 'AB'
END
ELSE 'A'
END
WHEN b1 IS NULL
THEN 'B'
ELSE 'AllOK'
END
FROM Table;
Result1:
+------+--+------+--+------+--+---------------+
| a1 | | a2 | | b1 | | StatusMissing |
+------+--+------+--+------+--+---------------+
| OK | | OK | | OK | | AllOK |
| OK | | OK | | NULL | | B |
| OK | | NULL | | OK | | AllOK |
| NULL | | OK | | OK | | NULL |
| NULL | | NULL | | OK | | NULL |
| NULL | | OK | | NULL | | AB |
| OK | | NULL | | NULL | | B |
+------+--+------+--+------+--+---------------+
Query2 (Else as main)
SELECT a1, a2, b1, 'Status' =
CASE
WHEN a1 IS NOT NULL AND a2 IS NOT NULL AND b1 IS NOT NULL
THEN 'AllOK!'
ELSE
CASE
WHEN a2 IS NOT NULL OR a2 IS NOT NULL
THEN
CASE
WHEN b1 IS NULL
THEN 'AB'
END
WHEN b1 IS NULL
THEN 'B'
ELSE 'A'
END
END
FROM Table;
Result2
+------+--+------+--+------+--+---------------+
| a1 | | a2 | | b1 | | StatusMissing |
+------+--+------+--+------+--+---------------+
| OK | | OK | | OK | | AllOK |
| OK | | OK | | NULL | | AB |
| OK | | NULL | | OK | | A |
| NULL | | OK | | OK | | NULL |
| NULL | | NULL | | OK | | A |
| NULL | | OK | | NULL | | AB |
| OK | | NULL | | NULL | | B |
+------+--+------+--+------+--+---------------+
What the hell am I doing wrong?
I'm quite new to SQL, so if there is a proper function to do this I would appreciate the info!
EDIT:
If something like this would be possible in SQL i mean:
Column StatusMissing = ' missing'
If(a1 == NULL) { StatusMissing += 'A'}
EDIT2:
The column StatusMissing IS NOT THERE!
I want to create it using the SQL statements like below.
SELECT .... Status =
So basically I only have A1,A2,B1 (in the simple table). Please don't get confused with the first Table. It's only there to SHOW HOW IT SHOULD look like.

For the simplified table, assuming data type to be nvarchar.
Try using UPDATE-
UPDATE [dbo].[StatusMissing]
SET result='';
UPDATE [dbo].[StatusMissing]
SET result= CONCAT(result , 'A')
WHERE a1 is null or a2 is null;
UPDATE [dbo].[StatusMissing]
SET result= CONCAT(result , 'B')
WHERE b1 is null ;
UPDATE [dbo].[StatusMissing]
SET result= 'AllOK'
WHERE result ='';
This can be done in one step as well.

I might suggest that you make two small modifications to your output:
Instead of "As", just say "A".
Instead of "AllOK", just leave the field blank.
With these modifications, the rules are pretty easy:
select t.*,
((case when a1 is null or a2 is null or a3 is null then 'A' else '' end) +
(case when b1 is null then 'B' else '' end) +
(case when c1 is null then 'C' else '' end) +
(case when d1 is null then 'D' else '' end)
) as StatusMissing
from table t;
If you do want your version, a subquery is perhaps the easiest way:
select t. . . .,
(case when StatusMissing = '' then 'AllOK'
when StatusMissing = 'A' then 'As'
else StatusMissing
end) as StatusMissing
from (select t.*,
((case when a1 is null or a2 is null or a3 is null then 'A' else '' end) +
(case when b1 is null then 'B' else '' end) +
(case when c1 is null then 'C' else '' end) +
(case when d1 is null then 'D' else '' end)
) as StatusMissing
from table t
) t

You can play with COALESCE and a couple of CASE conditions
SELECT a1,
a2,
a3,
b1,
c1,
d2,
COALESCE(
CASE WHEN
b1 = 'OK'
AND c1 = 'OK'
AND d2 = 'OK'
AND (a1 IS NULL OR a2 IS NULL OR a3 is NULL)
THEN 'As'
ELSE ''
END,
CASE WHEN
(a1 IS NULL OR a2 IS NULL or a3 is NULL)
THEN 'A'
END
+ CASE WHEN
b1 IS NULL
THEN 'B'
ELSE ''
END
+ CASE WHEN
c1 IS NULL
THEN 'C'
ELSE ''
END
+ CASE WHEN
d2 IS NULL
THEN 'D'
ELSE ''
END,
'AllOK') AS 'StatusMissing'
FROM Table;

Related

Spark DataFrame: Ignore columns with empty IDs in groupBy

I have a dataframe e.g. with this structure:
ID | Date | P1_ID | P2_ID | P3_ID | P1_A | P1_B | P2_A | ...
============================================================
1 | 123 | 1 | | | A1 | B1 | | ... <- only P1_x columns filled
1 | 123 | 2 | | | A2 | B2 | | ... <- only P1_x filled
1 | 123 | 3 | | | A3 | B3 | | ... <- only P1_x filled
1 | 123 | | 1 | | | | A4 | ... <- only P2_x filled
1 | 123 | | 2 | | | | A5 | ... <- only P2_x filled
1 | 123 | | | 1 | | | | ... <- only P3_x filled
I need to combine the rows, that have the same ID, Date and Px_ID values, but without caring for empty values in the Px_ID when comparing the key columns.
In the end I need a dataframe like this:
ID | Date | P1_ID | P2_ID | P3_ID | P1_A | P1_B | P2_A | ...
============================================================
1 | 123 | 1 | 1 | 1 | A1 | B1 | A4 | ...
1 | 123 | 2 | 2 | | A2 | B2 | A5 | ...
1 | 123 | 3 | | | A3 | B3 | | ...
Is this possible and how? Thank you!
I found a solution for this problem: Since the non-relevant x_ID columns are empty, one possible way is to create a new column combined_ID that contains a concatenation of all x_ID column values (this will only contain one value, since only one x_ID is not empty in each row):
var xIdArray = Seq[Column]("P1_ID", "P2_ID", "P3_ID")
myDF = myDF.withColumn("combined_ID", concat(xIdArray : _*))
This changes the DF to following structure:
ID | Date | P1_ID | P2_ID | P3_ID | P1_A | P1_B | P2_A | ... | combined_ID
===========================================================================
1 | 123 | 1 | | | A1 | B1 | | ... | 1
1 | 123 | 2 | | | A2 | B2 | | ... | 2
1 | 123 | 3 | | | A3 | B3 | | ... | 3
1 | 123 | | 1 | | | | A4 | ... | 1
1 | 123 | | 2 | | | | A5 | ... | 2
1 | 123 | | | 1 | | | | ... | 1
Now, I can simply group my DF by ID, Date and combined_ID and aggreate all the relevant columns by e.g. max function to get the values of the non-empty cells:
var groupByColumns : Seq[String] = Seq("ID", "Date", "x_ID")
var aggColumns : Seq[String] = Seq("P1_ID", "P2_ID", "P3_ID", "P1_A", "P1_B", "P2_A", ...)
myDF = myDF.groupBy(groupByColumns.head, groupByColumns.tail : _*).agg(aggColumns.head, aggColumns.tail : _*)
Result:
ID | Date | combined_ID | P1_ID | P2_ID | P3_ID | P1_A | P1_B | P2_A | ...
===========================================================================
1 | 123 | 1 | 1 | 1 | 1 | A1 | B1 | A4 | ...
1 | 123 | 2 | 2 | 2 | | A2 | B2 | A5 | ...
1 | 123 | 3 | 3 | | | A3 | B3 | | ...

Moving data to correct record

I have a table where the data is needs to be corrected. Below is an example of one record. Basically the data in the selling closed_unit needs to be in the Agent_to_Agent Ref close_unit. I have tried every different what I can think of but I can't get it figured out. I am sure it is fairly simple I think I am just looking too hard at the wrong way. Any help is greatly appreciated!
Current (bad) data:
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| sale_no | payeeID | ComType | close_units | record_type | ref_agent_type | referring_agentID | ref_side |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| 7586 | 1001 | Listing | 1 | Listing | NULL | 0 | |
| 7586 | 2001 | Selling | 1 | Selling | NULL | 0 | |
| 7586 | 3254 | NULL | 0 | Off The Top Ref | NULL | 0 | L |
| 7586 | 4684 | Agent to Agent Ref | 0 | Agent Paid Ref | Selling | 2001 | |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
Expected result:
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| sale_no | payeeID | ComType | close_units | record_type | ref_agent_type | referring_agentID | ref_side |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| 7586 | 1001 | Listing | 1 | Listing | NULL | 0 | |
| 7586 | 2001 | Selling | 0 | Selling | NULL | 0 | |
| 7586 | 3254 | NULL | 0 | Off The Top Ref | NULL | 0 | L |
| 7586 | 4684 | Agent to Agent Ref | 1 | Agent Paid Ref | Selling | 2001 | |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
The following query will copy the value to the "Agent to Agent Ref" row:
update my_table t1 set close_units = (
select close_units from my_table t2
where t2.sale_no = t1.sale_no and t2.ComType = 'Selling'
)
where ComType = 'Agent to Agent Ref';
And this one will reset the "Selling" value to zero:
update my_table t1
set close_units = 0
where ComType = 'Selling'
and exists (
select close_units from my_table t2
where t2.sale_no = t1.sale_no and t2.ComType = 'Agent to Agent Ref'
)

Oracle SQL Left join same table unknown amount of times

I have this table
| old | new |
|------|-------|
| a | b |
| b | c |
| d | e |
| ... | ... |
| aa | bb |
| bb | ff |
| ... | ... |
| 11 | 33 |
| 33 | 523 |
| 523 | 4444 |
| 4444 | 21444 |
The result I want to achieve is
| old | newest |
|------|--------|
| a | e |
| b | e |
| d | e |
| ... | |
| aa | ff |
| bb | ff |
| ... | |
| 11 | 21444 |
| 33 | 21444 |
| 523 | 21444 |
| 4444 | 21444 |
I can hard code the query to get the result that I want.
SELECT
older.old,
older.new,
newer.new firstcol,
newer1.new secondcol,
…
newerX-1.new secondlastcol,
newerX.new lastcol
from Table older
Left join Table newer
on older.old = newer.new
Left join Table newer1
on newer.new = newer1.old
…
Left join Table newerX-1
on newerX-2.new = newerX-1.old
Left join Table newerX
on newerX-1.new = newerX.old;
and then just take the first value from the right that is not null.
Illustrated here:
| old | new | firstcol | secondcol | thirdcol | fourthcol | | lastcol |
|------|-------|----------|-----------|----------|-----------|-----|---------|
| a | b | c | e | null | null | ... | null |
| b | c | e | null | null | null | ... | null |
| d | e | null | null | null | null | ... | null |
| ... | ... | ... | ... | ... | ... | ... | null |
| aa | bb | ff | null | null | null | ... | null |
| bb | ff | null | null | null | null | ... | null |
| ... | ... | ... | ... | ... | ... | ... | null |
| 11 | 33 | 523 | 4444 | 21444 | null | ... | null |
| 33 | 523 | 4444 | 21444 | null | null | ... | null |
| 523 | 4444 | 21444 | null | null | null | ... | null |
| 4444 | 21444 | null | null | null | null | ... | null |
The problem is that the length of "the replacement chain" is always changing (Can vary from 10 to 100).
There must be a better way to do this?
What you are looking for is a recursive query. Something like this:
with cte (old, new, lev) as
(
select old, new, 1 as lev from mytable
union all
select m.old, cte.new, cte.lev + 1
from mytable m
join cte on cte.old = m.new
)
select old, max(new) keep (dense_rank last order by lev) as new
from cte
group by old
order by old;
The recursive CTE creates all iterations (you can see this by replacing the query by select * from cte). And in the final query we get the last new per old with Oracle's KEEP LAST.
Rextester demo: http://rextester.com/CHTG34988
I'm trying to understand how you group your rows to determine different "newest" values. Are these the groupings you want based on the old field?
Group 1 - one letter (a, b, d)
Group 2 - two letters (aa, bb)
Group 3 - any number (11, 33, 523, 4444)
Is this correct? If so, you just need to group them by an expression and then use a window function MAX(). Something like this:
SELECT
"old",
MAX() OVER(PARTITION BY MyGrouping) AS newest
FROM (
SELECT
"old",
CASE
WHEN NOT IS_NUMERIC("old") THEN 'string' || CHAR_LENGTH("old") -- If string, group by string length
ELSE 'number' -- Otherwise, group as a number
END AS MyGrouping
FROM MyTable
) src
I don't know if Oracle has equivalents of the IS_NUMERIC and CHAR_LENGTH functions, so you need to check on that. If not, replace that expression with something similar, like this:
https://www.techonthenet.com/oracle/questions/isnumeric.php

Pivot Table on Column Datatype in SSIS/SQL

I have been tasked to transform the following table:
+---------------+----------+---------+-------------+-----+-------------+--------+
| AnnualRevenue | City | Company | CreatedDate | Id | IsConverted | UserId |
+---------------+----------+---------+-------------+-----+-------------+--------+
| NULL | New York | ABC | 1/03/2015 | 123 | 0 | A1 |
| 200 | NULL | DEF | 2/03/2016 | 456 | 1 | A1 |
+---------------+----------+---------+-------------+-----+-------------+--------+
in either a SQL query or SSIS to this:
+-----+---------------+----------+-----------+------+------+--------+
| Id | name | nvarchar | date | int | bit | UserId |
+-----+---------------+----------+-----------+------+------+--------+
| 123 | AnnualRevenue | NULL | NULL | NULL | NULL | A1 |
| 123 | City | New York | NULL | NULL | NULL | A1 |
| 123 | Company | ABC | NULL | NULL | NULL | A1 |
| 123 | CreatedDate | NULL | 1/03/2015 | NULL | NULL | A1 |
| 123 | IsConverted | NULL | NULL | NULL | 0 | A1 |
| 456 | AnnualRevenue | NULL | NULL | 200 | | A1 |
| 456 | City | NULL | NULL | NULL | NULL | A1 |
| 456 | Company | DEF | NULL | NULL | NULL | A1 |
| 456 | CreatedDate | NULL | 2/03/2016 | NULL | NULL | A1 |
| 456 | IsConverted | NULL | NULL | NULL | 1 | A1 |
+-----+---------------+----------+-----------+------+------+--------+
I've tried to research online and found PIVOT transform in SSIS but I've never used that before. I'm unable to figure out how I can achieve the desired outcome using it. Can anyone point me in the right direction?
This option will dynamically unpivot your data and link data type to the Information Schema.
I should note, that the XML field names ID and UserId are case sensitive
Example
Select A.ID
,A.Name
,[nvarchar] = case when data_type='nvarchar' then value end
,[date] = case when data_type='date' then value end
,[int] = case when data_type='int' then value end
,[bit] = case when data_type='bit' then value end
,A.UserID
From (
Select C.*
From YourTable A
Cross Apply (Select XMLData = cast((Select A.* For XML Raw) as xml)) B
Cross Apply (
Select Id = r.value('#Id','int')
,UserID = r.value('#UserId','varchar(25)')
,Name = attr.value('local-name(.)','nvarchar(100)')
,Value = attr.value('.','nvarchar(max)')
From B.XMLData.nodes('/row') as A(r)
Cross Apply A.r.nodes('./#*') AS B(attr)
Where attr.value('local-name(.)','varchar(100)') not in ('Id','UserId')
) C
) A
Join (Select Column_Name,Data_Type From INFORMATION_SCHEMA.COLUMNS Where Table_Name='YourTable') B on B.Column_Name=A.Name
Returns

Need to shift the data to next column, unfortunately added data in wrong column

I have a table test
+----+--+------+--+--+--------------+--+--------------+
| ID | | Name1 | | | Name2 |
+----+--+------+--+--+--------------+--+--------------+
| 1 | | Andy | | | NULL |
| 2 | | Kevin | | | NULL |
| 3 | | Phil | | | NULL |
| 4 | | Maria | | | NULL |
| 5 | | Jackson | | | NULL |
+----+--+------+--+--+----------+--+--
I am expecting output like
+----+--+------+--+--+----------+--
| ID | | Name1 | | | Name2 |
+----+--+------+--+--+----------+--
| 1 | | NULL | | | Andy |
| 2 | | NULL | | | Kevin |
| 3 | | NULL | | | Phil |
| 4 | | NULL | | | Maria |
| 5 | | NULL | | | Jackson |
+----+--+------+--+--+----------+--
I unfortunately inserted data in wrong column and now I want to shift the data to the next column.
You can use an UPDATE statement with no WHERE condition, to cover the entire table.
UPDATE test
SET Name2 = Name1,
Name1 = NULL