Is there a way to get this solution in the SPARK-SQL, we need to apply filter after the grouping of records? - sql

Whenever there are 2 different values in the approval_ind column [ Y,X ] or it can be [ Y,N ] for the same Doc_Act_Checklist_Item_ID,Assigned_To_Person_ID
Then
1. Pick the record with entered_by_name=assigned_to_name
2. For both records if the names are matching then pick the minimum( Entered_date )
3. In the above case the names are not matching so we will pick the minimum( Entered_date ) which is Doc_Act_Checklist_Item_Person_ID = 101
I want to do this in SPARK-SQL please help me out of this.
I have tried:
SELECT * FROM STG_PUB_CHKLST_ITEM_PERSON
WHERE Doc_Act_Checklist_Item_Person_ID = (SELECT Doc_Act_Checklist_Item_Person_ID from (SELECT *,
CASE
WHEN ASSIGNED_TO_NAME = ENTERED_BY_NAME
THEN 'MATCH'
ELSE 'NO MATCH'
END AS MATCHING_STATUS
FROM STG_PUB_CHKLST_ITEM_PERSON
WHERE doc_act_checklist_item_id = 55
AND assigned_to_person_id = 33)
WHERE MATCHING_STATUS = 'MATCH')
;

Related

Create new column

I have a state table like this
And a Data table like this
enter image description here
What I'm trying to do is create a 'Target' column for my Data table
So
case when ( first 2 letter from column [Cate_1] = [Abbreviation]
or [Cate_1] start with State )
and [Cate_2] like 'A20%'
then 'State'
case when Cate_1 = "Customer_Name"
and [Cate_2] like 'A20%'
then 'State'
else 'Correct'
end
My first scenario is to get the 'State' value for [Target] column
My idea is using LEFT so
if left(Cate_1,2) = [Abbreviation]
or left (Cate_, ) = [State]
then [State]
Please help if you know how to do this
Thank you so much.
Here are two ways to derive the sample output from the sample input - using MySQL:
select
d.Cate_1,
d.Cate_2,
'State' as Target
from State s
join Data d
on left(d.Cate_2, 3) = 'A20'
and
((left(d.Cate_1, 2) = s.abbreviation and length(d.Cate_1) = 4)
or
position(s.stateName in d.Cate_1) = 1
)
union all
select distinct
d.Cate_1,
d.Cate_2,
'Customer'
from State s
join Data d
on left(d.Cate_2, 3) = 'A20'
and
(Cate_1 in ('Louis', 'Adam', 'Customer3')
)
union all
select distinct
d.Cate_1,
d.Cate_2,
'Correct'
from State s
join Data d
on left(d.Cate_2, 3) <> 'A20'
order by 1
;
or
select
d.Cate_1,
d.Cate_2,
case
when d.Cate_1 in ('Louis', 'Adam', 'Customer3') and left(d.Cate_2, 3) = 'A20' then 'Customer'
when left(d.Cate_2, 3) = 'A20' then 'State'
else 'Correct'
end as Target
from Data d
left join State s
on left(d.Cate_2, 3) = 'A20'
and
((left(d.Cate_1, 2) = s.abbreviation and length(d.Cate_1) = 4)
or
position(s.stateName in d.Cate_1) = 1
)
order by 1
;
N.B.:
In general, you need to keep in mind, you might hit the same Data record more than once, e.g., "Alaska 01" with both "Alaska" and "AL" - at least if your server/session settings ignore upper/lower case. (That's why a length check has been added.)
The customer names, could, of course, be read from a table as well.
If you'd rather update the third column, that's feasible, too. - How exactly is largely depending on the database system used.
The order by facilitates the comparison of the two result sets.
See it in action: SQL Fiddle.
Please comment, if and as this requires adjustment / further detail.

SQL Server UDF array inputs and outputs

I have a set of columns CODE_1-10, which contain diagnostic codes. I want to create a set of variables CODE_GROUP_1-17, which indicate whether or not one of some particular set of diagnostic codes matches any of the CODE_1-10 variables. For example, CODE_GROUP_1 = 1 if any of CODE_1-10 match either '123' or '456', and CODE_GROUP_2 = 1 if any of CODE_1-10 match '789','111','333','444' or 'foo'.
Here's an example of how you could do this using values constructors.
CASE WHEN (SELECT count(value.val)
FROM (VALUES (CODE_1)
, (CODE_2)
, (CODE_3)
, (CODE_4)
, (CODE_5)
, (CODE_6)
, (CODE_7)
, (CODE_8)
, (CODE_9)
, (CODE_10)
) AS value(val)
WHERE value.val in ('123', '456')
) > 0 THEN 1 ELSE 0 END AS CODE_GROUP_1,
CASE WHEN (SELECT count(value.val)
FROM (VALUES (CODE_1)
, (CODE_2)
, (CODE_3)
, (CODE_4)
, (CODE_5)
, (CODE_6)
, (CODE_7)
, (CODE_8)
, (CODE_9)
, (CODE_10)
) AS value(val)
WHERE value.val in ('789','111','333','444','foo')
) > 0 THEN 1 ELSE 0 END AS CODE_GROUP_2
I am wondering if there is another way to do this that is more efficient. Is there a way to make a CLR UDF that takes an array of CODE_1-10, and outputs a set of columns CODE_GROUP_1-17?
You could at least avoid the repetition of FROM (VALUES ...) like this:
SELECT
CODE_GROUP_1 = COUNT(DISTINCT CASE WHEN val IN ('123', '456') THEN 1 END),
CODE_GROUP_2 = COUNT(DISTINCT CASE WHEN val IN ('789','111','333','444','foo') THEN 1 END),
...
FROM
(
VALUES
(CODE_1),
(CODE_2),
(CODE_3),
(CODE_4),
(CODE_5),
(CODE_6),
(CODE_7),
(CODE_8),
(CODE_9),
(CODE_10)
) AS value(val)
If CODE_1, CODE_2 etc. are column names, you can use the above query as a derived table in CROSS APPLY:
SELECT
...
FROM
dbo.atable -- table containing CODE_1, CODE_2 etc.
CROSS APPLY
(
SELECT ... -- the above query
) AS x
;
Can you create 2 new tables with the columns appended as rows? So one table would be dxCode with a source column if you need to retain the 1-10 value and the dx code and whatever key field(s) you need, the other table would be dxGroup with your 17 groups, the source groupID if you need it, and your target dx values.
Then to determine which codes are in which groups, you can join on your dx fields.

Grouping column data with common value or else show default text

I want to achieve the following transformation:
Sample Data
SELECT NumWURm,ReportAText,ReportBText,ReportCText,ReportDText,ReportEText,ReportFText
FROM t_SchFacility
WHERE FacID IN (483,485)
Result:
NumWURm ReportAText ReportBText ReportCText ReportDText ReportEText ReportFText
3 Report On venue Warm Up Photo Get Set
2 Report On venue Warm Up Photo
Desired Output
I want to to get the common column values to be shown as it is, in case the values differ, I want to show some default text.
NumWURm ReportAText ReportBText ReportCText ReportDText ReportEText ReportFText
3 Report On venue Warm Up Photo Default Text
This is just the case for my favourite MIN = MAX trick. When MIN and MAX are the same, then there's only one value, and either the MIN or the MAX can be used as THE value.
SELECT
MAX(NumWURm) as NumWURm,
CASE WHEN MIN(ReportAText) = MAX(ReportAText)
THEN MIN(ReportAText)
ELSE 'Default'
END,
CASE WHEN MIN(ReportBText) = MAX(ReportBText)
THEN MIN(ReportBText)
ELSE 'Default'
END,
CASE WHEN MIN(ReportCText) = MAX(ReportCText)
THEN MIN(ReportCText)
ELSE 'Default'
END,
CASE WHEN MIN(ReportDText) = MAX(ReportDText)
THEN MIN(ReportDText)
ELSE 'Default'
END,
CASE WHEN MIN(ReportEText) = MAX(ReportEText)
THEN MIN(ReportEText)
ELSE 'Default'
END,
CASE WHEN MIN(ReportFText) = MAX(ReportFText)
THEN MIN(ReportFText)
ELSE 'Default'
END
FROM t_SchFacility
WHERE FacID IN (483,485)
If you need this to be really specific, you may need to specify a collation option for the string comparison (e.g. if case difference is significant to you).
The following would produce a single row of output for a non-empty set.
with [Selected] ([NumWURm], [ReportAText], [ReportBText], [ReportCText], [ReportDText], [ReportEText], [ReportFText])
(
select [NumWURm], [ReportAText], [ReportBText], [ReportCText], [ReportDText], [ReportEText], [ReportFText]
from [t_SchFacility]
where [FacID] IN (483, 485)
),
[Number_Selected] ([count])
(
selected count(*)
from [Selected]
),
[ReportAText] ([ReportAText])
(
select s.[ReportAText], n.[count]
from [Selected] as s cross join [Number_Selected] as n
group by s.[ReportAText], n.[count]
having count(*) = n.[count]
),
...
[ReportFText] ([ReportFText])
(
select s.[ReportFText], n.[count]
from [Selected] as s cross join [Number_Selected] as n
group by s.[ReportFText], n.[count]
having count(*) = n.[count]
)
select
max(s.[NumWURm]) as 'NumWuRm',
a.coalesce([ReportAText], 'Default Text') as 'ReportAText',
b.[ReportBText] as 'ReportBText',
c.[ReportCText] as 'ReportCText',
d.[ReportDText] as 'ReportDText',
e.[ReportEText] as 'ReportEText',
f.[ReportFText] as 'ReportFText'
from
[Selected] as s
left outer join
[ReportAText] as a
on (null is null)
left outer join
...
[ReportFText] as f
on (null is null)
This is a code outline and so will require suitable testing and adjustment.

Output data after updating SQL

imagine there are 2 tables :
T_Customer (p_customer_id, name, prename, country, age)
and
T_SomeInfo (f_customer_id, somebit, otherbit)
Now I want to update 1 random somebit and OUTPUT updated T_Customer, which belongs to f_customer_id of effected row.
Atm I've following statement :
UPDATE randombit SET randombit.somebit= 1
OUTPUT inserted.f_customer_id
FROM
(
SELECT TOP 1 * FROM T_SomeInfo
WHERE somebit= 0 AND otherbit = 0
ORDER BY NEWID()
) AS randombit
So I f_customer_id of my updated row.
But I'm not able to build a valid statement to OUTPUT a value from another table.
This is a statement I tried without success:
UPDATE randombit SET randombit.somebit= 1
OUTPUT customer.*
FROM T_Customer AS customer
WHERE customer.f_customer_id = inserted.f_customer_id
FROM
(
SELECT TOP 1 * FROM T_SomeInfo
WHERE somebit= 0 AND otherbit = 0
ORDER BY NEWID()
) AS randombit
Is there any solution to update and output (with INNER JOIN or SELECT) into one statement?
EDIT as example:
There are 2 customers :
T_Customer (1, "Smith", "John", "country", 10)
T_Customer (2, "John", "William", "country2", 20)
actually a update
UPDATE randombit SET randombit.somebit= 1
OUTPUT inserted.f_customer_id
FROM
(
SELECT TOP 1 * FROM T_SomeInfo
WHERE somebit= 0 AND otherbit = 0
ORDER BY NEWID()
) AS randombit
will output (if he's the random winner):
1
But I want to see
1, "Smith", "John", "country", 10

Using Order By with Distinct on a Join (PLSQL)

I have written a join on some tables and I have ordered the data using two levels of ordering - one of which is the primary key of one table.
Now, with this data sorted I want to then exclude any duplicates from my data using an in-line view and the DISTINCT clause - and this is where I am coming unstuck.
I seem to be able to either sort the data OR distinct it, but never both at the same time. Is there a way around this or have I stumbled upon the SQL equivalent of the uncertainty principle?
This code returns the data sorted, but with duplicates
SELECT
ada.source_tab source_tab
, ada.source_col source_col
, ada.source_value source_value
, ada.ada_id ada_id
FROM
are_aud_data ada
, are_aud_exec_checks aec
, are_audit_elements ael
WHERE
aec.aec_id = ada.aec_id
AND ael.ano_id = aec.ano_id
AND aec.acn_id = 123456
AND ael.ael_type = 1
ORDER BY
CASE
WHEN source_tab = 'Tab type 1' THEN 1
WHEN source_tab = 'Tab type 2' THEN 2
ELSE 3
END
,ada.ada_id ASC;
This code removes the duplicates, but I lose the order...
SELECT DISTINCT source_tab, source_col, source_value FROM (
SELECT
ada.source_tab
, ada.source_col source_col
, ada.source_value source_value
, ada.ada_id ada_id
FROM
are_aud_data ada
, are_aud_exec_checks aec
, are_audit_elements ael
WHERE
aec.aec_id = ada.aec_id
AND ael.ano_id = aec.ano_id
AND aec.acn_id = 123456
AND ael.ael_type = 1
ORDER BY
CASE
WHEN source_tab = 'Tab type 1' THEN 1
WHEN source_tab = 'Tab type 2' THEN 2
ELSE 3
END
,ada.ada_id ASC
)
;
If I try and include 'ORDER BY ada_id' at the end of the outer select, I get the error message 'ORA-01791: not a SELECTed expression' which is infuriating me!!
Why don't you include ada_id at the selected fields of the outer query?
;WITH CTE AS
(
SELECT
ada.source_tab source_tab
, ada.source_col source_col
, ada.source_value source_value
, ada.ada_id ada_id
, ROW_NUMBER() OVER (PARTITION BY [COLUMNS_YOU_WANT TO BE DISTINCT]
ORDER BY [your_columns]) rn
FROM
are_aud_data ada
, are_aud_exec_checks aec
, are_audit_elements ael
WHERE
aec.aec_id = ada.aec_id
AND ael.ano_id = aec.ano_id
AND aec.acn_id = 356441
AND ael.ael_type = 1
ORDER BY
CASE
WHEN source_tab = 'Licensed Inventory' THEN 1
WHEN source_tab = 'CMDB' THEN 2
ELSE 3
END
,ada.ada_id ASC
)
select * from CTE WHERE rn<2
it seems that the ada_id is meaningless in the outer query.
you have removed all those values to boil it down to the distinct source_tab and source_col...
what would you expect the order to be?
you want maybe the minimum ada_id for each table and column set to be the driver for the order - (although the table name seems appropriate to me)
include the minimum ada_id in the inner query (you'll need a group by clause)
then reference that in the outer query and sort on it.