Structuring SQL

Structuring SQL - sql

I have the below requirement to generate a report.
TASKTYPE.TaskTypeName,TASKWIP.DMTaskState_key FROM MercuryProd.TEAMSPACE.F_DMCaseWIP WIP,
MercuryProd.TEAMSPACE.F_DMTaskWIP TASKWIP,
MercuryProd.TEAMSPACE.D_DMDataField_BM_ExternalCaseIdentifier EXTID,
MercuryProd.TEAMSPACE.D_DMTaskType TASKTYPE
WHERE WIP.DMCase_key=TASKWIP.DMCase_key
AND EXTID.BM_ExternalCaseIdentifier_key=WIP.VMAE_BM_ExternalCaseIdentifier_key
AND TASKTYPE.DMTaskType_key=TASKWIP.DMTaskType_key
AND EXTID.BM_ExternalCaseIdentifier='BMAX5C62970'
--AND TASKTYPE.DMTaskType_key=9 AND TASKWIP.DMTaskState_key=2
--AND TASKTYPE.DMTaskType_key=10 AND TASKWIP.DMTaskState_key=0
If you look at the last two lines of sql, that's critical. I need all records satisfying both condition. A case type can have multiple corresponding child records in the taskwip table. I need to filter only those cases where within the child records both criteria meets. That's task 9 with state 2 and task 10 with state 0. What I have given here is an example data for one record. There will be multiple records similarly, like for another case key, multiple child record where task 9 with state 3 not 2, and task 10 with state 2 not 0. The report should not show this record.
I am happy if you can develop a query in any of the DB language whether its slq server, Oracle, mysql. I am interested more on the logic than the language format.
Because as seen in the result set, for this case key, there is a tasktype 10 with state 0 and task type 9 with state 2.

The specification isn't clear; I'm guessing, and this is just a guess, that we want to return rows ONLY if BOTH of a couple specific rows exist.
One option is to use correlated subqueries in an EXISTS predicate.
for example, something like this:
TASKTYPE.TaskTypeName
, TASKWIP.DMTaskState_key
FROM MercuryProd.TEAMSPACE.F_DMCaseWIP WIP
JOIN MercuryProd.TEAMSPACE.F_DMTaskWIP TASKWIP
ON TASKWIP.DMCase_key = WIP.DMCase_key
JOIN MercuryProd.TEAMSPACE.D_DMDataField_BM_ExternalCaseIdentifier EXTID
ON EXTID.BM_ExternalCaseIdentifier_key = WIP.VMAE_BM_ExternalCaseIdentifier_key
JOIN MercuryProd.TEAMSPACE.D_DMTaskType TASKTYPE
ON TASKTYPE.DMTaskType_key = TASKWIP.DMTaskType_key
WHERE EXTID.BM_ExternalCaseIdentifier = 'BMAX5C62970'
AND EXISTS ( SELECT 1
FROM MercuryProd.TEAMSPACE.D_DMTaskType tt92
WHERE tt92.DMTaskType_key = 9
AND TASKWIP.DMTaskState_key = 2
)
AND EXISTS ( SELECT 1
FROM MercuryProd.TEAMSPACE.D_DMTaskType tt10
WHERE tt10.DMTaskType_key = 10
AND TASKWIP.DMTaskState_key = 0
)
Note that it doesn't matter what value the subqueries return, the EXISTS is just checking if at least one row is return.
Note that this doesn't restrict which rows from TASKTYPE are returned. If we want to limit the return to just specific matching rows, we can add to the ON clause of the TASKTYPE join, or to the WHERE clause ...
AND ( ( TASKTYPE.DMTaskType_key = 9 AND TASKWIP.DMTaskState_key = 2 )
OR ( TASKTYPE.DMTaskType_key = 10 AND TASKWIP.DMTaskState_key = 0 )
)
There are other query patterns we could use; we could do a single EXISTS like this:
AND EXISTS ( SELECT 1
FROM MercuryProd.TEAMSPACE.D_DMTaskType ttx
WHERE ( ttx.DMTaskType_key = 9 AND TASKWIP.DMTaskState_key = 2 )
OR ( ttx.DMTaskType_key = 10 AND TASKWIP.DMTaskState_key = 0 )
HAVING COUNT(DISTINCT ttx.DMTaskType_key) = 2
)
EDIT
The first pattern demonstrated isn't sufficient. That requires both TASKTYPE rows to be related to the same TASKWIP row, and that can't happen because each TASKTYPE row require a different value from the TASKWIP row.
We would need to do the join in the correlated subqueries.
Something along these lines:
AND EXISTS ( SELECT 1
FROM MercuryProd.TEAMSPACE.F_DMTaskWIP tw92
JOIN MercuryProd.TEAMSPACE.D_DMTaskType tt92
ON tt92.DMTaskType_key = tw92.DMTaskType_key
AND tt92.DMTaskType_key = 9
WHERE tw92.DMTaskState_key = 2
AND tw92.DMCase_key = WIP.DMCase_key
)
AND EXISTS ( SELECT 1
FROM MercuryProd.TEAMSPACE.F_DMTaskWIP tw10
JOIN MercuryProd.TEAMSPACE.D_DMTaskType tt10
ON tt10.DMTaskType_key = tw10.DMTaskType_key
AND tt10.DMTaskType_key = 10
WHERE tw10.DMTaskState_key = 0
AND tw10.DMCase_key = WIP.DMCase_key
)

For oracle, you can use the listagg like below
SELECT DM_Case_Key,listagg(TaskTypeName,',') within group (order by DMTaskType_key)
over (partition by DM_Case_Key) as Tasks
FROM your_data

Related

Query bring lowest rank if client is under more than in 1 program

I am trying to find a query to bring the lowest rank number from a client if he/she is active under more than 1 program, if only active in 1 program then ignore this condition
I have the following query Query1 from the code below
SELECT --PC.OP__DOCID AS LegacyClientProgramId,
PC.ClientKey AS ClientId,
PC.PgmKey AS ProgramId,
CASE WHEN Date_Discharged_Program IS NULL THEN 4 ELSE 5 END AS STATUS,
PC.Date_Admit_Program AS RequestedDate,
PC.Date_Admit_Program AS EnrolledDate,
PC.Date_Discharged_Program AS DischargedDate,
TX.Rank
FROM FD__PROGRAM_CLIENT PC
LEFT JOIN LT__TXPLANHIERARCHY TX ON PC.PgmKey = TX.PgmKey
WHERE pc.ClientKey in ( SELECT ClientKey FROM LT__MIGRATE_CLIENT)
and pc.ClientKey in (3634164,99589547)
as you can see it brings me 1 clientid(3634164) who is under 4 different programids(4,16,54,1,5) and another clientid(99589547) only on programid 158. I want to bring the lowest rank number for that client which I do
when I add a clause to the query which brings me this Query1 with condition
and tx.Rank =
(
select min(txx.rank)
from FD__PROGRAM_CLIENT PCC
LEFT JOIN LT__TXPLANHIERARCHY Txx ON PCC.PgmKey = TXX.PgmKey
where pcc.ClientKey = pc.ClientKey)`
but I also want to bring the clientid(99589547) that has null on the rank. That client only has 1 program.
Is there a way to skip the condition tx.Rank if client has more than 1 programid? or client has null rank then still bring? Thank you very much
I created a table for the clients that have more than 1 programid and tried doing a case statement like
case when clientkey in ( select clientkey from TableCreated) then do this
tx.Rank =
(
select min(txx.rank)
from FD__PROGRAM_CLIENT PCC
LEFT JOIN LT__TXPLANHIERARCHY Txx ON PCC.PgmKey = TXX.PgmKey
where pcc.ClientKey = pc.ClientKey)
else ignore but no luck

Let's assume your logic is:
Bring me Lowest non-null rank row for a client, unless the client only has NULL rows, in that case, bring me the first one.
You don't mention what happens if client is in two programs but both Ranks are NULL but i'll let it slide, in that case we return the lowest Program ID.
The way to solve this stuff is to create a counter per Client ordered by Rank ID. The function for that is ROW_NUMBER:
select *
from (
SELECT --PC.OP__DOCID AS LegacyClientProgramId,
PC.ClientKey AS ClientId,
PC.PgmKey AS ProgramId,
CASE WHEN Date_Discharged_Program IS NULL THEN 4 ELSE 5 END AS STATUS,
PC.Date_Admit_Program AS RequestedDate,
PC.Date_Admit_Program AS EnrolledDate,
PC.Date_Discharged_Program AS DischargedDate,
TX.Rank,
ROW_NUMBER() OVER(PARTITION BY PC.ClientKey ORDER BY case when TX.Rank IS NOT NULL THEN 0 ELSE 1 END, TX.Rank, PC.PgmKey) AS sortOrder
FROM FD__PROGRAM_CLIENT PC
LEFT JOIN LT__TXPLANHIERARCHY TX ON PC.PgmKey = TX.PgmKey
WHERE pc.ClientKey in ( SELECT ClientKey FROM LT__MIGRATE_CLIENT)
and pc.ClientKey in (3634164,99589547)
) x
WHERE sortOrder = 1
The only downside is that you can't use such window functions inside WHERE directly, so you have to wrap it in a subquery.

MariaDB - GROUP BY with an order

So I have a dataset, where I would like to order it based on strings ORDER BY FIELD(field_name, ...) after the order I wan't it to group the dataset based on another column.
I have tried with a subquery, but it seems like it ignores by ORDER BY when it gets subqueried.
This is the query I would like to group with GROUP BY setting_id
SELECT *
FROM `setting_values`
WHERE ((`owned_by_type` = 'App\\Models\\Utecca\\User' AND `owned_by_id` = 1 OR ((`owned_by_type` = 'App\\Models\\Utecca\\Agreement' AND `owned_by_id` = 1006))) OR (`owned_by_type` = 'App\\Models\\Utecca\\Employee' AND `owned_by_id` = 1)) AND `setting_values`.`deleted_at` IS NULL
ORDER BY FIELD(owned_by_type, 'App\\Models\\Utecca\\Employee', 'App\\Models\\Utecca\\Agreement', 'App\\Models\\Utecca\\User')
The order by works just fine, but I cannot get it to group it based on my order, it always selects the one with lowest primary key (id).
Here is my attempt which did not work.
SELECT * FROM (
SELECT *
FROM `setting_values`
WHERE ((`owned_by_type` = 'App\\Models\\Utecca\\User' AND `owned_by_id` = 1 OR ((`owned_by_type` = 'App\\Models\\Utecca\\Agreement' AND `owned_by_id` = 1006))) OR (`owned_by_type` = 'App\\Models\\Utecca\\Employee' AND `owned_by_id` = 1)) AND `setting_values`.`deleted_at` IS NULL
ORDER BY FIELD(owned_by_type, 'App\\Models\\Utecca\\Employee', 'App\\Models\\Utecca\\Agreement', 'App\\Models\\Utecca\\User')
) AS t
GROUP BY setting_id;
Here is some sample data
What I am trying to accomplish with this sample data is 1 row with the id 3 as the row.
The desired result set from the query should obey these rules
1 row for each setting_id
owned_by_type together with owned_by_id is filtered the following way agreement = 1006, user = 1, employee = 1.
When limiting the 1 row for each setting_idit should be done with the following priority in owned_by_type column Employee, Agreement, User
Here is a SQLFiddle with it.
Running MariaDB version 10.2.6-MariaDB

First of all, the Optimizer is free to ignore the inner ORDER BY. So, please describe further what your intent is.
Getting past that, you can use a subquery:
SELECT ...
FROM ( SELECT
...
GROUP BY ...
ORDER BY ... -- This is lost unless followed by :
LIMIT 9999999999 -- something valid; or very high (for all)
) AS x
GROUP BY ...
Perhaps you are doing groupwise max ??

Fastest way to check if values in one table does not exist in another based on multiple values

This is probably easy but i have been racking my brain. I have two tables (TEMP_PARTY and GTT_PARTY).
TEMP_PARTY AND GTT_PARTY have the following columns => system, case_num, party_id, party, role.
What I am trying to do is find all values from TEMP_PARTY that do not exist in GTT_PARTY based on a unique combination of System, case_num and party_id.
Hence, i want to pull all parties from TEMP_PARTY where I have no record in GTT_PARTY. A record is uniquely identified by data in the three columns (System, case_num and party_id).
Speed is a HUGE concern for me. Can someone please help.

Use NOT EXISTS like below
select * from TEMP_PARTY
where not exists
(
select 1 from GTT_PARTY
where System = some_val
and case_num = 3
and party_id = 7
)
Also, You can use MINUS operator (If I am not wrong Oracle has it and it's same as EXCEPT in SQL Server) like
select * from TEMP_PARTY
where System = some_val
and case_num = 3
and party_id = 7
MINUS
select * from GTT_PARTY
where System = some_val
and case_num = 3
and party_id = 7

One approach is an anti-join pattern:
SELECT t.system
, t.case_num
, t.party_id
, t.party
, t.role
FROM TEMP_PARTY t
LEFT
JOIN GTT_PARTY g
ON g.system = t.system
AND g.case_num = t.case_num
AND g.party_id = t.party_id
WHERE g.system IS NULL
This is an "outer" join, returning all rows from t, along with all matching rows from g, except we've added a predicate in the WHERE clause which effectively excludes all rows that had a match. So what we're left with is rows from t that don't have any matching row(s) in g.
This isn;t the only way to get the result. There are several other approaches, for example, you can return an equivalent result with a query using a NOT EXISTS and correlated subquery.
SELECT t.system
, t.case_num
, t.party_id
, t.party
, t.role
FROM TEMP_PARTY t
WHERE NOT EXISTS
( SELECT 1
FROM GTT_PARTY g
WHERE g.system = t.system
AND g.case_num = t.case_num
AND g.party_id = t.party_id
)
For best performance, we'd want to see an index (with leading columns of)
... ON GTT_PARTY (system,case_num,party_id)
EXPLAIN output will show the execution plan.
(I was thinking this was for MySQL; these same queries will work in Oracle as well.)

Comparing a list of values

For example, I have a head-table with one column id and a position-table with id, head-id (reference to head-table => 1 to N), and a value. Now I select one row in the head-table, say id 1. I look into the position-table and find 2 rows which referencing to the head-table and have the values 1337 and 1338. Now I wanna select all heads which have also 2 positions with these values 1337 and 1338. The position-ids are not the same, only the values, because it is not a M to N relation. Can anyone tell me a SQL-Statement? I have no idea to get it done :/

Assuming that the value is not repeated for a given headid in the position table, and that it is never NULL, then you can do this using the following logic. Do a full outer join on the position table to the specific head positions you care about. Then check whether there is a full match.
The following query does this:
select *
from (select p.headid,
sum(case when p.value is not null then 1 else 0 end) as pmatches,
sum(case when ref.value is not null then 1 else 0 end) as refmatches
from (select p.value
from position p
where p.headid = <whatever>
) ref full outer join
position p
on p.value = ref.value and
p.headid <> ref.headid
) t
where t.pmatches = t.refmatches
If you do have NULLs in the values, you can accommodate these using coalesce. If you have duplicates, you need to specify more clearly what to do in this case.

Assuming you have:
Create table head
(
id int
)
Create table pos
(
id int,
head_id int,
value int
)
and you need to find duplicates by value, then I'd use:
Select distinct p.head_id, p1.head_id
from pos p
join pos p1 on p.value = p1.value and p.head_id<>p1.head_id
where p.head_id = 1
for specific head_id, or without last where for every head_id

Variant use of the GROUP BY clause in TSQL

Imagine the following schema and sample data (SQL Server 2008):
OriginatingObject
----------------------------------------------
ID
1
2
3
ValueSet
----------------------------------------------
ID OriginatingObjectID DateStamp
1 1 2009-05-21 10:41:43
2 1 2009-05-22 12:11:51
3 1 2009-05-22 12:13:25
4 2 2009-05-21 10:42:40
5 2 2009-05-20 02:21:34
6 1 2009-05-21 23:41:43
7 3 2009-05-26 14:56:01
Value
----------------------------------------------
ID ValueSetID Value
1 1 28
etc (a set of rows for each related ValueSet)
I need to obtain the ID of the most recent ValueSet record for each OriginatingObject. Do not assume that the higher the ID of a record, the more recent it is.
I am not sure how to use GROUP BY properly in order to make sure the set of results grouped together to form each aggregate row includes the ID of the row with the highest DateStamp value for that grouping. Do I need to use a subquery or is there a better way?

You can do it with a correlated subquery or using IN with multiple columns and a GROUP-BY.
Please note, simple GROUP-BY can only bring you to the list of OriginatingIDs and Timestamps. In order to pull the relevant ValueSet IDs, the cleanest solution is use a subquery.
Multiple-column IN with GROUP-BY (probably faster):
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
(V.OriginatingID, V.DateStamp) IN
(
SELECT OriginatingID, Max(DateStamp)
FROM ValueSet
GROUP BY OriginatingID
)
Correlated Subquery:
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
V.DateStamp =
(
SELECT Max(DateStamp)
FROM ValueSet V2
WHERE V2.OriginatingID = O.ID
)

SELECT OriginatingObjectID, id
FROM (
SELECT id, OriginatingObjectID, RANK() OVER(PARTITION BY OriginatingObjectID
ORDER BY DateStamp DESC) as ranking
FROM ValueSet)
WHERE ranking = 1;

This can be done with a correlated sub-query. No GROUP-BY necessary.
SELECT
vs.ID,
vs.OriginatingObjectID,
vs.DateStamp,
v.Value
FROM
ValueSet vs
INNER JOIN Value v ON v.ValueSetID = vs.ID
WHERE
NOT EXISTS (
SELECT 1
FROM ValueSet
WHERE OriginatingObjectID = vs.OriginatingObjectID
AND DateStamp > vs.DateStamp
)
This works only if there can not be two equal DateStamps for a OriginatingObjectID in the ValueSet table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas