SQL Inverse JOIN Query

SQL Inverse JOIN Query - sql

MS-SQL Server
Table A (Study_ID,Issue_Id)
XX,1
BB,2
Table B (Study_ID,System_Id)
XX,User1
BB,User2
XX,User2
View V : (Issue_Id,System_Id)
2,User1
View V should give all Issues from Table A, for System_Id X, which are not in Table B for the combination of Study and SytemID
The purpose is, The table A has Issues(Issue_Id), which are tied to Study(Study_id). If A user User1 logs in into system he should be able to see all issues in table A apart from the ones which have study_id for which the user isn't having rights. Table B indicates the StudyId's for which the user has no rights
How can I achieve this in an efficient way?

You can try to make a list of all the combinations of Study_IDs AND System_IDs and then by a left join you can see whether or not the combination exists.
I'm a bit confused as to your comment about user rights. Is this more of a database issue, or is using a AND System_ID = 'User1' in the WHERE statement a solution?
WITH T_A AS (SELECT *
FROM ( VALUES ('XX', 1)
, ('BB', 2)
) x (Study_ID, Issue_ID)
)
, T_B AS (SELECT *
FROM ( VALUES ('XX', 'User1')
, ('BB', 'User2')
, ('XX', 'User2')
) x ( Study_ID, System_ID)
)
SELECT Issue_ID, USERS.System_ID
FROM T_A
INNER JOIN (SELECT DISTINCT System_ID FROM T_B) USERS
ON 1 = 1
LEFT JOIN T_B
ON T_A.Study_ID = T_B.Study_ID
AND USERS.System_ID = T_B.System_ID
WHERE T_B.Study_ID IS NULL

This is one way to do it:
select * from a
except
select * from b

Related

Why is my query inserting the same values when I have added a 'not exists' parameter that should avoid this from happening?

My query should stop inserting values, as the not exists statement is satisfied (I have checked both tables) and matching incidents exist in both tables, any ideas why values are still being returned?
Here is the code:
INSERT INTO
odwh_system.ead_incident_credit_control_s
(
incident
)
SELECT DISTINCT
tp.incident
FROM
odwh_data.ead_incident_status_audit_s ei
INNER JOIN odwh_data.ead_incident_s tp ON ei.incident=tp.incident
WHERE
ei.status = 6
OR
ei.status = 7
AND NOT EXISTS
(
SELECT
true
FROM
odwh_system.ead_incident_credit_control_s ead
WHERE
ead.incident = tp.incident
)
AND EXISTS
(
SELECT
true
FROM
odwh_work.ead_incident_tp_s tp
WHERE
tp.incident = ei.incident
);

dont reuse table aliases
use sane aliases
avoid AND/OR conflicts; prefer IN()
INSERT INTO odwh_system.ead_incident_credit_control_s (incident)
SELECT -- DISTINCT
tp.incident
FROM odwh_data.ead_incident_s dtp
WHERE NOT EXISTS (
SELECT *
FROM odwh_system.ead_incident_credit_control_s sic
WHERE sic.incident = dtp.incident
)
AND EXISTS (
SELECT *
FROM odwh_work.ead_incident_tp_s wtp
JOIN odwh_data.ead_incident_status_audit_s dis ON wtp.incident = dis.incident AND dis.status IN (6 ,7)
WHERE wtp.incident = dtp.incident
);

How to convert list of comma separated Ids into their name?

I have a table that contains:
id task_ids
1 10,15
2 NULL
3 17
I have the table that has the names of this tasks:
id task_name
10 a
15 b
17 c
I want to generate the following output
id task_ids task_names
1 10,15 a,b
2 null null
3 17 c
I know this structure isn't ideal but this is legacy table which I will not change now.
Is there easy way to get the output ?
I'm using Presto but I think this can be solved with native sql

WITH data AS (
SELECT * FROM (VALUES (1, '10,15'), (2, NULL)) x(id, task_ids)
),
task AS (
SELECT * FROM (VALUES ('10', 'a'), ('15', 'b')) x(id, task_name)
)
SELECT
d.id, d.task_ids
-- array_agg will obviously capture NULL task_name comping from LEFT JOIN, so we need to filter out such results
IF(array_agg(t.task_name) IS NOT DISTINCT FROM ARRAY[NULL], NULL, array_agg(t.task_name)) task_names
FROM data d
-- split task_ids by `,`, convert into numbers, UNNEST into separate rows
LEFT JOIN UNNEST (split(d.task_ids, ',')) AS e(task_id) ON true
-- LEFT JOIN with task to pull the task name
LEFT JOIN task t ON e.task_id = t.id
-- aggregate back
GROUP BY d.id, d.task_ids;

You have a horrible data model, but you can do what you want with a bit of effort. Arrays are better than strings, so I'll just use that:
select t.id, t.task_id, array_agg(tt.task_name) as task_names
from t left join lateral
unnest(split(t.task_ids, ',')) u(task_id)
on 1=1 left join
tasks tt
on tt.task_id = u.task_id
group by t.id, t.task_id;
I don't have Presto on hand to test this. But this or some minor variant should do what you want.
EDIT:
This version might work:
select t.id, t.task_id,
(select array_agg(tt.task_name)
from unnest(split(t.task_ids, ',')) u(task_id) join
tasks tt
on tt.task_id = u.task_id
) as task_names
from t ;

Unable to convert this legacy SQL into Standard SQL in Google BigQuery

I am not able to validate this legacy sql into standard bigquery sql as I don't know what else is required to change here(This query fails during validation if I choose standard SQL as big query dialect):
SELECT
lineitem.*,
proposal_lineitem.*,
porder.*,
company.*,
product.*,
proposal.*,
trafficker.name,
salesperson.name,
rate_card.*
FROM (
SELECT
*
FROM
dfp_data.dfp_order_lineitem
WHERE
DATE(end_datetime) >= DATE(DATE_ADD(CURRENT_TIMESTAMP(), -1, 'YEAR'))
OR end_datetime IS NULL ) lineitem
JOIN (
SELECT
*
FROM
dfp_data.dfp_order) porder
ON
lineitem.order_id = porder.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_proposal_lineitem) proposal_lineitem
ON
lineitem.id = proposal_lineitem.dfp_lineitem_id
JOIN (
SELECT
*
FROM
dfp_data.dfp_company) company
ON
porder.advertiser_id = company.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_product) product
ON
proposal_lineitem.product_id=product.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_proposal) proposal
ON
proposal_lineitem.proposal_id=proposal.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_rate_card) rate_card
ON
proposal_lineitem.ratecard_id=rate_card.id
LEFT JOIN (
SELECT
id,
name
FROM
dfp_data.dfp_user) trafficker
ON
porder.trafficker_id =trafficker.id
LEFT JOIN (
SELECT
id,
name
FROM
dfp_data.dfp_user) salesperson
ON
porder. salesperson_id =salesperson.id

Most likely the error you are getting is something like below
Duplicate column names in the result are not supported. Found duplicate(s): name
Legacy SQL adjust trafficker.name and salesperson.name in your SELECT statement into respectively trafficker_name and salesperson_name thus effectively eliminating column names duplication
Standard SQL behaves differently and treat both those columns as named name thus producing duplication case. To avoid it - you just need to provide aliases as in example below
SELECT
lineitem.*,
proposal_lineitem.*,
porder.*,
company.*,
product.*,
proposal.*,
trafficker.name AS trafficker_name,
salesperson.name AS salesperson_name,
rate_card.*
FROM ( ...
You can easily check above explained using below simplified/dummy queries
#legacySQL
SELECT
porder.*,
trafficker.name,
salesperson.name
FROM (
SELECT 1 order_id, 'abc' order_name, 1 trafficker_id, 2 salesperson_id
) porder
LEFT JOIN (SELECT 1 id, 'trafficker' name) trafficker
ON porder.trafficker_id =trafficker.id
LEFT JOIN (SELECT 2 id, 'salesperson' name ) salesperson
ON porder. salesperson_id =salesperson.id
and
#standardSQL
SELECT
porder.*,
trafficker.name AS trafficker_name,
salesperson.name AS salesperson_name
FROM (
SELECT 1 order_id, 'abc' order_name, 1 trafficker_id, 2 salesperson_id
) porder
LEFT JOIN (SELECT 1 id, 'trafficker' name) trafficker
ON porder.trafficker_id =trafficker.id
LEFT JOIN (SELECT 2 id, 'salesperson' name ) salesperson
ON porder. salesperson_id =salesperson.id
Note: if you have more duplicate names - you need to alias all of them too

Oracle SQL XOR condition with > 14 tables

I have a question on sql desgin.
Context:
I have a table called t_master and 13 other tables (lets call them a,b,c... for simplicity) where it needs to compared.
Logic:
t_master will be compared to table 'a' where t_master.gen_val =
a.value.
If record exist in t_master, retrieve t_master record, else retrieve 'a' record.
I do not need to retrieve the records if it exists in both tables (t_master and a) - XOR condition
Repeat this comparison with the remaining 12 tables.
I have some idea on doing this, using WITH to subquery the non-master tables (a,b,c...) first with their respective WHERE clause.
Then use XOR statement to retrieve the records.
Something like
WITH a AS (SELECT ...),
b AS (SELECT ...)
SELECT field1,field2...
FROM t_master FULL OUTER JOIN a FULL OUTER JOIN b FULL OUTER JOIN c...
ON t_master.gen_value = a.value
WHERE ((field1 = x OR field2 = y ) AND NOT (field1 = x AND field2 = y))
AND ....
.
.
.
.
Seeing that I have 13 tables that I need to full outer join, is there a better way/design to handle this?
Otherwise I would have at least 2*13 lines of WHERE clause which I'm not sure if that will have impact on the performance as t_master is sort of a log table.
**Assume I cant change any schema.
Currently I'm not sure if this SQL will working correctly yet, so I'm hoping someone can guide me in the right direction regarding this.
update from used_by_already's suggestion:
This is what I'm trying to do (comparison between 2 tables first, before I add more, but I am unable to get values from ATP_R.TBL_HI_HDR HI_HDR as it is in the NOT EXISTS subquery.
How do i overcome this?
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO JOIN ATP_R.TBL_HI_HDR HI_HDR ON LOG_REPO.GEN_VAL = HI_HDR.HI_NO
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_R.TBL_HI_HDR HI_HDR
WHERE LOG_REPO.GEN_VAL = HI_HDR.HI_NO
)
UNION ALL
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_R.TBL_HI_HDR HI_HDR JOIN ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO ON HI_HDR.HI_NO = LOG_REPO.GEN_VAL
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO
WHERE HI_HDR.HI_NO = LOG_REPO.GEN_VAL
)

Full outer joins used to exclude all matching rows can be an expensive query. You don't supply much detail, but perhaps using NOT EXISTS would be simpler and maybe it will produce a better explain plan. Something along these lines.
select
cola,colb,colc
from t_master m
where not exists (
select null from a where m.keycol = a.fk_to_m
)
and not exists (
select null from b where m.keycol = b.fk_to_m
)
and not exists (
select null from c where m.keycol = c.fk_to_m
)
union all
select
cola,colb,colc from a
where not exists (
select null from t_master m where a.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from b
where not exists (
select null from t_master m where b.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from c
where not exists (
select null from t_master m where c.fk_to_m = m.keycol
)
You could union the 13 a,b,c ... tables to simplify the coding, but that may not perform so well.

How to join three tables with distinct

I'm trying to join three tables to pull back a list of distinct blog posts with associated assets (images etc) but I keep coming up a cropper. The three tablets are tblBlog, tblAssetLink and tblAssets. The Blog tablet hold the blog, the asset table holds the assets and the Assetlink table links the two together.
tblBlog.BID is the PK in blog, tblAssets.AID is the PK in Assets.
This query works but pulls back multiple posts for the same record. I've tried to use select distinct and group by and even union but as my knowledge is pretty poor with SQL - they all error.
I'd like to also discount any assets that are marked as deleted (tblAssets.Deleted = true) but not hide the associated Blog post (if that's not marked as deleted). If anyone can help - it would be much appreciated! Thanks.
Here's my query so far....
SELECT dbo.tblBlog.BID,
dbo.tblBlog.DateAdded,
dbo.tblBlog.PMonthName,
dbo.tblBlog.PDay,
dbo.tblBlog.Header,
dbo.tblBlog.AddedBy,
dbo.tblBlog.PContent,
dbo.tblBlog.Category,
dbo.tblBlog.Deleted,
dbo.tblBlog.Intro,
dbo.tblBlog.Tags,
dbo.tblAssets.Name,
dbo.tblAssets.Description,
dbo.tblAssets.Location,
dbo.tblAssets.Deleted AS Expr1,
dbo.tblAssetLink.Priority
FROM dbo.tblBlog
LEFT OUTER JOIN dbo.tblAssetLink
ON dbo.tblBlog.BID = dbo.tblAssetLink.BID
LEFT OUTER JOIN dbo.tblAssets
ON dbo.tblAssetLink.AID = dbo.tblAssets.AID
WHERE ( dbo.tblBlog.Deleted = 'False' )
ORDER BY dbo.tblAssetLink.Priority, tblBlog.DateAdded DESC
EDIT
Changed the Where and the order by....
Expected output:
tblBlog.BID = 123
tblBlog.DateAdded = 12/04/2015
tblBlog.Header = This is a header
tblBlog.AddedBy = Persons name
tblBlog.PContent = *text*
tblBlog.Category = Category name
tblBlog.Deleted = False
tblBlog.Intro = *text*
tblBlog.Tags = Tag, Tag, Tag
tblAssets.Name = some.jpg
tblAssets.Description = Asset desc
tblAssets.Location = Location name
tblAssets.Priority = True

Use OUTER APPLY:
DECLARE #b TABLE ( BID INT )
DECLARE #a TABLE ( AID INT )
DECLARE #ba TABLE
(
BID INT ,
AID INT ,
Priority INT
)
INSERT INTO #b
VALUES ( 1 ),
( 2 )
INSERT INTO #a
VALUES ( 1 ),
( 2 ),
( 3 ),
( 4 )
INSERT INTO #ba
VALUES ( 1, 1, 1 ),
( 1, 2, 2 ),
( 2, 1, 1 ),
( 2, 2, 2 )
SELECT *
FROM #b b
OUTER APPLY ( SELECT TOP 1
a.*
FROM #ba ba
JOIN #a a ON a.AID = ba.AID
WHERE ba.BID = b.BID
ORDER BY Priority
) o
Output:
BID AID
1 1
2 1
Something like:
SELECT b.BID ,
b.DateAdded ,
b.PMonthName ,
b.PDay ,
b.Header ,
b.AddedBy ,
b.PContent ,
b.Category ,
b.Deleted ,
b.Intro ,
b.Tags ,
o.Name ,
o.Description ,
o.Location ,
o.Deleted AS Expr1 ,
o.Priority
FROM dbo.tblBlog b
OUTER APPLY ( SELECT TOP 1
a.* ,
al.Priority
FROM dbo.tblAssetLink al
JOIN dbo.tblAssets a ON al.AID = a.AID
WHERE b.BID = al.BID
ORDER BY al.Priority
) o
WHERE b.Deleted = 'False'

You cannot join three tables unless they all have the same attribute. It would work if all tables had BID, but the second join is trying to join AID. Which wont work. They all have to have BID.

Based on your comments
i would like to get is just one asset per blog post (top one ordered
by Priority)
You can change your query as following. I suggest changing the join with dbo.tblAssetLink to filtered one, which contains only one (highest priority) link for every blog.
SELECT dbo.tblBlog.BID,
dbo.tblBlog.DateAdded,
dbo.tblBlog.PMonthName,
dbo.tblBlog.PDay,
dbo.tblBlog.Header,
dbo.tblBlog.AddedBy,
dbo.tblBlog.PContent,
dbo.tblBlog.Category,
dbo.tblBlog.Deleted,
dbo.tblBlog.Intro,
dbo.tblBlog.Tags,
dbo.tblAssets.Name,
dbo.tblAssets.Description,
dbo.tblAssets.Location,
dbo.tblAssets.Deleted AS Expr1,
dbo.tblAssetLink.Priority
FROM dbo.tblBlog
LEFT OUTER JOIN
(SELECT BID, AID,
ROW_NUMBER() OVER (PARTITION BY BID ORDER BY [Priority] DESC) as N
FROM dbo.tblAssetLink) AS filteredAssetLink
ON dbo.tblBlog.BID = filteredAssetLink.BID
LEFT OUTER JOIN dbo.tblAssets
ON filteredAssetLink.AID = dbo.tblAssets.AID
WHERE dbo.tblBlog.Deleted = 'False' AND filteredAssetLink.N = 1
ORDER BY tblBlog.DateAdded DESC

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Inverse JOIN Query - sql

This is one way to do it: select * from a except select * from b

Related

Why is my query inserting the same values when I have added a 'not exists' parameter that should avoid this from happening?

How to convert list of comma separated Ids into their name?

Unable to convert this legacy SQL into Standard SQL in Google BigQuery

Oracle SQL XOR condition with > 14 tables

How to join three tables with distinct

Categories

Resources