Data Comparison between Two Tables

Data Comparison between Two Tables - sql

I have what should be simple (maybe) and I am just struggling with it.
Here is the scenario:
TABLE 1 contains all the data
TABLE 2 contains only a subset
I need a query that will look at table 1 and give a list of items that are not in table 2. Below is what I have but I know its not performing as such.
SELECT c.[DOC_ID], d.[DOCID]
FROM [dbo].[Custom_SUAM_Docuware] d
LEFT JOIN [dbo].[Custom_SUAM_Content] c ON (c.[DOC_ID] = d.[DOCID])
WHERE c.[DOC_ID] IS NULL
OR d.[DOCID] IS NULL

You are describing a not exists scenario.
You can't expect to return data from c since by definition what you want doesn't exist:
select d.DOCID
from dbo.Custom_SUAM_Docuware d
where not exists(
select * from dbo.Custom_SUAM_Content c
where c.DOC_ID = d.DOCID
);

you can use EXCEPT
SELECT c.[DOC_ID]
FROM [dbo].[Custom_SUAM_Content] c
EXCEPT
SELECT d.[DOC_ID]
FROM [dbo].[Custom_SUAM_Docuware] d ;
that would show all ids from c that are not in d

Related

Optimizing SQL Cross Join that checks if any array value in other column

Let's say I have a table events with structure:
id
value_array
XXXX
[a,b,c,d]
...
...
I have a second table values_of_interest with structure:
value
x
y
z
a
I want to find id's that have any of the values found in values_of_interest. All else equal, what would be the most performant SQL to make this happen? (I am using BigQuery, but feel free to answer more generally)
My current thought is:
SELECT
DISTINCT e.id
FROM
events e, values_of_interest vi
WHERE
EXISTS(
SELECT
value
FROM
UNNEST(e.value_array) value
JOIN
vi ON vi.value = e.value
)

Few quick options for BigQuery Standard SQL
Option 1
select id
from `project.dataset.events`
where exists (
select 1
from `project.dataset.values_of_interest`
where value in unnest(value_array)
)
Option 2
select id
from `project.dataset.events` t
where (
select count(1)
from t.value_array as value
join `project.dataset.values_of_interest`
using(value)
) > 0

I would write this using exists and a join:
select e.id
from `project.dataset.events` e
where exists (select 1
from unnest(e.value_array) val join
`project.dataset.values_of_interest` voi
on val = voi.value
);

Oracle SQL XOR condition with > 14 tables

I have a question on sql desgin.
Context:
I have a table called t_master and 13 other tables (lets call them a,b,c... for simplicity) where it needs to compared.
Logic:
t_master will be compared to table 'a' where t_master.gen_val =
a.value.
If record exist in t_master, retrieve t_master record, else retrieve 'a' record.
I do not need to retrieve the records if it exists in both tables (t_master and a) - XOR condition
Repeat this comparison with the remaining 12 tables.
I have some idea on doing this, using WITH to subquery the non-master tables (a,b,c...) first with their respective WHERE clause.
Then use XOR statement to retrieve the records.
Something like
WITH a AS (SELECT ...),
b AS (SELECT ...)
SELECT field1,field2...
FROM t_master FULL OUTER JOIN a FULL OUTER JOIN b FULL OUTER JOIN c...
ON t_master.gen_value = a.value
WHERE ((field1 = x OR field2 = y ) AND NOT (field1 = x AND field2 = y))
AND ....
.
.
.
.
Seeing that I have 13 tables that I need to full outer join, is there a better way/design to handle this?
Otherwise I would have at least 2*13 lines of WHERE clause which I'm not sure if that will have impact on the performance as t_master is sort of a log table.
**Assume I cant change any schema.
Currently I'm not sure if this SQL will working correctly yet, so I'm hoping someone can guide me in the right direction regarding this.
update from used_by_already's suggestion:
This is what I'm trying to do (comparison between 2 tables first, before I add more, but I am unable to get values from ATP_R.TBL_HI_HDR HI_HDR as it is in the NOT EXISTS subquery.
How do i overcome this?
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO JOIN ATP_R.TBL_HI_HDR HI_HDR ON LOG_REPO.GEN_VAL = HI_HDR.HI_NO
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_R.TBL_HI_HDR HI_HDR
WHERE LOG_REPO.GEN_VAL = HI_HDR.HI_NO
)
UNION ALL
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_R.TBL_HI_HDR HI_HDR JOIN ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO ON HI_HDR.HI_NO = LOG_REPO.GEN_VAL
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO
WHERE HI_HDR.HI_NO = LOG_REPO.GEN_VAL
)

Full outer joins used to exclude all matching rows can be an expensive query. You don't supply much detail, but perhaps using NOT EXISTS would be simpler and maybe it will produce a better explain plan. Something along these lines.
select
cola,colb,colc
from t_master m
where not exists (
select null from a where m.keycol = a.fk_to_m
)
and not exists (
select null from b where m.keycol = b.fk_to_m
)
and not exists (
select null from c where m.keycol = c.fk_to_m
)
union all
select
cola,colb,colc from a
where not exists (
select null from t_master m where a.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from b
where not exists (
select null from t_master m where b.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from c
where not exists (
select null from t_master m where c.fk_to_m = m.keycol
)
You could union the 13 a,b,c ... tables to simplify the coding, but that may not perform so well.

Oracle SQL - selective filtering causes cartesian

Oracle 12.2
I have a SQL statement that is causing me issues. I am retrieving data from a table called BURNDOWN. If the user is an admin, they get to see all the data. If the user is NOT an admin, they are restricted to what they can see, based on some join conditions.
The issue I am running into is when the user is an ADMIN, I don’t need the other tables… subsequently, the JOIN condition is not relevant, so Oracle is deciding to do a cartesian join across everything…
How do I get around this so that is the user is an Admin, I only look at one table, else I look at all tables and include the join condition?
The example SQL is a contrived example, but it shows the issue.
Select
BURNDOWN.NAME,
BURNDOWN.ADDRESS,
BURNDOWN.STATE
from BURNDOWN, FILTER_A, FILTER_B, FILTER_C
Where
(
:ISAdmin = 1
Or
(
BURNDOWN.x=FILTER_A.x and
FILTER_A.y=FILTER_B.y and
FILTER_B.z=FILTER_C.z and
FILTER_C.user = :ThisUser
)
)

Use an EXISTS to see if the data exists in the FILTER tables without joining them in to the results.
select bd.*
from burndown bd
where ( :isadmin = 1 or
exists ( select 1
from filter_a a
inner join filter_b b on b.y = a.y
inner join filter_c c on c.z = b.z
where a.x = bd.x
and c.user = :ThisUser )
)

Presumably, you want:
select bd.*
from burndown bd
where :ISAdmin = 1 or
(exists (select 1 from FILTER_A a where bd.x = a.x) or
exists (select 1 from FILTER_B b where bd.y = b.y) or
exists (select 1 from FILTER_C c where bd.z = c.z)
);

Why is Selecting From Table Variable Far Slower than List of Integers

I have a pretty big MSSQL stored procedure that I need to conditionally check for certain IDs:
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (1,2,3,4,5)
I wanted to conditionally check the SomeID field, so I did the following:
if #enteredText = 'This'
INSERT INTO #AwesomeIDs
VALUES(1),(2),(3)
if #enteredText = 'That'
INSERT INTO #AwesomeIDs
VALUES(4),(5)
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (Select ID from #AwesomeIDs)
Nothing else has changed, yet I can't even get the latter query to grab 5 records. The top query returns 5000 records in less than 3 seconds. Why is selecting from a table variable so much drastically slower?

Two other possible options you can consider
Option 1
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where
( b.SomeID IN (1,2,3) AND #enteredText = 'This')
OR
( b.SomeID IN (4,5) AND #enteredText = 'That')
Option 2
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where EXISTS (Select 1
from #AwesomeIDs
WHERE b.SomeID = ID)
Mind you for Table variables , SQL Server always assumes there is only ONE row in the table (except sql 2014 , assumption is 100 rows) and it can affect the estimated and actual plans. But 1 row against 3 not really a deal breaker.

Need help with a complex SQL query - I think I need a two-stage inner join or something like that?

Okay, here's what I'm trying to do. I have a drupal table (term_data) in mysql that lists tags and their ID numbers and a second table, (term_node) that shows the relationship between tags and the data the tags refer to. For example, if node 1 had 3 tags, "A", "B" and "C". term_data might look like this:
name tid
A 1
B 2
C 3
and term_node might look like this:
nid tid
1 1
1 2
2 2
3 3
3 2
In this example, node 1 has been tagged with "A" and "B", node 2 has been tagged with "A" and node 3 has been tagged with "B", and "C".
I need to write a query that, given a tag name, list for me all the OTHER tags that are ever used with that tag. In the above example, searching on "A" should return "A" and "B" because node 1 uses both, searching on "C" should return "B" and "C", and searching on "B" should return "A", "B" and "C".
Any ideas? I got this far:
select distinct n.nid from term_node n INNER join term_data t where n.tid = t.tid and t.name='A';
Which gives me a list of every node that has been tagged with "A" - but I can't figure out the next step.
Can anyone help me out?

Try:
select distinct d2.name
from term_data d1
join term_node n1 on d1.tid = n1.tid
join term_node n2 on n1.nid = n2.nid
join term_data d2 on n2.tid = d2.tid
where d1.name = 'A'

Updated: Mark pointed out that the query wasn't correct.
SELECT DISTINCT t.name, t2.name Other
FROM
term_data t
INNER JOIN term_node n ON t.tid = n.tid
INNER JOIN term_node n2 ON n2.nid = n.nid
INNER JOIN term_data t2 ON n2.tid = t2.tid
WHERE
t.name = 'A'
Marks answer should be accepted since he got it right first. Here is a demonstration of a similar query
https://data.stackexchange.com/stackoverflow/query/13283/demo-for-need-help-with-a-complex-sql-query

Your description of term_node data and the example do not seem to match but using the example data provided I believe the following query will do what you need.
select distinct td.name, td2.name as tagged_name
from term_data td
inner join term_node tn
on tn.tid = td.tid
inner join term_node tn2
on tn2.nid = tn.nid
inner join term_data td2
on td2.tid = tn2.tid
The first join looks up the term_node records that match the name, term_node is then joined to itself to find all other tid's for that node, finally the second term_node is joined to term_data to retrieve the names of the tag.
You need to tack on the appropriate where clause to select just the tag you want.
Result set follows for above:-
name tagged_name
A A
A B
B A
B B
B C
C B
C C
Hope this helps
Ray

I created the schema in my workbench, and here's the query I came up with:
SELECT * FROM `term_data` WHERE `term_data`.`tid` IN (
SELECT `term_node`.`tid` from `term_node` WHERE `nid` IN (
SELECT `nid` FROM `term_node` JOIN `term_data` ON `term_data`.`tid` = `term_node`.`tid` WHERE `term_data`.`name` = 'A'
)
);
Sorry for the structure ;) Here's SHOW CREATE TABLE for both tables:
CREATE TABLE `term_data` (
`tid` int(11) NOT NULL,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`tid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `term_node` (
`term_node_id` int(11) NOT NULL,
`nid` int(11) NOT NULL,
`tid` varchar(45) DEFAULT NULL,
PRIMARY KEY (`term_node_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
This seemed to work as expected, if I understood your question correctly. So one more time, we have some nodes which are tagged. We'd like to select a tag (A), and then select other tags that were used to tag same nodes as tag A.
Cheers.
P.S. Output is the following:
tid name
/* For tag A */
1 A
2 B
/* For tag B */
1 A
2 B
3 C
/* For tag C */
2 B
3 C

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Data Comparison between Two Tables - sql

You are describing a not exists scenario. You can't expect to return data from c since by definition what you want doesn't exist: select d.DOCID from dbo.Custom_SUAM_Docuware d where not exists( select * from dbo.Custom_SUAM_Content c where c.DOC_ID = d.DOCID );

you can use EXCEPT SELECT c.[DOC_ID] FROM [dbo].[Custom_SUAM_Content] c EXCEPT SELECT d.[DOC_ID] FROM [dbo].[Custom_SUAM_Docuware] d ; that would show all ids from c that are not in d

Related

Optimizing SQL Cross Join that checks if any array value in other column

Oracle SQL XOR condition with > 14 tables

Oracle SQL - selective filtering causes cartesian

Why is Selecting From Table Variable Far Slower than List of Integers

Need help with a complex SQL query - I think I need a two-stage inner join or something like that?

Categories

Resources