MySQL: Select pages that are not tagged? - sql

I have a db with two tables like these below,
page table
pg_id title
1 a
2 b
3 c
4 d
tagged table
tagged_id pg_id
1 1
2 4
I want to select the pages which are tagged, I tried with this query below but doesn't work,
SELECT *
FROM root_pages
LEFT JOIN root_tagged ON ( root_tagged.pg_id = root_pages.pg_id )
WHERE root_pages.pg_id != root_tagged.pg_id
It returns zero - Showing rows 0 - 1 (2 total, Query took 0.0021 sec)
But I want it to return
pg_id title
2 b
3 c
My query must have been wrong?
How can I return the pages which are not tagged correctly?

SELECT *
FROM root_pages
LEFT JOIN root_tagged ON root_tagged.pg_id = root_pages.pg_id
WHERE root_tagged.pg_id IS NULL
The != (or <>) operator compare two values, but cannot be used for NULL.
NULL = NULL returns false
NULL = 0 returns false
NULL != NULL returns false
You get the point, to check for NULL you should use the IS or IS NOT operator.

If your density to tag to pages is more than 2:1 or so, then using NOT EXISTS will be faster than using LEFT JOIN + IS NULL
SELECT *
FROM root_pages
WHERE NOT EXISTS (
SELECT *
FROM root_tagged
WHERE root_tagged.pg_id = root_pages.pg_id )
It is an alternative that more clearly states what you are looking for, a non-existence.
For the strikeout text above:
The question is MySQL specific, and assuming root_tagged.pg_id is not nullable, LEFT JOIN + IS NULL is implemented using ANTI-JOIN which is the same strategy as NOT EXISTS, except there seems to be some overhead added by NOT EXISTS, so LEFT JOIN is supposed to work faster.

Related

How to find difference between table with multiple conditions

I have exact two tables but some value differences. So I would like to find those differences with condition that if the column value has a difference of more than 10.
For example, all 9 columns have the same values in both tables, but the difference between the values column is 11, so this record is different. If the value difference is 9 so records are the same.
So I wrote this query to get differences:
select *
from test.test m
inner join test.test1 t
on
m.month_date = t.month_date and
m.level_1 = t.level_1 and
m.level_2 = t.level_2 and
m.level_3 = t.level_3 and
m.level_4 = t.level_4 and
m.level_header = t.level_header and
m.unit = t.unit and
m.model_type_id = t.model_type_id and
m.model_version_desc = t.model_version_desc
where m.month_date = '2022-11-01' and abs(m.value - t.value) > 10)
so this returns me all records that all column values are matched but did not pass the value difference condition.
Second, i have full outer join to get all differences
select *
from test.test m
full outer join test.test1 t
on
m.month_date = t.month_date and
m.level_1 = t.level_1 and
m.level_2 = t.level_2 and
m.level_3 = t.level_3 and
m.level_4 = t.level_4 and
m.level_header = t.level_header and
m.unit = t.unit and
m.model_type_id = t.model_type_id and
m.model_version_desc = t.model_version_desc
where m.month_date is null or t.month_date is null and m.month_date = '2022-11-01'
How can I combine the results of these two queries without UNION? I want to have only one query (sub query is acceptable)
Assuming that for a given day, you need to find
rows that match between the tables but exceed the value difference threshold
AND
rows present in either left or right table, that don't have a corresponding row in the other table
select *
from test.test m
full outer join test.test1 t
using (
month_date,
level_1,
level_2,
level_3,
level_4,
level_header,
unit,
model_type_id,
model_version_desc )
where (m.month_date is null
or t.month_date is null
and m.month_date = '2022-11-01' )
or (m.month_date = '2022-11-01' and abs(m.value - t.value) > 10);
Online demo
Since the columns used to join the tables have the same names, you can shorten their list by swapping out the lengthy table1.column1=table2.column1 and... list of pairs for a single USING (month_date,level_1,level_2,level_3,...) (doc). As a bonus, it will avoid listing the matching columns twice in your output, once for the left table, once for the right table.
select *
from (select 1,2,3) as t1(a,b,c)
full outer join
(select 1,2,3) as t2(a,b,c)
on t1.a=t2.a
and t1.b=t2.b
and t1.c=t2.c;
-- a | b | c | a | b | c
-----+---+---+---+---+---
-- 1 | 2 | 3 | 1 | 2 | 3
select *
from (select 1,2,3) as t1(a,b,c)
full outer join
(select 1,2,3) as t2(a,b,c)
using(a,b,c);
-- a | b | c
-----+---+---
-- 1 | 2 | 3
In your first query, you can replace the null values for a specific number. Something like this:
where m.month_date = '2022-11-01' and abs(ISNULL(m.value,-99) - ISNULL(t.value,-99)) > 10)
The above will replace the nulls for -99 (choose an appropriate value for your data), so if you have that m.value is 10 and t.value is null, then should be returned in your first query.

SELECT NOT IN with multiple columns in subquery

Regarding the statement below, sltrxid can exist as both ardoccrid and ardocdbid. I'm wanting to know how to include both in the NOT IN subquery.
SELECT *
FROM glsltransaction A
INNER JOIN cocustomer B ON A.acctid = B.customerid
WHERE sltrxstate = 4
AND araccttype = 1
AND sltrxid NOT IN(
SELECT ardoccrid,ardocdbid
FROM arapplyitem)
I would recommend not exists:
SELECT *
FROM glsltransaction t
INNER JOIN cocustomer c ON c.customerid = t.acctid
WHERE
??.sltrxstate = 4
AND ??.araccttype = 1
AND NOT EXISTS (
SELECT 1
FROM arapplyitem a
WHERE ??.sltrxid IN (a.ardoccrid, a.ardocdbid)
)
Note that I changed the table aliases to things that are more meaningful. I would strongly recommend prefixing the column names with the table they belong to, so the query is unambiguous - in absence of any indication, I represented this as ?? in the query.
IN sometimes optimize poorly. There are situations where two subqueries are more efficient:
SELECT *
FROM glsltransaction t
INNER JOIN cocustomer c ON c.customerid = t.acctid
WHERE
??.sltrxstate = 4
AND ??.araccttype = 1
AND NOT EXISTS (
SELECT 1
FROM arapplyitem a
WHERE ??.sltrxid = a.ardoccrid
)
AND NOT EXISTS (
SELECT 1
FROM arapplyitem a
WHERE ??.sltrxid = a.ardocdbid
)

Null value is considered an existing value

I have a validation into if conditional like:
IF(EXISTS
(SELECT TOP 1 [TaskAssignationId]
FROM [Task] AS [T]
INNER JOIN #TaskIdTableType AS [TT] ON [T].[TaskId] = [TT].[Id]
))
But it returns NULL value because TaskAssignationId is NULL so in consequence IF condition it's true because it exist with NULL value, but I don't want to consider NULL as a value. How can add an exception of nulls? Regards
If you don't want to include rows where [TaskAssignationId] is null then add that to a WHERE clause.
IF(EXISTS
SELECT TOP 1 [TaskAssignationId]
FROM [Task] AS [T]
INNER JOIN #TaskIdTableType AS [TT] ON [T].[TaskId] = [TT].[Id]
WHERE [TaskAssignationId] is not null
))
Exists works like "Did the (sub)query return more than zero (correlated) rows" not "did the (sub)query return a non null value"
These are perfectly valid exists:
SELECT * FROM person p
WHERE EXISTS (SELECT null FROM address a WHERE a.personid = p.id)
SELECT * FROM person p
WHERE EXISTS (SELECT 1 FROM address a WHERE a.personid = p.id)
SELECT * FROM person p
WHERE EXISTS (SELECT * FROM address a WHERE a.personid = p.id)
It doesn't matter what values you return, or how many columns, exists cares whether the rowcount is 0 or greater when determining whether results exist
Hence you have to make sure your (sub)query returns no rows if you want the exists check to fail. If Addresses that have a null type are unacceptable, the (sub)query has to exclude them with WHERE a.type IS NOT NULL so that only rows with a non null type are considered
There's also little point doing a TOP 1 in the (sub)query; the optimiser knows that the only condition it cares about is 0 or not-0 rows, so it automatically do a TOP 1 (i.e. it will stop retrieving data when it knows there is at least one row)
If you want to check the existence then no need to assign the column name, you can use select 1
IF(EXISTS
SELECT TOP 1 1
FROM [Task] AS [T]
INNER JOIN #TaskIdTableType AS [TT] ON [T].[TaskId] = [TT].[Id]
))
begin
----code---
end

Oracle SQL XOR condition with > 14 tables

I have a question on sql desgin.
Context:
I have a table called t_master and 13 other tables (lets call them a,b,c... for simplicity) where it needs to compared.
Logic:
t_master will be compared to table 'a' where t_master.gen_val =
a.value.
If record exist in t_master, retrieve t_master record, else retrieve 'a' record.
I do not need to retrieve the records if it exists in both tables (t_master and a) - XOR condition
Repeat this comparison with the remaining 12 tables.
I have some idea on doing this, using WITH to subquery the non-master tables (a,b,c...) first with their respective WHERE clause.
Then use XOR statement to retrieve the records.
Something like
WITH a AS (SELECT ...),
b AS (SELECT ...)
SELECT field1,field2...
FROM t_master FULL OUTER JOIN a FULL OUTER JOIN b FULL OUTER JOIN c...
ON t_master.gen_value = a.value
WHERE ((field1 = x OR field2 = y ) AND NOT (field1 = x AND field2 = y))
AND ....
.
.
.
.
Seeing that I have 13 tables that I need to full outer join, is there a better way/design to handle this?
Otherwise I would have at least 2*13 lines of WHERE clause which I'm not sure if that will have impact on the performance as t_master is sort of a log table.
**Assume I cant change any schema.
Currently I'm not sure if this SQL will working correctly yet, so I'm hoping someone can guide me in the right direction regarding this.
update from used_by_already's suggestion:
This is what I'm trying to do (comparison between 2 tables first, before I add more, but I am unable to get values from ATP_R.TBL_HI_HDR HI_HDR as it is in the NOT EXISTS subquery.
How do i overcome this?
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO JOIN ATP_R.TBL_HI_HDR HI_HDR ON LOG_REPO.GEN_VAL = HI_HDR.HI_NO
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_R.TBL_HI_HDR HI_HDR
WHERE LOG_REPO.GEN_VAL = HI_HDR.HI_NO
)
UNION ALL
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_R.TBL_HI_HDR HI_HDR JOIN ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO ON HI_HDR.HI_NO = LOG_REPO.GEN_VAL
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO
WHERE HI_HDR.HI_NO = LOG_REPO.GEN_VAL
)
Full outer joins used to exclude all matching rows can be an expensive query. You don't supply much detail, but perhaps using NOT EXISTS would be simpler and maybe it will produce a better explain plan. Something along these lines.
select
cola,colb,colc
from t_master m
where not exists (
select null from a where m.keycol = a.fk_to_m
)
and not exists (
select null from b where m.keycol = b.fk_to_m
)
and not exists (
select null from c where m.keycol = c.fk_to_m
)
union all
select
cola,colb,colc from a
where not exists (
select null from t_master m where a.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from b
where not exists (
select null from t_master m where b.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from c
where not exists (
select null from t_master m where c.fk_to_m = m.keycol
)
You could union the 13 a,b,c ... tables to simplify the coding, but that may not perform so well.

sql function case returns more than one row

Going to use this query as a subquery, the problem is it returns many rows of duplicates. Tried to use COUNT() instead of exists, but it still returns a multiple answer.
Every table can only contain one record of superRef.
The below query I`ll use in SELECT col_a, [the CASE] From MyTable
SELECT CASE
WHEN
EXISTS (SELECT 1 FROM A WHERE
A_superRef = myTable.sysno AND A_specAttr = 'value')
THEN 3
WHEN EXISTS (SELECT 1 FROM B
INNER JOIN С ON С_ReferenceForB = B_sysNo WHERE C_superRef = myTable.sysno AND b_type = 2)
THEN 2
ELSE (SELECT C_intType FROM C
WHERE C_superRef = myTable.sysno)
END
FROM A, B, C
result:
3
3
3
3
3
3...
What if you did this? Because Im guessing you are getting an implicit full outer join A X B X C then running the case statement for each row in that result set.
SELECT CASE
WHEN
EXISTS (SELECT 1 FROM A WHERE
A_superRef = 1000001838012)
THEN 3
WHEN EXISTS (SELECT 1 FROM B
INNER JOIN С ON С_ReferenceForB = B_sysNo AND C_superRef = 1000001838012 )
THEN 2
ELSE (SELECT C_type FROM C
WHERE C_superRef = 1000001838012)
END
FROM ( SELECT COUNT(*) FROM A ) --This is a hack but should work in ANSI sql.
--Your milage my vary with different RDBMS flavors.
DUAL is what I needed, thanks to Thorsten Kettner
SELECT CASE
WHEN
EXISTS (SELECT 1 FROM A WHERE
A_superRef = 1000001838012)
THEN 3
WHEN EXISTS (SELECT 1 FROM B
INNER JOIN С ON С_ReferenceForB = B_sysNo AND C_superRef = 1000001838012 )
THEN 2
ELSE (SELECT C_type FROM C
WHERE C_superRef = 1000001838012)
END
FROM DUAL