Eliminating NOT IN from query - sql

I'm trying to eliminate the need to use NOT IN in my query:
select count(*)
FROM TABLE1 T1
LEFT OUTER JOIN TABLE2 T2
ON T1.DATAID = T2.EXISTING_DOCUMENT
AND T1.ownerid = -2000
AND T1.SUBTYPE = 144
AND T1.dataid NOT IN (SELECT T3.dataid
FROM TABLE3 T3
WHERE T3.ID = 123)
Reason: I read that NOT IN is slow (+500k rows) and doesn't use indices
I tried:
select count(*)
FROM TABLE1 T1
LEFT OUTER JOIN TABLE2 T2
ON T1.DATAID = T2.EXISTING_DOCUMENT
AND T1.ownerid = -2000
AND T1.SUBTYPE = 144
left outer join TABLE3 T3
on T3.ancestorid = T1.dataid
where T3.ID = 123

NOT IN does use indices, at least in a competent database such as Oracle. However, you can write this using joins if you prefer.
But, why doesn't this do what you want?
select count(*)
FROM TABLE1 T1
WHERE T1.ownerid = -2000 AND T1.SUBTYPE = 144;
You are using a LEFT JOIN, so the only difference is that your version counts duplicates in TABLE2. But that may not really apply.
Your query doesn't really make sense, because you are comparing T1.dataid to T1.dataid. But, further, the comparison to Table3 has no impact on the result. So, you can just remove it:
select count(*)
FROM TABLE1 T1 LEFT OUTER JOIN
TABLE2 T2
ON T1.DATAID = T2.EXISTING_DOCUMENT AND
T1.ownerid = -2000 AND
T1.SUBTYPE = 144 ;
Because of the LEFT JOIN, filtering in the ON clause will not remove any rows. And because it is NOT IN, there is no possibility of duplication.

Use a WHERE x IS NULL filter to emulate a NOT IN.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE t1 ( ownerid int, subtype int, dataid int, note varchar(100) ) ;
INSERT INTO t1 ( ownerid, subtype, dataid, note )
SELECT 1 as ownerid, 1 as subtype, 1 as dataid, 'WHERE Filter' as note FROM DUAL UNION ALL
SELECT -2000, 1,1, 'IN WHERE Filter' FROM DUAL UNION ALL
SELECT -2000,144,1, 'IN WHERE, NOT IN t3' FROM DUAL UNION ALL
SELECT -2000,144,2, 'IN WHERE, IN t3' FROM DUAL UNION ALL
SELECT -2000,144,3, 'IN WHERE, NOT IN t3' FROM DUAL
;
CREATE TABLE t2 ( existing_document int, note varchar(100) ) ;
INSERT INTO t2 (existing_document, note)
SELECT 1 as existing_document, 'JOIN t1' as note FROM DUAL UNION ALL
SELECT 2, 'JOIN t1' FROM DUAL UNION ALL
SELECT 2, 'JOIN t1, DUPE' FROM DUAL UNION ALL
SELECT 3, 'JOIN t1' FROM DUAL UNION ALL
SELECT 3, 'JOIN t1, DUPE' FROM DUAL UNION ALL
SELECT 4, 'NOT JOIN t1' FROM DUAL
;
CREATE TABLE t3 ( id int, dataid int, note varchar(100) ) ;
INSERT INTO t3 (id, dataid, note)
SELECT 1 as id, 1 as dataid, 'No filter. No match.' as note FROM DUAL UNION ALL
SELECT 1, 4, 'No filter. No match t1.' FROM DUAL UNION ALL
SELECT 123,2,'Match JOIN filter. Match t1' FROM DUAL
;
Read the notes in the setup to view how I'm building up the data. It's very simple and not a lot to count, but it should give you an idea on how this data works together.
Query:
SELECT * /* Not counting here so you can see what's supposed to be counted. */
FROM t1
INNER JOIN t2 ON t1.dataid = t2.EXISTING_DOCUMENT
LEFT OUTER JOIN t3 ON t1.dataid = t3.dataid
AND t3.ID = 123
WHERE t1.ownerid = -2000
AND t1.subtype = 144
AND t3.dataid IS NULL /* This is the NOT IN */
Results:
| OWNERID | SUBTYPE | DATAID | NOTE | EXISTING_DOCUMENT | NOTE | ID | DATAID | NOTE |
|---------|---------|--------|---------------------|-------------------|---------------|--------|--------|--------|
| -2000 | 144 | 1 | IN WHERE, NOT IN t3 | 1 | JOIN t1 | (null) | (null) | (null) |
| -2000 | 144 | 3 | IN WHERE, NOT IN t3 | 3 | JOIN t1 | (null) | (null) | (null) |
| -2000 | 144 | 3 | IN WHERE, NOT IN t3 | 3 | JOIN t1, DUPE | (null) | (null) | (null) |
The optimizer usually runs very well with the WHERE x IS NULL syntax and indexes should still apply, but if Oracle is able to make use of the indexes in the NOT IN, that is a big plus. If you're dealing with a lot of data, the IS NULL method can be a lot faster. The best check is to just test it with your actual data.

how about NOT EXISTS?
select count(*)
FROM TABLE1 T1
LEFT OUTER JOIN TABLE2 T2
ON T1.DATAID = T2.EXISTING_DOCUMENT
AND T1.ownerid = -2000
AND T1.SUBTYPE = 144
AND NOT EXISTS (SELECT 1
FROM TABLE3 T3
WHERE T3.ID = 123
AND T3.dataid = T1.dataid)

Related

How to find the same elements in tow arrays of two different tables in HIVE?

I have two tables like the follows:
table1:
id sid
1 | ['101', '102', '103']
2 | ['102', '101', '103']
3 | ['103', '101', '102']
table2:
id | sid
1 | ['101', '102', '103']
3 | ['102', '103']
and I wish to get the following table:
id sid
1 | ['101', '102', '103']
2 | ['102', '101', '103']
3 | ['103', '102']
Explanation: I wish to select the same elements in table1.sid and table2.sid with the same order in table1. Besides, if the id in table1 doesn't exist in table2, then keep the sid as it is in table1. What should I do?
You can use posexplode() to basically do what you want. Of course, all this array stuff is more complex in Hive than in other databases, particularly getting the results in the order you want:
select t1.id, collect_list(sid2)
from (select t1.id, t2.sid2, t1.pos1
from (table1 t1 lateral view
posexplode(t1.sid) as pos1, sid1
) left join
(table2 t2 lateral view
posexplode(t2.sid) as pos2, sid2
)
on t2.id = t1.id and t2.sid2 = t1.sid1
distribute by t1.id
order by t1.id, t1.pos1
) t
Something like this:
with t as (
select t1.id, collect_list(sid2) as sid
from (select t1.id, t2.sid2, t1.pos1
from (table1 t1 lateral view
posexplode(t1.sid) as pos1, sid1
) left join
(table2 t2 lateral view
posexplode(t2.sid) as pos2, sid2
)
on t2.id = t1.id and t2.sid2 = t1.sid1
distribute by t1.id
order by t1.id, t1.pos1
) t
select t1.id,
(case when size(t.sid) = 0 then t1.sid else t.sid end)
from t1 left join
t
on t1.id = t.id

SQL Query -Without using nested subqueries

Table1
ID SystemID Description
---------------------------
1 25 Test1
1 25 Test2
2 40 Test1
2 40 Test3
3 26 Test9
3 36 Test5
4 70 Test2
4 70 Test9
Table2
ID Department
------------------
1 Sales
2 Marketing
3 Accounting
4 Purchasing
I have these 2 tables, Table1 and Table2.
I need to select all the distinct ids from Table1 that have the same description as ID = 1 and SystemID = 25, and then select all the rows from Table2 from the query result.
Is there a better way to query for this, without using nested subqueries?
select *
from Table2
where ID in (select distinct(ID)
from Table1
where SystemID = 25
and Description in (select Description
from Table1
where ID = 1 and SystemID = 25))
Final output is
1 Sales
2 Marketing
4 Purchasing
Any help is appreciated. Thank you.
I think you want:
select t1.id, t2.department
from table1 t1 join
table2 t2
on t1.id = t2.id
where t1.description in (select tt1.description from table1 tt1 where tt1.id = 1 and tt1.systemid = 25);
This is standard SQL and should work in both SQL Server and Oracle.
You can also use a modification of an outer join to detect presence of a value.
SELECT DISTINCT t2.ID, t2.DEPARTMENT
FROM
table2 AS t2
INNER JOIN table1 AS t1a ON table2.ID = table1.ID
LEFT OUTER JOIN table1 AS t1b ON t1b.id = 1 AND t1b.systemID = 25 AND t1b.description = t1a.description
WHERE t1b.ID IS NOT NULL
AND t1a.systemID = 25
This will filter out all entries who don't have a description matching an entry with id 1 and systemID 25
I believe this should give you the same result. Instead of using an IN I used an EXISTS and then instead of a futher subquery you can then use a JOIN:
SELECT *
FROM Table2 T2
WHERE EXISTS (SELECT 1
FROM Table1 T1
JOIN Table1 T1t ON T1.[Description] = T1t.[Description]
WHERE T1.ID = T2.ID
AND T1t.ID = 1 AND T1t.SystemID = 25);
SELECT DISTINCT T2.* --Use a distinct for simplicity but a group by is better
FROM Table2 AS T2
INNER JOIN Table1 AS T1_Source ON T1_Source.SystemID = 25 AND T1_Source.ID = 1
/*^ Find table1 with System and ID
Expected Result
ID SystemID Description
1 25 Test1
1 25 Test2
Note Rows are duplicated use distinct or group by
*/
INNER JOIN Table1 AS T1_Target ON T1_Target.Description = T1_Source.Description
/*^ Find table1 with all the Description matching the result we found
Expected Result
ID SystemID Description
1 25 Test1
1 25 Test2
2 40 Test1
4 70 Test2
Note Rows are duplicated use distinct or group by
*/

Get groups that are exactly equal to a table

I have a query that groups easily. I need to get the groups that have exactly the same records to another table (relationship).
I'm using ANSI-SQL under SQL Server, but I accept an answer of any implementation.
For example:
Table1:
Id | Value
---+------
1 | 1
1 | 2
1 | 3
2 | 4
3 | 2
4 | 3
Table2:
Value | ...
------+------
1 | ...
2 | ...
3 | ...
In my example, the result is:
Id |
---+
1 |
How imagined that it could be the code:
SELECT Table1.Id
FROM Table1
GROUP BY Table1.Id
HAVING ...? -- The group that has exactly the same elements of Table2
Thanks in advance!
You can try the following:
select t1.Id
from Table2 t2
join Table1 t1 on t1.value = t2.value
group by t1.Id
having count(distinct t1.value) = (select count(*) from Table2)
SQLFiddle
To get the same sets use an inner join:
SELECT Table1.Id
FROM Table1
INNER JOIN table2 ON table1.id=table2.id
GROUP BY Table1.Id
HAVING ...? --
CREATE TABLE #T1 (ID INT , [Values] INT) INSERT INTO #T1 VALUES (1,1),(1,2),(1,3),(2,4),(2,5),(3,6)
CREATE TABLE #T2 ([Values] INT) INSERT INTO #T2 VALUES (1),(2),(3),(4)
SELECT * FROM #T1
SELECT * FROM #T2
SELECT A.ID
FROM
( SELECT ID , COUNT(DISTINCT [Values]) AS Count FROM #T1
GROUP BY ID
) A
JOIN
(
SELECT T1.ID, COUNT(DISTINCT T2.[Values]) Count
FROM #T1 T1
JOIN #t2 T2
ON T1.[Values] = T2.[Values]
GROUP BY T1.ID
) B
ON A.ID = B.ID AND A.Count = B.Count

Multiple join not joining as expected

I have the following tables
table 2 with a fk to table 1
table 3 with a fk to table 1
In table 2 I have two rows linked to table 1
In table 3 I have one row linked table 1
I am trying to produce a table that has
| table1 pk | table 2 pk | null |
| table1 pk | table 2 pk | null |
| table1 pk | null | table 3 pk |
However when I try the following I get
select tab1.id, tab2.id, tab3.id
from table1 tab1
left join tab2 on tab1.id = tab2.tab1_id
left join tab3 on tab1.id = tab3.tab1_id
gives this table
| table1 pk | table 2 pk | table 3 pk |
| table1 pk | table 2 pk | table 3 pk |
Can anyone help with this SQL please?
Thanks in advance
EDIT
I think I may have simplified this a bit too much. Ideally output would be
| table1 pk | table 2 pk |
| table1 pk | table 3 pk |
| table1 pk | table 3 pk |
Once I get this join working it will be added to another massive query...
One unusual solution is
select coalesce(t2.tab1_id, t3.tab1_id) pk,
t2.pk, t3.pk
from table2 t2
full join table3 t3
on t3.tab1_id = t2.tab1_id
where exists (select * from table1
where pk in (t2.tab1_id, t3.tab1_id)
NOTE (Edit) As noted by Andriy M in comment, the where clause only eliminates rows from table2 and table3 where the FK does not exist in table1, which cannot exist if FK constraints have been properly applied to table2 and table3.
I don't see anything crazy here
WITH TAB1
AS (SELECT 1 AS ID FROM DUAL
UNION ALL
SELECT 2 AS ID FROM DUAL
UNION ALL
SELECT 3 AS ID FROM DUAL),
TAB2
AS (SELECT 1 AS TAB1_ID, 'A' AS ID FROM DUAL
UNION ALL
SELECT 2 AS TAB1_ID, 'B' AS ID
FROM DUAL),
TAB3
AS (SELECT 3 AS TAB1_ID, 'C' AS ID
FROM DUAL)
SELECT
TAB1.ID,
COALESCE ( TAB2.ID,
TAB3.ID )
FROM
TAB1
LEFT JOIN TAB2
ON TAB1.ID = TAB2.TAB1_ID
LEFT JOIN TAB3
ON TAB1.ID = TAB3.TAB1_ID;
1 A
2 B
3 C
You could use a join to the result of a union:
SELECT
t1.table1pk,
t23.table2pk,
t23.table3pk
FROM table1 t1
INNER JOIN
(
SELECT table2pk, NULL AS table3pk, table1fk
FROM table2
UNION ALL
SELECT NULL AS table2pk, table3pk, table1fk
FROM table3
) t23
ON t1.table1pk = t23.table1fk
;
Or you could use a union of two joins' results:
SELECT
t1.table1pk,
t2.table2pk,
NULL AS table3pk
FROM table1 t1
INNER JOIN table2 t2
ON t1.table1pk = t2.table1fk
UNION ALL
SELECT
t1.table1pk,
NULL AS table2pk,
t3.table3pk
FROM table1 t1
INNER JOIN table3 t3
ON t1.table1pk = t3.table1fk
;
Both methods could be adapted to produce the two-column version of the desired output:
a join to a union:
SELECT
t1.table1pk,
t23.otherpk
FROM table1 t1
INNER JOIN
(
SELECT table2pk AS otherpk, table1fk
FROM table2
UNION ALL
SELECT table3pk AS otherpk, table1fk
FROM table3
) t23
ON t1.table1pk = t23.table1fk
;
a union of joins:
SELECT
t1.table1pk,
t2.table2pk AS otherpk
FROM table1 t1
INNER JOIN table2 t2
ON t1.table1pk = t2.table1fk
UNION ALL
SELECT
t1.table1pk,
t3.table3pk AS otherpk
FROM table1 t1
INNER JOIN table3 t3
ON t1.table1pk = t3.table1fk
;

Within a SQL Server view - how to combine multiple column results into one column

I have a SQL Server database with the following 2 tables:
I have created a view with the following query and results:
My question is what query would bring the (3) ID columns in 'Table2' into one master ID List to where the final result would look like this:
ID Table1ID
test1 1
test1 4
test2 1
test2 2
test3 1
test3 2
test3 3
Note: here is the view as shown above:
SELECT
dbo.Table1.Description, Table2_1.ID AS Table2ID_1, Table2_2.ID AS Table2ID_2,
dbo.Table2.ID AS Table2ID_3
FROM
dbo.Table1
LEFT OUTER JOIN
dbo.Table2 ON dbo.Table1.ID = dbo.Table2.Table1ID3
LEFT OUTER JOIN
dbo.Table2 AS Table2_2 ON dbo.Table1.ID = Table2_2.Table1ID2
LEFT OUTER JOIN
dbo.Table2 AS Table2_1 ON dbo.Table1.ID = Table2_1.Table1ID1
My suggestion would be to UNPIVOT the data in Table2 so you can easily join on the data, then you can return the table1 description and the table2 id. The UNPIVOT portion of this query using CROSS APPLY:
select col, value, t2.Id
from table2 t2
cross apply
(
select 'table1id1', table1id1 union all
select 'table1id2', table1id2 union all
select 'table1id3', table1id3
) c (col, value);
See SQL Fiddle with Demo. This gives a result:
| COL | VALUE | ID |
---------------------------
| table1id1 | 1 | 1 |
| table1id2 | 2 | 1 |
| table1id3 | 3 | 1 |
| table1id1 | 2 | 2 |
| table1id2 | 3 | 2 |
| table1id3 | (null) | 2 |
| table1id1 | 3 | 3 |
Now that you have the data in rows, you can easily join on the value column to return the id:
select t1.description,
d.id
from table1 t1
inner join
(
select col, value, t2.Id
from table2 t2
cross apply
(
select 'table1id1', table1id1 union all
select 'table1id2', table1id2 union all
select 'table1id3', table1id3
) c (col, value)
) d
on t1.id = d.value
order by t1.description, d.id;
See SQL Fiddle with Demo
If you really want to use UNPIVOT, then you can use the following which doesn't join on each table multiple times to get the result:
select t1.description, t2.id
from table1 t1
inner join
(
select id, col, value
from
(
select id, [Table1ID1], [Table1ID2], [Table1ID3]
from table2
) d
unpivot
(
value for col in ([Table1ID1], [Table1ID2], [Table1ID3])
) unpiv
) t2
on t1.id = t2.value
order by t1.description, t2.id;
See SQL Fiddle with Demo.
The UNPIVOT and the CROSS APPLY is doing the same thing as a UNION ALL query:
select t1.description, t2.id
from table1 t1
inner join
(
select id, 'table1id1' col, table1id1 value
from table2
union all
select id, 'table1id2' col, table1id2
from table2
union all
select id, 'table1id3' col, table1id3
from table2
) t2
on t1.id = t2.value
order by t1.description, t2.id;
See SQL Fiddle with Demo
Microsoft SQL Server 2005 and higher support an UNPIVOT statement making the CROSS APPLY unnecessary.
SELECT Description AS [ID], Table1ID
FROM (SELECT Table1.Description, Table2_1.ID AS Table2ID_1, Table2_2.ID AS Table2ID_2, Table2.ID AS Table2ID_3
FROM Table1 LEFT OUTER JOIN
Table2 ON Table1.ID = Table2.Table1ID3 LEFT OUTER JOIN
Table2 AS Table2_2 ON Table1.ID = Table2_2.Table1ID2 LEFT OUTER JOIN
Table2 AS Table2_1 ON Table1.ID = Table2_1.Table1ID1) AS pvttbl
UNPIVOT ( Table1ID FOR ID IN (Table2ID_1, Table2ID_2, Table2ID_3)) AS unpvttbl
ORDER BY Description, Table1ID
See Using PIVOT and UNPIVOT on MSDN.