Oracle Toad SQL queries leading to inconsistent id counts - sql

I'm using Oracle Toad with SQL commands in the editor window.
I created two new tables (PIDS1 and PIDS2) that contain only one column of ID numbers from two related tables.
I had expected that PIDS2 would contain a superset of the ID's in PIDS1. When I tried to identify the ID's in PIDS2 that are not in PIDS1, I got started on a wild goose chase.
Let's say that it is a given that there is something unexpected going on with the data in my tables. But I cannot make any sense of the two simplified queries described below. The numbers are inconsistent. Can someone explain what is going on?
-- PIDS1 IS A SINGLE-COLUMN TABLE THAT CONTAINS 1638061 DISTINCT ID'S
-- PIDS2 IS A SINGLE-COLUMN TABLE THAT CONTAINS 3510272 DISTINCT ID'S
SELECT COUNT(T2.ID)
FROM PIDS2 T2
WHERE T2.ID NOT IN (
SELECT T1.ID
FROM PIDS1 T1);
-- RESULT IS ZERO!
-- WTF? PIDS2 HAS MORE ID'S THAN PIDS1!
SELECT COUNT(T1.ID)
FROM PIDS1 T1
WHERE T1.ID NOT IN (
SELECT T2.ID
FROM PIDS2 T2);
-- RESULT IS 786690
-- WHERE DID THAT NUMBER COME FROM? LOOKS ARBITRARY

Never use NOT IN with a subquery. If any of the values returned by the subquery are NULL, then all rows are filtered out.
For this reason, I always advise NOT EXISTS:
SELECT T2.ID
FROM PIDS2 T2
WHERE NOT EXISTS (SELECT 1 FROM PIDS1 T1 WHERE t1.ID = T2.ID);
Of course, you can also add WHERE t1.ID IS NOT NULL to the NOT IN version. In my experience, you'll forget it at some inopportune time in the future. Just use NOT EXISTS.

Related

TSQL - LEFT JOIN strange results

I am coming across an issue with the results of a seemingly basic query (on SQL Server 2017 CU17), and am hoping that people could suggest some things that I may not have checked or tried to get the correct results out.
The premise of the issue is that I am attempting to identify rows in one table, where an ID exists in another. This can usually be done with a LEFT JOIN, in this case, the query is as simple as follows:
SELECT t1.id,
t2.id
FROM Table1 AS t1
LEFT JOIN Table2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
This query should identify rows that exist in Table1 that do not exist in Table2, based on the 'id' columns, and is running against static data that isn't being manipulated in any way when I run the query.
I am getting a strange result from this, where rows are being returned where t1.id is returned, but t2.id is NULL - as if there are rows that exist in t1, but not t2.
However, if I take one of the IDs returned from the first query, and manually check if it exists in both tables, it looks like the id does exist in both - even if I put that id into a query with an inner join such as follows:
SELECT t1.id,
t2.id
FROM Table1 AS t1
INNER JOIN Table2 t2 ON t2.id = t1.id
WHERE t1.id = 761179370
If I run the LEFT JOIN query a number of times, I get a different number of rows returned each time.
Important to note that the id columns are both int datatypes, and the tables have the exact same collation.
What I have tried:
I have rebuilt statistics on all columns and indexes for each table to see if that was causing some issues.
I have restored the database on to another server and ran the same, and do not get the problem I am seeing above. The DB is in an availability group and I have also run the same query on the readable secondary, and am not seeing the same behaviour.
The server I ran the LEFT JOIN query is on, is a busy server overall - could this be a factor in why the query is not returning the correct results?
I have tried with ANSI_NULLS both on and off, no difference.
Any idea what the problem may be, or what I could check to figure out why I am getting these results - any guidance would be appreciated!
If you post the real query we may see a mistake there, check the query another time, some condition in the left join can change the "usual logic".
If still wrong, try this:
SELECT
t1.id,
t2.id
FROM Table1 AS t1
LEFT JOIN Table2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
option (recompile) --> This will ignore any cached data
Or create 2 temps to store some data and test there, database in production may change the results very quickly.
I had a similiar problem a while ago. The reason was, that the SQL-Server Management Studio had cached some of the results. The famous off and on again did solve my problem. Your experiments pretty much say this wasnt the case, so ..
Here i would suggest comparing the results to the following:
SELECT t1.id FROM Table1 AS t1
where t1.id not in (select id from table2)
If these results dont match with your query i would suggest a restart.
PS: Sorry for not using the comment, i have not enough reputation for that. :-\
Sometimes if you have that issue you can paste the value on Notepad++ to validate that there is no other char or something similar added to the second one.
You can try the (NOLOCK) to avoid this kind of behavior, something like this:
SELECT
*
FROM TABLE1 T1 (NOLOCK)
LEFT JOIN TABLE2 T2 (NOLOCK) -- LEFT JOIN IS THE SAME AS LEFT OUTER JOIN
ON T1.ID = T2.ID

SQL Query - Select Value from T1 where second value fully met in T2

I can do this in an ugly stored procedure with temp tables and whatnot, but I know an experienced developer could do this SO much more elegantly than what I've come up with. In fact, I'd kind of rather not have to call the sproc at all, but just have one query that gives me what I need.
I'm working with two tables:
T1 BillingDirectivesNeeded
T2 BillingDirectives.
T1 Has two fields relevant to this task -
PKey
WBS1.
There will be many PKeys associated with each WBS1.
T2 has only one field of interest
PKey.
The task I'm trying to address is geting a list of WBS1s from T1 that have ALL of their needed directives in T2 before I enable their import.
We want to import a WBS1 ONLY when all of the PKeys for that WBS1 are found in T2. If not, I'll just leave them grayed out.
I've tried a dozen different ways to get this to happen over the last few hours, and I seem to have a mental block. The pseudo-code would look something like this:
select T1.WBS1 from BillingDirectiveNeeded T1
where [all the T1.PKeys for T1.WBS1 can be found in BillingDirectives T2]
You can try using a Where Exists clause:
Select T1.WBS1
From BillingDirectiveNeeded T1
Where Exists
(
Select 1
From BillingDirectives T2
Where T2.PKey = T1.PKey
)
select DISTINCT T1.WBS1 from BillingDirectiveNeeded T1 where T1.PKey in (SELECT T2.PKey FROM BillingDirectives T2)

In SQL Server, how to filter lots of elements across multiple columns

I have a table, t1, with columns such as name, code1, code2,..., code20
There are, say, 100K rows.
I have another look up table, t2, which has one column, code; it has 10k rows and each row has a code. So, totally there are 10K codes in this 1-column table.
I need to filter out all the rows in t1 that have the codes in t2 from any column, i.e. columns code1 to code20. In other words, in each row in t1, once a column has one of the codes in t2, it should be captured.
Is there an easy way to do this? Thanks a lot!
Here is a way to do it using not exists:
select t1.*
from t1
where not exists (select 1
from t2
where t2.code = t1.code1 or
t2.code = t1.code2 or
. . .
t2.code = t1.code20
);
It is tempting to use in as the condition in the nested select, but this behaves in a funky way with NULLs. The sequence of direct comparisons is easier.
That said, having 20 columns with the same type of data is usually a sign of poor table design. More typically, the data would be in some sort of association/junction table, with the 20 columns each appearing in their own row.
Sounds like you need to pivot the data in Table t1 then join on t2.
So instead of t1 where you have name, code1, code2,...Code 20 you would pivot t1 to
just Name and Code columns then join on t2.
Alternatively you could just perform separate joins of t1 on t2 for each of t2's columns Code 1 to 20 and union the result.
That's if I understand your problem correctly.

SQL Statements Display More Rows Than Table Has

So I've just come across something interesting, and I don't know if it's been answered before, as I have no clue what it's called. Lets say that you have two tables; the first has one row and the second has two rows. If you run the following statment:
SELECT t1.*
FROM table1 t1, table2 t2
it returns two rows, and both have the same value, but the first table only has one row! Why does this occur? I didn't think having another table in the from clause changed anything if you didn't change the select clause accordingly.
You are selecting a cartesian product of the two tables.
It will return COUNT(t1) * COUNT(t2) records: all possible combinations of records from t1 with records from t2.
Using ANSI syntax, your query would read as:
SELECT t1.*
FROM table1 t1
CROSS JOIN
table2 t2

Select proper columns from JOIN statement

I have two tables: table1, table2. Table1 has 10 columns, table2 has 2 columns.
SELECT * FROM table1 AS T1 INNER JOIN table2 AS T2 ON T1.ID = T2.ID
I want to select all columns from table1 and only 1 column from table2. Is it possible to do that without enumerating all columns from table1 ?
Yes, you can do the following:
SELECT t1.*, t2.my_col FROM table1 AS T1 INNER JOIN table2 AS T2 ON T1.ID = T2.ID
Even though you can do the t1.*, t2.col1 thing, I would not recommend it in production code.
I would never ever use a SELECT * in production - why?
you're telling SQL Server to get all columns - do you really, really need all of them?
by not specifying the column names, SQL Server has to go figure that out itself - it has to consult the data dictionary to find out what columns are present which does cost a little bit of performance
most importantly: you don't know what you're getting back. Suddenly, the table changes, another column or two are added. If you have any code which relies on e.g. the sequence or the number of columns in the table without explicitly checking for that, your code can brake
My recommendation for production code: always (no exceptions!) specify exactly those columns you really need - and even if you need all of them, spell it out explicitly. Less surprises, less bugs to hunt for, if anything ever changes in the underlying table.
Use table1.* in place of all columns of table1 ;)