TSQL - LEFT JOIN strange results - sql

I am coming across an issue with the results of a seemingly basic query (on SQL Server 2017 CU17), and am hoping that people could suggest some things that I may not have checked or tried to get the correct results out.
The premise of the issue is that I am attempting to identify rows in one table, where an ID exists in another. This can usually be done with a LEFT JOIN, in this case, the query is as simple as follows:
SELECT t1.id,
t2.id
FROM Table1 AS t1
LEFT JOIN Table2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
This query should identify rows that exist in Table1 that do not exist in Table2, based on the 'id' columns, and is running against static data that isn't being manipulated in any way when I run the query.
I am getting a strange result from this, where rows are being returned where t1.id is returned, but t2.id is NULL - as if there are rows that exist in t1, but not t2.
However, if I take one of the IDs returned from the first query, and manually check if it exists in both tables, it looks like the id does exist in both - even if I put that id into a query with an inner join such as follows:
SELECT t1.id,
t2.id
FROM Table1 AS t1
INNER JOIN Table2 t2 ON t2.id = t1.id
WHERE t1.id = 761179370
If I run the LEFT JOIN query a number of times, I get a different number of rows returned each time.
Important to note that the id columns are both int datatypes, and the tables have the exact same collation.
What I have tried:
I have rebuilt statistics on all columns and indexes for each table to see if that was causing some issues.
I have restored the database on to another server and ran the same, and do not get the problem I am seeing above. The DB is in an availability group and I have also run the same query on the readable secondary, and am not seeing the same behaviour.
The server I ran the LEFT JOIN query is on, is a busy server overall - could this be a factor in why the query is not returning the correct results?
I have tried with ANSI_NULLS both on and off, no difference.
Any idea what the problem may be, or what I could check to figure out why I am getting these results - any guidance would be appreciated!

If you post the real query we may see a mistake there, check the query another time, some condition in the left join can change the "usual logic".
If still wrong, try this:
SELECT
t1.id,
t2.id
FROM Table1 AS t1
LEFT JOIN Table2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
option (recompile) --> This will ignore any cached data
Or create 2 temps to store some data and test there, database in production may change the results very quickly.

I had a similiar problem a while ago. The reason was, that the SQL-Server Management Studio had cached some of the results. The famous off and on again did solve my problem. Your experiments pretty much say this wasnt the case, so ..
Here i would suggest comparing the results to the following:
SELECT t1.id FROM Table1 AS t1
where t1.id not in (select id from table2)
If these results dont match with your query i would suggest a restart.
PS: Sorry for not using the comment, i have not enough reputation for that. :-\

Sometimes if you have that issue you can paste the value on Notepad++ to validate that there is no other char or something similar added to the second one.
You can try the (NOLOCK) to avoid this kind of behavior, something like this:
SELECT
*
FROM TABLE1 T1 (NOLOCK)
LEFT JOIN TABLE2 T2 (NOLOCK) -- LEFT JOIN IS THE SAME AS LEFT OUTER JOIN
ON T1.ID = T2.ID

Related

Oracle Toad SQL queries leading to inconsistent id counts

I'm using Oracle Toad with SQL commands in the editor window.
I created two new tables (PIDS1 and PIDS2) that contain only one column of ID numbers from two related tables.
I had expected that PIDS2 would contain a superset of the ID's in PIDS1. When I tried to identify the ID's in PIDS2 that are not in PIDS1, I got started on a wild goose chase.
Let's say that it is a given that there is something unexpected going on with the data in my tables. But I cannot make any sense of the two simplified queries described below. The numbers are inconsistent. Can someone explain what is going on?
-- PIDS1 IS A SINGLE-COLUMN TABLE THAT CONTAINS 1638061 DISTINCT ID'S
-- PIDS2 IS A SINGLE-COLUMN TABLE THAT CONTAINS 3510272 DISTINCT ID'S
SELECT COUNT(T2.ID)
FROM PIDS2 T2
WHERE T2.ID NOT IN (
SELECT T1.ID
FROM PIDS1 T1);
-- RESULT IS ZERO!
-- WTF? PIDS2 HAS MORE ID'S THAN PIDS1!
SELECT COUNT(T1.ID)
FROM PIDS1 T1
WHERE T1.ID NOT IN (
SELECT T2.ID
FROM PIDS2 T2);
-- RESULT IS 786690
-- WHERE DID THAT NUMBER COME FROM? LOOKS ARBITRARY
Never use NOT IN with a subquery. If any of the values returned by the subquery are NULL, then all rows are filtered out.
For this reason, I always advise NOT EXISTS:
SELECT T2.ID
FROM PIDS2 T2
WHERE NOT EXISTS (SELECT 1 FROM PIDS1 T1 WHERE t1.ID = T2.ID);
Of course, you can also add WHERE t1.ID IS NOT NULL to the NOT IN version. In my experience, you'll forget it at some inopportune time in the future. Just use NOT EXISTS.

SQL Query - Select Value from T1 where second value fully met in T2

I can do this in an ugly stored procedure with temp tables and whatnot, but I know an experienced developer could do this SO much more elegantly than what I've come up with. In fact, I'd kind of rather not have to call the sproc at all, but just have one query that gives me what I need.
I'm working with two tables:
T1 BillingDirectivesNeeded
T2 BillingDirectives.
T1 Has two fields relevant to this task -
PKey
WBS1.
There will be many PKeys associated with each WBS1.
T2 has only one field of interest
PKey.
The task I'm trying to address is geting a list of WBS1s from T1 that have ALL of their needed directives in T2 before I enable their import.
We want to import a WBS1 ONLY when all of the PKeys for that WBS1 are found in T2. If not, I'll just leave them grayed out.
I've tried a dozen different ways to get this to happen over the last few hours, and I seem to have a mental block. The pseudo-code would look something like this:
select T1.WBS1 from BillingDirectiveNeeded T1
where [all the T1.PKeys for T1.WBS1 can be found in BillingDirectives T2]
You can try using a Where Exists clause:
Select T1.WBS1
From BillingDirectiveNeeded T1
Where Exists
(
Select 1
From BillingDirectives T2
Where T2.PKey = T1.PKey
)
select DISTINCT T1.WBS1 from BillingDirectiveNeeded T1 where T1.PKey in (SELECT T2.PKey FROM BillingDirectives T2)

Select proper columns from JOIN statement

I have two tables: table1, table2. Table1 has 10 columns, table2 has 2 columns.
SELECT * FROM table1 AS T1 INNER JOIN table2 AS T2 ON T1.ID = T2.ID
I want to select all columns from table1 and only 1 column from table2. Is it possible to do that without enumerating all columns from table1 ?
Yes, you can do the following:
SELECT t1.*, t2.my_col FROM table1 AS T1 INNER JOIN table2 AS T2 ON T1.ID = T2.ID
Even though you can do the t1.*, t2.col1 thing, I would not recommend it in production code.
I would never ever use a SELECT * in production - why?
you're telling SQL Server to get all columns - do you really, really need all of them?
by not specifying the column names, SQL Server has to go figure that out itself - it has to consult the data dictionary to find out what columns are present which does cost a little bit of performance
most importantly: you don't know what you're getting back. Suddenly, the table changes, another column or two are added. If you have any code which relies on e.g. the sequence or the number of columns in the table without explicitly checking for that, your code can brake
My recommendation for production code: always (no exceptions!) specify exactly those columns you really need - and even if you need all of them, spell it out explicitly. Less surprises, less bugs to hunt for, if anything ever changes in the underlying table.
Use table1.* in place of all columns of table1 ;)

Nested sql joins process explanation needed

I want to understand the process of nested join clauses in sql queries. Can you explain this example with pseudo codes? (What is the order of joining tables?)
FROM
table1 AS t1 (nolock)
INNER JOIN table2 AS t2 (nolock)
INNER JOIN table3 as t3 (nolock)
ON t2.id = t3.id
ON t1.mainId = t2.mainId
In SQl basically we have 3 ways to join two tables.
Nested Loop ( Good if one table has small number of rows),
Hash Join (Good if both table has very large rows, it does expensive hash formation in memory)
Merge Join (Good when we have sorted data to join).
From your question it seems that you want for Nested Loop.
Let us say t1 has 20 rows, t2 has 500 rows.
Now it will be like
For each row in t1
Find rows in t2 where t1.MainId = t2.MainId
Now out put of that will be joined to t3.
Order of Joining depends on Optimizer, Expected Row count etc.
Try EXPLAIN query.
It tells you exactly what's going on. :)
Of course that doesn't work in SQL Server. For that you can try Razor SQLServer Explain Plan
Or even SET SHOWPLAN_ALL
If you're using SQL Server Query Analyzer, look for "Show Execution Plan" under the "Query" menu, and enable it.

Execute MySQL update query on 750k rows

I've added a field to a MySQL table. I need to populate the new column with the value from another table. Here is the query that I'd like to run:
UPDATE table1 t1
SET t1.user_id =
(
SELECT t2.user_id
FROM table2 t2
WHERE t2.usr_id = t1.usr_id
)
I ran that query locally on 239K rows and it took about 10 minutes. Before I do that on the live environment I wanted to ask if what I am doing looks ok i.e. does 10 minutes sound reasonable. Or should I do it another way, a php loop? a better query?
Use an UPDATE JOIN! This will provide you a native inner join to update from, rather than run the subquery for every bloody row. It tends to be much faster.
update table1 t1
inner join table2 t2 on
t1.usr_id = t2.usr_id
set t1.user_id = t2.user_id
Ensure that you have an index on each of the usr_id columns, too. That will speed things up quite a bit.
If you have some rows that don't match up, and you want to set t1.user_id = null, you will need to do a left join in lieu of an inner join. If the column is null already, and you're just looking to update it to the values in t2, use an inner join, since it's faster.
I should make mention, for posterity, that this is MySQL syntax only. The other RDBMS's have different ways of doing an update join.
There are two rather important pieces of information missing:
What type of tables are they?
What indexes exist on them?
If table2 has an index that contains user_id and usr_id as the first two columns and table1 is indexed on user_id, it shouldn't be that bad.
You don't have an index on t2.usr_id.
Create this index and run your query again, or a multiple-table UPDATE proposed by #Eric (with LEFT JOIN, of course).
Note that MySQL lacks other JOIN methods than NESTED LOOPS, so it's index that matters, not the UPDATE syntax.
However, the multiple table UPDATE is more readable.