how to find the rows not matched by the left join and perform some operations on top of it in sql server? - sql

I'm trying to delete using left join from my sql server studio and my question is how do i get the list of ids that are getting deleted as part of the left join also i would like to compare the difference between the sum from both the tables
Table A:
ID NAME LOC SUM
4 abc NY 500
5 seq CA 100
15 juv TX 120
Table B:
ID NAME LOC SUM INFO
5 seq CA 90 x
18 jay AL 94 x
15 juv CL 190 x
I want to get to the number of rows that are getting removed as part of the left join and i want to see the difference in the sum
DELETE MYDB
FROM MYDB.A
LEFT JOIN MYDB.B
ON A.ID=B.ID
WHERE A.ID=B.ID

It is not clear why you would be using a LEFT JOIN for the JOIN. Your WHERE clause -- which is otherwise redundant -- is turning the outer join into an inner join.
I would suggest using exists:
delete from mydb.a
where exists (select 1 from mydb.b where b.id = a.id);
For a count, you can use:
select count(*)
from mydb.a
where exists (select 1 from mydb.b where b.id = a.id);
Do note: If you run these as two separate operations, the underlying data can change between the operations.
After running the delete, you can use ##ROWCOUNT to get the number of records deleted.

Related

why is my sql inner join return much more data than table 1?

I need to join three tables to get all the info I need. Table a has 70 million rows, after joining a with b, I got 40 million data. But after I join table c, which has only 1.7 million rows, it becomes 300 million rows.
In table c, there are more than one same pt_id and fi_id, one pt_id can connect to many different fi_id, but one fi_id only connects to one same pt_id.
I'm wondering if there is any way to get rid of the duplicate rows, cause I join table c only to get the pt_id.
Thanks for any help!
select c.pt_id,b.fi_id,a.zq_id
from a
inner join (select zq_id, fi_id from b) b
on a.zq_id = b.zq_id
inner join (select fi_id,pt_id from c) c
on b.fi_id = c.fi_id
You can use GROUP BY
select c.pt_id,b.fi_id,a.zq_id
from a
inner join (select zq_id, fi_id from b) b
on a.zq_id = b.zq_id
inner join (select fi_id,pt_id from c) c
on b.fi_id = c.fi_id
group by c.pt_id,b.fi_id,a.zq_id
to remove all duplicate row as question below:
How do I (or can I) SELECT DISTINCT on multiple columns?

MSQL get first 100 rows of not intersection

I have 2 tables in MSQL
The first one is called tbl_file and it has the following properties:
id
hash_id
description
mime-type
creation date
etc..
The second one is called tbl_sys_filecontent and it only has:
id
content (base64)
It's supposed that these two tables must have the same row size. (hash-id of tbl_file is the same as id in tbl_sys_filecontent
However, something went wrong and now if we execute:
select count(*) from tbl_sys_filecontent;
We get about 1 000 000 rows
And we're supposed to get the same row count after executing the query:
select count(*) from tbl_file as f
JOIN tbl_sys_filecontent as sf on f.hash_id = sf.id;
But we are returned with only roughly 100 000 results
Let's call the first query collection A and the second B
The question is how do I get the first 100 rows which are equal to A-B (so the hashes aren't equal?)
Thanks in advance.
With not exists
select *
from tbl_file as f
where not exists (select 'x' from tbl_sys_filecontent as sf where f.hash_id = sf.id);
With not in
select *
from tbl_file as f
where f.hash_id not in (select 'x' from tbl_sys_filecontent as sf);
However, not exists is to favor over not in: https://stackoverflow.com/a/11074428/59119
With a left join
select f.* from tbl_file as f
LEFT JOIN tbl_sys_filecontent as sf on f.hash_id = sf.id
where sf.id is null;

Additional inner join modifying results of previous calculations

I am having issues with using the count() function in an sql plus query.. say if
SELECT B.ID COUNT(S.BRANCH_ID) FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
GROUP BY B.ID;
from doing this I'll get the results
b.id count
1 6
2 6
3 6
4 7
5 6
which is fine.. However if I even add an extra inner join i'll get completely different and wrong results.. So if I put for example..
SELECT COUNT(S.BRANCH_ID) FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN TOOL_STOCK TS ON TS.BRANCH_ID = B.ID
GROUP BY B.ID;
Now the results I get will be...
b.id count
1 96
2 96
3 96
4 112
5 96
Why is this and how do I stop it? Cheers!
Try
SELECT B.ID, COUNT(DISTINCT S.STAFF_ID) FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN TOOL_STOCK TS ON TS.BRANCH_ID = B.ID
GROUP BY B.ID;
replacing S.STAFF_ID with the primary key field from the STAFF table.
Your problem is that the COUNT function returns the number of rows matching the GROUP BY clause after all rows have been joined and returned.
In your initial query you are finding the number of employees for each branch, In the second the number of employees is multiplied by the number of stock items.
When you add the second join, you are getting the counts for STAFF + TOOLS at each branch.
You will likely need to add a subquery if you want all the data returned, but only counts of one record type.
I think the key to your issue is, which are you actually trying to count?

SQL INNER JOIN duplicating data on output?

I am using SQL Server 2008 and trying to join data from 4 different tables where the tables are related such that the 2nd table is a child of the 1st and the 3rd & 4th tables are BOTH children of the 2nd, I am using the following statement to output the results but rather than getting 9 distinct records (5 from the CR table and 4 from the CX table) I am getting 20 records where data from the 3rd & 4th tables is duplicated.
If I omit references to table CX I get the desired 5 results and omitting references to CR gives the desired 4 results, however I require the 9 results from both tables combined returned instead of the 20 records I do get. I would post screenshots but am unable due to reputation sorry.
SELECT
LS.SITECODE,
ep.EP_KEY,
C0.LEASEID,
C0.SDATE AS LeaseStart,
C0.EDATE AS LeaseExpiry,
CR.EFFDATE AS RenewalDate,
CX.SDATE AS ReviewDate
FROM LS
INNER JOIN FMEP AS ep ON LS.SITECODE = ep.SITECODE
INNER JOIN C0 ON ep.EP_KEY = C0.EP_KEY
INNER JOIN CR ON C0.LEASEID = CR.LEASEID
INNER JOIN CX ON C0.LEASEID = CX.LEASEID
WHERE ls.SITECODE = 2121
I have searched around for the last couple of hours for a solution however I'm obviously not using the correct search terms due to my lack of familiarity with SQL. I am new to SQL so please be patient if I struggle to understand your responses and thank you in advance for taking the time to look at this.
if i understand you right, you want C0 joined with CR
and then c0 joined with CX ( so you get 5 and 4 rows)
yet you get 5 rows from one join and 4 rows from the other join. the DBMS doesn't know how to connect these two joins and make a little cross join ( every 4 rows with the 5 other rows) resulting in 20 rows ( if i understand your description in the right way)
here is a soution you get the 9 rows you want.
SELECT
LS.SITECODE,
ep.EP_KEY,
C0.LEASEID,
C0.SDATE AS LeaseStart,
C0.EDATE AS LeaseExpiry,
CR.EFFDATE AS RenewalDate,
'CR'
FROM LS
INNER JOIN FMEP AS ep ON LS.SITECODE = ep.SITECODE
INNER JOIN C0 ON ep.EP_KEY = C0.EP_KEY
INNER JOIN CR ON C0.LEASEID = CR.LEASEID
WHERE ls.SITECODE = 2121
UNION
SELECT
LS.SITECODE,
ep.EP_KEY,
C0.LEASEID,
C0.SDATE,
C0.EDATE,
CX.SDATE
'CX'
FROM LS
INNER JOIN FMEP AS ep ON LS.SITECODE = ep.SITECODE
INNER JOIN C0 ON ep.EP_KEY = C0.EP_KEY
INNER JOIN CX ON C0.LEASEID = CX.LEASEID
WHERE ls.SITECODE = 2121
but if you just want the C0 row once, ( with eather an CX or a CR you have to do a better join )

How do I join a one-to-many table with results only appended to the smaller table?

Basically I have a one-to-many table. I want to append the columns of the larger table to the result set of the smaller table. I will end up with a result set the same size of the smaller table, but again, with the relevant information from the many sided table.
One catch: the many sided table doesn't have a primary key defined, although a composite key could be established (again it isn't there as constraint).
Since more than one record in t_large may correspond to a record in t_small, you need to choose what exactly you want to pick from t_large.
This means that you'll either need to aggregate all corresponding records from t_large (how?), or select a single record out of many (which?).
SELECT s.*, SUM(l.value)
FROM t_small s
LEFT JOIN
t_large l
ON s.id = l.small
or
SELECT s.*, l.*
FROM t_small s
LEFT JOIN
t_large l
ON l.id =
(
SELECT MIN(id)
FROM t_large li
WHERE li.small = s.id
)
Now, imagine this table layout:
t_small
id value
-- --
1 Small 1
2 Small 2
t_large
id small_id value
-- -- --
1 1 1
2 1 4
3 2 9
4 2 16
Could you please post the resultset which you'd like to see as a result of the query?
Isn't this just an left join on the key?
select * from small_table s left join large_table l on s.id = l.small_table_id
Every record in small_table, but only the relevant rows in large_table.
I'm missing something. Please elaborate or provide scrubbed sample data.