Duplication of records due to Left Join - sql

I have 2 tables named Product Category, and Transactions. In Transaction Columns there are two columns called Transaction ID, and Product_category_code. In Product Category Table there are two columns called Product_category_code, and Product Name. I'm trying to combine the two tables by having the corresponding product name written next to the transaction ID. I'm using Left JOIN but somehow it is giving me same results as right join.
The code I'm using is:
SELECT *
FROM [dbo].[Transactions] AS T1
LEFT JOIN [dbo].[Product category] AS T2
ON T1.prod_cat_code=T2.prod_cat_code
order by transaction_id desc
I'm getting 5 records for the first transaction after using left join when I should only be only getting one. How can I fix this?
First few entries of T1 is
First few entries of T2 is
First few entries of output is
Thanks

You need to join on both the category and sub-category codes

Related

Left Join two tables with one common column and other diff columns

I have two tables where main table has 10+ columns and second table has 3 columns with one common field. My problem here is I am not able to get exact count with left outer join as main table. I am seeing more count than actual. It might be due to one of the field I am trying to get is not in main table which is in second table.
Table 1: master_table
Table 2: manager_table
Master_table :
ID,
Column1,
Column2,
...
Column10
manager_table:
ID,
Column2_different,
Column3_different
I am trying to join using Left Join to get same records as present in master table.
Select table1.columns, table2.columns
From table1
Left join table2 on table1.ID = table2.ID
The above is not giving me exact record count as in master table (table1) , it is giving me more count as the table 2 other field is not present in table 1 .
Can someone help me here ?
TIA
I believe that an INNER JOIN would be better than a LEFT JOIN. Need some sample data to be sure, but if you're getting a higher count than you'd expect upon joining the tables this is probably because the LEFT JOIN is returning everything from both tables. An INNER JOIN will only return data that appears in both tables.

find the number of records that left outer join,right outer join and full outer join return

Is it possible to find the number of records that left outer join, right outer join and full outer join return. Given the number of records in left handed side table and right handed side table and matching records.
I am trying to correlate the relation ship between them. I have tried with two tables entering sample data. Could not get any relation ship between them.
How ever if I know the number of unmatched entries in left handed side table. I will add that number to the matching records, then I will get the left outer join output. If I know the number of unmatched records in right handed side, then I will add that number to the matched records. It will give us right outer join output.
Is it possible without knowing unmatched records. Can we find the number of records that left outer join, right outer join and full outer join returns.
CREATE table table1(
id integer,
name varchar(40)
);
CREATE table table2(
id integer,
name varchar(40)
);
insert into table1(id,name)values(1,'ABC');
insert into table1(id,name)values(2,'DEF');
insert into table1(id,name)values(3,'GHI');
insert into table1(id,name)values(4,'JKL');
insert into table1(id,name)values(5,'JKL');
insert into table1(id,name)values(6,'JKL');
insert into table2(id,name)values(2,'ABC');
insert into table2(id,name)values(2,'ABC');
insert into table2(id,name)values(1,'ABC');
insert into table2(id,name)values(1,'ABC');
insert into table2(id,name)values(3,'ABC');
insert into table2(id,name)values(3,'ABC');
insert into table2(id,name)values(4,'ABC');
insert into table2(id,name)values(4,'ABC');
insert into table2(id,name)values(5,'ABC');
insert into table2(id,name)values(5,'ABC');
insert into table2(id,name)values(11,'ABC');
insert into table2(id,name)values(12,'ABC');
insert into table2(id,name)values(13,'ABC');
insert into table2(id,name)values(14,'ABC');
select count(*) from table1;//6
select count(*) from table2; //14
select count(*) from table1 inner join table2
on table1.id=table2.id; //10
select count(*) from table1 left outer join table2
on table1.id=table2.id;//11
select count(*) from table1 right outer join table2
on table1.id=table2.id;//14
select count(*) from table1 full outer join table2
on table1.id=table2.id;//15
//Unmatched records`enter code here`
select count(*) from table1 left outer join table2
on table1.id=table2.id
where table2.id is null;//1
select count(*) from table1 right outer join table2
on table1.id=table2.id
where table1.id is null;//4
Since you know the total number of records, knowing the number of unmatched records is equivalent to knowing the number of matched records. (Perhaps!)
The answer to your question is NO, you can't determine the number of records in different types of joins by only knowing the cardinalities of the base tables, without knowing how many records are matched (or unmatched). Simple mental exercise: both tables have 100 records. If all are perfectly matched, one-to-one, then all joins are the same as the inner join, and they all have 100 records. If there are no matches at all, the inner join has zero rows, the one-sided joins have 100 rows and the full outer join has 200 rows. And the only difference between these cases is the number of matched (or unmatched) records, there is absolutely nothing else that you might know that will allow you to get the answer without this piece of information.
ADDED after the OP asked a follow-up question:
In fact knowing "how many records are matched" is not well defined, and insufficient anyway. Suppose ALL records from both tables are matched. At one extreme, the matches may be in pairs: there is an id column in both tables, and the values in both tables are all the possible values from 1 to 100. Then the result has 100 rows. On the other hand, suppose the "id" is not unique in either table. Instead, it has the value 1 IN ALL 100 ROWS IN BOTH TABLES. Then every row in the first table matches every row in the second table, and the result set will have 100 x 100 = 10,000 rows.
This is just to suggest the following: "how many rows are matched" is not a well-defined concept. To have a count of the resulting joins (of different kinds), one needs to know what the joins are on, and for each tuple in the join condition, how many rows have that specific tuple in each table. Then the number of rows in the result set of an inner join is a sum of products of such tuple-grouped counts, and additional rows for unmatched rows from the left (or right, or both) table(s) for outer joins.

SQL MS Access Summing Results Based on Two Table Criteria

I've been asked to find if totals from one table exceed a certain value. However the identifier I need to group these is stored in another table. So I've figured how to isolate what I need and then copy into excel. However I do understand the principle of summing in SQL, as I've made my own queries that look like this:
SELECT * FROM
(SELECT ID, SUM(Table1.Amount) AS Subtotal FROM Table1 WHERE LineNumber = 1 GROUP BY ID) AS a, Table2 AS b
WHERE a.ID = b.ID
AND a.Subtotal > b.Threshold
In this case, the totals I need to sum all have the same LineNumber from the original table (Table1) so it's easy to compare to a value in a different table. What I want to do is be able to is join a subquery back to the original table, then GROUPBY the identifier and sum from the other table, adding a criteria that the original ID from table one must be a duplicate:
SELECT * FROM
((Table1 as t
INNER JOIN
(SELECT ID FROM Table1
GROUP BY ID HAVING COUNT(ID) >1) as b
ON t.ID = b.ID
Joining criteria back to the original table, I need to do this because the table I need to sum doesn't contain ID,
INNER JOIN Table2 as c
ON t.ID2 = c.ID2
WHERE c.LineNumber = 4
This is where I'm stuck. I want to sum all the LineNumber = 4 for each ID. Again, I had to join using ID2 because ID1 isn't in Table2
Perhaps I'm making this too complicated? Any suggestions would be welcome
Thanks!

How to use sql join where table1 has rows not present in table2

I have two tables record and share. record has columns: name and id. share has columns id.
I want to find the rows which are present in record but not present in share.
How can I do this?
SQL LEFT JOIN returns all rows from the left table even if there are no matches in the right table
SELECT name, id
FROM record r LEFT JOIN share s on r.id = s.id
WHERE s.id is null
You have this tables:
RECORD (ID, NAME)
SHARE(ID,VALUE)
If your SQL engine supports LEFT OUTER JOINS, the best way is:
SELECT RECORD.* FROM RECORD LEFT OUTER JOIN SHARE WHERE RECORD.ID=SHARE.ID
WHERE SHARE.ID IS NULL
Important
place an index on SHARE.ID
How it works:
SQL Engine span all the RECORD table looking for a record in SHARE, for each record in RECORD if it is found a "linked" record in SHARE the where clause is falso, so the record is not included in the result set, if no records are found in share the where clause is true and the RECORD.* is included in result set. This works thanks to LEFT OUTER JOIN.
Note:
Another way of doing the same thing is to use the WHERE RECORD.ID NOT IN (SELECT ID FROM SHARE).
Pay attention that depending on the sql engine you are using this may lead to serious performance issues because the internal engine can run the (SELECT ID FROM SHARE) once per record in RECORD table.
select Id from t1 where id not in (select id from t2)
SELECT * from RECORD where ID not in (SELECT DISTINCT ID FROM SHARE);
SELECT DISTINCT ID FROM SHARE - will get all the distinct IDs in the table share
SELECT * from RECORD where ID not in (SELECT DISTINCT ID FROM SHARE); would display all records whose ID is not in the first query.

How can a LEFT OUTER JOIN return more records than exist in the left table?

I have a very basic LEFT OUTER JOIN to return all results from the left table and some additional information from a much bigger table. The left table contains 4935 records yet when I LEFT OUTER JOIN it to an additional table the record count is significantly larger.
As far as I'm aware it is absolute gospel that a LEFT OUTER JOIN will return all records from the left table with matched records from the right table and null values for any rows which cannot be matched, as such it's my understanding that it should be impossible to return more rows than exist in the left table, but it's happening all the same!
SQL Query follows:
SELECT SUSP.Susp_Visits.SuspReason, SUSP.Susp_Visits.SiteID
FROM SUSP.Susp_Visits LEFT OUTER JOIN
DATA.Dim_Member ON SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum
Perhaps I have made a mistake in the syntax or my understanding of LEFT OUTER JOIN is incomplete, hopefully someone can explain how this could be occurring?
The LEFT OUTER JOIN will return all records from the LEFT table joined with the RIGHT table where possible.
If there are matches though, it will still return all rows that match, therefore, one row in LEFT that matches two rows in RIGHT will return as two ROWS, just like an INNER JOIN.
EDIT:
In response to your edit, I've just had a further look at your query and it looks like you are only returning data from the LEFT table. Therefore, if you only want data from the LEFT table, and you only want one row returned for each row in the LEFT table, then you have no need to perform a JOIN at all and can just do a SELECT directly from the LEFT table.
Table1 Table2
_______ _________
1 2
2 2
3 5
4 6
SELECT Table1.Id,
Table2.Id
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.Id=Table2.Id
Results:
1,null
2,2
2,2
3,null
4,null
It isn't impossible. The number of records in the left table is the minimum number of records it will return. If the right table has two records that match to one record in the left table, it will return two records.
In response to your postscript, that depends on what you would like.
You are getting (possible) multiple rows for each row in your left table because there are multiple matches for the join condition. If you want your total results to have the same number of rows as there is in the left part of the query you need to make sure your join conditions cause a 1-to-1 match.
Alternatively, depending on what you actually want you can use aggregate functions (if for example you just want a string from the right part you could generate a column that is a comma delimited string of the right side results for that left row.
If you are only looking at 1 or 2 columns from the outer join you might consider using a scalar subquery since you will be guaranteed 1 result.
Each record from the left table will be returned as many times as there are matching records on the right table -- at least 1, but could easily be more than 1.
Could it be a one to many relationship between the left and right tables?
LEFT OUTER JOIN just like INNER JOIN (normal join) will return as many results for each row in left table as many matches it finds in the right table. Hence you can have a lot of results - up to N x M, where N is number of rows in left table and M is number of rows in right table.
It's the minimum number of results is always guaranteed in LEFT OUTER JOIN to be at least N.
If you need just any one row from the right side
SELECT SuspReason, SiteID FROM(
SELECT SUSP.Susp_Visits.SuspReason, SUSP.Susp_Visits.SiteID, ROW_NUMBER()
OVER(PARTITION BY SUSP.Susp_Visits.SiteID) AS rn
FROM SUSP.Susp_Visits
LEFT OUTER JOIN DATA.Dim_Member ON SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum
) AS t
WHERE rn=1
or just
SELECT SUSP.Susp_Visits.SuspReason, SUSP.Susp_Visits.SiteID
FROM SUSP.Susp_Visits WHERE EXISTS(
SELECT DATA.Dim_Member WHERE SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum
)
Pay attention if you have a where clause on the "right side' table of a query containing a left outer join...
In case you have no record on the right side satisfying the where clause, then the corresponding record of the 'left side' table will not appear in the result of your query....
It seems as though there are multiple rows in the DATA.Dim_Member table per SUSP.Susp_Visits row.
if multiple (x) rows in Dim_Member are associated with a single row in Susp_Visits, there will be x rows in the resul set.
Since the left table contains 4935 records, I suspect you want your results to return 4935 records. Try this:
create table table1
(siteID int,
SuspReason int)
create table table2
(siteID int,
SuspReason int)
insert into table1(siteID, SuspReason) values
(1, 678),
(1, 186),
(1, 723)
insert into table2(siteID, SuspReason) values
(1, 678),
(1, 965)
select distinct t1.siteID, t1.SuspReason
from table1 t1 left join table2 t2 on t1.siteID = t2.siteID and t1.SuspReason = t2.SuspReason
union
select distinct t2.siteID, t2.SuspReason
from table1 t1 right join table2 t2 on t1.siteID = t2.siteID and t1.SuspReason = t2.SuspReason
The only way your query would return more number of rows than the left table ( which is SUSP.Susp_Visits in your case), is that the condition (SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum) is matching multiple rows in the right table, which is DATA.Dim_Member. So, there are multiple rows in the DATA.Dim_Member where identical values are present for DATA.Dim_Member.MembershipNum. You can verify this by executing the below query:
select DATA.Dim_Member.MembershipNum, count(DATA.Dim_Member.MembershipNum) from DATA.Dim_Member group by DATA.Dim_Member.MembershipNum
Simply, LEFT OUTER JOIN is the Cartesian product within each join key, along with the unmatched rows of the left table
(i.e. for each key_x that has N records in table_L and M records in table_R the result will have N*M records if M>0, or N records if M=0)