using LEFT JOIN produces different results than using IN

using LEFT JOIN produces different results than using IN - sql

I am wondering whether you can uncover this mystery of why these two queries, that should be producing the same exact results, are in fact producing different results. The first query is producing the incorrect result.
The following query shows that o.OwnerId=b.systemuserid is never true:
select distinct o.assignedto,b.systemuserid
from Opportunity o
left join crmtestdb.dm1_mscrm.dbo.systemuserbase b
on o.OwnerId = b.systemuserid
because it returns all nulls on the right side:
Whereas this query shows that o.OwnerId=b.systemuserid is indeed true for some records:
select distinct assignedto
from Opportunity
where assignedto in (select distinct systemuserid
from crmtestdb.dm1_mscrm.dbo.systemuserbase)
and this shows that they do have those fields in common:
What's going on here? What am I doing wrong? Please let me know if you need clarification on anything.

select distinct
o.assignedto,
b.systemuserid
from Opportunity o
left join crmtestdb.dm1_mscrm.dbo.systemuserbase b
on o.OwnerId=b.systemuserid
--here you are matching Onwer to sysuser
select distinct
assignedto
from Opportunity
where assignedto in (
select distinct
systemuserid
--here you are matching assignedto to sysuser
--this is not equivilant
from crmtestdb.dm1_mscrm.dbo.systemuserbase
)
You are not matching on the same things. In one query you are selecting Assignedto and matching OwnerID to sytemuserID in the second you are searching where assigned to is IN systemuserId Try changing the second query to
select distinct
assignedto
from Opportunity
where OwnerId in (
select distinct
systemuserid
from crmtestdb.dm1_mscrm.dbo.systemuserbase
)

Related

I expect these 2 sql statements to return same number of rows

In my mind these 2 sql statements are equivalent.
My understanding is:
the first one i am pulling all rows from tmpPerson and filtering where they do not have an equivalent person id. This query returns 211 records.
The second one says give me all tmpPersons whose id isnt in person. this returns null.
Obviously they are not equivalent or theyd have the same results. so what am i missing? thanks
select p.id, bp.id
From person p
right join(
select distinct id
from tmpPerson
) bp
on p.id= bp.id
where p.id is null
select id
from tmpPerson
where id not in (select id from person)
I pulled some ids from the first result set and found no matching records for them in Person so im guessing the first one is accurate but im still surprised they're different

I much prefer left joins to right joins, so let's write the first query as:
select p.id, bp.id
From (select distinct id
from tmpPerson
) bp left join
person p
on p.id = bp.id
where p.id is null;
(The preference is because the result set keeps all the rows in the first table rather than the last table. When reading the from clause, I immediately know what the first table is.)
The second is:
select id
from tmpPerson
where id not in (select id from person);
These are not equivalent for two reasons. The most likely reason in your case is that you have duplicate ids in tmpPerson. The first version removes the duplicates. The second doesn't. This is easily fixed by putting distincts in the right place.
The more subtle reason has to do with the semantics of not in. If any person.id has a NULL value, then all rows will be filtered out. I don't think that is the case with your query, but it is a difference.
I strongly recommend using not exists instead of not in for the reason just described:
select tp.id
from tmpPerson tp
where not exists (select 1 from person p where p.id = tp.id);

select id
from tmpPerson
where id not in (select id from person)
If there is a null id in tmp person then they will not be captured in this query. But in your first query they will be captured. So using an isnull will be resolve the issue
where isnull(id, 'N') not in (select id from person)

SQL Group By Throwing Up Error (SQL Server)

I have SQL code that throws up an error saying
Error: SQLCODE=-119, SQLSTATE=42803, SQLERRMC=WONUM
The code works fine until I add the group by:
select *
from workorder
left join labtrans on labtrans.refwo=workorder.wonum and labtrans.siteid=workorder.siteid
left join matusetrans on workorder.wonum=matusetrans.refwo and workorder.siteid=matusetrans.tositeid and linetype not in (select value from synonymdomain where domainid='LINETYPE' and maxvalue='TOOL')
left join locations on locations.location = workorder.location and locations.siteid=workorder.siteid
left join person on personid in (select personid from labor where laborcode = labtrans.laborcode)
left join po on workorder.wonum=po.hflwonum and workorder.siteid=po.siteid and workorder.orgid=po.orgid
left join companies on companies.company = po.vendor and companies.orgid=po.orgid
left join pluspcustomer on pluspcustomer.customer=workorder.pluspcustomer
where workorder.wonum='10192'
group by personid
;

if you only GROUP BY personid, you cannot select everything except personid, OR the fields used by aggregate functions such as SUM,MAX, etc
UPDATE
If you just want to see the duplicate personid, you could use:
select personid
from table
group by personid
But be careful here: If you write query like this, the only field that to determine the duplicate records is persionid, if you need to uniquely identify each persionid from different CompanyId, you need to group by persionid, CompanyId, otherwise, same personId from different company will be considered as the duplicate records.
But if you want to delete those duplicate records, you should use ROW_NUMBER()OVER (Partition by persionid Order by your_criteria) to delete the duplicate records. Try to do some searches to see how does that work, usually I prefer to use that function along with the CTE table expression.

if you just need to remove duplicates, use DISTINCT with your query like this:
your query:
SELECT * FROM .....
modify it:
SELECT DISTINCT * FROM .....
Hope it helps.

SQL: Why is distinct and max not removing duplicates?

SHouldn't the following query remove duplicates:
SELECT DISTINCT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount AS ACC RIGHT OUTER JOIN
(SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID, TenancyStartDate) AS Relevant ON ACC.PropertyID = Relevant.PropertyID AND ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType, ACC.TenancyType
From my understanding (and what I want to happen) is, the query in brackets is selecting the property ID and of the ones with a status of current returning the highest tenancy start date (albeit several times). This is then joined to the original table by start date and property id, to get the most recent tenancytype.
Why is it still returning duplicate lines!?
(by the way this is relating to another question I had yesterday, but apparently replies are not supposed to descend into conversation so I thought I'd seperate this off... I hope that is the right thing to do... I have searched but clearly there is something missing in my understanding of something!)

First, you almost never need select distinct when using group by.
The problem with your query is the group by clause in the subquery.
SELECT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount ACC RIGHT OUTER JOIN
(SELECT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID
) Relevant
ON ACC.PropertyID = Relevant.PropertyID AND
ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType;
It should not have TenancyStartDate. Also, your outer query had ACC.TenancyType twice in the group by.
That said, it is easier to write the query using analytic functions:
select a.*
from (select a.*,
max(tenancystartdate) over (partition by propertyid) as max_tsd
from dimaccount a
where accountstatus = 'Current'
) a
where tenancystartdate = max_tsd;
This is not exactly the same as your query, because your query will take non-current records into account. I am guessing that this might be the intention, however.

To answer your question: Yes, you are right, there can be no duplicates. And I am pretty sure there are none. I am also pretty sure that your query does not what you think it does.
This is your derived table:
SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID, TenancyStartDate
As you group by PropertyID and TenancyStartDate, you get one line per PropertyID and TenancyStartDate. For each such line you want the MAX(TenancyStartDate), which is the TenancyStartDate itself of course. There is no other field you aggregate, so you don't aggregate at all, but only make the rows distinct, for which one would use DISTINCT. Then you do use DISTINCT to get unique result records, but your records are already unique, by your obfuscated way of doing it. So you say: select the distinct records of distinct records. Your subquery can be re-written as:
SELECT DISTINCT PropertyID, TenancyStartDate
FROM DimAccount
WHERE AccountStatus = 'Current'
Then you outer-join the DimAccount table. So you would keep your found records, even in case there is no matching DimAccount record. But: You've selected from DimAccount, so of course there is always at least the one record you already found. Your outer join is actually an inner join. Then the only field from the derived query shown is PropertyID which always equals ACC.PropertyID. This means: You are only selecting records from ACC and the derived table is just to make sure a 'Current' record exists for PropertyID and TenancyStartDate. Your query could thus be re-written as:
SELECT DISTINCT
PropertyID, TenancyStartDate, AccountID, TenancyType
FROM DimAccount AS ACC
WHERE EXISTS
(
SELECT *
FROM DimAccount CurrentAccount
WHERE CurrentAccount.AccountStatus = 'Current'
AND CurrentAccount.PropertyID = ACC.PropertyID
AND CurrentAccount.TenancyStartDate = ACC.TenancyStartDate
);
In case PropertyID + TenancyStartDate + AccountID + TenancyType are unique (is AccountID the table's ID?) then you can even remove DISTINCT.
This query gets all 'Current' DimAccount records first and then gives you all records with the same PropertyID and TenancyStartDate. However, from your explanation it seems you want to select the latest 'Current' DimAccount record per PropertyID. This is something entirely else. There are different solutions to such a task depending on the dbms you are using (you haven't specified yours in your tags).

Why Does This subquery only return one record?

this is a subquery that I have. I am having a hard time understanding why this keeps popping back to me saying ("at most this subquery can only return one record")
SELECT COUNT(*)
FROM SoftwareAssigned
GROUP BY SoftID
By my understanding this is saying "get a count of all records where the SoftID (softwareID) is the same"
What is really going on and how do I keep from making this mistake in the future?
The context is within this (attempted query:)
SELECT Software.Description, Software.QtyPurchased
, (
SELECT COUNT(*)
FROM SoftwareAssigned
GROUP BY SoftID
) AS Assigned
,( Software.QtyPurchased -
(
SELECT COUNT(*)
FROM SoftwareAssigned
GROUP BY SoftID
)
) AS Remaining
FROM Software
;

The query will "get a count for each specific SoftID value how many there are that has that id".
The query will return one row for each specific SoftID value that exists in the table.
If you want to count how many different SoftID values there are, you would use:
select count(distinct SoftID)
from SoftwareAssigned
Edit:
To get a count from one table of the records that correspond to a record in another table, you would join the tables together and group on the values from the other table:
select
Software.Description, Software.QtyPurchased,
count(SoftwareAssigned.SoftID) as Assigned,
Software.QtyPurchased - count(SoftwareAssigned.SoftID) as Remaining
from
Software
left join SoftwareAssigned on SoftwareAssigned.SoftID = Software.SoftID
group by
Software.SoftID, Software.Description, Software.QtyPurchased

I'm assuming Sofware has a SoftID column. It looks like you are hoping SQL will link between the sub query and the main query. It will only do ths if you tell it how:
select
s.Description,
s.QtyPurchased, (
select
count(*)
from
SoftwareAssigned a
where
-- link to outer query
a.SoftID = s.SoftID
) as Assigned,
s.QtyPurchased - (
select
count(*)
from
SoftwareAssigned a
where
-- link to outer query.
a.SoftID = s.SoftID
) as Remaining
from
Software s;
As it happens, there is a more compact way of writing this:
select
s.Description,
s.QtyPurchased,
count(a.SoftID) as assigned,
s.QtyPurchased - count(a.SoftID) as Remaining
from
software s
left outer join
SoftwareAssigned a
on s.SoftID = a.SoftID
group by
s.SoftID,
s.Description,
s.QtyPurchased;
Example SQLFiddle

how can i eliminate duplicates in gridview?

i am retrieving data from three tables for my requirement so i wrote the following query
i was getting correct result but the problem is records are repeated whats the problem in
that query. i am binding result of query to grid view control. please help me
SELECT DISTINCT (tc.coursename), ur.username, uc. DATE, 'Paid' AS Status
FROM tblcourse tc, tblusereg ur, dbo.UserCourse uc
WHERE tc.courseid IN (SELECT ur1.courseid
FROM dbo.UserCourse ur1
WHERE ur1.userid = #userid)
AND ur.userid = #userid
AND uc. DATE IS NOT NULL
AND ur.course - id = uc.course - id

There is no JOIN between tblcourse tc,tblusereg ur. So you get a cross join despite the IN (which is actually a JOIN)
DISTINCT works on the whole row too: not one column.
Note: you mention dbo.UserCourse twice but use different column names courseid and [course-id]
Rewritten with JOINs.
select distinct
tc.coursename, ur.username, uc.[date], 'Paid' as [Status]
from
dbo.tblcourse tc
JOIN
dbo.tblusereg ur ON tc.courseid = ur.[course-id]
JOIN
dbo.UserCourse uc ON ur.[course-id] = uc.[course-id]
where
ur.userid=#userid
and
uc.[date] is not null
This may fix your problem...

Change that first part of your query
select distinct (tc.coursename),
TO
select distinct tc.coursename,
to make all the columns distinct not just tc.coursename

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

using LEFT JOIN produces different results than using IN - sql

Related

I expect these 2 sql statements to return same number of rows

SQL Group By Throwing Up Error (SQL Server)

SQL: Why is distinct and max not removing duplicates?

Why Does This subquery only return one record?

how can i eliminate duplicates in gridview?

Categories

Resources