I expect these 2 sql statements to return same number of rows - sql

In my mind these 2 sql statements are equivalent.
My understanding is:
the first one i am pulling all rows from tmpPerson and filtering where they do not have an equivalent person id. This query returns 211 records.
The second one says give me all tmpPersons whose id isnt in person. this returns null.
Obviously they are not equivalent or theyd have the same results. so what am i missing? thanks
select p.id, bp.id
From person p
right join(
select distinct id
from tmpPerson
) bp
on p.id= bp.id
where p.id is null
select id
from tmpPerson
where id not in (select id from person)
I pulled some ids from the first result set and found no matching records for them in Person so im guessing the first one is accurate but im still surprised they're different

I much prefer left joins to right joins, so let's write the first query as:
select p.id, bp.id
From (select distinct id
from tmpPerson
) bp left join
person p
on p.id = bp.id
where p.id is null;
(The preference is because the result set keeps all the rows in the first table rather than the last table. When reading the from clause, I immediately know what the first table is.)
The second is:
select id
from tmpPerson
where id not in (select id from person);
These are not equivalent for two reasons. The most likely reason in your case is that you have duplicate ids in tmpPerson. The first version removes the duplicates. The second doesn't. This is easily fixed by putting distincts in the right place.
The more subtle reason has to do with the semantics of not in. If any person.id has a NULL value, then all rows will be filtered out. I don't think that is the case with your query, but it is a difference.
I strongly recommend using not exists instead of not in for the reason just described:
select tp.id
from tmpPerson tp
where not exists (select 1 from person p where p.id = tp.id);

select id
from tmpPerson
where id not in (select id from person)
If there is a null id in tmp person then they will not be captured in this query. But in your first query they will be captured. So using an isnull will be resolve the issue
where isnull(id, 'N') not in (select id from person)

Related

SQL Server : get unique ID from one table, another table, or both

I have two tables, TA and CMI, that contain a person_ID. The ID may exist in TA, it may exist in CMI, or it may exist in both. I want a distinct list of ALL person_ID's regardless whether they are in TA, CMI, or both tables.
I also want to be able to select them where their question_ID's are the same. However, the question_id's have different column names: TA.question and CMI.sco = question_id.
EDIT:
So, if I also wanted to do the select on question as I stated earlier AND a join to the person table, it would look something like:
select ta.person_id, person_key
from ta
left join person on person.person_id = ta.person_id
where question=7033
union -- on purpose to remove duplicates
select cmi.person_id, person_key
from cmi
left join person on person.person_id = cmi.person_id
where sco=7033
You would use union:
select person_id
from ta
union -- on purpose to remove duplicates
select person_id
from cmi;
You can use this as a CTE or subquery in a query.

SQL grouping. How to select row with the highest column value when joined. No CTEs please

I've been banging my head against the wall for something that I think should be simple but just cant get to work.
I'm trying to retrieve the row with the highest multi_flag value when I join table A and table B but I can't seem to get the SQL right because it returns all the rows rather than the one with the highest multi_flag value.
Here are my tables...
Table A
Table B
This is almost my desired result but only if I leave out the value_id row
SELECT CATALOG, VENDOR_CODE, INVLINK, NAME_ID, MAX(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE AS A
INNER JOIN TBLATTRIBUTE_VALUE AS B
ON A.VALUE_ID = B.VALUE_ID
GROUP BY CATALOG, VENDOR_CODE, INVLINK, NAME_ID
ORDER BY CATALOG DESC
This is close to what I want to retreive but not quite notice how it returns unique name_id and the highest multi_flag but I also need the value_id that belongs to such multi_flag / name_id grouping...
If I include the value_id in my SQL statement then it returns all rows and is no longer grouped
Notic ein the results below how it no longer returns the row for the highest multi_flag and how all the different values for name_id (Ex. name_id 1) are also returned
You can choose to use a sub-query, derived table or CTE to solve this problem. Performance will be depending on the amount of data you are querying. To achieve your goal of getting the max multiflag you must first get the max value based on the grouping you want to achieve this you can use a CTE or sub query. The below CTE will give the max multi_flag by value that you can use to get the max multi_flag and then you can use that to join back to your other tables. I have three joins in this example but this can be reduce and as far a performance it may be better to use a subquery but you want know until you get the se the actual execution plans side by side.
;with highest_multi_flag as
(
select value_id, max(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE
group by value_id
)
select A.CATALOG, a.VENDOR_CODE, a.INVLINK, b.NAME_ID,m.multiflag
from highest_multi_flag m
inner join TBLINVENT_ATTRIBUTE AS A on a.VALUE_ID =b. m.VALUE_ID
INNER JOIN TBLATTRIBUTE_VALUE AS B ON m.VALUE_ID = B.VALUE
You can use Lateral too, its an other solution
SELECT
A.CATALOG, A.VENDOR_CODE, A.INVLINK, B.NAME_ID, M.maxmultiflag
FROM TBLINVENT_ATTRIBUTE AS A
inner join lateral
(
select max(B.multi_flag) as maxmultiflag from TBLINVENT_ATTRIBUTE C
where A.VALUE_ID = C.VALUE_ID
) M on 1=1
INNER JOIN TBLATTRIBUTE_VALUE AS B ON M.maxmultiflag = B.VALUE

SQL: Why is distinct and max not removing duplicates?

SHouldn't the following query remove duplicates:
SELECT DISTINCT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount AS ACC RIGHT OUTER JOIN
(SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID, TenancyStartDate) AS Relevant ON ACC.PropertyID = Relevant.PropertyID AND ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType, ACC.TenancyType
From my understanding (and what I want to happen) is, the query in brackets is selecting the property ID and of the ones with a status of current returning the highest tenancy start date (albeit several times). This is then joined to the original table by start date and property id, to get the most recent tenancytype.
Why is it still returning duplicate lines!?
(by the way this is relating to another question I had yesterday, but apparently replies are not supposed to descend into conversation so I thought I'd seperate this off... I hope that is the right thing to do... I have searched but clearly there is something missing in my understanding of something!)
First, you almost never need select distinct when using group by.
The problem with your query is the group by clause in the subquery.
SELECT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount ACC RIGHT OUTER JOIN
(SELECT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID
) Relevant
ON ACC.PropertyID = Relevant.PropertyID AND
ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType;
It should not have TenancyStartDate. Also, your outer query had ACC.TenancyType twice in the group by.
That said, it is easier to write the query using analytic functions:
select a.*
from (select a.*,
max(tenancystartdate) over (partition by propertyid) as max_tsd
from dimaccount a
where accountstatus = 'Current'
) a
where tenancystartdate = max_tsd;
This is not exactly the same as your query, because your query will take non-current records into account. I am guessing that this might be the intention, however.
To answer your question: Yes, you are right, there can be no duplicates. And I am pretty sure there are none. I am also pretty sure that your query does not what you think it does.
This is your derived table:
SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID, TenancyStartDate
As you group by PropertyID and TenancyStartDate, you get one line per PropertyID and TenancyStartDate. For each such line you want the MAX(TenancyStartDate), which is the TenancyStartDate itself of course. There is no other field you aggregate, so you don't aggregate at all, but only make the rows distinct, for which one would use DISTINCT. Then you do use DISTINCT to get unique result records, but your records are already unique, by your obfuscated way of doing it. So you say: select the distinct records of distinct records. Your subquery can be re-written as:
SELECT DISTINCT PropertyID, TenancyStartDate
FROM DimAccount
WHERE AccountStatus = 'Current'
Then you outer-join the DimAccount table. So you would keep your found records, even in case there is no matching DimAccount record. But: You've selected from DimAccount, so of course there is always at least the one record you already found. Your outer join is actually an inner join. Then the only field from the derived query shown is PropertyID which always equals ACC.PropertyID. This means: You are only selecting records from ACC and the derived table is just to make sure a 'Current' record exists for PropertyID and TenancyStartDate. Your query could thus be re-written as:
SELECT DISTINCT
PropertyID, TenancyStartDate, AccountID, TenancyType
FROM DimAccount AS ACC
WHERE EXISTS
(
SELECT *
FROM DimAccount CurrentAccount
WHERE CurrentAccount.AccountStatus = 'Current'
AND CurrentAccount.PropertyID = ACC.PropertyID
AND CurrentAccount.TenancyStartDate = ACC.TenancyStartDate
);
In case PropertyID + TenancyStartDate + AccountID + TenancyType are unique (is AccountID the table's ID?) then you can even remove DISTINCT.
This query gets all 'Current' DimAccount records first and then gives you all records with the same PropertyID and TenancyStartDate. However, from your explanation it seems you want to select the latest 'Current' DimAccount record per PropertyID. This is something entirely else. There are different solutions to such a task depending on the dbms you are using (you haven't specified yours in your tags).

using LEFT JOIN produces different results than using IN

I am wondering whether you can uncover this mystery of why these two queries, that should be producing the same exact results, are in fact producing different results. The first query is producing the incorrect result.
The following query shows that o.OwnerId=b.systemuserid is never true:
select distinct o.assignedto,b.systemuserid
from Opportunity o
left join crmtestdb.dm1_mscrm.dbo.systemuserbase b
on o.OwnerId = b.systemuserid
because it returns all nulls on the right side:
Whereas this query shows that o.OwnerId=b.systemuserid is indeed true for some records:
select distinct assignedto
from Opportunity
where assignedto in (select distinct systemuserid
from crmtestdb.dm1_mscrm.dbo.systemuserbase)
and this shows that they do have those fields in common:
What's going on here? What am I doing wrong? Please let me know if you need clarification on anything.
select distinct
o.assignedto,
b.systemuserid
from Opportunity o
left join crmtestdb.dm1_mscrm.dbo.systemuserbase b
on o.OwnerId=b.systemuserid
--here you are matching Onwer to sysuser
select distinct
assignedto
from Opportunity
where assignedto in (
select distinct
systemuserid
--here you are matching assignedto to sysuser
--this is not equivilant
from crmtestdb.dm1_mscrm.dbo.systemuserbase
)
You are not matching on the same things. In one query you are selecting Assignedto and matching OwnerID to sytemuserID in the second you are searching where assigned to is IN systemuserId Try changing the second query to
select distinct
assignedto
from Opportunity
where OwnerId in (
select distinct
systemuserid
from crmtestdb.dm1_mscrm.dbo.systemuserbase
)

how can i eliminate duplicates in gridview?

i am retrieving data from three tables for my requirement so i wrote the following query
i was getting correct result but the problem is records are repeated whats the problem in
that query. i am binding result of query to grid view control. please help me
SELECT DISTINCT (tc.coursename), ur.username, uc. DATE, 'Paid' AS Status
FROM tblcourse tc, tblusereg ur, dbo.UserCourse uc
WHERE tc.courseid IN (SELECT ur1.courseid
FROM dbo.UserCourse ur1
WHERE ur1.userid = #userid)
AND ur.userid = #userid
AND uc. DATE IS NOT NULL
AND ur.course - id = uc.course - id
There is no JOIN between tblcourse tc,tblusereg ur. So you get a cross join despite the IN (which is actually a JOIN)
DISTINCT works on the whole row too: not one column.
Note: you mention dbo.UserCourse twice but use different column names courseid and [course-id]
Rewritten with JOINs.
select distinct
tc.coursename, ur.username, uc.[date], 'Paid' as [Status]
from
dbo.tblcourse tc
JOIN
dbo.tblusereg ur ON tc.courseid = ur.[course-id]
JOIN
dbo.UserCourse uc ON ur.[course-id] = uc.[course-id]
where
ur.userid=#userid
and
uc.[date] is not null
This may fix your problem...
Change that first part of your query
select distinct (tc.coursename),
TO
select distinct tc.coursename,
to make all the columns distinct not just tc.coursename