SELECT Statement in CASE - sql

Please don't downgrade this as it is bit complex for me to explain. I'm working on data migration so some of the structures look weird because it was designed by someone like that.
For ex, I have a table Person with PersonID and PersonName as columns. I have duplicates in the table.
I have Details table where I have PersonName stored in a column. This PersonName may or may not exist in the Person table. I need to retrieve PersonID from the matching records otherwise put some hardcode value in PersonID.
I can't write below query because PersonName is duplicated in Person Table, this join doubles the rows if there is a matching record due to join.
SELECT d.Fields, PersonID
FROM Details d
JOIN Person p ON d.PersonName = p.PersonName
The below query works but I don't know how to replace "NULL" with some value I want in place of NULL
SELECT d.Fields, (SELECT TOP 1 PersonID FROM Person where PersonName = d.PersonName )
FROM Details d
So, there are some PersonNames in the Details table which are not existent in Person table. How do I write CASE WHEN in this case?
I tried below but it didn't work
SELECT d.Fields,
CASE WHEN (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) = null
THEN 123
ELSE (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) END Name
FROM Details d
This query is still showing the same output as 2nd query. Please advise me on this. Let me know, if I'm unclear anywhere. Thanks

well.. I figured I can put ISNULL on top of SELECT to make it work.
SELECT d.Fields,
ISNULL(SELECT TOP 1 p.PersonID
FROM Person p where p.PersonName = d.PersonName, 124) id
FROM Details d

A simple left outer join to pull back all persons with an optional match on the details table should work with a case statement to get your desired result.
SELECT
*
FROM
(
SELECT
Instance=ROW_NUMBER() OVER (PARTITION BY PersonName),
PersonID=CASE WHEN d.PersonName IS NULL THEN 'XXXX' ELSE p.PersonID END,
d.Fields
FROM
Person p
LEFT OUTER JOIN Details d on d.PersonName=p.PersonName
)AS X
WHERE
Instance=1

Ooh goody, a chance to use two LEFT JOINs. The first will list the IDs where they exist, and insert a default otherwise; the second will eliminate the duplicates.
SELECT d.Fields, ISNULL(p1.PersonID, 123)
FROM Details d
LEFT JOIN Person p1 ON d.PersonName = p1.PersonName
LEFT JOIN Person p2 ON p2.PersonName = p1.PersonName
AND p2.PersonID < p1.PersonID
WHERE p2.PersonID IS NULL

You could use common table expressions to build up the missing datasets, i.e. your complete Person table, then join that to your Detail table as follows;
declare #n int;
-- set your default PersonID here;
set #n = 123;
-- Make sure previous SQL statement is terminated with semilcolon for with clause to parse successfully.
-- First build our unique list of names from table Detail.
with cteUniqueDetailPerson
(
[PersonName]
)
as
(
select distinct [PersonName]
from [Details]
)
-- Second get unique Person entries and record the most recent PersonID value as the active Person.
, cteUniquePersonPerson
(
[PersonID]
, [PersonName]
)
as
(
select
max([PersonID]) -- if you wanted the original Person record instead of the last, change this to min.
, [PersonName]
from [Person]
group by [PersonName]
)
-- Third join unique datasets to get the PersonID when there is a match, otherwise use our default id #n.
-- NB, this would also include records when a Person exists with no Detail rows (they are filtered out with the final inner join)
, cteSudoPerson
(
[PersonID]
, [PersonName]
)
as
(
select
coalesce(upp.[PersonID],#n) as [PersonID]
coalesce(upp.[PersonName],udp.[PersonName]) as [PersonName]
from cteUniquePersonPerson upp
full outer join cteUniqueDetailPerson udp
on udp.[PersonName] = p.[PersonName]
)
-- Fourth, join detail to the sudo person table that includes either the original ID or our default ID.
select
d.[Fields]
, sp.[PersonID]
from [Details] d
inner join cteSudoPerson sp
on sp.[PersonName] = d.[PersonName];

Related

Assign lowest record ID with Outer Apply

Assume I have those tables:
CREATE TABLE Employee (ID int, EmployeeIdentifier varchar(100),ManagerIdentifier varchar(100))
CREATE TABLE EmployeeManager (ID int, EmployeeID varchar(100))
INSERT Employee
VALUES
(1,'apple','apple'),
(2,'banana','apple'),
(3,'citrus','apple'),
(4,'grape','grape'),
(5,'grape','grape'),
(6,'grape','grape')
INSERT EmployeeManager
VALUES
(1,1),
(2,1),
(3,1),
(4,4),
(5,5),
(6,5)
For Employee.ID IN (1,2,3), records in EmployeeManager look fine.
But in Employee.ID IN (4,5,6) we can see many duplicates. We are not allowed to delete any records from Employee table. But we are free to assign EmpoyeeManager.EmployeeID value. Since there is only one Actual record for Grape and the rest is duplicate, I want to assign EmpoyeeManager.EmployeeID to a minimum value Employee.ID from all duplicated grape records in Employee table, aka to 4.
I have this query,
UPDATE d SET EmployeeID = l.ID
FROM dbo.EmployeeManager d
INNER JOIN Employee s on d.ID=s.ID
OUTER APPLY (
SELECT ID
FROM Employee l
WHERE s.ManagerIdentifier=l.EmployeeIdentifier
) l
WHERE
EXISTS (
SELECT d.EmployeeID
EXCEPT
SELECT l.ID
)
If you keep running it you will see that EmployeeManager.EmployeeID values for ID (4,5,6) will keep changing.
How I can I update above update statement to assign to the lowest value of Employee.ID for all EmployeeManager.ID (4,5,6), aka to 4?
We are not allowed to run one time fix script, because corrupted data to above table can keep coming.
Desired output after running above update statement should be
You need TOP (1) and ORDER BY in the subquery to pick out a specific row
UPDATE d SET EmployeeID = l.ID
FROM dbo.EmployeeManager d
INNER JOIN Employee s on d.ID=s.ID
OUTER APPLY (
SELECT TOP (1) ID
FROM Employee l
WHERE s.ManagerIdentifier = l.EmployeeIdentifier
ORDER BY ID
) l
WHERE
EXISTS (
SELECT d.EmployeeID
EXCEPT
SELECT l.ID
)
You appear to have a normalization issue, as the Manager is defined in two places
I suggest you use better aliases for your tables, they are not very memorable
You can change your OUTER to CROSS, and then you can use a standard <> instead of the EXISTS/EXCEPT
CROSS APPLY (
SELECT TOP (1) ID
FROM Employee l
WHERE s.ManagerIdentifier = l.EmployeeIdentifier
ORDER BY ID
) l
WHERE d.EmployeeID <> l.ID

Will this left join on same table ever return data?

In SQL Server, on a re-engineering project, I'm walking through some old sprocs, and I've come across this bit. I've hopefully captured the essence in this example:
Example Table
SELECT * FROM People
Id | Name
-------------------------
1 | Bob Slydell
2 | Jim Halpert
3 | Pamela Landy
4 | Bob Wiley
5 | Jim Hawkins
Example Query
SELECT a.*
FROM (
SELECT DISTINCT Id, Name
FROM People
WHERE Id > 3
) a
LEFT JOIN People b
ON a.Name = b.Name
WHERE b.Name IS NULL
Please disregard formatting, style, and query efficiency issues here. This example is merely an attempt to capture the exact essence of the real query I'm working with.
After looking over the real, more complex version of the query, I burned it down to this above, and I cannot for the life of me see how it would ever return any data. The LEFT JOIN should always exclude everything that was just selected because of the b.Name IS NULL check, right? (and it being the same table). If a row from People was found where b.Name IS NULL evals to true, then shouldn't that mean that data found in People a was never found? (impossible?)
Just to be very clear, I'm not looking for a "solution". The code is what it is. I'm merely trying to understand its behavior for the purpose of re-engineering it.
If this code indeed never returns results, then I'll conclude it was written incorrectly and use that knowledge during the re-engineering.
If there is a valid data scenario where it would/could return results, then that will be news to me and I'll have to go back to the books on SQL Joins! #DrivenCrazy
Yes. There are circumstances where this query will retrieve rows.
The query
SELECT a.*
FROM (
SELECT DISTINCT Id, PName
FROM People
WHERE Id > 3
) a
LEFT JOIN People b
ON a.PName = b.PName
WHERE b.PName IS NULL;
is roughly (maybe even exactly) equivalent to...
select distinct Id, PName
from People
where Id > 3 and PName is null;
Why?
Tested it using this code (mysql).
create table People (Id int, PName varchar(50));
insert into People (Id, Pname)
values (1, 'Bob Slydell'),
(2, 'Jim Halpert'),
(3,'Pamela Landy'),
(4,'Bob Wiley'),
(5,'Jim Hawkins');
insert into People (Id, PName) values (6,null);
Now run the query. You get
6, Null
I don't know if your schema allows null Name.
What value can P.Name have such that a.PName = b.PName finds no match and b.PName is Null?
Well it's written right there. b.PName is Null.
Can we prove that there is no other case where a row is returned?
Suppose there is a value for (Id,PName) such that PName is not null and a row is returned.
In order to satisfy the condition...
where b.PName is null
...such a value must include a PName that does not match any PName in the People table.
All a.PName and all b.PName values are drawn from People.PName ...
So a.PName may not match itself.
The only scalar value in SQL that does not equal itself is Null.
Therefore if there are no rows with Null PName this query will not return a row.
That's my proposed casual proof.
This is very confusing code. So #DrivenCrazy is appropriate.
The meaning of the query is exactly "return people with id > 3 and a null as name", i.e. it may return data but only if there are null-values in the name:
SELECT DISTINCT Id, PName
FROM People
WHERE Id > 3 and PName is null
The proof for this is rather simple, if we consider the meaning of the left join condition ... LEFT JOIN People b ON a.PName = b.PName together with the (overall) condition where p.pname is null:
Generally, a condition where PName = PName is true if and only if PName is not null, and it has exactly the same meaning as where PName is not null. Hence, the left join will match only tuples where pname is not null, but any matching row will subsequently be filtered out by the overall condition where pname is null.
Hence, the left join cannot introduce any new rows in the query, and it cannot reduce the set of rows of the left hand side (as a left join never does). So the left join is superfluous, and the only effective condition is where PName is null.
LEFT JOIN ON returns the rows that INNER JOIN ON returns plus unmatched rows of the left table extended by NULL for the right table columns. If the ON condition does not allow a matched row to have NULL in some column (like b.NAME here being equal to something) then the only NULLs in that column in the result are from unmatched left hand rows. So keeping rows with NULL for that column as the result gives exactly the rows unmatched by the INNER JOIN ON. (This is an idiom. In some cases it can also be expressed via NOT IN or EXCEPT.)
In your case the left table has distinct People rows with a.Id > 3 and the right table has all People rows. So the only a rows unmatched in a.Name = b.Name are those where a.Name IS NULL. So the WHERE returns those rows extended by NULLs.
SELECT * FROM
(SELECT DISTINCT * FROM People WHERE Id > 3 AND Name IS NULL) a
LEFT JOIN People b ON 1=0;
But then you SELECT a.*. So the entire query is just
SELECT DISTINCT * FROM People WHERE Id > 3 AND Name IS NULL;
sure.left join will return data even if the join is done on the same table.
according to your query
"SELECT a.*
FROM (
SELECT DISTINCT Id, Name
FROM People
WHERE Id > 3
) a
LEFT JOIN People b
ON a.Name = b.Name
WHERE b.Name IS NULL"
it returns null because of the final filtering "b.Name IS NULL".without that filtering it will return 2 records with id > 3

SQL Server Compare two rows to identify ID

Here is what I am trying to do.
I have one column of data that is the ID of every person. I have a second column of data that is the ID of just supervisors. I also have a third column of data that identifies an ID as staff or faculty.
I need to take the Supervisor column and compare it against the ID column. When the supervisor's ID is located then I need to identify that row as a staff of faculty supervisor in a separate column. If the ID is not in the supervisor column then they just need to be marked as staff or faculty based off of the third column.
So the three columns that I have are ID, Supervisor ID and Class Type.
Any help would be appreciated.
Here is the code that I currently have
select distinct ODS_PERSON.ID "Cient_ID",
ODS_PERSON.LAST_NAME "Last_Name",
CASE
WHEN H17_PERSON.NICKNAME is not null
THEN H17_PERSON.NICKNAME
ELSE ODS_PERSON.FIRST_NAME
END "First_Name",
H17_PERSON.H17_PER_USERNAME + '#highpoint.edu' "Email",
CASE
WHEN ODS_HRPER.HRP_EFFECT_TERM_DATE is null
THEN '1'
ELSE '0'
END "User_Status",
CASE
WHEN SPT_POSITION.POS_CLASS = 'FACL' AND (ODS_PERSON.ID = SPT_PERPOS.PERPOS_SUPERVISOR_HRP_ID)
THEN 'FACSUP'
ELSE 'NOPE'
END "Employee_Type",
SPT_PERPOS.PERPOS_SUPERVISOR_HRP_ID "Manager",
SPT_POSITION.DEPARTMENT_DESC "Department",
SPT_PERPOS.PERPOS_POS_SHORT_TITLE "Position_Title",
SPT_POSITION.POS_CLASS "Position_Class"
from ( ( ( ( ( SPT_PERPOSWG SPT_PERPOSWG left join ODS_HRPER ODS_HRPER on SPT_PERPOSWG.PPWG_HRP_ID = ODS_HRPER.HRPER_ID ) left join SPT_PERPOS SPT_PERPOS on SPT_PERPOSWG.PPWG_HRP_ID = SPT_PERPOS.PERPOS_HRP_ID ) left join SPT_PERSTAT SPT_PERSTAT on SPT_PERPOSWG.PPWG_HRP_ID = SPT_PERSTAT.PERSTAT_HRP_ID ) left join ODS_PERSON ODS_PERSON on SPT_PERPOSWG.PPWG_HRP_ID = ODS_PERSON.ID ) left join SPT_POSITION SPT_POSITION on SPT_PERPOS.PERPOS_POSITION_ID = SPT_POSITION.POSITION_ID ) left join H17_PERSON H17_PERSON on SPT_PERPOSWG.PPWG_HRP_ID = H17_PERSON.ID
where ODS_HRPER.HRP_EFFECT_TERM_DATE is null
and SPT_PERPOS.PERPOS_END_DATE is null
order by ODS_PERSON.ID
SELECT ID, Supervisor_ID, Class_Type,
CASE
WHEN SuperVisor_ID is null and Class_Type is not null THEN Class_Type
WHEN (SuperVisor_ID = ID or SuperVisor_ID is not null) THEN 'Supervisor'
END
from tableID
Not sure if this is what you're looking for. In the future it would be helpful to know how many tables are involved, and if the data types in each column can be compared directly or if you have to cast the data types.
You may also want to put whether join conditions are necessary to get the information and what columns the tables would join on.
Hope this helps

SQL joins with multiple records into one with a default

My 'people' table has one row per person, and that person has a division (not unique) and a company (not unique).
I need to join people to p_features, c_features, d_features on:
people.person=p_features.num_value
people.division=d_features.num_value
people.company=c_features.num_value
... in a way that if there is a record match in p_features/d_features/c_features only, it would be returned, but if it was in 2 or 3 of the tables, the most specific record would be returned.
From my test data below, for example, query for person=1 would return
'FALSE'
person 3 returns maybe, person 4 returns true, and person 9 returns default
The biggest issue is that there are 100 features and I have queries that need to return all of them in one row. My previous attempt was a function which queried on feature,num_value in each table and did a foreach, but 100 features * 4 tables meant 400 reads and it brought the database to a halt it was so slow when I loaded up a few million rows of data.
create table p_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table c_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table d_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table default_features (
feature varchar(20),
feature_value varchar(128)
);
create table people (
person int8 not null,
division int8 not null,
company int8 not null
);
insert into people values (4,5,6);
insert into people values (3,5,6);
insert into people values (1,2,6);
insert into p_features values (4,'WEARING PANTS','TRUE');
insert into c_features values (6,'WEARING PANTS','FALSE');
insert into d_features values (5,'WEARING PANTS','MAYBE');
insert into default_features values('WEARING PANTS','DEFAULT');
You need to transpose the features into rows with a ranking. Here I used a common-table expression. If your database product does not support them, you can use temporary tables to achieve the same effect.
;With RankedFeatures As
(
Select 1 As FeatureRank, P.person, PF.feature, PF.feature_value
From people As P
Join p_features As PF
On PF.num_value = P.person
Union All
Select 2, P.person, PF.feature, PF.feature_value
From people As P
Join d_features As PF
On PF.num_value = P.division
Union All
Select 3, P.person, PF.feature, PF.feature_value
From people As P
Join c_features As PF
On PF.num_value = P.company
Union All
Select 4, P.person, DF.feature, DF.feature_value
From people As P
Cross Join default_features As DF
)
, HighestRankedFeature As
(
Select Min(FeatureRank) As FeatureRank, person
From RankedFeatures
Group By person
)
Select RF.person, RF.FeatureRank, RF.feature, RF.feature_value
From people As P
Join HighestRankedFeature As HRF
On HRF.person = P.person
Join RankedFeatures As RF
On RF.FeatureRank = HRF.FeatureRank
And RF.person = P.person
Order By P.person
I don't know if I had understood very well your question, but to use JOIN, you need your table loaded already and then use the SELECT statement with INNER JOIN, LEFT JOIN or whatever you need to show.
If you post some more information, maybe turn it easy to understand.
There are some aspects of your schema I'm not understanding, like how to relate to the default_features table if there's no match in any of the specific tables. The only possible join condition is on feature, but if there's no match in the other 3 tables, there's no value to join on. So, in my example, I've hard-coded the DEFAULT since I can't think of how else to get it.
Hopefully this can get you started and if you can clarify the model a bit more, the solution can be refined.
select p.person, coalesce(pf.feature_value, df.feature_value, cf.feature_value, 'DEFAULT')
from people p
left join p_features pf
on p.person = pf.num_value
left join d_features df
on p.division = df.num_value
left join c_features cf
on p.company = cf.num_value

Get latest record from second table left joined to first table

I have a candidate table say candidates having only id field and i left joined profiles table to it. Table profiles has 2 fields namely, candidate_id & name.
e.g. Table candidates:
id
----
1
2
and Table profiles:
candidate_id name
----------------------------
1 Foobar
1 Foobar2
2 Foobar3
i want the latest name of a candidate in a single query which is given below:
SELECT C.id, P.name
FROM candidates C
LEFT JOIN profiles P ON P.candidate_id = C.id
GROUP BY C.id
ORDER BY P.name;
But this query returns:
1 Foobar
2 Foobar3
...Instead of:
1 Foobar2
2 Foobar3
The problem is that your PROFILES table doesn't provide a reliable means of figuring out what the latest name value is. There are two options for the PROFILES table:
Add a datetime column IE: created_date
Define an auto_increment column
The first option is the best - it's explicit, meaning the use of the column is absolutely obvious, and handles backdated entries better.
ALTER TABLE PROFILES ADD COLUMN created_date DATETIME
If you want the value to default to the current date & time when inserting a record if no value is provided, tack the following on to the end:
DEFAULT CURRENT_TIMESTAMP
With that in place, you'd use the following to get your desired result:
SELECT c.id,
p.name
FROM CANDIDATES c
LEFT JOIN PROFILES p ON p.candidate_id = c.id
JOIN (SELECT x.candidate_id,
MAX(x.created_date) AS max_date
FROM PROFILES x
GROUP BY x.candidate_id) y ON y.candidate_id = p.candidate_id
AND y.max_date = p.created_date
GROUP BY c.id
ORDER BY p.name
Use a subquery:
SELECT C.id, (SELECT P.name FROM profiles P WHERE P.candidate_id = C.id ORDER BY P.name LIMIT 1);