Sql COALESCE entire rows? - sql

I just learned about COALESCE and I'm wondering if it's possible to COALESCE an entire row of data between two tables? If not, what's the best approach to the following ramblings?
For instance, I have these two tables and assuming that all columns match:
tbl_Employees
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
tbl_Customers
Id Name Email Etc
-----------------------------------
1 Bob ... ...
2 Dan ... ...
3 Mary ... ...
And a table with id's:
tbl_PeopleInCompany
Id CompanyId
-----------------
1 1
2 1
3 1
And I want to query the data in a way that gets rows from the first table with matching id's, but gets from second table if no id is found.
So the resulting query would look like:
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
3 Mary ... ...
Where Sue and Rick was taken from the first table, and Mary from the second.

SELECT Id, Name, Email, Etc FROM tbl_Employees
WHERE Id IN (SELECT ID From tbl_PeopleInID)
UNION ALL
SELECT Id, Name, Email, Etc FROM tbl_Customers
WHERE Id IN (SELECT ID From tbl_PeopleInID) AND
Id NOT IN (SELECT Id FROM tbl_Employees)
Depending on the number of rows, there are several different ways to write these queries (with JOIN and EXISTS), but try this first.
This query first selects all the people from tbl_Employees that have an Id value in your target list (the table tbl_PeopleInID). It then adds to the "bottom" of this bunch of rows the results of the second query. The second query gets all tbl_Customer rows with Ids in your target list but excluding any with Ids that appear in tbl_Employees.
The total list contains the people you want — all Ids from tbl_PeopleInID with preference given to Employees but missing records pulled from Customers.

You can also do this:
1) Outer Join the two tables on tbl_Employees.Id = tbl_Customers.Id. This will give you all the rows from tbl_Employees and leave the tbl_Customers columns null if there is no matching row.
2) Use CASE WHEN to select either the tbl_Employees column or tbl_Customers column, based on whether tbl_Customers.Id IS NULL, like this:
CASE WHEN tbl_Customers.Id IS NULL THEN tbl_Employees.Name ELSE tbl_Customers.Name END AS Name
(My syntax might not be perfect there, but the technique is sound).

This should be pretty performant. It uses a CTE to basically build a small table of Customers that have no matching Employee records, and then it simply UNIONs that result with the Employee records
;WITH FilteredCustomers (Id, Name, Email, Etc)
AS
(
SELECT Id, Name, Email, Etc
FROM tbl_Customers C
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
LEFT JOIN tbl_Employees E
ON C.Id = E.Id
WHERE E.Id IS NULL
)
SELECT Id, Name, Email, Etc
FROM tbl_Employees E
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
UNION
SELECT Id, Name, Email, Etc
FROM FilteredCustomers
Using the IN Operator can be rather taxing on large queries as it might have to evaluate the subquery for each record being processed.

I don't think the COALESCE function can be used for what you're thinking. COALESCE is similar to ISNULL, except it allows you to pass in multiple columns, and will return the first non-null value:
SELECT Name, Class, Color, ProductNumber,
COALESCE(Class, Color, ProductNumber) AS FirstNotNull
FROM Production.Product
This article should explain it's application:
http://msdn.microsoft.com/en-us/library/ms190349.aspx
It sounds like Larry Lustig's answer is more along the lines of what you need though.

Related

How to return all names that appear multiple times in table [duplicate]

This question already has answers here:
What's the SQL query to list all rows that have 2 column sub-rows as duplicates?
(10 answers)
Closed last year.
Suppose I have the following schema:
student(name, siblings)
The related table has names and siblings. Note the number of rows of the same name will appear the same number of times as the number of siblings an individual has. For instance, a table could be as follows:
Jack, Lucy
Jack, Tim
Meaning that Jack has Lucy and Tim as his siblings.
I want to identify an SQL query that reports the names of all students who have 2 or more siblings. My attempt is the following:
select name
from student
where count(name) >= 1;
I'm not sure I'm using count correctly in this SQL query. Can someone please help with identifying the correct SQL query for this?
You're almost there:
select name
from student
group by name
having count(*) > 1;
HAVING is a where clause that runs after grouping is done. In it you can use things that a grouping would make available (like counts and aggregations). By grouping on the name and counting (filtering for >1, if you want two or more, not >=1 because that would include 1) you get the names you want..
This will just deliver "Jack" as a single result (in the example data from the question). If you then want all the detail, like who Jack's siblings are, you can join your grouped, filtered list of names back to the table:
select *
from
student
INNER JOIN
(
select name
from student
group by name
having count(*) > 1
) morethanone ON morethanone.name = student.name
You can't avoid doing this "joining back" because the grouping has thrown the detail away in order to create the group. The only way to get the detail back is to take the name list the group gave you and use it to filter the original detail data again
Full disclosure; it's a bit of a lie to say "can't avoid doing this": SQL Server supports something called a window function, which will effectively perform a grouping in the background and join it back to the detail. Such a query would look like:
select student.*, count(*) over(partition by name) n
from student
And for a table like this:
jack, lucy
jack, tim
jane, bill
jane, fred
jane, tom
john, dave
It would produce:
jack, lucy, 2
jack, tim, 2
jane, bill, 3
jane, fred, 3
jane, tom, 3
john, dave, 1
The rows with jack would have 2 on because there are two jack rows. There are 3 janes, there is 1 john. You could then wrap all that in a subquery and filter for n > 1 which would remove john
select *
from
(
select student.*, count(*) over(partition by name) n
from student
) x
where x.n > 1
If SQL Server didn't have window functions, it would look more like:
select *
from
student
INNER JOIN
(
select name, count(*) as n
from student
group by name
) x ON x.name = student.name
The COUNT(*) OVER(PARTITION BY name) is like a mini "group by name and return the count, then auto join back to the main detail using the name as key" i.e. a short form of the latter query
You can do:
select name
from student as s1
where exists (
select s2
from student as s2
where s1.name = s2.name and s1.siblings != s2.siblings
)
I think the best approach is what 'Caius Jard' mentioned. However, additional way if you want to get how many siblings each name has .
SELECT name, COUNT(*) AS Occurrences
FROM student
GROUP BY name
HAVING (COUNT(*) > 1)
I wanted to share another solution I came up with:
select s1.name
from student s1, student s2
where s1.name = s2.name and s1.sibling != s2.sibling;

Select all related records

I have a table (in SQL Server) that stores records as shown below. The purpose for Old_Id is for change tracking.
Meaning that when I want to update a record, the original record has to be unchanged, but a new record has to be inserted with a new Id and with updated values, and with the modified record's Id in Old_Id column
Id Name Old_Id
---------------------
1 Paul null
2 Paul 1
3 Jim null
4 Paul 2
5 Tim null
My question is:
When I search for id = 1 or 2 or 4, I want to select all related records.
In this case I want see records the following ids: 1, 2, 4
How can it be written in a stored procedure?
Even if it's bad practice to go with this, I can't change this logic because its legacy database and it's quite a large database.
Can anyone help with this?
you can do that with Recursive Common Table Expressions (CTE)
WITH cte_history AS (
SELECT
h.id,
h.name,
h.old_id
FROM
history h
WHERE old_id IS NULL
and id in (1,2,4)
UNION ALL
SELECT
e.id,
e.name,
e.old_id
FROM
history e
INNER JOIN cte_history o
ON o.id = e.old_id
)
SELECT * FROM cte_history;

Sum of two columns alongside separate select statement

I am working on an SQL task and I cannot figure out how to get the sum of two columns from the same table while displaying information from another table.
I have tried multiple things and have spent probably about two hours trying to figure this out.
I have two tables: Employees and Fuel. I displayed all of the employee's information.First SQL statement I had to make:
SELECT firstname, lastname, title, registrationyear, make, model FROM Employees ORDER BY make;
My Employees table has the following columns: firstname, lastname, employeeid, make, model, registrationyear, title
My Fuel table has the following columns: currentprice, fueltype, fuelcost, mileage, mileagecount, fuelamount, employeeid, date
My instructions state: "A list that shows what cars the employees currently use (first SQL statement I made, so this one is DONE!)
Like the above report but also the total amount of kilometers that the employees have driven and the total fuel cost." (this is the task that I am trying to make a statement for)
I have tried using LIKE, UNION, UNION ALL, etc. and the best that I have been able to do is listing the employee information and the totals ON TOP of the information instead of in two separate columns of their own alongside the other data in the query.
I am really stuck here. Could anyone please help me?
This second task is muck more complex than the first one.
First of all, combining in a single row the columns from two or more tables is what join is for, so you will have to join the two tables based on employeeid. This will return you a table like this
employeeid | other emp fields | fuel date | other fueld fields
1 | ... | 01/01/2017 | ...
1 | ... | 01/02/2017 | ...
2 | ... | 01/01/2017 | ...
2 | ... | 02/01/2017 | ...
2 | ... | 04/03/2017 | ...
From here, you want the data from each employee combined with the sum of the rows from fuel related to that employee, and that's what group by is for.
When using group by you define a set of columns that defines the grouping criteria; everything else in your select statement will have to be grouped somehow (in your case with a sum), so that the columns in the group by stay unique.
Your final query would look like this
select t1.firstname, t1.lastname, t1.title, t1.registrationyear, t1.make, t1.model,
sum(t2.mileage) as total_milege,
sum(t2.fuelcost * t2.fuelamount) as total_fuel_cost
from Employees t1
join Fuel t2
on t1.employeeid = t2.employeeid
group by t1.firstname, t1.lastname, t1.title, t1.registrationyear, t1.make, t1.model
Note: I don't know the difference between mileage and mileagecount, so the part of my query involving those fields may need some tweaking.
You can use Inner join & Group By clause as mentioned below. Let me know if you mean something else.
SELECT A.firstname, A.lastname, A.title, A.registrationyear, A.make, A.model,
SUM(B.Column_Having_Kilometer_Driven_Value)
FROM
Employee A
INNER JOIN Fuel B ON A.EmployeeID = B.EmployeeID
Group By A.EmployeeID, A.firstname, A.lastname, A.title, A.registrationyear, A.make, A.model

write a query to identify discrepancy

I have a table with Student ID's and Student Names. There has been issues with assigning unique Student Id's to students and Hence I want to find the duplicates
Here is the sample Table:
Student ID Student Name
1 Jack
1 John
1 Bill
2 Amanda
2 Molly
3 Ron
4 Matt
5 James
6 Kathy
6 Will
Here I want a third column "Duplicate_Count" to display count of duplicate records.
For e.g. "Duplicate_Count" would display "3" for Student ID = 1 and so on. How can I do this?
Thanks in advance
Select StudentId, Count(*) DupCount
From Table
Group By StudentId
Having Count(*) > 1
Order By Count(*) desc,
Select
aa.StudentId, aa.StudentName, bb.DupCount
from
Table as aa
join
(
Select StudentId, Count(*) as DupCount from Table group by StudentId
) as bb
on aa.StudentId = bb.StudentId
The virtual table gives the count for each StudentId, this is joined back to the original table to add the count to each student record.
If you want to add a column to the table to hold dupcount, this query can be used in an update statement to update that column in the table
This should work:
update mytable
set duplicate_count = (select count(*) from mytable t where t.id = mytable.id)
UPDATE:
As mentioned by #HansUp, adding a new column with the duplicate count probably doesn't make sense, but that really depends on what the OP originally thought of using it for. I'm leaving the answer in case it is of help for someone else.

Select a subgroup of records by one distinct column

Sorry if this has been answered before, but all the related questions didn't quite seem to match my purpose.
I have a table that looks like the following:
ID POSS_PHONE CELL_FLAG
=======================
1 111-111-1111 0
2 222-222-2222 0
2 333-333-3333 1
3 444-444-4444 1
I want to select only distinct ID values for an insert, but I don't care which specific ID gets pulled out of the duplicates.
For Example(a valid SELECT would be):
1 111-111-1111 0
2 222-222-2222 0
3 444-444-4444 1
Before I had the CELL_FLAG column, I was just using an aggregate function as so:
SELECT ID, MAX(POSS_PHONE)
FROM TableA
GROUP BY ID
But I can't do:
SELECT ID, MAX(POSS_PHONE), MAX(CELL_FLAG)...
because I would lose integrity within the row, correct?
I've seen some similar examples using CTEs, but once again, nothing that quite fit.
So maybe this is solvable by a CTE or some type of self-join subquery? I'm at a block right now, so I can't see any other solutions.
Just get your aggregation in a subquery and join to it:
SELECT a.ID, sub.Poss_Phone, CELL_FLAG
FROM TableA as a
INNER JOIN (SELECT ID, MAX(POSS_PHONE) as [Poss_Phone]
FROM TableA
GROUP BY ID) Sub
ON Sub.ID = a.ID and SUB.Poss_Phone = A.Poss_Phone
This will keep integrity between your non-aggregated fields but still give you the MAX(Poss_Phone) per ID.