How the following SQL query is working? - sql

How is the given query correct as it is using T1 inside the with clause and T1 is declared after the clause within WITH is completed.
WITH T1(Emp,Manager,Salary) AS
(
SELECT tt2.[Emp],tt2.[Manager],tt2.[Salary]
FROM [YourTable] AS tt1
RIGHT OUTER JOIN [YourTable] AS tt2 ON tt1.[Emp]=tt2.[Manager]
WHERE tt1.[Emp] is NULL
UNION ALL
SELECT r.[Emp],T1.[Manager],r.[Salary]
FROM [YourTable] AS r
INNER JOIN T1 ON r.[Manager]=T1.[Emp]
)
SELECT [Manager],SUM([Salary]) AS Salary
FROM T1
GROUP BY [Manager]
ORDER BY SUM([Salary]) DESC
Above query is answer to following question -
I have table with columns (Employee, Manager, Salary). Need to calculate aggregate salary for all employees corresponding to top-level managers in one SQL. For example
Input table is :
Emp Manager Salary
A T 10
B A 11
C F 13
D B 5
Result should be :
Top-Lvl Manager Salary(agg)
T 26
F 13
Manager-Employee layering can go multiple levels.

This is a recursive query. The part before UNION ALL gets the base records. The part after it recursively gets more rows attatched to the former.
The first part is confusingly written. It is an anti-join pattern inplemented even with a right outer join which many consider hard to read. It merely means this:
select emp, manager, salary
from yourtable
where manager not in (select emp from yourtable);
So you get all employees that have no manager (i.e. the super managers).
With the part after UNION ALL you get their subordinates and the subordinates of these etc. A hierarchical query.
At last in
SELECT [Manager],SUM([Salary]) AS Salary
FROM T1
GROUP BY [Manager]
ORDER BY SUM([Salary]) DESC
you use those rows to get a cumulated salary per manager.
You can read up on recursive queries in SQL Server here: https://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx.

EDIT - Less over-kill
Declare #YourTable table (Emp varchar(25),Manager varchar(25),Salary int)
Insert into #YourTable values
('A','T',10),
('B','A',11),
('C','F',13),
('D','B',5)
;with cteP as (
Select Seq = cast(1000+Row_Number() over (Order by Emp) as varchar(500))
,Emp=Manager
,Manager=cast(null as varchar(25))
,Lvl=1
,Salary = 0
From #YourTable
Where Manager Not In (Select Distinct Emp From #YourTable)
Union All
Select Seq = cast(concat(p.Seq,'.',1000+Row_Number() over (Order by r.Emp)) as varchar(500))
,r.Emp
,r.Manager
,p.Lvl+1
,r.Salary
From #YourTable r
Join cteP p on r.Manager = p.Emp)
Select TopLvl = A.Emp
,Salary = sum(B.Salary)
from cteP A
Join cteP B on (B.Seq Like A.Seq+'%')
Where A.Lvl=1
Group By A.Emp
Returns
TopLvl Salary
F 13
T 26

Inside with T1 references to INNER JOIN T1 ON r.[Manager]=T1.[Emp] which is probably table in your DB. Outside with T1 references to result of with statement.

Related

SQL inner join on a column based on max value from another column [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 6 years ago.
I have two tables, one "master" is a master list of names and the second "scenario" is a list of multiple scenarios for each name from the master list. I want my INNER JOIN query to fetch the master list of ID with the column status from "scenario" table but only the most recent status based on scenarioID. Here's the code that I've tried and tables with desired output
SELECT DISTINCT a.[user], a.ID, a.Name, b.status
from master a
INNER JOIN scenario b ON a.ID = b.ID
WHERE
b.scenarioID = (
SELECT max(scenarioID) FROM scenario c2 WHERE c2.ID=c.ID)
Master
ID user Name
425 John Skyline
426 John Violin
427 Joe Pura
Scenario
ID ScenarioID status
425 1 active
425 2 active
425 3 done
426 1 active
426 2 active
427 1 done
Desired output
ID user Name status
425 John Skyline done
426 John Violin active
427 Joe Pura done
You can do this with a CROSS APPLY looking up the most recent for each value:
Select M.ID, M.[User], M.Name, X.Status
From [Master] M
Cross Apply
(
Select Top 1 S.Status
From Scenario S
Where S.ID = M.ID
Order By S.ScenarioID Desc
) X
Another way you could do it is with a ROW_NUMBER() PARTITIONED on the ID and ORDERED by the ScenarioID DESC:
;With OrderedStatuses As
(
Select M.Id, M.[User], M.Name, S.Status,
Row_Number() Over (Partition By S.Id Order By S.ScenarioID Desc) RN
From [Master] M
Join Scenario S On S.Id = M.Id
)
Select Id, [User], Name, Status
From OrderedStatuses
Where RN = 1
Here's a slightly different formulation that uses a CTE, which I generally find easier to read than a subquery (though of course, your mileage may vary).
declare #Master table
(
ID bigint,
[user] varchar(16),
Name varchar(16)
);
declare #Scenario table
(
ID bigint,
ScenarioID bigint,
[status] varchar(16)
);
insert #Master values
(425, 'John', 'Skyline'),
(426, 'John', 'Violin'),
(427, 'Joe', 'Pura');
insert #Scenario values
(425, 1, 'active'),
(425, 2, 'active'),
(425, 3, 'done'),
(426, 1, 'active'),
(426, 2, 'active'),
(427, 1, 'done');
with ReversedScenarioCTE as
(
select
ID,
[status],
rowNumber = row_number() over (partition by ID order by ScenarioID desc)
from
#Scenario
)
select
M.ID,
M.[user],
M.Name,
S.[status]
from
#Master M
inner join ReversedScenarioCTE S on
M.ID = S.ID and
S.rowNumber = 1;
If you have SQL Server 2008 or later you can use the ROW_NUMBER() function to achieve what you want. It will avoid querying the same table twice or performing joins.
SELECT *
FROM (
SELECT a.[user]
,a.ID
,a.Name
,b.status
,ROW_NUMBER() OVER (PARTITION BY a.ID ORDER BY b.scenarioID DESC) AS VersionRank
from [master] a INNER JOIN scenario b ON a.ID = b.ID
) Result
WHERE Result.VersionRank = 1

Limit the data set of a single table within a multi-table sql select statement

I'm working in an Oracle environment.
In a 1:M table relationship I want to write a query that will bring me each row from the "1" table and only 1 matching row from the "many" table.
To give a made up example... ( * = Primary Key/Foreign Key )
EMPLOYEE
*emp_id
name
department
PHONE_NUMBER
*emp_id
num
There are many phone numbers for one employee.
Let's say I wanted to return all employees and only one of their phone numbers. (Please forgive the far-fetched example. I'm trying to simulate a workplace scenario)
I tried to run:
SELECT emp.*, phone.num
FROM EMPLOYEE emp
JOIN PHONE_NUMBER phone
ON emp.emp_id = phone.emp_id
WHERE phone.ROWNUM <= 1;
It turns out (and it makes sense to me now) that ROWNUM only exists within the context of the results returned from the entire query. There is not a "ROWNUM" for each table's data set.
I also tried:
SELECT emp.*, phone.num
FROM EMPLOYEE emp
JOIN PHONE_NUMBER phone
ON emp.emp_id = phone.emp_id
WHERE phone.num = (SELECT MAX(num)
FROM PHONE_NUMBER);
That one just returned me one row total. I wanted the inner SELECT to run once for each row in EMPLOYEE.
I'm not sure how else to think about this. I basically want my result set to be the number of rows in the EMPLOYEE table and for each row the first matching row in the PHONE_NUMBER table.
Obviously there are all sorts of ways to do this with procedures and scripts and such but I feel like there is a single-query solution in there somewhere...
Any ideas?
I'd use a rank (or dense_rank or row_number depending on how you want to handle ties)
SELECT *
FROM (SELECT emp.*,
phone.num,
rank() over (partition by emp.emp_id
order by phone.num) rnk
FROM EMPLOYEE emp
JOIN PHONE_NUMBER phone
ON emp.emp_id = phone.emp_id)
WHERE rnk = 1
will rank the rows in phone for each emp_id by num and return the top row. If there could be two rows for the same emp_id with the same num, rank would assign both a rnk of 1 so you'd get duplicate rows. You could add additional conditions to the order by to break the tie. Or you could use row_number rather than rank to arbitrarily break the tie.
All above answers will work beautifully with the scenario you described.
But if you have some employees which are missing in phone tables, then you need to do a left outer join like below. (I faced similar scenario where I needed isolated parents also)
EMP
---------
emp_id Name
---------
1 AA
2 BB
3 CC
PHONE
----------
emp_id no
1 7555
1 7777
2 5555
select emp.emp_id,ph.no from emp left outer join
(
select emp_id,no,
ROW_NUMBER() OVER (PARTITION BY emp_id ORDER BY emp_id) as rnum
FROM phone) ph
on emp.emp_id = ph.emp_id
where ph.rnum = 1 or ph.rnum is null
Result
EMP_ID NO
1 7555
2 5555
3 (null)
If you want only one phone number, then use row_number():
SELECT e.*, p.num
FROM EMPLOYEE emp JOIN
(SELECT p.*,
ROW_NUMBER() OVER (PARTITION BY emp_id ORDER BY emp_id) as seqnum
FROM PHONE_NUMBER p
) p
ON e.emp_id = p.emp_id and seqnum = 1;
Alternatively, you can use aggregation, to get the minimum or maximum value.
This is my solution. Simple but maybe wont scale well for lot of columns.
Sql Fiddle Demo
select e.emp_id, e.name, e.dep, min(p.phone_num)
from
EMPLOYEE e inner join
PHONE_NUMBER p on e.emp_id = p.emp_id
group by e.emp_id, e.name, e.dep
order by e.emp_id;
And this fix the query you try
Sql Fiddle 2
SELECT emp.*, phone.num
FROM EMPLOYEE emp
JOIN PHONE_NUMBER phone
ON emp.emp_id = phone.emp_id
WHERE phone.num = (SELECT MAX(num)
FROM PHONE_NUMBER p
WHERE p.emp_id = emp.emp_id );

sql get max of multiple results

In SQL server 2008 R2, how do you get the MAX result if you have two rows that have the same value? I have a table that I use to store the data about the number of times a client is seen at a location.
create table #templocation
(
ClientID Int,
Location Varchar(100),
LocationCount Int,
MaxCount Int
)
I only fill the first three columns with an insert statement then I get the max for each client like this:
update t
set t.MaxCount = t2.locationcount
from #templocation t
join (select tl.ClientID, MAX(tl.LocationCount) as 'locationcount'
from #templocation tl
group by tl.ClientID) t2
on t2.ClientID = t.ClientID
update t
set t.PrimaryLocation = tl.Location
from #temp t
join #templocation tl
on t.ClientId = tl.ClientID
where tl.LocationCount = tl.MaxCount
I then join this to the main table to get the final results. My problem is if a client is has two or more max counts. Then it wants to display all of the max locations like this:
ClientId Location LocationCount
12502 Main St. 4
12502 Lake Ave 4
12502 Tracy Rd 2
The results I get are:
ClientId ClientName Location
12502 John Smith Main St.
12502 John Smith Lake Ave
I only want the first or top one displayed though preferably by alphabetical order.
I do things like this through a couple of CTEs. One will give us the max(locationcount) for each client, and one just adds a row_number() for every clientid so we can pick the "first" location".
SQL Fiddle
It's a bit of a hack, but it works.
;with maxrows as
(select
ClientID as ClientID,
max(LocationCount) as MaxValue
--row_number() OVER (partition by ClientID order by ClientID)
from
table1
group by ClientID)
,
rowcounts as
(
select
t1.ClientID,
t1.Location,
row_number() over (PARTITION BY t1.ClientID order by t1.Location) as RowNum
from
table1 t1
)
select
t2.ClientID,
t2.Location
from
maxrows t1
inner join rowcounts t2
on t1.ClientID = t2.ClientID
and t2.rownum = 1
From how I understood the issue, you could use MAX() OVER () to achieve what you want like this:
WITH maxvalues AS (
SELECT
ClientId,
Location,
LocationCount,
MaxCount = MAX(LocationCount) OVER (PARTITION BY ClientId)
FROM atable
)
SELECT
ClientId,
Location
FROM maxvalues
WHERE LocationCount = MaxCount
;
Join maxvalues to whatever table holds the client names to include those into the output too.
Learn more about the OVER clause in this manual:
OVER clause (Transact-SQL)

SQL Loop/Crawler

I am trying to figure out some ways to accomplish this script. I import an excel sheet and then I need to populate 5 different tables based on this excel sheet. However for this example I just need help with the initial loop then I think I can work through the rest.
select distinct Department from IPACS_New_MasterList
where Department is not null
This provides me a list of 7 different departments.
Dep1, Dep2, Dep3, Dep4, Dep5, Dep6, Dep7
For each of these departments I need to perform some code.
Step #1:
Insert the department into table_one
I then need to keep the SCOPE_IDENTITY() for the rest of the code.
Step #2
perform the second loop (inserting all functions in that department into table2.
I'm not sure how to really do a foreach row in this select statement loop, or if I need to do something completely different. I've looked at several answers but can't seem to find exactly what I'm looking for.
Sample Data:
Source Table
Dep1, func1, process1, procedure1
dep1, func1, process1, procedure2
dep1, func1, process2, procedure3
dep1, func1, process2, procedure4
dep1, func1, process2, procedure5
dep1, func2, process3, procedure6
dep2, func3, process4, procedure7
My Tables:
My first table is a list of every department from the above query. With a key on the departmentID. Each department can have many functions.
My second table is a list of all functions with a key on functionID and a foreign key on departmentID. Each function must have 1 department and can have many processes
My third table is a list of all processes with a key on processID and a foreign key on functionID. Each process must have 1 function and can have many procedures.
There are two approaches you can use without a loop.
1) If you have candidate keys in your source (department name) just join your source table back to the table you inserted
e.g.
INSERT INTO Department
(Name)
SELECT DISTINCT Dep1
FROM SOURCE;
INSERT INTO Functions
(
Name,
DepartmentID)
SELECT DISTINCT
s.Func1,
d.DepartmentID
FROM
source s
INNER JOIN Department d
on s.dep1 = d.name;
INSERT INTO
processes
(
name,
FunctionID,
[Procedure]
)
SELECT
s.process1,
f.FunctionID,
s.procedure1
FROM
source s
INNER JOIN Department d
on s.dep1 = d.name
INNER JOIN Functions f
on d.DepartmentID = f.departmentID
and s.func1 = f.name;
SQL Fiddle
2) If you don't have candidate keys in your source then you can use the output clause
For example here if a department weren't guaranteed to be unique this would correctly find only the newly add
DECLARE #Department TABLE
(
DepartmentID INT
)
DECLARE #Functions TABLE
(
FunctionID INT
)
INSERT INTO Department
(Name)
OUTPUT INSERTED.DepartmentID INTO #Department
SELECT DISTINCT Dep1
FROM SOURCE
INSERT INTO Functions
(
Name,
DepartmentID)
OUTPUT INSERTED.FunctionID INTO #FunctionID
SELECT DISTINCT
s.Func1,
d.DepartmentID
FROM
source s
INNER JOIN Department d
on s.dep1 = d.name
INNER JOIN #Department d2
ON d.departmentID = d2.departmentID;
INSERT INTO
processes
(
name,
FunctionID,
[Procedure]
)
SELECT
s.process1,
f.FunctionID,
s.procedure1
FROM
source s
INNER JOIN Department d
on s.dep1 = d.name
INNER JOIN Functions f
on d.DepartmentID = f.departmentID
and s.func1 = f.name
INNER JOIN #Functions f2
ON f.Functions = f2.Functions
SELECT * FROM Department;
SELECT * FROm Functions;
SELECT * FROM processes;
SQL Fiddle
If I am understanding what you are trying to do... yes you can use a loop. Its not really talked about and I bet I am going to get some feedback from other SQL developers that its not a best practice. But if you really need to do a loop
DECLARE #rowcount as int
DECLARE #numberOfRows as int
SET #rowcount = 0
SET #numberOfRows = SELECT COUNT(*) from tablename --put in anything to get the number of times to loop.
WHILE #numberOfRows <= #rowcount
BEGIN
--Put whatever process you need to repeat here
SET #rowcount = #rowcount + 1
END
Assuming you have tables set up with an IDENTITY field set for the Primary Key, you can populate each successive table's foreign key by joining to the previous table and the source table, something like:
INSERT INTO Table1
SELECT DISTINCT Department
FROM SourceTable
GO
INSERT INTO Table2
SELECT DISTINCT b.Deptartment_ID, a.Function
FROM SourceTable a
JOIN Table1 b
ON a.Department = b.Department
GO
INSERT INTO Table3
SELECT DISTINCT b.Function_ID, a.Process
FROM SourceTable a
JOIN Table2 b
ON a.Function = b.Function
GO
INSERT INTO Table4
SELECT DISTINCT b.Process_ID, a.Procedure
FROM SourceTable a
JOIN Table3 b
ON a.Process = b.Process
GO

how can i rewrite a select query in this situation

Here are two table in parent/child relationship.
What i need to do is to select students with there average mark:
CREATE TABLE dbo.Students(
Id int NOT NULL,
Name varchar(15) NOT NULL,
CONSTRAINT PK_Students PRIMARY KEY CLUSTERED
(
CREATE TABLE [dbo].[Results](
Id int NOT NULL,
Subject varchar(15) NOT NULL,
Mark int NOT NULL
)
ALTER TABLE [dbo].[Results] WITH CHECK ADD CONSTRAINT [FK_Results_Students] FOREIGN KEY([Id])
REFERENCES [dbo].[Students] ([Id])
I wrote a query like this :
SELECT name , coalesce(avg(r.[mark]),0) as Avmark
FROM students s
LEFT JOIN results r ON s.[id]=r.[id]
GROUP BY s.[name]
ORDER BY ISNULL(AVG(r.[mark]),0) DESC;
But the result is that all of students with there avg mark in desc order.What i need is to restrict result set with students that have the highest average mark agaist other,i.e.if the are two students with avg mark 50 and 1 with 25 i need to display only those students with 50.If there are only one student with highest avg mark- only he must appear in result set.How can i do this in best way?
SQL Server 2005+, using CTEs:
WITH grade_average AS (
SELECT r.id,
AVG(r.mark) 'avg_mark'
FROM RESULTS r
GROUP BY r.id),
highest_average AS (
SELECT MAX(ga.avg_mark) 'highest_avg_mark'
FROM grade_average ga)
SELECT DISTINCT
s.name,
ga.avg_mark
FROM STUDENTS s
JOIN grade_average ga ON ha.id = s.id
JOIN highest_average ha ON ha.highest_avg_mark = ga.avg_mark
Non-CTE equivalent:
SELECT DISTINCT
s.name,
ga.avg_mark
FROM STUDENTS s
JOIN (SELECT r.id,
AVG(r.mark) 'avg_mark'
FROM RESULTS r
GROUP BY r.id) ga ON ha.id = s.id
JOIN SELECT MAX(ga.avg_mark) 'highest_avg_mark'
FROM (SELECT r.id,
AVG(r.mark) 'avg_mark'
FROM RESULTS r
GROUP BY r.id) ga) ha ON ha.highest_avg_mark = ga.avg_mark
If you're using a relatively new version of MS SQL server, you can use WITH to make this simple to write:
WITH T AS (
SELECT
name,
coalesce(avg(r.[mark]),0) as mark
FROM students s
LEFT JOIN results r ON s.[id]=r.[id]
GROUP BY s.[name])
SELECT name as 'ФИО', mark as 'Средний бал'
FROM T
WHERE T.mark = (SELECT MAX(mark) from T)
Is it as simple as this? For all versions of SQL Server 2000+
SELECT TOP 1 WITH TIES
name, ISNULL(avg(r.[mark]),0) as AvMark
FROM
students s
LEFT JOIN
results r ON s.[id]=r.[id]
GROUP BY
s.[name]
ORDER BY
ISNULL(avg(r.[mark]),0) DESC;
SELECT name as 'ФИО',
coalesce(avg(r.[mark]),0) as 'Средний бал'
FROM students s
LEFT JOIN results r
ON s.[id]=r.[id]
GROUP BY s.[name]
HAVING AVG(r.[mark]) >= 50
ORDER BY ISNULL(AVG(r.[mark]),0) DESC
about HAVING clause