self joining sql without aggregation - sql

I have a table with the below structure:
ID EmployeeType Name
1 Contract John, Baxter
2 Contract Will, Smith
3 Full Josh, Stevens
4 Full Sitar, Zhang
All I need to do is Pivot it so I get the below output:
Contract_Employee FullTime_Employee
John, Baxter Josh, Stevens
Will,Smith Sitar, Zhang
Any idea how I can do this in one query?

That's kind of a funny request. Here's how I would do it:
(basically, just deriving a fake key to "join" on for two derived tables, on for contractors, one for employees)
CREATE TABLE #Table1
([ID] int, [EmployeeType] varchar(8), [Name] varchar(13))
;
INSERT INTO #Table1
([ID], [EmployeeType], [Name])
VALUES
(1, 'Contract', 'John, Baxter'),
(2, 'Contract', 'Will, Smith'),
(3, 'Full', 'Josh, Stevens'),
(4, 'Full', 'Sitar, Zhang'),
(5, 'Full','Bob, Bob'),
(6, 'Contract','Bob, Bob')
;
select
c.name as ContractEmployee,
f.name as FullTime_Employee
from
(
select
row_number() over (order by id) as RN,
name
from
#table1
where
employeetype = 'Contract'
) c
full join (
select
row_number() over (order by id) as RN,
name
from
#table1
where
employeetype = 'Full'
) f
on
c.name = f.name OR
c.rn = f.rn

One method of doing this is to use aggregation:
select max(case when employeetype = 'Contract' then Name end) as ContractEmployees,
max(case when employeetype = 'Full' then Name end) as FullEmployees
from (select t.*,
row_number() over (partition by employeetype order by id) as seqnum
from table t
) t
group by seqnum;

Related

How to fetch records from two tables without common column using CTE

Users table details
userid values (abc,xyz,abc,sdf)
master table details
(mid,priority)values(101,1),(102,2),(101,1),(103,1)
i need to count of mid based on userid (userid is names of users) group by priority(priority is int ) grouping like case priority =1 then 'Open', priority =2 then 'closed' etc using CTE(common table expression)
Select * from users
userid
abc
xyz
abc
sdf
Select * from master
mid Priority
101 1
102 2
101 1
103 1
(Priority 1= Open 2=Closed)
OUTPUT expected:
Userid count(mid) Priority
abc 2 Open
xyz 1 Closed
sdf 1 Open
Try this:
use db_test;
go
drop table dbo.users;
create table dbo.users
(
userid varchar(max) not null
)
;
insert into dbo.users
values
('abc'),
('xyz'),
('sdf')
create table dbo.master
(
mid int not null,
Priority int not null
)
;
insert into dbo.master
values
(101, 1),
(102, 2),
(101, 1),
(103, 1)
;
with cte1 as (
select userid, row_number() over(order by userid asc) as rn
from dbo.users
), cte2 as (
select mid, priority, dense_rank() over(order by mid asc) as rn
from dbo.master
)
select a.userid, count(*) as [count(mid)], b.priority
from cte1 a join cte2 b on a.rn = b.rn
group by a.userid, b.priority

How to select top 3 values from each group in a table with SQL which have duplicates [duplicate]

This question already has answers here:
Select top 10 records for each category
(14 answers)
Closed 5 years ago.
Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person.
The issue can be solved if there are no duplicates in the table by the query provided in this article Select top 3 values from each group in a table with SQL . But if there are duplicates, what is the solution?
For example, if for one name John, he has 5 values related to him. They are 20,7,7,7,4. I need to return the name/value pairs as below order by value descending for each name:
-----------+-------+
| name | value |
-----------+-------+
| John | 20 |
| John | 7 |
| John | 7 |
-----------+-------+
Only 3 rows should be returned for John even though there are three 7s for John.
In many modern DBMS (e.g. Postgres, Oracle, SQL-Server, DB2 and many others), the following will work just fine. It uses CTEs and ranking function ROW_NUMBER() which is part of the latest SQL standard:
WITH cte AS
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
)
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;
Without CTE, only ROW_NUMBER():
SELECT name, value, rn
FROM
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
) tmp
WHERE rn <= 3
ORDER BY name, rn ;
Tested in:
Postgres
Oracle
SQL-Server
In MySQL and other DBMS that do not have ranking functions, one has to use either derived tables, correlated subqueries or self-joins with GROUP BY.
The (tid) is assumed to be the primary key of the table:
SELECT t.tid, t.name, t.value, -- self join and GROUP BY
COUNT(*) AS rn
FROM t
JOIN t AS t2
ON t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;
SELECT t.tid, t.name, t.value, rn
FROM
( SELECT t.tid, t.name, t.value,
( SELECT COUNT(*) -- inline, correlated subquery
FROM t AS t2
WHERE t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
) AS rn
FROM t
) AS t
WHERE rn <= 3
ORDER BY name, rn ;
Tested in MySQL
I was going to downvote the question. However, I realized that it might really be asking for a cross-database solution.
Assuming you are looking for a database independent way to do this, the only way I can think of uses correlated subqueries (or non-equijoins). Here is an example:
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3)
However, each database that you mention (and I note, Hadoop is not a database) has a better way of doing this. Unfortunately, none of them are standard SQL.
Here is an example of it working in SQL Server:
with t as (
select 1 as personid, 5 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 7 as val union all
select 1 as personid, 8 as val
)
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3);
Using GROUP_CONCAT and FIND_IN_SET you can do that.Check SQLFIDDLE.
SELECT *
FROM tbl t
WHERE FIND_IN_SET(t.value,(SELECT
SUBSTRING_INDEX(GROUP_CONCAT(t1.value ORDER BY VALUE DESC),',',3)
FROM tbl t1
WHERE t1.name = t.name
GROUP BY t1.name)) > 0
ORDER BY t.name,t.value desc
If your result set is not so heavy, you can write a stored procedure (or an anonymous PL/SQL-block) for that problem which iterates the result set and finds the bigges three by a simple comparing algorithm.
Try this -
CREATE TABLE #list ([name] [varchar](100) NOT NULL, [value] [int] NOT NULL)
INSERT INTO #list VALUES ('John', 20), ('John', 7), ('John', 7), ('John', 7), ('John', 4);
WITH cte
AS (
SELECT NAME
,value
,ROW_NUMBER() OVER (
PARTITION BY NAME ORDER BY (value) DESC
) RN
FROM #list
)
SELECT NAME
,value
FROM cte
WHERE RN < 4
ORDER BY value DESC
This works for MS SQL. Should be workable in any other SQL dialect that has the ability to assign row numbers in a group by or over clause (or equivelant)
if object_id('tempdb..#Data') is not null drop table #Data;
GO
create table #data (name varchar(25), value integer);
GO
set nocount on;
insert into #data values ('John', 20);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 5);
insert into #data values ('Jack', 5);
insert into #data values ('Jane', 30);
insert into #data values ('Jane', 21);
insert into #data values ('John', 5);
insert into #data values ('John', -1);
insert into #data values ('John', -1);
insert into #data values ('Jane', 18);
set nocount off;
GO
with D as (
SELECT
name
,Value
,row_number() over (partition by name order by value desc) rn
From
#Data
)
SELECT Name, Value
FROM D
WHERE RN <= 3
order by Name, Value Desc
Name Value
Jack 5
Jane 30
Jane 21
Jane 18
John 20
John 7
John 7

SQL cross apply

I have a SQL table which contains audit information:
GroupId AuditDate ID FirstName LastName
1 01/06/2011 123 Michael Jackson
1 01/09/2010 123 M J
1 01/06/2009 123 Mike J
and trying show the differences between the audit records:
GroupId AuditDate ID Attribute From To
1 01/06/2011 123 FirstName M Michael
1 01/06/2011 123 LastName J Jackson
1 01/09/2010 123 FirstName Mike M
1 01/06/2009 123 FirstName NULL Mike
1 01/06/2009 123 LastName NULL J
I am using the following SQL query:
WITH result AS (
SELECT [Current].Id,
[Current].GroupId,
[Current].AuditDate,
[Current].FirstName,
[Current].LastName
Previous.FirstName AS PFirstName,
Previous.LastName AS PLastName,
FROM
(SELECT
*, ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY AuditDate ASC) AS RowNumber
FROM
AuditTable
WHERE
Id = #ID
) AS [Current]
LEFT JOIN
(SELECT
*, ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY AuditDate ASC) AS RowNumber
FROM
AuditTable
WHERE
Id = #ID
) AS [Previous]
ON
[Current].RowNumber = [Previous].RowNumber + 1
)
SELECT r.Id,r.GroupId, r.AuditDate
x.Attribute,
x.[From],
x.[To]
FROM result r
CROSS APPLY
(
VALUES
('FirstName', t.FirstName, t.PFirstName),
('LastName', t.LastName, t.PLastName),
) x (Attribute, [To], [From])
where
ISNULL(x.[From],'') <> ISNULL(x.[To],'')
ORDER BY r.AuditDate asc;
Is it possible to merge two select queries to improve the performance?
Try this query
WITH result AS (
SELECT Id,
GroupId,
AuditDate,
FirstName,
LastName,
ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY AuditDate ASC) AS RowNumber
FROM AuditTable
WHERE Id = #ID
)
SELECT r.Id,r.GroupId, r.AuditDate,
x.Attribute,
x.[From],
x.[To]
FROM result r LEFT JOIN result r2 ON r.RowNumber = r2.RowNumber + 1
CROSS APPLY (
VALUES ('FirstName', r.FirstName, r2.FirstName),
('LastName', r.LastName, r2.LastName)
) x (Attribute, [To], [From])
WHERE ISNULL(x.[From],'') <> ISNULL(x.[To],'')
ORDER BY r.AuditDate ASC;
Demo on SQLFiddle
You can eliminate both subqueries entirely by using lag():
WITH result AS (
SELECT Id,
GroupId,
AuditDate,
FirstName,
LastName,
lag(FirstName) over (PARTITION BY GroupId ORDER BY AuditDate ASC)
AS PFirstName,
lag(LastName) over (PARTITION BY GroupId ORDER BY AuditDate ASC)
AS PLastName
FROM AuditTable
WHERE Id = #ID
)
...
Here is the relevant documentation.
Update: However, this is only available in SQL Server 2012, unfortunately. If you have an earlier version, you will need some sort of self join.
If you can't use lag(), you should at least be able to reduce your code from 3 queries to 2: include the row number in your first select statement, and left join one subquery to it rather than having two subqueries. I'm not sure whether this way or Chris Moutray's way would be faster.
WITH result AS (
SELECT ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY AuditDate ASC) AS RowNumber
[Current].Id,
[Current].GroupId,
[Current].AuditDate,
[Current].FirstName,
[Current].LastName
[Previous].FirstName AS PFirstName,
[Previous].LastName AS PLastName,
FROM AuditTable as [Current]
LEFT JOIN
(SELECT
*, ROW_NUMBER() OVER(PARTITION BY GroupId ORDER BY AuditDate ASC) AS RowNumber
FROM
AuditTable
WHERE
Id = #ID
) AS [Previous]
ON
[Current].RowNumber = [Previous].RowNumber + 1
)
You can use LAG in SQL Server 2012. I've used UNION ALL here to unpivot columns into rows.
Depending on how you filter and what your group level is, add/modify the PARTITION BY
DECLARE #foo TABLE (GroupId tinyint, AuditDate date, ID tinyint, FirstName varchar(100), LastName varchar(100));
INSERT #foo VALUES (1, '20110601', 123, 'Michael', 'Jackson'), (1, '20100901', 123, 'M', 'J'), (1, '20090601', 123, 'Mike', 'J');
SELECT
X.GroupId, X.AuditDate, X.ID, X.[From], X.[To]
FROM
(
SELECT
F.GroupId, F.AuditDate, F.ID, 'FirstName' AS Attribute, LAG(F.FirstName) OVER (/*PARTITION BY GroupId, ID*/ ORDER BY AuditDate) AS [From], F.FirstName AS [To]
FROM
#foo F
UNION ALL
SELECT
F.GroupId, F.AuditDate, F.ID, 'LastName' AS Attribute, LAG(F.LastName) OVER (/*PARTITION BY GroupId, ID*/ ORDER BY AuditDate) AS [From], F.LastName AS [To]
FROM
#foo F
) X
WHERE
ISNULL(X.[From], '') <> ISNULL(X.[To], '')
ORDER BY
X.AuditDate DESC, X.Attribute

retrieve the most recent record for each customer

I have this data:
ID NAME DATE
3 JOHN 2011-08-08
2 YOKO 2010-07-07
1 JOHN 2009-06-06
Code (for SQL Server 2005):
DECLARE #TESTABLE TABLE (id int, name char(4), date smalldatetime)
INSERT INTO #TESTABLE VALUES (3, 'JOHN', '2011-08-08')
INSERT INTO #TESTABLE VALUES (2, 'YOKO', '2010-07-07')
INSERT INTO #TESTABLE VALUES (1, 'JOHN', '2009-06-06')
I want to get, for each NAME, the ID that has the most recent DATE. Like this:
3 JOHN 2011-08-08
2 YOKO 2010-07-07
What is the most elegant way of accomplishing this?
;WITH x AS
(
SELECT ID, NAME, [DATE],
rn = ROW_NUMBER() OVER
(PARTITION BY NAME ORDER BY [DATE] DESC)
FROM #TESTABLE
)
SELECT ID, NAME, [DATE] FROM x WHERE rn = 1
ORDER BY [DATE] DESC;
Try to avoid reserved words (and vague column names) like [DATE]...
SELECT <fields>
FROM SourceTable st
INNER JOIN (SELECT name, MAX(Datefield) as Datefield
FROM SourceTable
GROUP BY name) x
ON x.Name = st.name
AND x.datefield = st.datefield
below is a possible solution:
Select c.CustomerID, c.CustomerName, c.CustomerOrder, c.CustomerOrderDate, c.CustomerQty
from tblCustomer c
inner join (select c2.CustomerName, MAX(c2.CustomerOrderDate) as MaxDate from tblCustomer c2 group by c2.CustomerName) c2
on c.CustomerName = c2.CustomerName
where c.CustomerOrderDate = c2.MaxDate

SQL Server Distinct Question

I need to be able to select only the first row for each name that has the greatest value.
I have a table with the following:
id name value
0 JOHN 123
1 STEVE 125
2 JOHN 127
3 JOHN 126
So I am looking to return:
id name value
1 STEVE 125
2 JOHN 127
Any idea on the MSSQL Syntax on how to perform this operation?
While you specified SQL Server, you did not specify the version. If you are using SQL Server 2005 or later, you can do something like:
With RankedItems As
(
Select id, name, value
, Row_Number() Over ( Partition By name Order By value Desc, id Asc ) As ItemRank
From Table
)
Select id, name, value
From RankedItems
Where ItemRank = 1
try:
SELECT
MIN(id) as id,dt.name,dt.value
FROM (SELECT
name,MAX(value) as value
FROM YourTable
GROUP BY name
) dt
INNER JOIN YourTable t ON dt.name=t.name and dt.value=t.value
GROUP BY dt.name,dt.value
try it out:
DECLARE #YourTable table (id int, name varchar(10), value int)
INSERT #YourTable VALUES (0, 'JOHN', 123)
INSERT #YourTable VALUES (1, 'STEVE', 125)
INSERT #YourTable VALUES (2, 'JOHN', 127)
INSERT #YourTable VALUES (3, 'JOHN', 126)
--extra data not in the question, shows why you need the outer group by
INSERT #YourTable VALUES (4, 'JOHN', 127)
INSERT #YourTable VALUES (5, 'JOHN', 127)
INSERT #YourTable VALUES (6, 'JOHN', 127)
INSERT #YourTable VALUES (7, 'JOHN', 127)
SELECT
MIN(id) as id,dt.name,dt.value
FROM (SELECT
name,MAX(value) as value
FROM #YourTable
GROUP BY name
) dt
INNER JOIN #YourTable t ON dt.name=t.name and dt.value=t.value
GROUP BY dt.name,dt.value
ORDER BY id
output:
id name value
----------- ---------- -----------
1 STEVE 125
2 JOHN 127
(2 row(s) affected)
You could do something like
SELECT id, name, value
FROM (SELECT id, name, value
ROWNUMBER() OVER (PARTITION BY name ORDER BY value DESC) AS r
FROM table) AS x
WHERE x.r = 1 ;
This will not work in SQL Server 2000 and earlier, but it will be incredibly fast in SQL Server 2005 and 2008
How about:
SELECT a.id, a.name, b.maxvalue
FROM mytbl a
INNER JOIN (SELECT id, max(value) as maxvalue
FROM mytbl
GROUP BY id) b ON b.id = a.id
SELECT a.id, a.name, a.value
FROM mytbl a
INNER JOIN (SELECT name, max(value) as maxvalue
FROM mytbl
GROUP BY name) b ON b.name = a.name and b.maxvalue = a.value