Remove duplicate rows as an additional column

Remove duplicate rows as an additional column - sql

I have a sql table for student records and I have some duplicate rows for the student dimension cause of the major, so now I have something like this:
ID Major
----------
1 CS
1 Mgt
What I want is to combine this two rows in this form:
ID Major Major2
----------
1 CS Mgt

You need a number for pivoting. Then you can pivot using either pivot or conditional aggregation:
select id,
max(case when seqnum = 1 then major end) as major_1,
max(case when seqnum = 2 then major end) as major_2
from (select t.*,
row_number() over (partition by id order by (select null)) as seqnum
from t
) t
group by id;
Note: you should validate that "2" is large enough to count the majors. You can get the maximum using:
select top 1 id, count(*)
from t
group by id
order by count(*) desc;

If you have at most two different values of major:
select a.id as id,
a.major as major,
b.major as major2
from YOUR_TABLE a
left join YOUR_TABLE b on
a.id = b.id
and (b.major is null or a.major > b.major)

This will help you
Select
ID,
(select top 1 Major from <Your_Table> where id=T.Id order by Major) Major,
(case when count(Id)>1 then (select top 1 Major from #temp where id=T.Id order by Major desc) else null end) Major2
from <Your_Table> T
Group By
ID

You can use pivot function directly
SELECT [ID],[CS] AS Major , [Mgt] AS Major2 from Your_Table_Name
PIVOT
(max(Major)for [Major] IN ([CS] , [Mgt]))as p

Related

Identify duplicates rows based on multiple columns

#SQL Experts,
I am trying to fetch duplicate records from SQL table where 1st Column and 2nd Column values are same but 3rd column values should be different.
Below is my table
ID NAME DEPT
--------------------
1 VRK CSE
1 VRK ECE
2 AME MEC
3 BMS CVL
From the above table , i am trying to fetch first 2 rows, below is the Query, suggest me why isn't give correct results.
SELECT A.ID, A.NAME, A.DEPT
FROM TBL A
INNER JOIN TBL B ON A.ID = B.ID
AND A.NAME = B.NAME
AND A.DEPT <> B.DEPT
Somehow I am not getting the expected results.

Your sample data does not make it completely clear what you want here. Assuming you want to target groups of records having duplicate first/second columns with all third column values being unique, then we may try:
SELECT ID, NAME, DEPT
FROM
(
SELECT ID, NAME, DEPT,
COUNT(*) OVER (PARTITION BY ID, NAME) cnt,
MIN(DEPT) OVER (PARTITION BY ID, NAME) min_dept,
MAX(DEPT) OVER (PARTITION BY ID, NAME) max_dept
FROM yourTable
) t
WHERE cnt > 1 AND min_dept = max_dept;

UPDATE
select *
from
(
select *,
COUNT(*) over (partition by id, [name]) cnt1,
COUNT(*) over (partition by id, [name], dept) cnt2
from dbo.T
) x
where x.cnt1 > 1 and x.cnt2 < x.cnt1;

For find duplicate column
select x.id, x.name, count(*)
from
(select distinct a.id, a.name, a.dept
from tab a) x
group by x.id, x.name
having count(*) > 1

If you want the original rows, I would just go for exists:
select t.*
from tbl t
where exists (select 1
from tbl t
where t2.id = t.id and t2.name = t.name and
t2.dept <> t.dept
);
If you just want the id/name pairs:
select t.id, t.name
from tbl t
group by t.id, t.name
having min(t.dept) <> max(t.dept);

Find the latest 3 records with the same status

I need to find the latest 3 records for each user that has a particular status on 'Fail'. At first it seems easy but I just can't seem to get it right.
So in a table of:
ID Date Status
1 2017-01-01 Fail
1 2017-01-02 Fail
1 2017-02-04 Fail
1 2015-03-21 Pass
1 2014-02-19 Fail
1 2016-10-23 Pass
2 2017-01-01 Fail
2 2017-01-02 Pass
2 2017-02-04 Fail
2 2016-10-23 Fail
I would expect ID 1 to be returned as the most recent 3 records are fails, but not ID 2, as they have a pass within their three fails. Each user may have any number of Pass and Fail records. There are thousands of different IDs
So far I've tried a CTE with ROW_NUMBER() to order the attempts but can't think of a way to ensure that the latest three results all have the same status of Fail.
Expected Results
ID Latest Fail Date Count
1 2017-02-04 3

Maybe try something like this:
WITH cte
AS
(
SELECT id,
date,
status,
ROW_NUMBER () OVER (PARTITION BY id ORDER BY date DESC) row
FROM #table
),cte2
AS
(
SELECT id, max(date) as date, count(*) AS count
FROM cte
WHERE status = 'fail'
AND row <= 3
GROUP BY id
)
SELECT id,
date AS latest_fail,
count
FROM cte2
WHERE count = 3

Check This.
Demo : Here
with CTE as
(
select *,ROW_NUMBER () over( partition by id order by date desc) rnk
from temp
where Status ='Fail'
)
select top 1 ID,max(DATE) as Latest_Fail_Date ,COUNT(rnk) as count
from CTE where rnk <=3
group by ID
Ouptut :

I think you can do this using cross apply:
select i.id
from (select distinct id from t) i cross apply
(select sum(case when t.status = 'Fail' then 1 else 0 end) as numFails
from (select top 3 t.*
from t
where t.id = i.id
order by date desc
) ti
) ti
where numFails = 3;
Note: You probably have a table with all the ids. If so, you an use that instead of the select distinct subquery.
Or, similarly:
select i.id
from (select distinct id from t) i cross apply
(select top 3 t.*
from t
where t.id = i.id
order by date desc
) ti
group by i.id
having min(ti.status) = 'Fail' and max(ti.status) = 'Fail' and
count(*) = 3;

Here you go:
declare #numOfTries int = 3;
with fails_nums as
(
select *, row_number() over (partition by ID order by [Date] desc) as rn
from #fails
)
select ID, max([Date]) [Date], count(*) as [count]
from fails_nums fn1
where fn1.rn <= #numOftries
group by ID
having count(case when [Status]='Fail' then [Status] end) = #numOfTries
Example here

Teradata SQL: Max (greatest), 2nd and 3rd greatest column names

I have a table with 6 columns in Teradata as follows:
ID Feature1 Feature2 Feature3 Feature4 Feature5
1 12 15 1 22 350
2 121 0.9 999 756 879
...
I need to get the column names for the greatest, 2nd greatest and 3rd greatest values per row, so, I need output that looks like this:
ID Greatest 2nd_Greatest 3rd_Greatest
1 Feature5 Feature4 Feature2
2 Feature3 Feature5 Feature4
Can someone help please.
Thank you!

You can do this with a massive case statement, which gets even more complicated if any of the values are NULL. That would be the fastest way, though.
The easiest method might be to unpivot the data and re-summarize it:
select id,
max(case when seqnum = 1 then feature end) as greatest_feature,
max(case when seqnum = 2 then feature end) as greatest_feature2,
max(case when seqnum = 3 then feature end) as greatest_feature3,
max(case when seqnum = 1 then which end) as which_1,
max(case when seqnum = 2 then which end) as which_2,
max(case when seqnum = 3 then which end) as which_3
from (select id, feature, row_number() over (partition by id order by feature desc) as serqnum
from ((select id, feature1 as feature, 'feature1' as which from table) union all
(select id, feature2 as feature, 'feature2' as which from table) union all
(select id, feature3 as feature, 'feature3' as which from table) union all
(select id, feature4 as feature, 'feature4' as which from table) union all
(select id, feature5 as feature, 'feature5' as which from table) union all
(select id, feature6 as feature, 'feature6' as which from table)
) t
) t
group by id;

Refining Gordon's query:
Instead of several passes over the source table for those UNIONs you can create a list of features and then cross join it:
SELECT t.id, f.feature,
CASE f.feature
WHEN 'feature1' THEN t.feature1
WHEN 'feature2' THEN t.feature2
WHEN 'feature3' THEN t.feature3
WHEN 'feature4' THEN t.feature4
WHEN 'feature5' THEN t.feature5
END AS val
FROM tab AS t CROSS JOIN
(
SELECT * FROM (SELECT 'feature1' AS feature) AS dt
UNION ALL
SELECT * FROM (SELECT 'feature2' AS feature) AS dt
UNION ALL
SELECT * FROM (SELECT 'feature3' AS feature) AS dt
UNION ALL
SELECT * FROM (SELECT 'feature4' AS feature) AS dt
UNION ALL
SELECT * FROM (SELECT 'feature5' AS feature) AS dt
) AS f
You can create the list on the fly like above using UNIONs or as a real table.
Starting with TD14.10 there's also a TD_UNPIVOT table operator (but still no PIVOT):
SELECT *
FROM TD_UNPIVOT
(
ON (SELECT id, feature1, feature2, feature3, feature4, feature5 FROM tab)
USING
VALUE_COLUMNS('val')
UNPIVOT_COLUMN('feature')
COLUMN_LIST('feature1', 'feature2', 'feature3', 'feature4', 'feature5')
) AS dt
Also starting with TD14.10 there's LAST_VALUE which can be used for finding nth-greatest value together with the ROW_NUMBER, thus avoiding the final aggregation:
SELECT id,
feature AS "Greatest",
LAST_VALUE(feature)
OVER (PARTITION BY id ORDER BY val DESC
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS "2nd_Greatest",
LAST_VALUE(feature)
OVER (PARTITION BY id ORDER BY val DESC
ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING) AS "3rd_Greatest"
FROM TD_UNPIVOT
(
ON (SELECT id, feature1, feature2, feature3, feature4, feature5 FROM tab)
USING
VALUE_COLUMNS('val')
UNPIVOT_COLUMN('feature')
COLUMN_LIST('feature1', 'feature2', 'feature3', 'feature4', 'feature5')
) AS dt
QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY val DESC) = 1;

Duplicate Counts - TSQL

I want to get All records that has duplicate values for SOME of the fields (i.e. Key columns).
My code:
CREATE TABLE #TEMP (ID int, Descp varchar(5), Extra varchar(6))
INSERT INTO #Temp
SELECT 1,'One','Extra1'
UNION ALL
SELECT 2,'Two','Extra2'
UNION ALL
SELECT 3,'Three','Extra3'
UNION ALL
SELECT 1,'One','Extra4'
SELECT ID, Descp, Extra FROM #TEMP
;WITH Temp_CTE AS
(SELECT *
, ROW_NUMBER() OVER (PARTITION BY ID, Descp ORDER BY (SELECT 0))
AS DuplicateRowNumber
FROM #TEMP
)
SELECT * FROM Temp_cte
DROP TABLE #TEMP
The last column tells me how many times each row has appeared based on ID and Descp values.
I want that row but I ALSO need another column* that indicates both rows for ID = 1 and Descp = 'One' has showed up more than once.
So an extra column* (i.e. MultipleOccurances (bool)) which has 1 for two rows with ID = 1 and Descp = 'One' and 0 for other rows as they are only showing up once.
How can I achieve that? (I want to avoid using Count(1)>1 or something if possible.
Edit:
Desired output:
ID Descp Extra DuplicateRowNumber IsMultiple
1 One Extra1 1 1
1 One Extra4 2 1
2 Two Extra2 1 0
3 Three Extra3 1 0
SQL Fiddle

You say "I want to avoid using Count" but it is probably the best way. It uses the partitioning you already have on the row_number
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID, Descp
ORDER BY (SELECT 0)) AS DuplicateRowNumber,
CASE
WHEN COUNT(*) OVER (PARTITION BY ID, Descp) > 1 THEN 1
ELSE 0
END AS IsMultiple
FROM #Temp
And the execution plan just shows a single sort

Well, I have this solution, but using a Count...
SELECT T1.*,
ROW_NUMBER() OVER (PARTITION BY T1.ID, T1.Descp ORDER BY (SELECT 0)) AS DuplicateRowNumber,
CASE WHEN T2.C = 1 THEN 0 ELSE 1 END MultipleOcurrences FROM #temp T1
INNER JOIN
(SELECT ID, Descp, COUNT(1) C FROM #TEMP GROUP BY ID, Descp) T2
ON T1.ID = T2.ID AND T1.Descp = T2.Descp

SQL query: how to distinct count of a column group by another column

In my table I need to know if each ID has one and only one ID_name. How can I write such query?
I tried:
select ID, count(distinct ID_name) as count_name
from table
group by ID
having count_name > 1
But it takes forever to run.
Any thoughts?

select ID
from YourTable
group by
ID
having count(distinct ID_name) > 1
or
select *
from YourTable yt1
where exists
(
select *
from YourTable yt2
where yt1.ID = yt2.ID
and yt1.ID_Name <> yt2.ID_Name
)
Now, most ID columns are defined as primary key and are unique. So in a regular database you'd expect both queries to return an empty set.

select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_Number() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
group by tt.ID
This gives you every ID with it's total number of ID_Name
If you want only those ID's which have more than one name associated just add a where clause
e.g.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_NUMBER() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
**where tt.myRank > 1**
group by tt.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove duplicate rows as an additional column - sql

I have a sql table for student records and I have some duplicate rows for the student dimension cause of the major, so now I have something like this: ID Major ---------- 1 CS 1 Mgt What I want is to combine this two rows in this form: ID Major Major2 ---------- 1 CS Mgt

If you have at most two different values of major: select a.id as id, a.major as major, b.major as major2 from YOUR_TABLE a left join YOUR_TABLE b on a.id = b.id and (b.major is null or a.major > b.major)

This will help you Select ID, (select top 1 Major from <Your_Table> where id=T.Id order by Major) Major, (case when count(Id)>1 then (select top 1 Major from #temp where id=T.Id order by Major desc) else null end) Major2 from <Your_Table> T Group By ID

You can use pivot function directly SELECT [ID],[CS] AS Major , [Mgt] AS Major2 from Your_Table_Name PIVOT (max(Major)for [Major] IN ([CS] , [Mgt]))as p

Related

Identify duplicates rows based on multiple columns

Find the latest 3 records with the same status

Teradata SQL: Max (greatest), 2nd and 3rd greatest column names

Duplicate Counts - TSQL

SQL query: how to distinct count of a column group by another column

Categories

Resources