Eliminating duplicate rows except one column with condition

Eliminating duplicate rows except one column with condition - sql

I am having trouble trying to find an appropriate query(SQL-SERVER) for selecting records with condition however, the table I will be using has more than 100,000 rows and more than 20 columns.
So I need a code that satisfies the following condition:
1.)If [policy] and [plan] column is unique between rows then I will select that record
2.)If [policy] and [plan] return 2 or more rows then I will select the record which 'code' column isn't 999
3.)In some cases the unwanted rows may not have '999' in [code] column but may be other specifics
In other words, I would like to get row number 1,2,4,5,7.
Here is an example of what the table looks like
row #|policy|plan|code
-----------------------
1 | a | aa |111
-----------------------
2 | b | bb |112
-----------------------
3 | b | bb |999
-----------------------
4 | c | cc |111
-----------------------
5 | c | cc |112
-----------------------
6 | c | cc |999
-----------------------
7 | d | dd |999
-----------------------
I'm expecting to see something like
row #|policy|plan|code
-----------------------
1 | a | aa |111
-----------------------
2 | b | bb |112
-----------------------
4 | c | cc |111
-----------------------
5 | c | cc |112
-----------------------
7 | d | dd |999
-----------------------
Thank you in advance

This sounds like a prioritization query. You an use row_number():
select t.*
from (select t.*,
row_number() over (partition by policy, plan
order by code
) as seqnum
from t
) t
where seqnum = 1;
The expected output makes this a bit clearer:
select t.*
from (select t.*,
rank() over (partition by policy, plan
order by (case when code = 999 then 1 else 2 end) desc
) as seqnum
from t
) t
where seqnum = 1;
The OP wants all codes that are not 999 unless the only codes are 999. So, another approach is:
select t.*
from t
where t.code <> 999
union all
select t.*
from t
where t.code = 999 and
not exists (select 1
from t t2
where t2.policy = t.policy and t2.plan = t.plan and
t2.code <> 999
);

May be you want this (eliminate the last row if more than one)?
select t.*
from (select t.*
, row_number() over (partition by policy, plan
order by code desc
) AS RN
, COUNT(*) over (partition by policy, plan) AS RC
from t
) t
where RN > 1 OR RN=RC;
Output:
row policy plan code RN RC
1 1 a aa 111 1 1
2 2 b bb 112 2 2
3 5 c cc 112 2 3
4 4 c cc 111 3 3
5 7 d dd 999 1 1

CREATE TABLE #Table2
([row] int, [policy] varchar(1), [plan] varchar(2), [code] int)
;
INSERT INTO #Table2
([row], [policy], [plan], [code])
VALUES
(1, 'a', 'aa', 111),
(2, 'b', 'bb', 112),
(3, 'b', 'bb', 999),
(4, 'c', 'cc', 111),
(5, 'c', 'cc', 112),
(6, 'c', 'cc', 999),
(7, 'd', 'dd', 999)
;
with cte
as
(
select *,
row_number() over (partition by policy, [plan]
order by code
) as seqnum
from #Table2
)
select [row], [policy], [plan], [code] from cte where seqnum=1

Related

Multiple counts on a single column SQL

I am currently running a query like the following:
SELECT a.ID, a.ContactID, a.Code,
FROM tableA a
JOIN (SELECT ContactID, Code
FROM tableA
WHERE ContactID IS NOT NULL
GROUP BY Code, ContactID
HAVING COUNT(Code) > 1) b
ON (a.Code = b.Code AND a.ContactID = b.ContactID)
WHERE a.ContactID IS NOT NULL
ORDER BY a.Code
This returns data that looks like the folloing:
table : a
+-------+-----------+-----------+
| ID | ContactID | Code |
+-------+-----------+-----------+
| 1 | 111 | abcd2 |
| 2 | 111 | abcd2 |
| 3 | 222 | abcd1 |
| 4 | 222 | abcd1 |
| 5 | 222 | abcd1 |
| 6 | 222 | abcd1 |
+-------+-----------+-----------+
So as you can see I get ContactID's that have more then one of the same Code.
The problem with this is, is that I don't want all this output (real table is much larger). I want a COUNT to go along side the Code column and just show one row for each iteration of Code. Like the following:
+-------+-----------+-----------+------+
| ID | ContactID | Code |COUNT |
+-------+-----------+-----------+------+
| 1 | 111 | abcd2 | 2 |
| 3 | 222 | abcd1 | 4 |
+-------+-----------+-----------+------+
Any help on this would be great and I hope I have explained my problem well enough. If not please ask for more information and if this has been answered before please point in that direction.
Thanks.

Your solution and other answers are way to complicated, you don't need the self join when you're simply aggregating with HAVING Count(x) > 1:
SELECT MIN(ID), ContactID, Code, COUNT(Code) AS [COUNT]
FROM tableA
WHERE ContactID IS NOT NULL
GROUP BY Code, ContactID
HAVING COUNT(Code) > 1
Full solution:
SQL Fiddle
CREATE TABLE TableA
([ID] int, [ContactID] int, [Code] varchar(5))
;
INSERT INTO TableA
([ID], [ContactID], [Code])
VALUES
(1, 111, 'abcd2'),
(2, 111, 'abcd2'),
(3, 222, 'abcd1'),
(4, 222, 'abcd1'),
(5, 222, 'abcd1'),
(6, 222, 'abcd1')
;
Query 1:
SELECT min(id), ContactID, Code, count(Code) as [COUNT]
FROM tableA
WHERE ContactID IS NOT NULL
GROUP BY Code, ContactID
HAVING COUNT(Code) > 1
Results:
| | ContactID | Code | |
|---|-----------|-------|---|
| 1 | 111 | abcd2 | 2 |
| 3 | 222 | abcd1 | 4 |

I would use exists instead of subquery :
select min(a.id) as id, a.ContactID, a.Code, count(*) as Cnt
from tableA a
where exists (select 1
from tableA a1
where a1.ContactID = a.ContactID and
a1.Code = a.Code and
a1.id <> a.id
)
group by a.ContactID, a.Code;

sub-query
select min(ID) as id, ContactID,Code,count(*) as cnt from
(SELECT a.ID, a.ContactID, a.Code
FROM tableA a
JOIN (SELECT ContactID, Code
FROM tableA
WHERE ContactID IS NOT NULL
GROUP BY Code, ContactID
HAVING COUNT(Code) > 1) b
ON (a.Code = b.Code AND a.ContactID = b.ContactID)
WHERE a.RetailContactID IS NOT NULL
ORDER BY a.Code
) t group ContactID,Code

;WITH CTE AS
(
SELECT a.ID, a.ContactID, a.Code,
FROM tableA a
JOIN (SELECT ContactID, Code
FROM tableA
WHERE ContactID IS NOT NULL
GROUP BY Code, ContactID
HAVING COUNT(Code) > 1) b
ON (a.Code = b.Code AND a.ContactID = b.ContactID)
WHERE a.RetailContactID IS NOT NULL
)
SELECT ID, ContactID, Code, COUNT(*) AS Cnt
FROM CTE
GROUP BY ID, ContactID, Code
ORDER BY 1, 2, 3

Extend your SQL query with one more grouping:
SELECT min(a.ID), a.ContactID, a.Code, count(*)
...
GROUP BY a.ContactID, a.Code
ORDER BY a.Code

Use group by in your select query
select x.ContactID, x.Code, [count] = count(x.id) from (
select id = 1 , ContactID = 111 , Code = 'abcd2' union all
select 2 , 111 , 'abcd2' union all
select 3 , 222 , 'abcd1' union all
select 4 , 222 , 'abcd1' union all
select 5 , 222 , 'abcd1' union all
select 6 , 222 , 'abcd1') x group by x.Code, x.ContactID

How to join two tables when there's no coincidence?

I have two tables that I want to join. I've tried an usual left and right join but neither gives the result I want.
TABLE A
ID_A VALUE_A
-----------------
A 1
B 2
TABLE B
ID_B ID_A VALUE_B
-------------------------
90 A 1
90 C 1
90 E 1
91 A 1
91 B 1
92 B 1
92 E 1
92 F 1
I want to get this result:
ID_A VALUE_A ID_B ID_A VALUE_B
-------------------------------------------------
A 1 90 A 1
B 2 90 NULL NULL
A 1 91 A 1
B 2 91 B 1
A 1 92 NULL NULL
B 2 92 B 1

If I understand correctly, you want all combinations of id_a and value_a from the first table along with all distinct id_b from the second table. If so:
select iv.id_a, iv.value_a, ib.id_b, b.id_a, b.value_b
from (select distinct id_a, value_a from a) iv cross join
(select distinct id_b from b) ib left join
b
on b.id_b = ib.id_b and b.id_a = iv.id_a;
The cross join generates the rows. The left join brings in the additional columns.

I usually break things like this down into CTEs:
DDL
use tempdb
CREATE TABLE Table1
([ID_A] varchar(1), [VALUE_A] int)
;
INSERT INTO Table1
([ID_A], [VALUE_A])
VALUES
('A', 1),
('B', 2)
;
CREATE TABLE Table2
([ID_B] int, [ID_A] varchar(1), [VALUE_B] int)
;
INSERT INTO Table2
([ID_B], [ID_A], [VALUE_B])
VALUES
(90, 'A', 1),
(90, 'C', 1),
(90, 'E', 1),
(91, 'A', 1),
(91, 'B', 1),
(92, 'B', 1),
(92, 'E', 1),
(92, 'F', 1)
;
Answer
with a as (
select distinct id_b
from Table2
),
b as (
select id_a, value_a, id_b
from Table1 cross join a
)
select b.id_a, b.value_a, b.id_b, t2.id_a, t2.value_b
from b left join Table2 t2
on b.id_a = t2.id_a
and b.id_b = t2.id_b
Results
+------+---------+------+------+---------+
| id_a | value_a | id_b | id_a | value_b |
+------+---------+------+------+---------+
| A | 1 | 90 | A | 1 |
| B | 2 | 90 | NULL | NULL |
| A | 1 | 91 | A | 1 |
| B | 2 | 91 | B | 1 |
| A | 1 | 92 | NULL | NULL |
| B | 2 | 92 | B | 1 |
+------+---------+------+------+---------+

I couldn't resolve the exact logic and couldn't match the results exactly as desired , but presume you'd like to get something like :
SELECT a.ID_A, COALESCE(a.VALUE_A,b.VALUE_B) VALUE_A, b.ID_B, a.ID_A,
(CASE WHEN a.ID_A IS NULL THEN a.ID_A ELSE CAST(b.VALUE_B as VARCHAR(1)) END)
as VALUE_B
FROM TABLE_A a FULL OUTER JOIN TABLE_B b
ON ( a.ID_A = b.ID_A )
GROUP BY a.ID_A, a.VALUE_A, b.ID_B, a.ID_A, b.VALUE_B
ORDER BY 3, 2, 1;
SQL Fiddle Demo

Try this:
SELECT A.ID_A , A.VALUE_A , B.ID_B , B.ID_A , B.VALUE_B
FROM TABLE_A A
LEFT OUTER JOIN TABLE_B B
ON A.ID_A = B.ID_A ;
EDIT: Typos corrected following sticky bit note (thanks!!).

SQL recursive id nodes

I have a table structure like so
Id Desc Node
---------------------
1 A
2 Aa 1
3 Ab 1
4 B
5 Bb 4
6 Bb1 5
these Desc values are presented in a listview to the user, if the user chooses Bb, I want the ID 5 and also the ID 4 becuase thats the root node of that entry, simular to that if the user chooses Bb1, I need ID 6, 5 and 4
I am only able to query one level up, but there could be n levels, so my query at the moment looks like this
SELECT Id
FROM tbl
WHERE Desc = 'Bb1'
OR Id = (SELECT Node FROM tbl WHERE Desc = 'Bb1');

You can do this with Recursive CTE like below
Schema:
CREATE TABLE #TAB (ID INT, DESCS VARCHAR(10), NODE INT)
INSERT INTO #TAB
SELECT 1 AS ID, 'A' DESCS, NULL NODE
UNION ALL
SELECT 2 , 'AA', 1
UNION ALL
SELECT 3, 'AB', 1
UNION ALL
SELECT 4, 'B', NULL
UNION ALL
SELECT 5, 'BB', 4
UNION ALL
SELECT 6, 'BB1', 5
Now do recursive CTE for picking node value and apply it again on #TAB with a Join.
;WITH CTE AS(
SELECT ID, DESCS, NODE FROM #TAB WHERE ID=6
UNION ALL
SELECT T.ID, T.DESCS, T.NODE FROM #TAB T
INNER JOIN CTE C ON T.ID = C.NODE
)
SELECT * FROM CTE
When you pass 6 to the first query in CTE, the result will be
+----+-------+------+
| ID | DESCS | NODE |
+----+-------+------+
| 6 | BB1 | 5 |
| 5 | BB | 4 |
| 4 | B | NULL |
+----+-------+------+

Count Top 5 Elements spread over rows and columns

Using T-SQL for this table:
+-----+------+------+------+-----+
| No. | Col1 | Col2 | Col3 | Age |
+-----+------+------+------+-----+
| 1 | e | a | o | 5 |
| 2 | f | b | a | 34 |
| 3 | a | NULL | b | 22 |
| 4 | b | c | a | 55 |
| 5 | b | a | b | 19 |
+-----+------+------+------+-----+
I need to count the TOP 3 names (Ordered by TotalCount DESC) across all rows and columns, for 3 Age groups: 0-17, 18-49, 50-100. Also, how do I ignore the NULLS from my results?
If it's possible, how I can also UNION the results for all 3 age groups into one output table to get 9 results (TOP 3 x 3 Age groups)?
Output for only 1 Age Group: 18-49 would look like this:
+------+------------+
| Name | TotalCount |
+------+------------+
| b | 4 |
| a | 3 |
| f | 1 |
+------+------------+

You need to unpivot first your table and then exclude the NULLs. Then do a simple COUNT(*):
WITH CteUnpivot(Name, Age) AS(
SELECT x.*
FROM tbl t
CROSS APPLY ( VALUES
(col1, Age),
(col2, Age),
(col3, Age)
) x(Name, Age)
WHERE x.Name IS NOT NULL
)
SELECT TOP 3
Name, COUNT(*) AS TotalCount
FROM CteUnpivot
WHERE Age BETWEEN 18 AND 49
GROUP BY Name
ORDER BY COUNT(*) DESC
ONLINE DEMO
If you want to get the TOP 3 for each age group:
WITH CteUnpivot(Name, Age) AS(
SELECT x.*
FROM tbl t
CROSS APPLY ( VALUES
(col1, Age),
(col2, Age),
(col3, Age)
) x(Name, Age)
WHERE x.Name IS NOT NULL
),
CteRn AS (
SELECT
AgeGroup =
CASE
WHEN Age BETWEEN 0 AND 17 THEN '0-17'
WHEN Age BETWEEN 18 AND 49 THEN '18-49'
WHEN Age BETWEEN 50 AND 100 THEN '50-100'
END,
Name,
COUNT(*) AS TotalCount
FROM CteUnpivot
GROUP BY
CASE
WHEN Age BETWEEN 0 AND 17 THEN '0-17'
WHEN Age BETWEEN 18 AND 49 THEN '18-49'
WHEN Age BETWEEN 50 AND 100 THEN '50-100'
END,
Name
)
SELECT
AgeGroup, Name, TotalCount
FROM(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY AgeGroup, Name ORDER BY TotalCount DESC)
FROM CteRn
) t
WHERE rn <= 3;
ONLINE DEMO
The unpivot technique using CROSS APPLY and VALUES:
An Alternative (Better?) Method to UNPIVOT (SQL Spackle) by Dwain Camps

You can check below multiple-CTE SQL select statement
Row_Number() with Partition By clause is used ordering records within each group categorized by ages
/*
CREATE TABLE tblAges(
[No] Int,
Col1 VarChar(10),
Col2 VarChar(10),
Col3 VarChar(10),
Age SmallInt
)
INSERT INTO tblAges VALUES
(1, 'e', 'a', 'o', 5),
(2, 'f', 'b', 'a', 34),
(3, 'a', NULL, 'b', 22),
(4, 'b', 'c', 'a', 55),
(5, 'b', 'a', 'b', 19);
*/
;with cte as (
select
col1 as col, Age
from tblAges
union all
select
col2, Age
from tblAges
union all
select
col3, Age
from tblAges
), cte2 as (
select
col,
case
when age < 18 then '0-17'
when age < 50 then '18-49'
else '50-100'
end as grup
from cte
where col is not null
), cte3 as (
select
grup,
col,
count(grup) cnt
from cte2
group by
grup,
col
)
select * from (
select
grup, col, cnt, ROW_NUMBER() over (partition by grup order by cnt desc) cnt_grp
from cte3
) t
where cnt_grp <= 3
order by grup, cnt

select rows where ID has variation of values

I have two columns, an ID, and the other a value which is either 0 or 1. I am trying to select all Rows for the ID where it has a 0 and a 1, for example,
RowNumber ------------- ID ------- value
1 ------------------- 001 ------- 1
2 ------------------- 001 ------- 1
3 ------------------- 001 ------- 1
4 ------------------- 002 ------- 1
5 ------------------- 002 ------- 0
6 ------------------- 003 ------- 1
7 ------------------- 003 ------- 1
8 --------------------004 ------- 1
9 -------------------- 004 ------- 0
10 ------------------- 004 ------- 1
The result should select rows 4, 5, 8, 9, 10

You can use window version of COUNT:
SELECT RowNumber, ID, value
FROM (
SELECT RowNumber, ID, value,
COUNT(CASE WHEN value = 1 THEN 1 END) OVER (PARTITION BY ID) AS cntOnes,
COUNT(CASE WHEN value = 0 THEN 1 END) OVER (PARTITION BY ID) AS cntZeroes
FROM test
WHERE value IN (0,1) ) AS t
WHERE cntOnes >= 1 AND cntZeroes >= 1
COUNT(DISTINCT value) has a value of 2 if both 0, 1 values exist within the same ID slice.

DISTINCT is indeed not allowed in a windowed version of the COUNT, so you can use MIN and MAX instead.
DECLARE #T TABLE(RN int, ID int, value int);
INSERT INTO #T (RN, ID, value) VALUES
(1, 001, 1),
(2, 001, 1),
(3, 001, 1),
(4, 002, 1),
(5, 002, 0),
(6, 003, 1),
(7, 003, 1),
(8, 004, 1),
(9, 004, 0),
(10, 004, 1);
WITH
CTE
AS
(
SELECT
RN, ID, value
,MIN(value) OVER (PARTITION BY ID) AS MinV
,MAX(value) OVER (PARTITION BY ID) AS MaxV
FROM #T AS T
)
SELECT RN, ID, value
FROM CTE
WHERE MinV <> MaxV
;
Result
+----+----+-------+
| RN | ID | value |
+----+----+-------+
| 4 | 2 | 1 |
| 5 | 2 | 0 |
| 8 | 4 | 1 |
| 9 | 4 | 0 |
| 10 | 4 | 1 |
+----+----+-------+

create table #shadowTemp (
RowNumber int not null,
Id char(3) not null,
value bit not null
)
insert into #shadowTemp values ( 1,'001', 0 )
insert into #shadowTemp values ( 2,'001', 1 )
insert into #shadowTemp values ( 3,'001', 1 )
insert into #shadowTemp values ( 4,'002', 0 )
insert into #shadowTemp values ( 5,'003', 0 )
insert into #shadowTemp values ( 6,'003', 1 )
select * from #shadowTemp;
;with cte ( Id ) As (
select Id
from #shadowTemp
group by Id
having sum( value + 1 ) >= 3
)
select a.*
from
#shadowTemp a
inner join cte b on ( a.Id = b.Id )
drop table #shadowTemp

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Eliminating duplicate rows except one column with condition - sql

Related

Multiple counts on a single column SQL

How to join two tables when there's no coincidence?

SQL recursive id nodes

Count Top 5 Elements spread over rows and columns

select rows where ID has variation of values

Categories

Resources