Filter unique records from a database while removing double not-null values

Filter unique records from a database while removing double not-null values - sql

This is kind of hard to explain in words but here is an example of what I am trying to do in SQL. I have a query which returns the following records:
ID Z
--- ---
1 A
1 <null>
2 B
2 E
3 D
4 <null>
4 F
5 <null>
I need to filter this query so that each unique record (based on ID) appears only once in the output and if there are multiple records for the same ID, the output should contain the record with the value of Z column being non-null. If there is only a single record for a given ID and it has value of null for column Z the output still should return that record. So the output from the above query should look like this:
ID Z
--- ---
1 A
2 B
2 E
3 D
4 F
5 <null>
How would you do this in SQL?

You can use GROUP BY for that:
SELECT
ID, MAX(Z) -- Could be MIN(Z)
FROM MyTable
GROUP BY ID
Aggregate functions ignore NULLs, returning them only when all values on the group are NULL.

If you need to return both 2-B and 2-E rows:
SELECT *
FROM YourTable t1
WHERE Z IS NOT NULL
OR NOT EXISTS
(SELECT * FROM YourTable t2
WHERE T2.ID = T1.id AND T2.z IS NOT NULL)

SELECT ID
,Z
FROM YourTable
WHERE Z IS NOT NULL

DECLARE #T TABLE ( ID INT, Z CHAR(1) )
INSERT INTO #T
( ID, Z )
VALUES ( 1, 'A' ),
( 1, NULL )
, ( 2, 'B' ) ,
( 2, 'E' ),
( 3, 'D' ) ,
( 4, NULL ),
( 4, 'F' ),
( 5, NULL )
SELECT *
FROM #T
; WITH c AS (SELECT ID, r=COUNT(*) FROM #T GROUP BY ID)
SELECT t.ID, Z
FROM #T t JOIN c ON t.ID = c.ID
WHERE c.r =1
UNION ALL
SELECT t.ID, Z
FROM #T t JOIN c ON t.ID = c.ID
WHERE c.r >=2
AND z IS NOT NULL
This example assumes you want two rows returned for ID = 2.

with tmp (id, cnt_val) as
(select id,
sum(case when z is not null then 1 else 0 end)
from t
group by id)
select t.id, t.z
from t
inner join tmp on t.id = tmp.id
where tmp.cnt_val > 0 and t.z is not null
or tmp.cnt_val = 0 and t.z is null

WITH CTE
AS (
SELECT id
,z
,ROW_NUMBER() OVER (
PARTITION BY id ORDER BY coalesce(z, '') DESC
) rn
FROM #T
)
SELECT id
,z
FROM CTE
WHERE rn = 1

Related

Case when duplicate add one more letter

For example: I have a table with these records below
1 A
2 A
3 B
4 C
...
and I need to migrate these record in to another table
1 AA
2 AB
3 B
4 C
...
Meaning if the record is duplicate, it will automatically add one more letter alphabetically.

Just a slightly different approach
Example
Declare #YourTable Table (ID int,[SomeCol] varchar(50))
Insert Into #YourTable Values
(1,'A')
,(2,'A')
,(3,'B')
,(4,'C')
Select *
,NewVal = concat(SomeCol,IIF(sum(1) over (partition by SomeCol)=1,'',char(64+row_number() over ( partition by SomeCol order by ID ))) )
From #YourTable
Returns
ID SomeCol NewVal
1 A AA
2 A AB
3 B B
4 C C
EDIT - Requested UPDATE
Declare #YourTable Table (ID int,[SomeCol] varchar(50))
Insert Into #YourTable Values
(1,'A')
,(2,'A')
,(3,'B')
,(4,'C')
Select *
,NewVal = concat(SomeCol,IIF(sum(1) over (partition by SomeCol)=1,'',replace(char(63+row_number() over ( partition by SomeCol order by ID )),'#','')) )
From #YourTable
Returns
ID SomeCol NewVal
1 A A
2 A AA
3 B B
4 C C

We might be able to handle this requirement with the help of a calendar table mapping secondary letters to duplicate sequence counts:
WITH letters AS (
SELECT 1 AS seq, 'A' AS let UNION ALL
SELECT 2, 'B' UNION ALL
SELECT 3, 'C' UNION ALL
...
SELECT 26, 'Z' UNION ALL
...
),
cte AS (
SELECT id, let, ROW_NUMBER() OVER (PARTITION BY let ORDER BY id) rn,
COUNT(*) OVER (PARTITION BY let) cnt
FROM yourTable
)
SELECT t1.id, t1.let + CASE WHEN t1.cnt > 1 THEN t2.let ELSE '' END AS let
FROM cte t1
LEFT JOIN letters t2
ON t1.id = t2.seq
ORDER BY t1.id;
Demo

Count pair-wise occurrences in a T-SQL table

How can I count pair-wise occurrences in a SQL Server table? Please note that the order of the given sequence has to be accounted for and shouldn't be changed.
Original table:
1 2 3 4
--------
1 | A A A B
2 | A # don't count
3 | B A A
4 | B # don't count
Result:
1 | AA = 3
2 | AB = 1
3 | BB = 0
4 | BA = 1
In addition, the code has to work for large datasets.
Edit:
A pair in this context is a set of two values {x[ij], x[(i+1)j]}, where i=1,...,4 and j=1,...,4. Further, pairs that have the form A null or B null shouldn't be counted. Moreover, null A or null B can't happen, therefore they don't have to be accounted for.

I just want to point out a pretty easy way to express this logic:
with vals as (
select 'A' as val union all select 'B'
)
pairs as (
select t1.val as val1, t2.val as val2
from vals t1 cross join vals t2
)
select p.*,
(select count(*)
from original
where [1] = val1 and [2] = val2 or
[2] = val1 and [3] = val2 or
[3] = val1 and [4] = val2
) as cnt
from pairs p
order by cnt desc;
This doesn't have great performance characteristics, that is actually easily fixed by using three subqueries and indexes on the data columns.

LiveDemo
CREATE TABLE #tab([1] NVARCHAR(100), [2] NVARCHAR(100),
[3] NVARCHAR(100), [4] NVARCHAR(100));
INSERT INTO #tab
VALUES ('A', 'A', 'A', 'B') ,('A' , NULL ,NULL ,NULL )
,('B' ,'A' ,'A', NULL),('B', NULL, NULL, NULL);
WITH cte AS
(
SELECT pair = [1] + [2] FROM #tab
UNION ALL
SELECT pair = [2] + [3] FROM #tab
UNION ALL
SELECT pair = [3] + [4] FROM #tab
), cte2 AS
(
SELECT [1] AS val FROM #tab
UNION ALL SELECT [2] FROM #tab
UNION ALL SELECT [3] FROM #tab
UNION ALL SELECT [4] FROM #tab
), all_pairs AS
(
SELECT DISTINCT a.val + b.val AS pair
FROM cte2 a
CROSS JOIN cte2 b
WHERE a.val IS NOT NULL and b.val IS NOT NULL
)
SELECT a.pair, result = COUNT(c.pair)
FROM all_pairs a
LEFT JOIN cte c
ON a.pair = c.pair
GROUP BY a.pair;
How it works:
cte create all pairs (1,2), (2,3), (3,4)
cte2 get all values from column
all_pairs create all possible pairs of values AA, AB, BA, BB
Final use grouping and COUNT to get number of occurences.
EDIT:
You can concatenate result as below:
LiveDemo2
...
, final AS
(
SELECT a.pair, result = COUNT(c.pair), rn = ROW_NUMBER() OVER(ORDER BY a.pair)
FROM all_pairs a
LEFT JOIN cte c
ON a.pair = c.pair
GROUP BY a.pair
)
SELECT rn, [result] = pair + ' = ' + CAST(result AS NVARCHAR(100))
FROM final

with cte as (
select 1 as id, 'A' as [1], 'A' as [2], 'A' as [3], 'B' as [4]
union all select 2 , 'A', NULL,NULL,NULL
union all select 3 , 'B', 'A','A',NULL
union all select 4 , 'B',NULL,NULL,NULL
)
, Vals as (
select 'AA' as Val
union all select 'AB'
union all select 'BB'
union all select 'BA'
)
, UNPVT as (
/*UNPIVOT to convert the columns to be rows*/
SELECT id , VAL + LEAD(VAL) OVER (PARTITION BY ID ORDER BY SEQ) as Code
FROM (
select ID,[1],[2],[3],[4] from cte
) P
UNPIVOT (Val FOR Seq IN ([1],[2],[3],[4])
) AS UNPVT
)
select Vals.Val, count(UNPVT.Code) from UNPVT right join Vals on UNPVT.Code = Vals.Val
group by Vals.Val
CTE: contains your data.
Vals: contains the returned code.
UnPVT: to convert the columns to be rows.

Select No Rows If Any Row Meets A Condition?

How can I select no rows if any row in the result set meets a certain condition?
For instance:
Id|SomeColumn|Indicator
1 | test | Y
1 | test1 | Y
1 | test2 | X
2 | test1 | Y
2 | test2 | Y
3 | test1 | Y
Say I wanted to select all rows where Id = 1 unless there is a row with an indicator = X
Currently I am doing something like this
SELECT * FROM SOMETABLE WHERE ID = 1 AND INDICATOR = 'Y' AND ID NOT IN (SELECT ID WHERE INDICATOR = 'X')
But that feels really clunky and I feel like there could be a better way to be doing this. Is there or am I just being overly sensitive

Something like this ?
SELECT *
FROM SOMETABLE
WHERE ID = 1
AND NOT EXISTS (SELECT 1 FROM SOMETABLE WHERE INDICATOR = 'X')
or, if you want the X to discriminate only on the same id:
SELECT *
FROM SOMETABLE t1
WHERE t1.ID = 1
AND NOT EXISTS (SELECT 1 FROM SOMETABLE t2 WHERE t1.ID = t1.ID AND INDICATOR = 'X')

There are not too many options to do this. Another option is to use EXISTS.
SELECT *
FROM SOMETABLE s1
WHERE ID = 1 AND INDICATOR = 'Y'
AND NOT EXISTS (SELECT TOP 1 ID FROM SOMETABLE s2 WHERE s1.ID = s2.ID AND INDICATOR = 'X')

Another option, assuming that there's an enforced order in indicator column.
DECLARE #T TABLE
(
ID INT
, someColumn VARCHAR(5)
, Indicator CHAR(1)
)
INSERT INTO #T
( ID, someColumn, Indicator )
VALUES ( 1, 'test', 'Y' ),
( 1, 'test1', 'Y' ),
( 1, 'test2', 'X' ),
( 2, 'test1', 'Y' ),
( 2, 'test2', 'Y' ),
( 3, 'test1', 'Y' )
SELECT t.ID
, t.someColumn
, t.Indicator
FROM #T t
JOIN (SELECT ID
FROM #T t2
GROUP BY t2.ID
HAVING MIN(indicator) >= 'Y') q ON q.ID = t.ID
Not sure if it's any less clunky, but it may perform better since it's using positive exclusion rather than negative.

Is it possible to write a sql query that is grouped based on a running total of a column?

It would be easier to explain with an example. Suppose I wanted to get at most 5 items per group.
My input would be a table looking like this:
Item Count
A 2
A 3
A 3
B 4
B 4
B 5
C 1
And my desired output would look like this:
Item Count
A 5
A>5 3
B 4
B>5 9
C 1
An alternative output that I could also work with would be
Item Count RunningTotal
A 2 2
A 3 5
A 3 8
B 4 4
B 4 8
B 5 13
C 1 1
I can use ROW_NUMBER() to get the top X records in each group, however my requirement is to get the top X items for each group, not X records. My mind is drawing a blank as to how to do this.

declare #yourTable table (item char(1), [count] int)
insert into #yourTable
select 'A', 2 union all
select 'A', 3 union all
select 'A', 3 union all
select 'B', 4 union all
select 'B', 4 union all
select 'B', 5 union all
select 'C', 1
;with cte(item, count, row) as (
select *, row_number() over ( partition by item order by item, [count])
from #yourTable
)
select t1.Item, t1.Count, sum(t2.count) as RunningTotal from cte t1
join cte t2 on t1.item = t2.item and t2.row <= t1.row
group by t1.item, t1.count, t1.row
Result:
Item Count RunningTotal
---- ----------- ------------
A 2 2
A 3 5
A 3 8
B 4 4
B 4 8
B 5 13
C 1 1

Considering the clarifications from your comment, you should be able to produce the second kid of output from your post by running this query:
select t.Item
, t.Count
, (select sum(tt.count)
from mytable tt
where t.item=tt.item and (tt.creating_user_priority < t.creating_user_priority or
( tt.creating_user_priority = t.creating_user_priority and tt.created_date < t.createdDate))
) as RunningTotal
from mytable t

declare #yourTable table (item char(1), [count] int)
insert into #yourTable
select 'A', 2 union all
select 'A', 3 union all
select 'A', 3 union all
select 'B', 4 union all
select 'B', 4 union all
select 'B', 5 union all
select 'C', 1
;with cte(item, count, row) as (
select *, row_number() over ( partition by item order by item, [count])
from #yourTable
)
select t1.row, t1.Item, t1.Count, sum(t2.count) as RunningTotal
into #RunTotal
from cte t1
join cte t2 on t1.item = t2.item and t2.row <= t1.row
group by t1.item, t1.count, t1.row
alter table #RunTotal
add GrandTotal int
update rt
set GrandTotal = gt.Total
from #RunTotal rt
left join (
select Item, sum(Count) Total
from #RunTotal rt
group by Item) gt
on rt.Item = gt.Item
select Item, max(RunningTotal)
from #RunTotal
where RunningTotal <= 5
group by Item
union
select a.Item + '>5', total - five
from (
select Item, max(GrandTotal) total
from #RunTotal
where GrandTotal > 5
group by Item
) a
left join (
select Item, max(RunningTotal) five
from #RunTotal
where RunningTotal <= 5
group by Item
) b
on a.Item = b.Item
I've updated the accepted answer and got your desired result.

SELECT Item, SUM(Count)
FROM mytable t
GROUP BY Item
HAVING SUM(Count) <=5
UNION
SELECT Item, 5
FROM mytable t
GROUP BY Item
HAVING SUM(Count) >5
UNION
SELECT t2.Item + '>5', Sum(t2.Count) - 5
FROM mytable t2
GOUP BY Item
HAVING SUM(Count) > 5
ORDER BY 1, 2

select 'A' as Name, 2 as Cnt
into #tmp
union all select 'A',3
union all select 'A',3
union all select 'B',4
union all select 'B',4
union all select 'B',5
union all select 'C',1
select Name, case when sum(cnt) > 5 then 5 else sum(cnt) end Cnt
from #tmp
group by Name
union
select Name+'>5', sum(cnt)-5 Cnt
from #tmp
group by Name
having sum(cnt) > 5
Here is what I have so far. I know it's not complete but... this should be a good starting point.

I can get your second output by using a temp table and an update pass:
DECLARE #Data TABLE
(
ID INT IDENTITY(1,1) PRIMARY KEY
,Value VARCHAR(5)
,Number INT
,Total INT
)
INSERT INTO #Data (Value, Number) VALUES ('A',2)
INSERT INTO #Data (Value, Number) VALUES ('A',3)
INSERT INTO #Data (Value, Number) VALUES ('A',3)
INSERT INTO #Data (Value, Number) VALUES ('B',4)
INSERT INTO #Data (Value, Number) VALUES ('B',4)
INSERT INTO #Data (Value, Number) VALUES ('B',5)
INSERT INTO #Data (Value, Number) VALUES ('C',1)
DECLARE
#Value VARCHAR(5)
,#Count INT
UPDATE #Data
SET
#Count = Total = CASE WHEN Value = #Value THEN Number + #Count ELSE Number END
,#Value = Value
FROM #Data AS D
SELECT
Value
,Number
,Total
FROM #Data
There may be better ways, but this should work.

Problem in counting nulls and then merging them with the existing rows

Input:
ID groupId RowID Data
1 1 1 W
2 1 1 NULL
3 1 1 NULL
4 1 1 Z
5 1 2 NULL
6 1 2 NULL
7 1 2 X
8 1 2 NULL
9 1 3 NULL
10 1 3 NULL
11 1 3 Y
12 1 3 NULL
Expected Output
GroupId NewData
1 2Y1,2X1,W2Z
For every Null there will be a numeric count. That is if there are two nulls then the numeric value will be 2.
The ddl is as under
DECLARE #t TABLE(ID INT IDENTITY(1,1) , GroupId INT, RowID INT, Data VARCHAR(10))
INSERT INTO #t (GroupId, RowID,DATA)
SELECT 1,1,'W' UNION ALL SELECT 1,1,NULL UNION ALL SELECT 1,1,NULL UNION ALL SELECT 1,1,'Z' UNION ALL SELECT 1,2,NULL UNION ALL
SELECT 1,2,NULL UNION ALL SELECT 1,2,'X' UNION ALL SELECT 1,2,NULL UNION ALL SELECT 1,3,NULL UNION ALL SELECT 1,3,NULL UNION ALL
SELECT 1,3,'Y' UNION ALL SELECT 1,3,NULL
select * from #t
My version is as under but not the correct output
;with t as (
select GroupID, id, RowID, convert(varchar(25), case when Data is null then '' else Data end) Val,
case when Data is null then 1 else 0 end NullCount from #t where id = 1
union all
select t.GroupID, a.id,a.RowID, convert(varchar(25), Val +
case when Data is not null or (t.RowID <> a.RowID and NullCount > 0) then ltrim(NullCount) else '' end +
case when t.RowID <> a.RowID then ',' else '' end + isnull(Data, '')),
case when Data is null then NullCount + 1 else 0 end NullCount
from t inner join #t a on t.GroupID = a.GroupID and t.id + 1 = a.id
)
select GroupID, Data = Val + case when NullCount > 0 then ltrim(NullCount) else '' end from t
where id = (select max(id) from #t where GroupID = t.GroupId)
Is yielding the below output
GroupID Data
1 W2Z,2X1,3Y1
Please help me out
Thanks in advance

Kind of messy and most likely can be improved
;With RawData AS
(
select * from #t
)
,Ranked1 as
(
select *, RANK() OVER (PARTITION BY GroupId, RowID ORDER BY ID, GroupId, RowID) R from #t
)
,Ranked2 as
(
select *, R - RANK() OVER (PARTITION BY GroupId, RowID ORDER BY ID, GroupId, RowID) R2 from Ranked1
where Data is null
)
,Ranked3 as
(
select MIN(ID) as MinID, GroupId, RowID, R2, COUNT(*) C2 from Ranked2
group by GroupId, RowID, R2
)
,Ranked4 as
(
select RD.ID, RD.GroupId, RD.RowID, ISNULL(Data, C2) as C3 from RawData RD
left join Ranked3 R3 on RD.ID = R3.MinID and RD.GroupId = R3.GroupId and RD.RowID = R3.RowID
where ISNULL(Data, C2) is not null
)
,Grouped as
(
select GroupId, RowID,
(
select isnull(C3, '') from Ranked4 as R41
where R41.GroupId = R42.GroupId and R41.RowID = R42.RowID
order by GroupId, RowID for xml path('')
) as C4
from Ranked4 as R42
group by GroupId, RowID
)
select GroupId,
stuff((
select ',' + C4 from Grouped as G1
where G1.GroupId = G2.GroupId
order by GroupId for xml path('')
), 1, 1, '')
from Grouped G2
group by GroupId

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Filter unique records from a database while removing double not-null values - sql

You can use GROUP BY for that: SELECT ID, MAX(Z) -- Could be MIN(Z) FROM MyTable GROUP BY ID Aggregate functions ignore NULLs, returning them only when all values on the group are NULL.

If you need to return both 2-B and 2-E rows: SELECT * FROM YourTable t1 WHERE Z IS NOT NULL OR NOT EXISTS (SELECT * FROM YourTable t2 WHERE T2.ID = T1.id AND T2.z IS NOT NULL)

SELECT ID ,Z FROM YourTable WHERE Z IS NOT NULL

with tmp (id, cnt_val) as (select id, sum(case when z is not null then 1 else 0 end) from t group by id) select t.id, t.z from t inner join tmp on t.id = tmp.id where tmp.cnt_val > 0 and t.z is not null or tmp.cnt_val = 0 and t.z is null

WITH CTE AS ( SELECT id ,z ,ROW_NUMBER() OVER ( PARTITION BY id ORDER BY coalesce(z, '') DESC ) rn FROM #T ) SELECT id ,z FROM CTE WHERE rn = 1

Related

Case when duplicate add one more letter

Count pair-wise occurrences in a T-SQL table

Select No Rows If Any Row Meets A Condition?

Is it possible to write a sql query that is grouped based on a running total of a column?

Problem in counting nulls and then merging them with the existing rows

Categories

Resources