SQL : Remove duplicate based on a critera - sql

I'm mostly new to SQL, thus I don't know a lot about all the advanced option it provides. I work currently with MS SQL Server 2016 (Developer edition).
I have the following result:
| Type | Role | GUID |
|--------|--------|--------------------------------------|
| B | 0 | ABC |
| B | 0 | KLM |
| A | 0 | CDE |
| A | 0 | EFG |
| A | 1 | CDE |
| B | 1 | ABC |
| B | 1 | GHI |
| B | 1 | IJK |
| B | 1 | KLM |
From the following SELECT :
SELECT DISTINCT
Type,
Role,
GUID
I'm looking to count the GUID following these constrains :
-> If there is multiple row with the same GUID, only count the row with "Role" set to "1", else, count the one with a "Role" set to 0
-> if there is only one, count it either as a "Role 0" or "Role 1", according to their own Role value.
My objective is to get the following result :
| Type | Role | COUNT(GUID) |
|--------|--------|--------------------------------------|
| A | 0 | 1 | => counted EFG as there was no other row with a "Role" set to 1
| A | 1 | 1 | => counted CDE with "Role" set to 1, but the row with "Role" set to 0 is ignored
| B | 1 | 4 |

Your query is not implementing the logic you mention. Here is a method that uses subqueries and window functions:
select type, role, count(*)
from (select t.*,
count(*) over (partition by GUID) as guid_cnt
from t
) t
where (guid_cnt > 1 and role = 1) or
(guid_cnt = 1 and role = 0)
group by type, role;
The subquery gets the count of rows that match a GUID. The outer where then uses that for filtering according to your conditions.
Note: role is not a good choice for a column name. It is a keyword (see here) and may be reserved in the future (see here).

A NOT EXISTS could be used for this.
For example:
declare #T table ([Type] char(1), [Role] int, [GUID] varchar(3));
insert into #T ([Type], [Role], [GUID]) values
('A',0,'CDE'),
('A',0,'EFG'),
('A',1,'CDE'),
('B',0,'ABC'),
('B',0,'KLM'),
('B',1,'ABC'),
('B',1,'GHI'),
('B',1,'IJK'),
('B',1,'KLM');
select [Type], [Role], COUNT(DISTINCT [GUID]) as TotalUniqueGuid
from #T t
where not exists (
select 1
from #T t1
where t.[Type] = t1.[Type]
and t.[Role] = 0 and t1.[Role] > 0
and t.[GUID] = t1.[GUID]
)
group by [Type], [Role];
Returns:
Type Role TotalUniqueGuid
A 0 1
A 1 1
B 1 4

Related

Count amount of same value

I have a simple task which I to be honest have no idea how to accomplish. I have these values from SQL query:
| DocumentNumber | CustomerID |
------------------------------
| AAA | 1 |
| BBB | 1 |
| CCC | 2 |
| DDD | 3 |
-------------------------------
I would like to display a bit modified table like this:
| DocumentNumber | CustomerID | Repeate |
-----------------------------------------
| AAA | 1 | Multiple |
| BBB | 1 | Multiple |
| CCC | 2 | Single |
| DDD | 3 | Single |
------------------------------------------
So, the idea is simple - I need to append a new column and set 'Multiple' and 'Single' value
depending on if customer Id exists multiple times
Use window functions:
select t.*,
(case when count(*) over (partition by CustomerId) = 1 then 'Single'
else 'Multiple'
end) as repeate
from t;
You also achieve the Same By using GROUP BY & SUB QUERY
DECLARE #T TABLE(
DocumentNumber VARCHAR(10),
CustomerID INT)
Insert Into #T VALUES('AAA', 1 ),('BBB', 1 ),('CCC', 2 ),('DDD', 3 )
select M.DocumentNumber,M.CustomerID,CASE WHEN Repeated_Row>1 THEN 'Multiple' ELSE 'Single' END As Repeate
from #T M
LEFT JOIN (SELECT CustomerID,COUNT(*) AS Repeated_Row FROM #T GROUP BY CustomerID) S ON S.CustomerID=M.CustomerID

SQL group by under some conditions

I have a big table with tons of duplicated rows (among those columns that I care about). Let me start with the following example:
|field1 | field2| field3| field4| field5|
| aa | 1 | NULL | 1 | 0 |
| aaa | 1 | NULL | 1 | 1 |
| aaa | 1 | NULL | 1 | 2 |
| a | 2 | 0 | 1 | 3 |
| a | 2 | 0 | NULL | 4 |
| a | 2 | NULL | 2 | 5 |
| b | 3 | NULL | 2 | 6 |
| b2 | 3 | NULL | NULL | 7 |
| c | 4 | NULL | NULL | 8 |
I am interested in an effiecient query to get the following table:
|field1 | field2| field3| field4|
| aaa | 1 | NULL | 1 |
| a | 2 | 0 | 1 |
| b | 3 | NULL | 2 |
| c | 4 | NULL | NULL |
Basically, it follows the following rules:
for each value of field2, there should be one and exactly one row present
among all the rows with the same value of field2 select the row that satisfy the following in order:
select the row that field4 is not Null (if possible)
among those that have a non Null value for the field4 select the row that has has a non Null value for field 3
among those that have a non Null value for the field4 and 3, select the row that has the longest string value for field 1
among those that satisfy all above, select only one row (does not matter what is the value of field5).
I could do it with bunch of joins, but it becomes very slow. Any better suggestions?
EDIT
The field2 values may not be in an specific order. I just put 1,2,3,4 in the example but this is not generally true in my case. I did not change it directly on the table since one of the suggested solutions are actually considering sequential value for field2, so I kept if for future readers that maybe interested in that.
This type of prioritization is challenging. I think the simplest method in MySQL uses variables:
select t.*
from (select t.*,
(#rn := if(#f2 = field2, #rn + 1,
if(#f2 := field2, 1, 1)
)
) as seqnum
from t cross join
(select #rn := 0, #field2 := '') params
order by field2,
(field4 is not null) desc,
(field3 is not null) desc,
length(field1) desc
) t
where seqnum = 1;
I'm not 100% sure I have the conditions right (the third seems to conflict with the first two). But whatever the prioritization, the idea is the same: use order by to get the rows in the right order and use variables to get the first one.
EDIT:
In SQL Server -- or any other reasonable database -- you do this with row_number():
select t.*
from (select t.*,
row_number() over (partition by field2
order by (case when field4 is not null then 0 else 1 end),
(case when field3 is not null then 0 else 1 end),
len(field1)
) as seqnum
from t
) t
where seqnum = 1;

Set-based way to calculate family ranges in SQL?

I have a table that contains parents and 0 or more children for each parent, with a flag indicating which records are parents. All of the members of a given family have the same parent id, and the parent always has the lowest id in a given family. Also, each child has a value associated with it. (Specifically, this is a database of emails and attachments, where each parent is an email and the children are the attachments.)
I have two fields I need to calculate:
Range = {lowest id in family} - {highest id in family} [populated for all members]
Value-list = {delimited list of the values of each child, in id order} [only for parent]
So, given this:
Id | Parent| HasChildren| Value | Range | Value-list
----------------------------------------|-----------
1 | 1 | 1 | | |
2 | 1 | 0 | a | |
3 | 1 | 0 | b | |
4 | 4 | 1 | | |
5 | 4 | 0 | c | |
6 | 6 | 0 | | |
I would like to end up with this:
Id | Parent| HasChildren| Value | Range | Value-list
----------------------------------------|-----------
1 | 1 | 1 | | 1-3 | a;b
2 | 1 | 0 | a | 1-3 |
3 | 1 | 0 | b | 1-3 |
4 | 4 | 1 | | 4-5 | c
5 | 4 | 0 | c | 4-5 |
6 | 6 | 0 | | 6-6 |
How can I do this efficiently? Ideally, I'd like to do this with just set-based logic, without cursors, or even stored procedures. Temporary tables are fine.
I'm working in T-SQL, if that makes a difference, though I'd be curious to see platform agnostic answers.
The following SQLFiddle Solution should do the job for you, however as #Allan mentioned, you might want to revise your database structure.
Using CTE's:
Note: my query uses table1 as name of Your table
with cte as(
select parent
,ValueList= stuff(( select ';' +isnull(t2.Value, '')
from table1 t2
where t1.parent=t2.parent
order by t2.value
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '')
from table1 t1
group by parent
),
cte2 as (select parent
, min(id) as firstID
, max(id) as LastID
from table1
group by parent)
select *
,(select FirstID from cte2 t2 where t2.parent=t1.parent)+'-'+(select LastID from cte2 t2 where t2.parent=t1.parent) as [Range]
,(select ValueList from cte t2 where t1.parent=t2.parent and t1.[haschildren]='1') as [Value -List]
from table1 t1

Select query where record count = 2 and column contains either two values

Example 1
+--------------------------+
| IDENT | CURRENT | SOURCE |
+--------------------------+
| 12345 | 12345 | A |
| 23456 | 12345 | B |
| 34567 | 12345 | C |
+--------------------------+
Example 2
+--------------------------+
| IDENT | CURRENT | SOURCE |
+--------------------------+
| 12345 | 55555 | A |
| 23456 | 55555 | B |
+--------------------------+
Trying to write select query that will show all records that CURRENT count = 2 and SOURCE contains both A and B (NOT C).
Example A should not show up as there are 3 entries for the CURRENT as record is linked to SOURCE C.
Example B is what I'm looking the query to find, CURRENT has two records and is only linked to SOURCE 'A' and 'B'.
Currently if I run something similar to "where SOURCE = A or SOURCE = B", results are records that just have SOURCE of A, OR A+C.
NOTES: IDENT is always a unique value. CURRENT links multiple IDENTS from different SOURCE's.
We're clearly missing more information. Let's take example data (thanks gloomy for the initial fiddle).
| ID | CURRENT | SOURCE |
|----|---------|--------|
| 1 | 111 | A |
| 2 | 111 | B |
| 3 | 111 | C |
| 4 | 222 | A |
| 5 | 222 | B |
| 6 | 333 | A |
| 7 | 333 | C |
| 8 | 444 | B |
| 9 | 444 | C |
| 10 | 555 | B |
| 11 | 666 | A |
| 12 | 666 | A |
| 13 | 666 | B |
| 14 | 777 | A |
| 15 | 777 | A |
I assume you only need this as the result:
| ID | CURRENT | SOURCE |
|----|---------|--------|
| 4 | 222 | A |
| 5 | 222 | B |
This query will work with any amount of sources and result in the expected output:
SELECT * FROM test
WHERE CURRENT IN (
SELECT CURRENT FROM test
WHERE CURRENT NOT IN (
SELECT CURRENT FROM test
WHERE SOURCE NOT IN ('A', 'B')
)
GROUP BY CURRENT
HAVING count(SOURCE) = 2 AND count(DISTINCT SOURCE) = 2
)
If SOURCE values are guaranteed to be unique per CURRENT:
SELECT CURRENT
FROM atable
GROUP BY CURRENT
HAVING COUNT(SOURCE) = 2
AND COUNT(CASE WHEN SOURCE IN ('A', 'B') THEN SOURCE END) = 2
;
If SOURCE values aren't unique per CURRENT but CURRENTs with duplicate entries of 'A' or 'B' are allowed:
SELECT CURRENT
FROM atable
GROUP BY CURRENT
HAVING COUNT(DISTINCT SOURCE) = 2
AND COUNT(DISTINCT CASE WHEN SOURCE IN ('A', 'B') THEN SOURCE END) = 2
;
If SOURCE values aren't unique and groups with duplicate SOURCE entries aren't allowed:
SELECT CURRENT
FROM atable
GROUP BY CURRENT
HAVING COUNT(SOURCE) = 2
AND COUNT(DISTINCT SOURCE) = 2
AND COUNT(DISTINCT CASE WHEN SOURCE IN ('A', 'B') THEN SOURCE END) = 2
;
Every query returns only distinct CURRENT values matching the requirements. Use the query as a derived dataset and join it back to your table to get the details.
All the above options assume that either SOURCE is a NOT NULL column or that NULLs can just be disregarded.
Records where current count = 2:
SELECT CURRENT
FROM table
GROUP BY CURRENT
HAVING COUNT(*) = 2
Records where C is in SOURCE values:
SELECT CURRENT
FROM table
WHERE SOURCE = 'C'
Global query:
SELECT t.*
FROM TABLE t
WHERE t.CURRENT IN (
SELECT CURRENT
FROM table
GROUP BY CURRENT
HAVING COUNT(*) = 2
) AND t.CURRENT NOT IN (
SELECT CURRENT
FROM table
WHERE SOURCE = 'C'
)
http://sqlfiddle.com/#!2/69be9/8/0
select * from test where current in (
select test_a.current
from
(select *
from test
where source = 'A') as test_a
join (select *
from test
where source = 'B') as test_b
on test_b.current = test_a.current
where test_a.current not in
(select current from test where source='C')
)
SELECT *
FROM TABLE mainTbl,
(SELECT CURRENT
FROM TABLE
WHERE source IN ('A', 'B')
HAVING COUNT(1) = 2
GROUP BY CURRENT
) selectedSet
WHERE mainTbl.current = selectedSet.current
AND mainTbl.source IN ('A', 'B');

How to write a Sql query to find distinct values that have never met the following "Where Not(a=x and b=x)"

I have the following table called Attributes
* AttId * CustomerId * Class * Code *
| 1 | 1 | 1 | AA |
| 2 | 1 | 1 | AB |
| 3 | 1 | 1 | AC |
| 4 | 1 | 2 | AA |
| 5 | 1 | 2 | AB |
| 6 | 1 | 3 | AB |
| 7 | 2 | 1 | AA |
| 8 | 2 | 1 | AC |
| 9 | 2 | 2 | AA |
| 10 | 3 | 1 | AB |
| 11 | 3 | 3 | AB |
| 12 | 4 | 1 | AA |
| 13 | 4 | 2 | AA |
| 14 | 4 | 2 | AB |
| 15 | 4 | 3 | AB |
Where each Class, Code pairing represents a specific Attribute.
I'm trying to write a query that returns all customers that are NOT linked to the Attribute pairing Class = 1, Code = AB.
This would return Customer Id values 2 and 4.
I started to write Select Distinct A.CustomerId From Attributes A Where (A.Class = 1 and A.Code = 'AB') but stopped when I realised I was writing a SQL query and there is not an operator available to place before the parentheses to indicate the clause within must Not be met.
What am I missing? Or which operator should I be looking at?
Edit:
I'm trying to write a query that only returns those Customers (ie distinct Customer Id's) that have NO link to the Attribute pairing Class = 1, Code = AB.
This could only be Customer Id values 2 and 4 as the table does Not contain the rows:
* AttId * CustomerId * Class * Code *
| x | 2 | 1 | AB |
| x | 4 | 1 | AB |
Changed Title from:
How to write "Where Not(a=x and b=x)"in Sql Query
To:
How to write a Sql query to find distinct values that have never met the following "Where Not(a=x and b=x)"
As the previous title was a question in it's own right however the detail of the question added an extra dimension which led to confusion.
One way would be
SELECT DISTINCT CustomerId FROM Attributes a
WHERE NOT EXISTS (
SELECT * FROM Attributes forbidden
WHERE forbidden.CustomerId = a.CustomerId AND forbidden.Class = _forbiddenClassValue_ AND forbidden.Code = _forbiddenCodeValue_
)
or with join
SELECT DISTINCT a.CustomerId FROM Attributes a
LEFT JOIN (
SELECT CustomerId FROM Attributes
WHERE Class = _forbiddenClassValue_ AND Code = _forbiddenCodeValue_
) havingForbiddenPair ON a.CustomerId = havingForbiddenPair.CustomerId
WHERE havingForbiddenPair.CustomerId IS NULL
Yet another way is to use EXCEPT, as per ypercube's answer
SELECT CustomerId
FROM Attributes
EXCEPT
SELECT CustomerId
FROM Attributes
WHERE Class = 1
AND Code = AB ;
Since no one has posted the simple logical statement, here it is:
select . . .
where A.Class <> 1 OR A.Code <> 'AB'
The negative of (X and Y) is (not X or not Y).
I see, this is a grouping thing. For this, you use aggregation and having:
select customerId
from Attributes a
group by CustomerId
having sum(case when A.Class = 1 and A.Code = 'AB' then 1 else 0 end) = 0
I always prefer to solve "is it in a set" type questions using this technique.
Select Distinct A.CustomerId From Attributes A Where not (A.Class = 1 and A.Code = 'AB')
Try this:
SELECT DISTINCT A.CustomerId From Attributes A Where
0 = CASE
WHEN A.Class = 1 and A.Code = 'AB' THEN 1
ELSE 0
END
Edit: of course this still gives you cust 1 (doh!), you should probably use pjotrs NOT EXISTS query ideally, serves me right for not looking at the data closely enough :)