Count amount of same value - sql

I have a simple task which I to be honest have no idea how to accomplish. I have these values from SQL query:
| DocumentNumber | CustomerID |
------------------------------
| AAA | 1 |
| BBB | 1 |
| CCC | 2 |
| DDD | 3 |
-------------------------------
I would like to display a bit modified table like this:
| DocumentNumber | CustomerID | Repeate |
-----------------------------------------
| AAA | 1 | Multiple |
| BBB | 1 | Multiple |
| CCC | 2 | Single |
| DDD | 3 | Single |
------------------------------------------
So, the idea is simple - I need to append a new column and set 'Multiple' and 'Single' value
depending on if customer Id exists multiple times

Use window functions:
select t.*,
(case when count(*) over (partition by CustomerId) = 1 then 'Single'
else 'Multiple'
end) as repeate
from t;

You also achieve the Same By using GROUP BY & SUB QUERY
DECLARE #T TABLE(
DocumentNumber VARCHAR(10),
CustomerID INT)
Insert Into #T VALUES('AAA', 1 ),('BBB', 1 ),('CCC', 2 ),('DDD', 3 )
select M.DocumentNumber,M.CustomerID,CASE WHEN Repeated_Row>1 THEN 'Multiple' ELSE 'Single' END As Repeate
from #T M
LEFT JOIN (SELECT CustomerID,COUNT(*) AS Repeated_Row FROM #T GROUP BY CustomerID) S ON S.CustomerID=M.CustomerID

Related

SQL : Remove duplicate based on a critera

I'm mostly new to SQL, thus I don't know a lot about all the advanced option it provides. I work currently with MS SQL Server 2016 (Developer edition).
I have the following result:
| Type | Role | GUID |
|--------|--------|--------------------------------------|
| B | 0 | ABC |
| B | 0 | KLM |
| A | 0 | CDE |
| A | 0 | EFG |
| A | 1 | CDE |
| B | 1 | ABC |
| B | 1 | GHI |
| B | 1 | IJK |
| B | 1 | KLM |
From the following SELECT :
SELECT DISTINCT
Type,
Role,
GUID
I'm looking to count the GUID following these constrains :
-> If there is multiple row with the same GUID, only count the row with "Role" set to "1", else, count the one with a "Role" set to 0
-> if there is only one, count it either as a "Role 0" or "Role 1", according to their own Role value.
My objective is to get the following result :
| Type | Role | COUNT(GUID) |
|--------|--------|--------------------------------------|
| A | 0 | 1 | => counted EFG as there was no other row with a "Role" set to 1
| A | 1 | 1 | => counted CDE with "Role" set to 1, but the row with "Role" set to 0 is ignored
| B | 1 | 4 |
Your query is not implementing the logic you mention. Here is a method that uses subqueries and window functions:
select type, role, count(*)
from (select t.*,
count(*) over (partition by GUID) as guid_cnt
from t
) t
where (guid_cnt > 1 and role = 1) or
(guid_cnt = 1 and role = 0)
group by type, role;
The subquery gets the count of rows that match a GUID. The outer where then uses that for filtering according to your conditions.
Note: role is not a good choice for a column name. It is a keyword (see here) and may be reserved in the future (see here).
A NOT EXISTS could be used for this.
For example:
declare #T table ([Type] char(1), [Role] int, [GUID] varchar(3));
insert into #T ([Type], [Role], [GUID]) values
('A',0,'CDE'),
('A',0,'EFG'),
('A',1,'CDE'),
('B',0,'ABC'),
('B',0,'KLM'),
('B',1,'ABC'),
('B',1,'GHI'),
('B',1,'IJK'),
('B',1,'KLM');
select [Type], [Role], COUNT(DISTINCT [GUID]) as TotalUniqueGuid
from #T t
where not exists (
select 1
from #T t1
where t.[Type] = t1.[Type]
and t.[Role] = 0 and t1.[Role] > 0
and t.[GUID] = t1.[GUID]
)
group by [Type], [Role];
Returns:
Type Role TotalUniqueGuid
A 0 1
A 1 1
B 1 4

SQL group by under some conditions

I have a big table with tons of duplicated rows (among those columns that I care about). Let me start with the following example:
|field1 | field2| field3| field4| field5|
| aa | 1 | NULL | 1 | 0 |
| aaa | 1 | NULL | 1 | 1 |
| aaa | 1 | NULL | 1 | 2 |
| a | 2 | 0 | 1 | 3 |
| a | 2 | 0 | NULL | 4 |
| a | 2 | NULL | 2 | 5 |
| b | 3 | NULL | 2 | 6 |
| b2 | 3 | NULL | NULL | 7 |
| c | 4 | NULL | NULL | 8 |
I am interested in an effiecient query to get the following table:
|field1 | field2| field3| field4|
| aaa | 1 | NULL | 1 |
| a | 2 | 0 | 1 |
| b | 3 | NULL | 2 |
| c | 4 | NULL | NULL |
Basically, it follows the following rules:
for each value of field2, there should be one and exactly one row present
among all the rows with the same value of field2 select the row that satisfy the following in order:
select the row that field4 is not Null (if possible)
among those that have a non Null value for the field4 select the row that has has a non Null value for field 3
among those that have a non Null value for the field4 and 3, select the row that has the longest string value for field 1
among those that satisfy all above, select only one row (does not matter what is the value of field5).
I could do it with bunch of joins, but it becomes very slow. Any better suggestions?
EDIT
The field2 values may not be in an specific order. I just put 1,2,3,4 in the example but this is not generally true in my case. I did not change it directly on the table since one of the suggested solutions are actually considering sequential value for field2, so I kept if for future readers that maybe interested in that.
This type of prioritization is challenging. I think the simplest method in MySQL uses variables:
select t.*
from (select t.*,
(#rn := if(#f2 = field2, #rn + 1,
if(#f2 := field2, 1, 1)
)
) as seqnum
from t cross join
(select #rn := 0, #field2 := '') params
order by field2,
(field4 is not null) desc,
(field3 is not null) desc,
length(field1) desc
) t
where seqnum = 1;
I'm not 100% sure I have the conditions right (the third seems to conflict with the first two). But whatever the prioritization, the idea is the same: use order by to get the rows in the right order and use variables to get the first one.
EDIT:
In SQL Server -- or any other reasonable database -- you do this with row_number():
select t.*
from (select t.*,
row_number() over (partition by field2
order by (case when field4 is not null then 0 else 1 end),
(case when field3 is not null then 0 else 1 end),
len(field1)
) as seqnum
from t
) t
where seqnum = 1;

Delete first rows after qualified ones

Let's suppose that I have the following table called Orders:
---------------------------------
| OrderId | Status | CustomerId |
---------------------------------
| 1 | + | 2 |
---------------------------------
| 2 | - | 1 |
---------------------------------
| 3 | + | 2 |
---------------------------------
| 4 | + | 1 |
---------------------------------
| 5 | - | 3 |
---------------------------------
| 6 | + | 4 |
---------------------------------
| 7 | + | 3 |
---------------------------------
The question is how I can delete the next order after cancelled one for each customer? I basically want to delete order with id = 4, 7.
So the result should be:
---------------------------------
| OrderId | Status | CustomerId |
---------------------------------
| 1 | + | 2 |
---------------------------------
| 2 | - | 1 |
---------------------------------
| 3 | + | 2 |
---------------------------------
| 5 | - | 3 |
---------------------------------
| 6 | + | 4 |
---------------------------------
I use SQL Server, but I'm realy curious about writing it using ANSI SQL.
You can get the last cancelled order for each customer. Then delete the orders after that:
with todelete as (
select t.*,
min(case when status = '-' then orderid end) over
(partition by customerid) as deleted_orderid
from table t
)
delete from todelete
where orderid > deleted_orderid;
EDIT:
To delete just the next one, let's use row_number():
with todelete as (
select t.*, min(case when orderid > deleted_orderid then orderid end) over
(partition by customerid) as orderid_to_delete
from (select t.*,
min(case when status = '-' then orderid end) over
(partition by customerid) as deleted_orderid
from table t
) t
)
delete from todelete
where orderid = orderid_to_delete;
EDIT II:
If you want to delete the next order after any delete, the query is a bit simpler:
with todelete as (
select t.*, lag(status) over (partition by customerid order by orderid) as prev_status
from table t
)
delete from todelete
where prev_status = '-';
This is ANSI SQL. If you are using SQL Server 2008, you need to use a correlated subquery or maybe cross apply (I'm not 100% sure that cross apply will work in a delete CTE, but it should.)

Rolling up remaining rows into one called "Other"

I have written a query which selects lets say 10 rows for this example.
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Store9 | 1 |
| Store2 | 1 |
| Store3 | 1 |
| Store4 | 1 |
| Store5 | 0 |
| Store6 | 0 |
| Store10 | 0 |
+-----------+------------+
How would I go about displaying the TOP 3 rows BUT Having the remaining rows roll up into a row called "other", and it adds all of their Complaints together?
So like this for example:
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Other | 4 |
+-----------+------------+
So what has happened above, is it displays the top3 then adds the complaints of the remaining rows into a row called other
I have exhausted all my resources and cannot find a solution. Please let me know if this makes sense.
I have created a SQLfiddle of the above tables that you can edit if it is possible :)
Here's hoping this is possible :)
Thanks,
Mike
Something like this may work
select *, row_number() over (order by complaints desc) as sno
into #temp
from
(
SELECT
a.StoreName
,COUNT(b.StoreID) AS [Complaints]
FROM Stores a
LEFT JOIN
(
SELECT
StoreName
,Complaint
,StoreID
FROM Complaints
WHERE Complaint = 'yes') b on b.StoreID = a.StoreID
GROUP BY a.StoreName
) as t ORDER BY [Complaints] DESC
select storename,complaints from #temp where sno<4
union all
select 'other',sum(complaints) as complaints from #temp where sno>=4
I do this with double aggregation and row_number():
select (case when seqnum <= 3 then storename else 'Other' end) as StoreName,
sum(numcomplaints) as numcomplaints
from (select c.storename, count(*) as numcomplaints,
row_number() over (order by count(*) desc) as seqnum
from complaints c
where c.complaint = 'Yes'
group by c.storename
) s
group by (case when seqnum <= 3 then storename else 'Other' end) ;
From what I can see, you don't really need any additional information from stores, so this version just leaves that table out.

Find and update specific duplicates in MS SQL

given below table:
+----+---------+-----------+-------------+-------+
| ID | NAME | LAST NAME | PHONE | STATE |
+----+---------+-----------+-------------+-------+
| 1 | James | Vangohg | 04333989878 | NULL |
| 2 | Ashly | Baboon | 09898788909 | NULL |
| 3 | James | Vangohg | 04333989878 | NULL |
| 4 | Ashly | Baboon | 09898788909 | NULL |
| 5 | Michael | Foo | 02933889990 | NULL |
| 6 | James | Vangohg | 04333989878 | NULL |
+----+---------+-----------+-------------+-------+
I want to use MS SQL to find and update duplicate (based on name, last name and number) but only the earlier one(s). So desired result for above table is:
+----+---------+-----------+-------------+-------+
| ID | NAME | LAST NAME | PHONE | STATE |
+----+---------+-----------+-------------+-------+
| 1 | James | Vangohg | 04333989878 | DUPE |
| 2 | Ashly | Baboon | 09898788909 | DUPE |
| 3 | James | Vangohg | 04333989878 | DUPE |
| 4 | Ashly | Baboon | 09898788909 | NULL |
| 5 | Michael | Foo | 02933889990 | NULL |
| 6 | James | Vangohg | 04333989878 | NULL |
+----+---------+-----------+-------------+-------+
This query uses a CTE to apply a row number, where any number > 1 is a dupe of the row with the highest ID.
;WITH x AS
(
SELECT ID,NAME,[LAST NAME],PHONE,STATE,
ROW_NUMBER() OVER (PARTITION BY NAME,[LAST NAME],PHONE ORDER BY ID DESC)
FROM dbo.YourTable
)
UPDATE x SET STATE = CASE rn WHEN 1 THEN NULL ELSE 'DUPE' END;
Of course, I see no reason to actually update the table with this information; every time the table is touched, this data is stale and the query must be re-applied. Since you can derive this information at run-time, this should be part of a query, not constantly updated in the table. IMHO.
Try this statement.
LAST UPDATE:
update t1
set
t1.STATE = 'DUPE'
from
TableName t1
join
(
select name, last_name, phone, max(id) as id, count(id) as cnt
from
TableName
group by name, last_name, phone
having count(id) > 1
) t2 on ( t1.name = t2.name and t1.last_name = t2.last_name and t1.phone = t2.phone and t1.id < t2.id)
If my understanding of your requirements is correct, you want to update all of the STATE values to DUPE when there exists another row with a higher ID value that has the same NAME and LAST NAME. If so, use this:
update t set STATE = (case when sorted.RowNbr = 1 then null else 'DUPE' end)
from yourtable t
join (select
ID,
row_number() over
(partition by name, [last name], phone order by id desc) as RowNbr from yourtable)
sorted on sorted.ID = t.ID