How to select sequential duplicates in SQL Server

How to select sequential duplicates in SQL Server - sql

I would like to select duplicate entries from a SQL Server table, but only if the id is consecutive.
I have been trying to twist this answer to my needs, but I can't get it to work.
The above answer is for Oracle, but I see that SQL Server also has lead and lag functions.
Also, I think that the answer above puts a * next to duplicates, but I only want to select the duplicates.
select
id, companyName,
case
when companyName in (prev, next)
then '*'
end match,
prev,
next
from
(select
id,
companyName,
lag(companyName, 1) over (order by id) prev,
lead(companyName, 1) over (order by id) next
from
companies)
order by
id;
Example:
So from this data set:
id companyName
-------------------
1 dogs ltd
2 cats ltd
3 pigs ltd
4 pigs ltd
5 cats ltd
6 cats ltd
7 dogs ltd
8 pigs ltd
I want to select:
id companyName
-------------------
3 pigs ltd
4 pigs ltd
5 cats ltd
6 cats ltd
Update
Every now and again I am taken aback by the quantity and quality of answers I get on SO. This is one of those times. I don't have the level of expertise to judge one answer as being better than another, so I've gone for SqlZim as this was the first working answer I saw. But it's great to see the different approaches. Especially when only an hour ago I was wondering "is this even possible?".

You are very close to what you want:
select id, companyName
from (select c.*,
lag(companyName, 1) over (order by id) prev,
lead(companyName, 1) over (order by id) next
from companies c
) a
where CompanyName in (prev, next)
order by id;

This is a gaps and islands style problem, but instead of using two row_numbers(), we use the id and row_number() in the innermost subquery. Followed by count() over() to get the count per grp, and finally return those with a cnt > 1.
select id, companyname
from (
select
id
, companyName
, grp
, cnt = count(*) over (partition by companyname, grp)
from (
select *
, grp = id - row_number() over (partition by companyname order by id)
from
companies
) islands
) d
where cnt > 1
order by id
rextester demo: http://rextester.com/ACP73683
returns:
+----+-------------+
| id | companyname |
+----+-------------+
| 3 | pigs ltd |
| 4 | pigs ltd |
| 5 | cats ltd |
| 6 | cats ltd |
+----+-------------+

One more alternate form, using LEAD() and LAG() (SQL 2012 and up)
SELECT id, CompanyName
FROM (
SELECT *,
LEAD(CompanyName, 1) OVER(ORDER BY id) as nc,
LAG(CompanyName, 1) OVER(ORDER BY id) AS pc
FROM #t t
) x
WHERE nc = companyName
OR pc = companyName
Here is the test data, so you can check it out yourself.
CREATE TABLE #T (id int not null PRIMARY KEY, companyName varchar(16) not null)
INSERT INTO #t Values
(1, 'dogs ltd'),
(2, 'cats ltd'),
(3, 'pigs ltd'),
(4, 'pigs ltd'),
(5, 'cats ltd'),
(6, 'cats ltd'),
(7, 'dogs ltd'),
(8, 'pigs ltd')

In the WHERE clause you just need to limit to those where the companyName is the same as the prev or the next
select id, companyName
from (
select id, companyName,
lag(companyName, 1) over (order by id) as prev,
lead(companyName, 1) over (order by id) as next
from companies
) q
where companyName in (prev, next)
order by id;
To make sure the id's are really without gaps then you can do it like this:
select id, companyName
from (
select id, companyName,
lag(concat(id+1,companyName), 1) over (order by id) as prev,
lead(concat(id-1,companyName), 1) over (order by id) as next
from companies
) q
where concat(id,companyName) in (prev, next)
order by id;

You can use Row_Number() and get the duplicates based on partition by clause
;with cte as (
SELECT id, companyName,
RowN = Row_Number() over (partition by id order by companynae) from #yourTable
)
Select * from cte where RowN > 1
Can you provide your input and expected output to verify this query

Related

Top 10 of total amount paid aggregated by provider, partitioned by state - PostgreSQL

I have a database of medicare data with three tables: provider metadata (doctor's unique number, name, city, state, credentials, etc); hcpcs metadata (code, description, if it's for drugs or not); provider_services (doctor's unique number, hcpcs code, number of services completed by that doctor, average cost)
I'm trying to get the top 10 payments by state, aggregated by provider. However I'm running into an issue where 1) I can't figure out how to rank by the total payment and 2) I can't figure out how to aggregate the providers. Here's the best query I've gotten so far:
SELECT *
FROM (
SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_city AS city,
p.nppes_provider_state AS state,
(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state ORDER BY ps.average_medicare_payment_amt desc) AS rank
FROM provider_services ps
JOIN provider p ON ps.npi = p.npi
) t
WHERE rank <= 10
GROUP BY t.last_name, t.npi, t.first_name, t.city, t.state, t.total_amount, t.rank
ORDER BY state ASC;
This results in something like:
| LAST | FIRST| STATE | TOTAL | RANK |
|-------|------|----|---------|---|
| DOE | JANE | AK | 3000.41 | 10|
| SMITH | JOHN | AK | 6000.98 | 7 |
| COLE | ANN | AK | 1000 | 4 |
| SMITH | JOHN | AK | 1560.32 | 1 |
So my issues are 1. the providers aren't aggregating (John Smith with the same unique number showing up multiple times) and 2. I can only get it to compile with that average_payment_amt and not total_amt so the rankings are really screwed up.

Consider following adjustments:
Avoid ever using SELECT * in aggregate queries with GROUP BY. It is a wonder this query was allowed in PostgreSQL without error but such use of SELECT * may be shorthand for all columns specified in GROUP BY.
Use calculated expression for total_amount in the window function's ORDER BY clause.
Apply an aggregation function like SUM on your total_amount and do not include it as grouping column. In fact, you do not mention how you want to aggregate by provider.
Rank based on state throws off aggregation based on different column: provider. Right now it appears you want to use rank only for filtering records and not display.
Below achieves the following:
Sums total payment amounts by provider for the top 10 payment amounts per state.
SELECT t.npi, t.last_name, t.first_name, t.city, t.state,
SUM(t.total_amount) AS total_amount
FROM (
SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_city AS city,
p.nppes_provider_state AS state,
(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state
ORDER BY ps.average_medicare_payment_amt * ps.line_srvc_cnt DESC) AS rank
FROM provider_services ps
JOIN provider p ON ps.npi = p.npi
) t
WHERE rank <= 10
GROUP BY t.npi, t.last_name, t.first_name, t.city, t.state
ORDER BY t.state ASC;
Now, below achieves the following if this is your intention:
Displays records of top 10 payments per state in state and rank order (where providers can repeat if they ranked multiple times within or between states).
SELECT t.*
FROM (
SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_city AS city,
p.nppes_provider_state AS state,
(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state
ORDER BY ps.average_medicare_payment_amt * ps.line_srvc_cnt DESC) AS rank
FROM provider_services ps
JOIN provider p ON ps.npi = p.npi
) t
WHERE rank <= 10
ORDER BY t.state, t.rank;

I am guessing that you actually want to aggregate in the subquery and rank by the total amount:
SELECT t.*
FROM (SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_state AS state,
SUM(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state ORDER BY SUM(ps.average_medicare_payment_amt * ps.line_srvc_cnt) DESC) as rnk
FROM provider_services ps JOIN
provider p
ON ps.npi = p.npi
) t
WHERE rnk <= 10
ORDER BY state ASC, total_amount DESC;

How to give the serial number if data is repeating

if My table has this values i need to generate seqno column
ClientId clinetLocation seqno
001 Abc 1
001 BBc 2
001 ccd 3
002 Abc 1
002 BBc 2
003 ccd 1

You are looking for the row_number() function:
select ClientId, clinetLocation,
row_number() over (partition by ClientId order by clinetLocation) as seqnum
from t;
This is a standard function available in most databases.

One option would be counting the grouped rows with respect to those columns :
select count(1) over ( order by ClientId, ClientLocation ) as seqno,
ClientId, ClientLocation
from tab
group by ClientId, ClientLocation;
where ClientId & ClientLocation combination seems unique.
Rextester Demo

countif type function in SQL where total count could be retrieved in other column

I have 36 columns in a table but one of the columns have data multiple times like below
ID Name Ref
abcd john doe 123
1234 martina 100
123x brittany 123
ab12 joe 101
and i want results like
ID Name Ref cnt
abcd john doe 123 2
1234 martina 100 1
123x brittany 123 2
ab12 joe 101 1
as 123 has appeared twice i want it to show 2 in cnt column and so on

select ID, Name, Ref, (select count(ID) from [table] where Ref = A.Ref)
from [table] A
Edit:
As mentioned in comments below, this approach may not be the most efficient in all cases, but should be sufficient on reasonably small tables.
In my testing:
a table of 5,460 records and 976 distinct 'Ref' values returned in less than 1 second.
a table of 600,831 records and 8,335 distinct 'Ref' values returned in 6 seconds.
a table of 845,218 records and 15,147 distinct 'Ref' values returned in 13 seconds.

You should provide SQL brand to know capabilities:
1) If your DB supports window functions:
Select
*,
count(*) over ( partition by ref ) as cnt
from your_table
2) If not:
Select
T.*, G.cnt
from
( select * from your_table ) T inner join
( select count(*) as cnt from your_table group by ref ) G
on T.ref = G.ref

You can use COUNT with OVERin following:
QUERY
select ID,
Name,
ref,
count(ref) over (partition by ref) cnt
from #t t
SAMPLE DATA
create table #t
(
ID NVARCHAR(400),
Name NVARCHAR(400),
Ref INT
)
insert into #t values
('abcd','john doe', 123),
('1234','martina', 100),
('123x','brittany', 123),
('ab12','joe', 101)

SQL Split Multiple Columns into Multiple Rows

I'm having difficulty with this problem.
I have a table with this structure:
OrderID | Manager | Worker
1 | John | Sally
2 | Tim | Kristy
I need a SQL query to get a result set like this:
OrderID | Employee
1 | John
1 | Sally
2 | Tim
2 | Kristy
Is this possible to perform?

Simplest way I can think of is (assuming you don't care if Tim is listed before or after Kristy):
SELECT OrderID, Employee = Manager FROM dbo.table
UNION ALL
SELECT OrderID, Employee = Worker FROM dbo.table
ORDER BY OrderID;
If order matters, and you want manager first always, then:
SELECT OrderID, Employee FROM
(
SELECT r = 1, OrderID, Employee = Manager
FROM dbo.Table
UNION ALL
SELECT r = 2, OrderID, Employee = Worker
FROM dbo.table
) AS x
ORDER BY OrderID, r;

You can use UNPIVOT for this.
SELECT p.OrderID, p.Employee
FROM (SELECT OrderID, Manager, Worker FROM table) a
UNPIVOT (Employee FOR FieldName IN (Manager, Worker)) p

Try something like
SELECT OrderID, Manager AS Employee, 'Manager' AS EmployeeRole From Employess
UNION ALL
SELECT OrderID, Worker AS Employee, 'Worker' AS EmployeeRole From Employess

How do I select a row from nearly duplicate rows based on a field value?

If I have rows with this data:
ID |Name |ContractType|
---|------------|------------|
1 |Aaron Shatz | 6-month |
2 |Jim Smith |12-month |
3 |Jim Smith | 6-month |
4 |Mark Johnson|12-month |
I can't use Id to determine which record to use: I have to use ContractType. I want to select all records from a table, but if there are records with the same Name value, I want to pick the 12-month contract record.
The result of the query should be:
ID |Name |ContractType|
---|------------|------------|
1 |Aaron Shatz | 6-month |
2 |Jim Smith |12-month |
4 |Mark Johnson|12-month |

Hard coded version
This solution assumes that there are only two contract types namely 6-month and 12-month. Please scroll to the bottom for dynamic version.
Click here to view the demo in SQL Fiddle.
Script:
CREATE TABLE contracts
(
id INT NOT NULL IDENTITY
, name VARCHAR(30) NOT NULL
, contracttype VARCHAR(30) NOT NULL
);
INSERT INTO contracts (name, contracttype) VALUES
('Aaron Shatz', '6-month'),
('Jim Smith', '12-month'),
('Jim Smith', '12-month'),
('Mark Johnson', '12-month'),
('John Doe', '6-month'),
('Mark Johnson', '6-month'),
('Aaron Shatz', '6-month');
SELECT id
, name
, contracttype
FROM
(
SELECT id
, name
, contracttype
, ROW_NUMBER() OVER(PARTITION BY name ORDER BY contracttype) AS rownum
FROM contracts
) T1
WHERE rownum = 1
ORDER BY id;
Output:
id name contracttype
-- ------------ ------------
1 Aaron Shatz 6-month
2 Jim Smith 12-month
4 Mark Johnson 12-month
5 John Doe 6-month
Dynamic version
This moves the contract type data into a table of its own with a sequence column. Based on how the contract types are ordered, the query will fetch the appropriate records.
Click here to view the demo in SQL Fiddle.
Script:
CREATE TABLE contracts
(
id INT NOT NULL IDENTITY
, name VARCHAR(30) NOT NULL
, contracttypeid INT NOT NULL
);
CREATE TABLE contracttypes
(
id INT NOT NULL IDENTITY
, contracttype VARCHAR(30) NOT NULL
, sequence INT NOT NULL
)
INSERT INTO contracttypes (contracttype, sequence) VALUES
('12-month', 1),
('6-month', 3),
('15-month', 2);
INSERT INTO contracts (name, contracttypeid) VALUES
('Aaron Shatz', 2),
('Jim Smith', 2),
('Jim Smith', 3),
('Mark Johnson', 1),
('John Doe', 2),
('Mark Johnson', 2),
('Aaron Shatz', 2);
SELECT id
, name
, contracttype
FROM
(
SELECT c.id
, c.name
, ct.contracttype
, ROW_NUMBER() OVER(PARTITION BY name ORDER BY ct.sequence) AS rownum
FROM contracts c
LEFT OUTER JOIN contracttypes ct
ON c.contracttypeid = ct.id
) T1
WHERE rownum = 1
ORDER BY id;
Output:
id name contracttype
-- ------------ ------------
1 Aaron Shatz 6-month
3 Jim Smith 15-month
4 Mark Johnson 12-month
5 John Doe 6-month

This works only because the OP has confirmed that only two contract types are possible, and the one he wants (for each contractor) happens to be the one that orders first alphabetically. So a couple of coincidences make this solution straight-forward.
;WITH x AS
(
SELECT ID, Name, ContractType, rn = ROW_NUMBER() OVER
(PARTITION BY Name ORDER BY ContractType)
FROM dbo.some_table
)
SELECT ID, Name, ContractType
FROM x
WHERE rn = 1
ORDER BY ID;
If you need to make this more dynamic, I suppose you could say:
DECLARE #PreferredContractType VARCHAR(32);
SET #PreferredContractType = '12-month';
;WITH x AS
(
SELECT ID, Name, ContractType, rn = ROW_NUMBER() OVER
(PARTITION BY Name ORDER BY CASE ContractType
WHEN #PreferredContractType THEN 1 ELSE 2 END
)
FROM dbo.some_table
)
SELECT ID, Name, ContractType
FROM x
WHERE rn = 1
ORDER BY ID;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to select sequential duplicates in SQL Server - sql

You are very close to what you want: select id, companyName from (select c.*, lag(companyName, 1) over (order by id) prev, lead(companyName, 1) over (order by id) next from companies c ) a where CompanyName in (prev, next) order by id;

You can use Row_Number() and get the duplicates based on partition by clause ;with cte as ( SELECT id, companyName, RowN = Row_Number() over (partition by id order by companynae) from #yourTable ) Select * from cte where RowN > 1 Can you provide your input and expected output to verify this query

Related

Top 10 of total amount paid aggregated by provider, partitioned by state - PostgreSQL

How to give the serial number if data is repeating

countif type function in SQL where total count could be retrieved in other column

SQL Split Multiple Columns into Multiple Rows

How do I select a row from nearly duplicate rows based on a field value?

Categories

Resources