SQL filter if ID doesn't have a value in another column - sql

I have this sample table
Payment ID
Type
123
Fee
123
Service
123
Finance
456
Fee
456
Service
I'm trying to achieve a table that would filter out any row where an ID doesn't have "type" "Finance".
Expected result would be
Payment ID
Type
456
Fee
456
Service

A readable alternative is:
with
filter_table as (
select payment_id
from your_table
where type = 'Finance'
)
select *
from your_table
where payment_id is not in (select id from filter_table)
An alternative without the sub-query could be:
select *
from your_table
where payment_id is not in (select id from your_table where type='Finance')

NOT EXISTS should work fine for you.
select * from TEST t
where not exists (
select 1 from TEST
where PAYMENTID = t.PAYMENTID and type = 'Finance')

Related

Pulling all ID with the latest date

Say I have multiple columns and three different rows. I want to pull the all the ID with the latest date 20220205 (for Ann) and 20220208 (for Lima) to get the correct package code. How do I code it out in the where statement?
ID
Name
pkg_date
package
11
Ann
20220205
R
11
Ann
20220101
A
11
Ann
20211101
U
22
Lima
20210708
B
22
Lima
20220208
A
You can use one of the rank-related window functions (rank, dense_rank, or row_number). Here is an example which uses dense_rank function.
SELECT *
FROM (
select id,
name,
pkg_date,
package,
dense_rank() OVER(PARTITION BY id ORDER BY pkg_date DESC) rk
from table_name) t
WHERE rk = 1
You can use an inner join with the max date:
SELECT ID, NAME, PKG_DATE, PACKAGE
FROM table t INNER JOIN
(SELECT max(PKG_DATE), ID, NAME FROM table GROUP BY ID, NANE) i
on i.ID = t.ID;
Try the new MAX_BY function:
select
id
, name
, max_by(package, pkg_date) as latest_pkg_code
from test_table
group by 1,2;
Result:
ID
NAME
LATEST_PKG_CODE
11
Ann
R
22
Lima
A
Here is a sample script:
-- create table
create or replace transient table test_table (
id int
, name varchar(50)
, pkg_date date
, package varchar(1)
);
-- insert data
insert into test_table
values
(11,'Ann','2022-02-05','R')
,(11,'Ann','2022-01-01','A')
,(11,'Ann','2021-11-01','U')
,(22,'Lima','2021-07-08','B')
,(22,'Lima','2022-02-08','A');
-- test table
select * from test_table;
-- get package code for max date for each name/id
select
id
, name
, max_by(package, pkg_date) as latest_pkg_code
from test_table
group by 1,2;

SQL find the 2 highest score for each country

I have two tables: maps_query and map_time like below:
CREATE TABLE maps_query (
id int
day varchar
search_query varchar
country varchar
query_score int
)
CREATE TABLE map_time (
id int
am_pm varchar
)
The question is to find the 2 highest score for each country. Desired output is like below:
country search_query query_score
CA Target 1000
CA Store 900
FR Eiffiel 1500
FR London 800
I was trying to use row_number() over but don't know how to complete the query.
Select t1.country, t1.search_query, t1.score_rank,
from (select *, (row_number() over(partition by country order by query_score) as score_rank from maps_search) t1
where score_rank = 1 and score_rank=2
This can be achieved by rank() instead of row_number().
select
*
from
(
select
*,
rank() over (
PARTITION by country
order by
query_score desc
)
from
maps_query
) q
where
rank <= 2;
A good reference article: https://spin.atomicobject.com/2016/03/12/select-top-n-per-group-postgresql/

Remove duplicated rows in sql query [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 1 year ago.
I have a table with the following structure
[id] [int] IDENTITY(1,1) NOT NULL,
[account_number] [int] NOT NULL,
[account_name] [varchar(100)] NULL,
[account_chapter] [varchar(20)] NULL,
There can be many rows with the same account_number, but differents account_name and account_chapters.
For example, we can have something like the following :
id account_number account_name account_chapter
12 1111 Name01 chapter01
13 1111 Name02 chapter02
14 2222 Name03 chapter07
15 2222 Name05 chapter11
16 7777 Name06 chapter44
What i want is a query that for each account_number, filter only the first occurence in the table. For example, the query above must be transformed in the following :
id account_number account_name account_chapter
12 1111 Name01 chapter01
14 2222 Name03 chapter07
16 7777 Name06 chapter44
Here is the query i wrote :
with req01 as (select distinct account_number from accounts)
select * from req01 full join (select * from accounts) as p on p.account_number = req01.account_number
It do not produce the expected result.
Any help ?
Thanks.
Use ROW_NUMBER:
SELECT TOP 1 WITH TIES *
FROM accounts
ORDER BY ROW_NUMBER() OVER (PARTITION BY account_number ORDER BY account_chapter);
Or, using ROW_NUMBER in a more typical way:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY account_number
ORDER BY account_chapter) rn
FROM accounts
)
SELECT id, account_number, account_name, account_chapter
FROM cte
WHERE rn = 1;
Note that both of these answers assumes that the account_chapter version determines which of the "duplicates" is actually first.
Normally, the "first" occurrence in the table would be determined by a date/time or identity column. You have only the identity, so you seem to want:
select a.*
from accounts a
where a.id = (select min(a2.id)
from accounts a2
where a2.account_number = a.account_number
);

Finding first transaction after death of date

Could anyone please help to enhance the query below,
select
columns
from
(
select t.*,
sum (case when TRAN_DATE >= '20170701' then 1 end)
over (partition by acct_no order by TRAN_DATE, TRAN_TIME ) as sm_i
from (
Select
Columns
FROM
#BASEtable DTRAN
INNER JOIN sometable
where condition) t) t
where sm_i = 1
order by acc_no
here is the data example, (attached)
Company Acct_no Tran_Date Death_of_date
1 123 20170725 20170702
1 123 20170825 20170702
1 123 20170925 20170702
2 456 20191025 20200101
2 456 20191125 20200101
2 456 20191225 20200101
Result expected: Row no 1 , as that is the first transaction for that account after the death_of_date
I am sorting the data based on 20170701, that is it will pick the first transaction happened after this date should be picked up which is working with the above query.
Now, i want to set the value of '20170701' with the dynamic value , i.e. need the first transaction of every account after its death of date..
I replaced the partition code the below code,
sum(case when tran_Date > = (select death_of_date from #basetable a where a.acct_no = t.acct_no ) then 1 end)
over partition by acct_no order by tran_Date , tran_Time) as sm_i
but getting error saying, subquery retuned more than one result which is not application where using > , = and so on.
Please help to enhance this code in sql server. Appreciate your help in advance!
enter image description here
Assuming two things:
You have data with the four columns you have specified.
For each account, you want the first row meeting your date condition.
Then you can use window functions and filtering:
select t.*
from (select t.*,
row_number() over (partition by Company, Acct_no order by Tran_Date) as seqnum
from t
where tran_date > death_of_date
) t
where seqnum = 1;

Finding duplicate rows in SQL Server

I have a SQL Server database of organizations, and there are many duplicate rows. I want to run a select statement to grab all of these and the amount of dupes, but also return the ids that are associated with each organization.
A statement like:
SELECT orgName, COUNT(*) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
Will return something like
orgName | dupes
ABC Corp | 7
Foo Federation | 5
Widget Company | 2
But I'd also like to grab the IDs of them. Is there any way to do this? Maybe like a
orgName | dupeCount | id
ABC Corp | 1 | 34
ABC Corp | 2 | 5
...
Widget Company | 1 | 10
Widget Company | 2 | 2
The reason being that there is also a separate table of users that link to these organizations, and I would like to unify them (therefore remove dupes so the users link to the same organization instead of dupe orgs). But I would like part manually so I don't screw anything up, but I would still need a statement returning the IDs of all the dupe orgs so I can go through the list of users.
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
SELECT orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
You can run the following query and find the duplicates with max(id) and delete those rows.
SELECT orgName, COUNT(*), Max(ID) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
But you'll have to run this query a few times.
You can do it like this:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
If you want to return just the records that can be deleted (leaving one of each), you can use:
SELECT
id, orgName
FROM (
SELECT
orgName, id,
ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
FROM organizations
) AS d
WHERE intRow != 1
Edit: SQL Server 2000 doesn't have the ROW_NUMBER() function. Instead, you can use:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id
You can try this , it is best for you
WITH CTE AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations
)
select * from CTE where RN>1
go
The solution marked as correct didn't work for me, but I found this answer that worked just great: Get list of duplicate rows in MySql
SELECT n1.*
FROM myTable n1
INNER JOIN myTable n2
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id
If you want to delete duplicates:
WITH CTE AS(
SELECT orgName,id,
RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
FROM organizations
)
DELETE FROM CTE WHERE RN > 1
select * from [Employees]
For finding duplicate Record
1)Using CTE
with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte
2)By Using GroupBy
select Name,EmailId,COUNT(name) as Duplicate from [Employees] group by Name,EmailId
Select * from (Select orgName,id,
ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum
From organizations )tbl Where Rownum>1
So the records with rowum> 1 will be the duplicate records in your table. ‘Partition by’ first group by the records and then serialize them by giving them serial nos.
So rownum> 1 will be the duplicate records which could be deleted as such.
select column_name, count(column_name)
from table_name
group by column_name
having count (column_name) > 1;
Src : https://stackoverflow.com/a/59242/1465252
select a.orgName,b.duplicate, a.id
from organizations a
inner join (
SELECT orgName, COUNT(*) AS duplicate
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) b on o.orgName = oc.orgName
group by a.orgName,a.id
select orgname, count(*) as dupes, id
from organizations
where orgname in (
select orgname
from organizations
group by orgname
having (count(*) > 1)
)
group by orgname, id
You have several way for Select duplicate rows.
for my solutions , first consider this table for example
CREATE TABLE #Employee
(
ID INT,
FIRST_NAME NVARCHAR(100),
LAST_NAME NVARCHAR(300)
)
INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );
First solution :
SELECT DISTINCT *
FROM #Employee;
WITH #DeleteEmployee AS (
SELECT ROW_NUMBER()
OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
RNUM
FROM #Employee
)
SELECT *
FROM #DeleteEmployee
WHERE RNUM > 1
SELECT DISTINCT *
FROM #Employee
Secound solution : Use identity field
SELECT DISTINCT *
FROM #Employee;
ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)
SELECT *
FROM #Employee
WHERE UNIQ_ID < (
SELECT MAX(UNIQ_ID)
FROM #Employee a2
WHERE #Employee.ID = a2.ID
AND #Employee.FIRST_NAME = a2.FIRST_NAME
AND #Employee.LAST_NAME = a2.LAST_NAME
)
ALTER TABLE #Employee DROP COLUMN UNIQ_ID
SELECT DISTINCT *
FROM #Employee
and end of all solution use this command
DROP TABLE #Employee
i think i know what you need
i needed to mix between the answers and i think i got the solution he wanted:
select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
having the max id will give you the id of the dublicate and the one of the original which is what he asked for:
id org name , dublicate count (missing out in this case)
id doublicate org name , doub count (missing out again because does not help in this case)
only sad thing you get it put out in this form
id , name , dubid , name
hope it still helps
Suppose we have table the table 'Student' with 2 columns:
student_id int
student_name varchar
Records:
+------------+---------------------+
| student_id | student_name |
+------------+---------------------+
| 101 | usman |
| 101 | usman |
| 101 | usman |
| 102 | usmanyaqoob |
| 103 | muhammadusmanyaqoob |
| 103 | muhammadusmanyaqoob |
+------------+---------------------+
Now we want to see duplicate records
Use this query:
select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;
+---------------------+------------+---+
| student_name | student_id | c |
+---------------------+------------+---+
| usman | 101 | 3 |
| muhammadusmanyaqoob | 103 | 2 |
+---------------------+------------+---+
I got a better option to get the duplicate records in a table
SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname
Result of the above query shows all the duplicate names with unique student ids and number of duplicate occurances
Click here to see the result of the sql
/*To get duplicate data in table */
SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1
GROUP BY EmpCode HAVING COUNT(EmpCode) > 1
I use two methods to find duplicate rows.
1st method is the most famous one using group by and having.
2nd method is using CTE - Common Table Expression.
As mentioned by #RedFilter this way is also right. Many times I find CTE method is also useful for me.
WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName)
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1
In the example above we collected the result by finding repeat occurrence using ROW_NUMBER and PARTITION BY. Then we applied where clause to select only rows which are on repeat count more than 1. All the result is collected CTE table and joined with Organizations table.
Source : CodoBee
Try
SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;