How to to find all matching rows in 2 columns in SQL? - sql

My table has 2 columns containing code pairs (Parentcodes and Childcodes). They are unique parings but each code can and often are repeated in each column. I'm trying to pull a list of each instance of each code and all of the associated values from the other column.
So basically
Select ParentCode, Childcode
from TABLE
where count(ParentCode)>1
(and vice versa)
It seems like I have to include both columns in the group by if I want them both in the select. I've tried subqueries but with no luck. I know I can set up a script in VBA to loop through each code and return the results (running a basic select where count > 1), but that seems like the least efficient approach.
Sample data:

To get as parentcode or childcode also repeated more than 1 time you can use IN:
select Parentcode, Childcode
from Table
where Parentcode in (
select Parentcode
from Table
group by Parentcode
having count(Parentcode) > 1
)
or Childcode in (
select Childcode
from Table
group by Childcode
having count(Childcode) > 1
)

You should be just about there with that.
select Perentcode, count(ParentCode) count
from TABLE
group by ParentCode
having count(Parentcode)>1

You can use EXISTS:
select t.* from tablename t
where
exists (select 1 from tablename where parentcode <> t.parentcode and childcode = t.childcode)
or
exists (select 1 from tablename where parentcode = t.parentcode and childcode <> t.childcode)

Related

SQL remove a row based on condition

I have a stored procedure in Bigquery and a resulting table where 2 rows are not exactly duplicates but I want to filter one of the rows based on a condition.
SQL query:
Results:
WITH DupCodes AS (
SELECT AccCode
FROM Table
GROUP BY AccCode
HAVING COUNT(*) > 1
)
SELECT *
FROM table
WHERE (AccCode IN (SELECT AccCode FROM DupCodes) AND AccountName IS NOT NULL)
OR (AccCode NOT IN (SELECT AccCode FROM DupCodes))
One method uses not exists logic:
select t.*
from t
where t.accountname is not null or
not exists (select 1
from t t2
where t2.accCode = t.accCode and t2.accountname is not null
);
That is, show all rows where accountname is not empty. Then show empty rows only when there is no non-empty accountname for the same accCode.

How to delete the duplicate data in table (Postgres)

I want to delete the duplicated data in a table , I know there is a way use
SELECT
fruit,
COUNT( fruit )
FROM
basket
GROUP BY
fruit
HAVING
COUNT( fruit )> 1
ORDER BY
fruit;
to find them , buy I need to determine every column's value is equal , which means tableA.* = tableA.* (except id , id is the auto-increment primary key )
and I tried this:
SELECT
*,
COUNT( * )
FROM
myTable
GROUP BY
*
HAVING
COUNT( * )> 1
ORDER BY
id;
but it says I can't use GROUP BY * , so how can I find & delete the duplicated data(need every column's value is equal except id)?
using
SELECT * DISTINCT
DISTINCT remove duplicated result
You need to try something similar to be below query. You apply PARTITION BY for the columns other than Id (as it is incrementing unique value). PARTITION BY should be applied for columns, for which you want to check duplicates.
Also refer to Row_Number in Postgres & Common Table expression in Postgres
WITH DuplicateTableRows AS
(
SELECT Id, Row_Number() OVER (PARTITION BY col1, col2... ORDER BY Id)
FROM
Table1
)
DELETE FROM Table1
WHERE Id IN (SELECT Id FROM Table1 WHERE row_number > 1)
You can do this using JSON:
select (to_jsonb(b) - 'id')
from basket b
group by 1
having count(*) > 1;
The result is as JSON. Unfortunately, to extract the values back into a record, you need to list the columns individually.

remove duplicate records with a criteria

I am using a script which requires only unique values. And I have a table which has duplicates like below, i need to keep only unique values (first occurrence) irrespective of what is present inside the brackets.
can I delete the records and keep the unique records using a single query?
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete)testing
4 (Delete)tester
5 (Del)tst
6 (Delete)tst
So the output tables should be something like
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete) tester
4 (Del)tst
SELECT DISTINCT * FROM FOO;
It depends how much data you have to retrieve, if you only have to change Delete -> Del you can try with REPLACE
http://technet.microsoft.com/en-us/library/ms186862.aspx
also grouping functions should help you
I don't think this would be easy query
Assumption: The name column always has all strings in the format given in the sample data.
Try this:
;with cte as
(select *, rank() over
(partition by substring(name, charindex(')',name)+1,len(name)+1 - charindex(')',name))
order by id) rn
from tbl
),
filtered_cte as
(select * from cte
where rn = 1
)
select rank() over (partition by getdate() order by id,getdate()) id , name
from filtered_cte
How this works:
The first CTE cte uses rank() to rank the occurrence of the string outside brackets in the name column.
The second CTE filtered_cte only returns the first row for each occurence of the specified string. In this step, we get the expected results, but not in the desired format.
In this step we partition by and order by the getdate() function. This function is chosen as a dummy to give us continuous values for the id column while using the rank function as we did in step 1.
Demo here.
Note that this solution will return filtered values, but not delete anything in the source table. If you wish, you can delete from the CTE created in step 1 to remove data from the source table.
First use this update to make them uniform
Update table set name = replace(Name, '(Del)' , '(Delete)')
then delete the repetitive names
Delete from table where id in
(Select id from (Select Row_Number() over(Partition by Name order by id) as rn,* from table) x
where rn > 1)
First create the input date table
CREATE TABLE test
(ID int,Name varchar(20));
INSERT INTO test
(`ID`, `Name`)
VALUES
(1, '(Del)testing'),
(2, '(Del)test'),
(3, '(Delete)testing'),
(4, '(Delete)tester'),
(5, '(Del)tst'),
(6, '(Delete)tst');
Select Query
select id, name
from (
select id, name ,
ROW_NUMBER() OVER(PARTITION BY substring(name,PATINDEX('%)%',name)+1,20) ORDER BY name) rn
from test ) t
where rn= 1
order by 1
SQL Fiddle Link
http://www.sqlfiddle.com/#!6/a02b0/34

SQL query exclude rows if having two different values

Basically my select statement returns below:
ID Status
100 1
100 2
101 1
What i'm looking for is to return if a ID having status as 1 and if the same ID has another status ID as 2 then exclude both
In Short results as below:
ID Status
101 1
Thanks in advance !
The following query returns ID values that occur only once.
SELECT ID
FROM t
GROUP BY ID
HAVING COUNT(*) = 1
It should be sufficient for the sample data you provided. If there are other cases then let me know.
SQL Fiddle
You gonna need subquery and NOT IN here.
The following would work if you have column status as INT datatype
SELECT *
FROM table
WHERE status = 1
AND ID NOT IN (
SELECT ID
FROM table
WHERE status = 2
);
Making a generic query, which will remove all duplicated rows, not only for a particular ID :
select ID
from table where ID NOT IN
(select ID from table GROUP BY ID HAVING count(Status) > 1)
/* Subquery will fetch ID's having multiple entries*/
SQL Fiddle
The CTE 'IDs' retrieves all IDs which have single record in DB. This is then joined to original table to return the result as a pair (ID, Status)
;with IDs as
(
select ID
from yourtable
group by ID
having count(*) = 1
)
select i.ID, y.Status
from yourtable y
inner join IDs i on y.ID = i.ID
order by i.ID

sql query - filtering duplicate values to create report

I am trying to list all the duplicate records in a table. This table does not have a Primary Key and has been specifically created only for creating a report to list out duplicates. It comprises of both unique and duplicate values.
The query I have so far is:
SELECT [OfficeCD]
,[NewID]
,[Year]
,[Type]
FROM [Test].[dbo].[Duplicates]
GROUP BY [OfficeCD]
,[NewID]
,[Year]
,[Type]
HAVING COUNT(*) > 1
This works right and gives me all the duplicates - that is the number of times it occurs.
But I want to display all the values in my report of all the columns. How can I do that without querying for each record separately?
For example:
Each table has 10 fields and [NewID] is the field which is occuring multiple times.I need to create a report with all the data in all the fields where newID has been duplicated.
Please help.
Thank you.
You need a subquery:
SELECT * FROM yourtable
WHERE NewID IN (
SELECT NewID FROM yourtable
GROUP BY OfficeCD,NewID,Year,Type
HAVING Count(*)>1
)
Additionally you might want to check your tags: You tagged mysql, but the Syntax lets me think you mean sql-server
Try this:
SELECT * FROM [Duplicates] WHERE NewID IN
(
SELECT [NewID] FROM [Duplicates] GROUP BY [NewID] HAVING COUNT(*) > 1
)
select d.*
from Duplicates d
inner join (
select NewID
from Duplicates
group by NewID
having COUNT(*) > 1
) dd on d.NewID = dd.NewID