SQL -- Remove duplicate pairs - sql

I'm using an SQLite to store a set of undirected edges of a graph using two columns, u and v. For example:
u v
1 2
3 2
2 1
3 4
I have already been through it with SELECT DISTINCT * FROM edges and removed all duplicate rows.
However, there are still duplicates if we remember these are undirected edges. In the above example, the edge (1,2) appears twice, once as (1,2) and once as (2,1) which are both equivalent.
I wish to remove all such duplicates leaving only one of them, either (1,2) or (2,1) -- it doesn't really matter which.
Any ideas how to achieve this? Thanks!

If the same pair (reversed) exists take the one where u>v.
SELECT DISTINCT u,v
FROM table t1
WHERE t1.u > t1.v
OR NOT EXISTS (
SELECT * FROM table t2
WHERE t2.u = t1.v AND t2.v = t1.u
)

This will find all the duplicates:
SELECT t1.u, t1.v FROM table t1 INNER JOIN table t2
ON t1.u = t2.v AND t1.v = t2.u
This will delete the duplicates:
DELETE FROM table t1 WHERE
EXISTS (SELECT * FROM table t2 WHERE t2.u = t1.v AND t2.v = t1.u AND t1.u > t2.u)
Note that this will not delete duplicates like (2, 2) but I think you got those already with SELECT DISTINCT.

Testing for 9 numbers so I'm adding 9 numbers to two tables:
declare #num int
set #num =1
while #num<10
begin
insert into t2 values (#num)
insert into t1 values (#num)
set #num += 1
end
Then coupling uniques without any repetition:
select t1.u, t2.v
from t1 cross join t2
where t1.u>t2.v

Related

Equivalent Subquery for a Join

I am looking for an answer which is actually
Is It possible to rewrite every Join to equivalent Subquery
I know that Subquery columns can not be selected outer query.
I run a query in sql server which is
select DISTINct A.*,B.ParentProductCategoryID from [SalesLT].[Product] as
A inner join [SalesLT].[ProductCategory] as B on
A.ProductCategoryID=B.ProductCategoryID
select A.*
from [SalesLT].[Product] as A
where EXISTS(select B.ParentProductCategoryID from [SalesLT].
[ProductCategory] as B where A.ProductCategoryID=B.ProductCategoryID)
Both of these query giving me output 293 rows which I expected.
Now Problem is How do I select [SalesLT].[ProductCategory] the column in the 2nd case?
Do I need to co-relate this subquery in the select clause to get this column to be shown in output?
Is It possible to rewrite every Join to equivalent Subquery
No, because joins can 1) remove rows or 2) multiply rows
ex 1)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3)
SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
The row containing value 1 from t1 was removed. This does not happen in a subquery.
ex 2)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3), (3), (3), (3)
SELECT t1.num AS t1num, t2.num as t2num FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
3 3
3 3
3 3
A subquery would not change the number of rows in the table being queried.
In your example, you do an exists... this is not going to return the value from the 2nd table.
This is how I would subquery:
select A.*
,(SELECT B.ParentProductCategoryID
FROM [SalesLT].[ProductCategory] B
WHERE B.ProductCategoryID = A.ProductCategoryID) AS [2nd table ProductCategoryID]
from [SalesLT].[Product] as A
You might use
select A.*,
(
select B.ParentProductCategoryID
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID
) ParentProductCategoryID
from [SalesLT].[Product] as A
where EXISTS(select 1
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID)
however, I find the JOIN version much more intuitive.
There is no way for you to use any data from the EXISTS subquery in the outer query. The only purpose of the subquery is to evaluate whether the EXISTS is true or false for each product.

Deleting equal number of records with positive and negative values in a table

I have a table having multiple negative and positive values, i want to delete only those number of records from table which are having negative values and have the same positive values . I'm not sure how to explain this scenario...
I will give a brief example-
I have a table with 6 records in which 2 records are with negative value and 4 record with positive
Name | number
A | 1
A |-1
A | 1
A |-1
A | 1
A | 1
So here i want to delete equal number of records of negative value and positive value
so my output should be
Name | Number
A | 1
A | 1
By using Row_number
;WITH CTE AS (
select *,ROW_NUMBER()OVER(PARTITION BY number ORDER BY (SELECT NULL)) -1 RN from Table1 )
Select Name, number from CTE WHERE RN NOT IN (1,0)
The following query assumes that your table has either a column called id which is either a primary key or some other means to order your records. Without any order, your question cannot be answered, and in fact the data sample you showed us would have no meaning, since internally records have no order in a SQL database.
WITH cte1 AS (
SELECT t1.id, t1.number, SUM(t2.number) as sum
FROM yourTable t1
INNER JOIN yourTable t2 on t1.id >= t2.id
GROUP BY t1.id, t1.number
)
WITH cte2 AS (
SELECT MAX(id) AS cutoff
FROM cte1
WHERE sum = 0
)
SELECT t.*
FROM yourTable t
WHERE t.id > (SELECT cutoff FROM cte2)
Note that I used the old school way of computing a running sum because you never told us the version of SQL Server which you are using. Hence, I didn't want to make assumptions about what you have available.
declare #negvalrecs int = (select COUNT(*) from tab where Number < 0)
delete
from tab
where Number < 0
delete top (#negvalrecs)
from tab
where Number > 0
Thanks for all your inputs!
I have a solution for it. We will be needing row number function for it.
--Providing row number to rows
select *,row_number () over (partition by name,number order by name) R into #1 from Table
--Taking negative values
select * into #2 from #1 where number<0
--Now Deleting those records from the main table by joining this table
delete #1 from #1 a inner join #2 b on a.name=b.name and a.number=b.number and a.r<=b.r
delete #1 from #1 a inner join #2 b on a.name=b.name and a.number=-(b.number) and a.r<=b.r
Hope it helps!
I recently encountered a similar problem and this is how I resolved it.
I also had records in table where there we no negatives for a given name the union all is to bring such records.
SELECT t1.name, t1.number
FROM table t1
LEFT OUTER JOIN
(SELECT name, number FROM table where number < 0) t2
ON
t1.name = t2.name and t1.number = t2.number
WHERE t1.number > 0 and t2.number IS NOT NULL
UNION ALL
SELECT t1.name, t1.number
FROM table t1
LEFT OUTER JOIN
(SELECT name, number FROM table where number < 0) t2
ON
t1.name = t2.name
WHERE t1.number > 0 and t2.number IS NULL;`
Try this,
delete from table_name
where substring(ltrim(rtrim(number)),1,1)='-'

SQLServer join two tables

I've gotta question for you, I'm getting hard times trying to combine two tables, I can't manage to find the correct query.
I have two tables:
T1: 1column, Has X records
T2: 1column, Has Y records
Note: Y could never be greater than X but it often lesser than this one
I want to join those tables in order to have a table with two columns
t3: ColumnFromT1, columnFromT2.
When Y is lesser than X, the T2 field values gets repeated and are spread over all my other values, but I want to get NULL when ALL the columns from T2 are used.
How could I achieve that?
Thanks
You could give each table a row number in a subquery. Then you can left join on that row number. To recycle rows from the second table, take the modulus % of the first table's row number.
Example:
select Sub1.col1
, Sub2.col1
from (
select row_number() over (order by col1) as rn
, *
from #T1
) Sub1
left join
(
select row_number() over (order by col1) as rn
, *
from #T2
) Sub2
on (Sub1.rn - 1) % (select count(*) from #T2) + 1 = Sub2.rn
Test data:
declare #t1 table (col1 int)
declare #t2 table (col1 datetime)
insert #t1 values (1), (2), (3), (4), (5)
insert #t2 values ('2010-01-01'), ('2012-02-02')
This prints:
1 2010-01-01
2 2012-02-02
3 2010-01-01
4 2012-02-02
5 2010-01-01
You are looking for a LEFT JOIN (http://www.w3schools.com/sql/sql_join_left.asp) eg . T1 LEFT JOIN T2
say they both have column CustomerID in common
SELECT *
FROM T1
LEFT JOIN
T2 on t1.CustomerId = T2.CustomerId
This will return all records in T1 and those that match in T2 with nulls for the T2 values where they do not match.
Make sure you are joining the tables on a common column (or common column set if more than one column are necessary to perform the join). If not, you are doing a cartesian join ( http://ezinearticles.com/?What-is-a-Cartesian-Join?&id=3560672 )

SQL Inner Join On Null Values

I have a Join
SELECT * FROM Y
INNER JOIN X ON ISNULL(X.QID, 0) = ISNULL(y.QID, 0)
Isnull in a Join like this makes it slow. It's like having a conditional Join.
Is there any work around to something like this?
I have a lot of records where QID is Null
Anyone have a work around that doesn't entail modifying the data
You have two options
INNER JOIN x
ON x.qid = y.qid OR (x.qid IS NULL AND y.qid IS NULL)
or easier
INNER JOIN x
ON x.qid IS NOT DISTINCT FROM y.qid
If you want null values to be included from Y.QID then Fastest way is
SELECT * FROM Y
LEFT JOIN X ON y.QID = X.QID
Note: this solution is applicable only if you need null values from Left table i.e. Y (in above case).
Otherwise
INNER JOIN x ON x.qid IS NOT DISTINCT FROM y.qid
is right way to do
This article has a good discussion on this issue. You can use
SELECT *
FROM Y
INNER JOIN X ON EXISTS(SELECT X.QID
INTERSECT
SELECT y.QID);
Are you committed to using the Inner join syntax?
If not you could use this alternative syntax:
SELECT *
FROM Y,X
WHERE (X.QID=Y.QID) or (X.QUID is null and Y.QUID is null)
I'm pretty sure that the join doesn't even do what you want. If there are 100 records in table a with a null qid and 100 records in table b with a null qid, then the join as written should make a cross join and give 10,000 results for those records. If you look at the following code and run the examples, I think that the last one is probably more the result set you intended:
create table #test1 (id int identity, qid int)
create table #test2 (id int identity, qid int)
Insert #test1 (qid)
select null
union all
select null
union all
select 1
union all
select 2
union all
select null
Insert #test2 (qid)
select null
union all
select null
union all
select 1
union all
select 3
union all
select null
select * from #test2 t2
join #test1 t1 on t2.qid = t1.qid
select * from #test2 t2
join #test1 t1 on isnull(t2.qid, 0) = isnull(t1.qid, 0)
select * from #test2 t2
join #test1 t1 on
t1.qid = t2.qid OR ( t1.qid IS NULL AND t2.qid IS NULL )
select t2.id, t2.qid, t1.id, t1.qid from #test2 t2
join #test1 t1 on t2.qid = t1.qid
union all
select null, null,id, qid from #test1 where qid is null
union all
select id, qid, null, null from #test2 where qid is null
Hey it is kind of late to answer that but I got the same question, what I realized is that you must have a record with the ID of 0 in you second table to make this :
SELECT * FROM Y
INNER JOIN X ON ISNULL(Y.QID, 0) = ISNULL(X.QID, 0)
to happen, it actually says if there is none, then use 0. BUT what if Y table does NOT have a record with the ID of 0?
So, I found this method, (and worked for my case):
SELECT
ISNULL(Y.QName, 'ThereIsNone') AS YTableQName
FROM
X
LEFT OUTER JOIN Y ON X.QID = Y.QID
A snapshot of my case
This way you DON'T need a record with 0 ID value in your second table (which is Y in this case and Customers in my case), OR any record at all
UPDATE:
You can also take a look at this post for better understanding.
Basically you want to join two tables together where their QID columns are both not null, correct? However, you aren't enforcing any other conditions, such as that the two QID values (which seems strange to me, but ok). Something as simple as the following (tested in MySQL) seems to do what you want:
SELECT * FROM `Y` INNER JOIN `X` ON (`Y`.`QID` IS NOT NULL AND `X`.`QID` IS NOT NULL);
This gives you every non-null row in Y joined to every non-null row in X.
Update: Rico says he also wants the rows with NULL values, why not just:
SELECT * FROM `Y` INNER JOIN `X`;
You could also use the coalesce function. I tested this in PostgreSQL, but it should also work for MySQL or MS SQL server.
INNER JOIN x ON coalesce(x.qid, -1) = coalesce(y.qid, -1)
This will replace NULL with -1 before evaluating it. Hence there must be no -1 in qid.

How do I compare 2 rows from the same table (SQL Server)?

I need to create a background job that processes a table looking for rows matching on a particular id with different statuses. It will store the row data in a string to compare the data against a row with a matching id.
I know the syntax to get the row data, but I have never tried comparing 2 rows from the same table before. How is it done? Would I need to use variables to store the data from each? Or some other way?
(Using SQL Server 2008)
You can join a table to itself as many times as you require, it is called a self join.
An alias is assigned to each instance of the table (as in the example below) to differentiate one from another.
SELECT a.SelfJoinTableID
FROM dbo.SelfJoinTable a
INNER JOIN dbo.SelfJoinTable b
ON a.SelfJoinTableID = b.SelfJoinTableID
INNER JOIN dbo.SelfJoinTable c
ON a.SelfJoinTableID = c.SelfJoinTableID
WHERE a.Status = 'Status to filter a'
AND b.Status = 'Status to filter b'
AND c.Status = 'Status to filter c'
OK, after 2 years it's finally time to correct the syntax:
SELECT t1.value, t2.value
FROM MyTable t1
JOIN MyTable t2
ON t1.id = t2.id
WHERE t1.id = #id
AND t1.status = #status1
AND t2.status = #status2
Some people find the following alternative syntax easier to see what is going on:
select t1.value,t2.value
from MyTable t1
inner join MyTable t2 on
t1.id = t2.id
where t1.id = #id
SELECT COUNT(*) FROM (SELECT * FROM tbl WHERE id=1 UNION SELECT * FROM tbl WHERE id=2) a
If you got two rows, they different, if one - the same.
SELECT * FROM A AS b INNER JOIN A AS c ON b.a = c.a
WHERE b.a = 'some column value'
I had a situation where I needed to compare each row of a table with the next row to it, (next here is relative to my problem specification) in the example next row is specified using the order by clause inside the row_number() function.
so I wrote this:
DECLARE #T TABLE (col1 nvarchar(50));
insert into #T VALUES ('A'),('B'),('C'),('D'),('E')
select I1.col1 Instance_One_Col, I2.col1 Instance_Two_Col from (
select col1,row_number() over (order by col1) as row_num
FROM #T
) AS I1
left join (
select col1,row_number() over (order by col1) as row_num
FROM #T
) AS I2 on I1.row_num = I2.row_num - 1
after that I can compare each row to the next one as I need