Tricky delete. How do i? - sql

I got a table with 2 columns (both INT), and there are 400 000 records (a lot).
The first column is random numbers ordered ASC. The second column has a rule on it (which is not important right now)
In the table there are 1000 records, that are exceptions. So, instead of the "rule", there is only "-1" - valued cells.
How can I delete ~399 000 records, so i want to have in my table left only the ones with -1 and their "neighbors" (the records before and after the ones with -1)
UPDATE
sql server 2k5
first column values - yes unique, but not ID-s (it's not ++ :D)
example:
before:
20022518 13
20022882 364
20022885 -1
20022887 5
20022905 18
20023200 295
20023412 212
20023696 284
20024112 416
20025015 903
20025400 385
20025401 -1
20025683 283
20025981 298
20025989 8
20026752 763
20027779 1027
20028344 565
20028350 6
20028896 546
20028921 25
20028924 -1
20028998 77
20029031 33
20029051 20
20029492 441
20029530 38
20029890 360
after:
20022882 364
20022885 -1
20022887 5
20025400 385
20025401 -1
20025683 283
20028921 25
20028924 -1
20028998 77

If I understand correctly you want to keep all records with col2 = -1 and the records with the closest col1 to the records with -1. Assuming no duplicates in col1 I would do something like this
delete from table where not col1 in
(
(select col1 from table where col2 = -1)
union
(select (select max(t2.col1) from table t2 where t2.col1 < t1.col1) from table t1 where t1.col2 = -1)
union
(select (select min(t4.col1) from table t4 where t4.col1 > t3.col1) from table t3 where t3.col2 = -1)
)
Edit:
t4.col1 < t3.col1 should be t4.col1 > t3.col1
I created a test-table with col1 and col2, both int, col1 is PK, but not autonumber
SELECT * from adjacent
Gives
col1 col2
1 5
3 4
4 2
7 -1
11 8
12 2
With the above subselects:
SELECT * from adjacent
where
col1 in
(
(select col1 from adjacent where col2 = -1)
union
(select (select max(t2.col1) from adjacent t2 where t2.col1 < t1.col1) from adjacent t1 where t1.col2 = -1)
union
(select (select min(t4.col1) from adjacent t4 where t4.col1 > t3.col1) from adjacent t3 where t3.col2 = -1)
)
gives
col1 col2
4 2
7 -1
11 8
With the not also
SELECT * from adjacent
where
col1 not in
(
(select col1 from adjacent where col2 = -1)
union
(select (select max(t2.col1) from adjacent t2 where t2.col1 < t1.col1) from adjacent t1 where t1.col2 = -1)
union
(select (select min(t4.col1) from adjacent t4 where t4.col1 > t3.col1) from adjacent t3 where t3.col2 = -1)
)
gives
col1 col2
1 5
3 4
12 2
Finally a delete and select
delete from adjacent
where
col1 not in
(
(select col1 from adjacent where col2 = -1)
union
(select (select max(t2.col1) from adjacent t2 where t2.col1 < t1.col1) from adjacent t1 where t1.col2 = -1)
union
(select (select min(t4.col1) from adjacent t4 where t4.col1 > t3.col1) from adjacent t3 where t3.col2 = -1)
)
select * from adjacent
gives
col1 col2
4 2
7 -1
11 8

Assuming SQL Server here. Your best bet, if you are keeping a very small dataset, is to insert into a new table. I.E.:
SELECT *
INTO MyTable2
FROM MyTable
WHERE ColumnB = -1
DROP TABLE MyTable
exec sp_rename MyTable2 MyTable
This will be a minimally logged operation and will run in a fraction of the time of a DELETE.
Without another key there is no way to ensure you get the "neighbors" since this is not really a valid concept in a relational DB. If the first column is "random" you can't tell which ones are "before" and "after" a row with a -1 value.
If by "random" you mean it's like an IDENTITY column that increases automatically, AND YOU HAVE NO MISSING VALUES IN THE SEQUENCE you can do something like:
SELECT *
INTO MyTable2
FROM MyTable mt
WHERE ColumnB = -1
OR WHERE EXISTS (
SELECT * FROM MyTable mt2
WHERE mt2.id = mt.id + 1
OR mt2.id = mt.id -1)
DROP TABLE MyTable
exec sp_rename MyTable2 MyTable

The solution is to number the records first, identify those adjacent to the -1 rules and then use UNION to assemble the final result:
WITH Numbered(seq, id, ruleno) AS (
SELECT
ROW_NUMBER() OVER (ORDER BY id), id, ruleno
FROM
Tricky
),
Brothers(id, ruleno) AS (
SELECT
b.id, b.ruleno
FROM
Numbered a INNER JOIN Numbered b
ON a.ruleno = -1 AND
abs(a.seq - b.seq) = 1
),
Triplets(id, ruleno) AS (
SELECT
id, ruleno
FROM
Tricky
WHERE
ruleno = -1
UNION ALL
SELECT
id, ruleno
FROM
Brothers
)
-- Display results
SELECT
id, ruleno
FROM
Triplets
ORDER BY
id
Result:
id ruleno
20022882 364
20022885 -1
20022887 5
20025400 385
20025401 -1
20025683 283
20028921 25
20028924 -1
20028998 77
Finally:
DELETE FROM
Tricky
WHERE
id NOT IN (
SELECT
id
FROM
triplets
)

USe this tricky query:
for this I created a table by below statement:
create table t1 (val int, val2 int)
GO
-- below is the exact stmt:
With CTE as(select val, val2, row_number() over (order by val ASC) as rnum
from t1)
DELETE t1
From t1 inner join cte a
ON t1.val = a.val INNER JOIN (SELECT * fROM cte where val2 = -1) as b
on a.rnum = b.rnum
or a.rnum = b.rnum - 1
or a.rnum = b.rnum + 1
For more information baout CTE please see this post:
http://blog.sqlauthority.com/2009/08/08/sql-server-multiple-cte-in-one-select-statement-query/

Related

Return rows where specific column has duplicate values

From the table below I want to show the two rows where the values in column 3 are duplicates:
ID
Col2
Col3
1
a
123
2
b
123
3
c
14
4
d
65
5
e
65
This means that the query that I need should return rows with ID 1, 2 and 4, 5.
I wrote query using having:
SELECT *
FROM t1
INNER JOIN (SELECT col3 FROM t1
GROUP BY col3
HAVING COUNT(*) > 1) a
ON t1.col3 = a.col3
This query though only returns 1 and 4 rows for example, not all duplicates.
I would appreciate the help.
Your query should work, but I would suggest window functions:
select t1.*
from (select t1.*, count(*) over (partition by col3) as cnt
from t1
) t1
where cnt > 1;

Get the data repeated n number of times

Below is my data.
ID Col1 Col2 Col3 Col4
1 101 1000 0 10000
1 102 0 1000 10000
2 101 1000 0 10000
2 102 0 1000 10000
3 103 2000 0 500
3 104 0 250 500
4 101 1000 0 10000
4 102 0 1000 10000
4 103 500 0 10000
I am unable to get the id's which has same data and occurred 2 times.
According to the above data,expected id's are 1,2. Which are participated in the duplication of 2 times.
Please help.
Try below sql
select ID from
(
select ID,Col1,Col2,Col3,Col4, count(*) from table
Group by ID,Col1,Col2,Col3,Col4
Having count(*) > 1
) as UniqueData
select distinct id
from tbl x
join (select col1,
col2,
col3,
col4
from tbl
group by col1,
col2,
col3,
col4
having count(distinct id) > 1) y
on x.col1 = y.col1
and x.col2 = y.col2
and x.col3 = y.col3
and x.col4 = y.col4
Fiddle: http://www.sqlfiddle.com/#!6/8bce9/3/0
Your comments seem to suggest you want IDs that have rows that are identical in columns 1, 2, 3, and 4, on ALL rows, not just some rows. However, if that's the case, ID #s 1 and 2 are not even a match. Regardless, if that really is what you want -- perhaps it is, I don't know -- this should do that. It returns no results for your sample data above because, like I said, the 2nd row for ID # 1 does not match the 2nd row for ID # 2.
select id
from tbl t
where exists (select 1
from tbl x
where x.col1 = t.col1
and x.col2 = t.col2
and x.col3 = t.col3
and x.col4 = t.col4
and x.id <> t.id
and not exists (select 1
from tbl z
where z.id = x.id
and z.col1 <> x.col1
or z.col2 <> x.col2
or z.col3 <> x.col3
or z.col4 <> x.col4));
General idea
For each ID build a long string by concatenating all values from all columns and rows of that ID together.
IDs that have the same data will have the same concatenated string.
Group by this string to find those IDs that have the same data.
SQL Fiddle
WITH
CTE_Groups
AS
(
SELECT DISTINCT ID
FROM tbl
)
,CTE_GroupsAggregated
AS
(
SELECT
CTE_Groups.ID
,CA_Data.XML_Value AS DataValues
FROM
CTE_Groups
CROSS APPLY
(
SELECT
CAST(Col1 AS varchar(10))+','+
CAST(Col2 AS varchar(10))+','+
CAST(Col3 AS varchar(10))+','+
CAST(Col4 AS varchar(10))+','
FROM tbl
WHERE tbl.ID = CTE_Groups.ID
ORDER BY Col1, Col2, Col3, Col4 FOR XML PATH(''), TYPE
) AS CA_XML(XML_Value)
CROSS APPLY
(
SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
) AS CA_Data(XML_Value)
)
,CTE_Duplicates
AS
(
SELECT DataValues
FROM CTE_GroupsAggregated
GROUP BY DataValues
HAVING COUNT(*) > 1
)
SELECT
CTE_GroupsAggregated.ID
FROM
CTE_GroupsAggregated
INNER JOIN CTE_Duplicates ON CTE_Duplicates.DataValues = CTE_GroupsAggregated.DataValues
;
result set
ID
1
2
To understand better how it works include DataValues into the output and examine intermediate results of each CTE.

require to form a sql query

I was working on preparing a query where I was stuck.
Consider tables below:
table1
id key col1
-- --- -----
1 1 abc
2 2 d
3 3 s
4 4 xyz
table2
id col1 foreignkey
-- ---- ----------
1 12 1
2 13 1
3 14 1
4 12 2
5 13 2
Now what I need is to select only those records from table1 for which the corresponding entries in table2 does not have say col1 value as 12.
So the challenge is after applying join even though it will skip for value 1 corresponding to col1 equal to 12 it still has another multiple rows whose values are say 13, 14 for which also they have same foreignkey. Now what I want is if there is a single row having value 12 then it should not pick that id at all from table1.
How can I form a query with this?
The output which i need is say from above table structure i want to get those records from table1 for which col1 value from table2 does not have value as 14.
so my query should return me only row 2 from table1 and not row 1.
Another way of doing that. The first two queries are just for making the sample data.
;WITH t1(id ,[key] ,col1) AS
(
SELECT 1 , 1 , 'abc' UNION ALL
SELECT 2 , 2 , 'd' UNION ALL
SELECT 3 , 3 , 's' UNION ALL
SELECT 4 , 4 , 'xyz'
)
,t2(id ,col1, foreignkey) AS
(
SELECT 1 , 12 , 1 UNION ALL
SELECT 2 , 13 , 1 UNION ALL
SELECT 3 , 14 , 1 UNION ALL
SELECT 4 ,12 , 2 UNION ALL
SELECT 5 ,13 , 2
)
SELECT id, [key], col1
FROM t1
WHERE id NOT IN (SELECT t2.Id
FROM t2
INNER JOIN t1 ON t1.Id = t2.foreignkey
WHERE t2.col1 = 14)
This is a typical case for NOT EXISTS:
SELECT id, [key], col1
FROM table1 t1
WHERE NOT EXISTS (SELECT 1
FROM table2 t2
WHERE t2.foreignkey = t1.id AND t2.col1 = 14)
The above query will not select a row from table1 if there is a single correlated row in table2 having col1 = 14.
Output:
id key col1
-------------
2 2 d
3 3 s
4 4 xyz
If you want to return records that, in addition to the criterion set above, also have correlated records in table2, then you can use the following query:
SELECT t1.id, MAX(t1.[key]) AS [key], MAX(t1.col1) AS col1
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.foreignkey
GROUP BY t1.id
HAVING COUNT(CASE WHEN t2.col1 = 14 THEN 1 END) = 0
Output:
id key col1
-------------
2 2 d
You can also achieve the same result with the second query using a combination of EXISTS and NOT EXISTS:
SELECT id, [key], col1
FROM table1 t1
WHERE EXISTS (SELECT 1
FROM table2 t2
WHERE t2.foreignkey = t1.id)
AND
NOT EXISTS (SELECT 1
FROM table2 t3
WHERE t3.foreignkey = t1.id AND t3.col1 = 14)
select t1.id,t1.key,
(select ROW_NUMBER() OVER(PARTITION BY col1 ORDER BY col1 DESC) AS Row,* into
#Temp from table1)
from table1 t1
inner join table2 t2 on t1.id=t2.foreignkey
where t2.col1=(select col1 from #temp where row>1)

Find matching column data between two rows in the same table

I want to find the matching value between two rows in the same sqlite table. For example, if I have the following table:
rowid, col1, col2, col3
----- ---- ---- ----
1 5 3 1
2 3 6 9
3 9 12 5
So comparing row 1 and 2, I get the value 3.
Row 2 and 3 will give 9.
Row 3 and 1 will give 5.
There will always be one and only one matching value between any two rows in the table.
What it the correct sqlite query for this?
I hardcoded the values for the rows because i do not know how to declare variables in sqllite.
select t1.rowid as r1, t2.rowid as r2, t2.col as matchvalue from <yourtable> t1 join
(
select rowid, col1 col from <yourtable> where rowid = 3 union all
select rowid, col2 from <yourtable> where rowid = 3 union all
select rowid, col3 from <yourtable> where rowid = 3
) t2
on t2.col in (t1.col1, t1.col2, t1.col3)
and t1.rowid < t2.rowid -- you don't need this if you have two specific rows
and t1.rowid = 1
select col from
(
select rid, c1 as col from yourtable
union
select rid, c2 from yourtable
union
select rid, c3 from yourtable
) v
where rid in (3,2)
group by col
order by COUNT(*) desc
limit 1

problem with select

I have a table with rows with two columns
A 1
A 2
B 1
B 3
C 1
C 2
C 3
and I want to get from this only this ID(a,b or c) which has only 2 rows with value 1,2, so from this table I should get a, bacause b hasn't row with 2, and c has rows with 1 and b, but also has row with c..
What is the simplest way to get this row?
SELECT col1
FROM YourTable
GROUP BY col1
HAVING COUNT(DISTINCT col2) =2 AND MIN(col2) = 1 AND MAX(col2) = 2
Or another way extendible to more than 2 numbers
SELECT col1
FROM yourtable
GROUP BY col1
HAVING MIN(CASE
WHEN col2 IN ( 1, 2 ) THEN 1
ELSE 0
END) = 1
AND COUNT(DISTINCT col2) = 2
select t1.col1
from table as t1
left join table as t2 on (t1.col1 = t2.col1)
where t1.col2 = 1 and t2.col2 = 2;