Delete duplicate rows based on two values in sql [duplicate]

Delete duplicate rows based on two values in sql [duplicate] - sql

This question already has answers here:
Removing duplicate rows from table in Oracle
(24 answers)
Closed 6 months ago.
I'm new to sql and I can't work out how to delete duplicate rows, I have a table like this called 'till_total':
till_id
total
1
80
1
80
1
60
2
30
2
30
2
50
I want to only delete full duplicate rows so the table ends up like this
till_id
total
1
80
1
60
2
30
2
50
I wrote this code to try and do it
SELECT till_id, total, COUNT(*) AS CNT
FROM till_total
GROUP BY till_id, total
HAVING COUNT(*) > 1
ORDER BY till_id;
But that seems to delete all rows where the till_id is repeated. Could anyone help me with this?

Good, old ROWID approach:
Before:
SQL> select * from till_total;
TILL_ID TOTAL
---------- ----------
1 80
1 80
1 60
2 30
2 30
2 50
6 rows selected.
Delete duplicates:
SQL> delete from till_total a
2 where a.rowid > (select min(b.rowid)
3 from till_total b
4 where b.till_id = a.till_id
5 and b.total = a.total
6 );
2 rows deleted.
After:
SQL> select * from till_total;
TILL_ID TOTAL
---------- ----------
1 80
1 60
2 30
2 50
SQL>

WITH till_total AS (
SELECT till_id
row_number() OVER(PARTITION BY till_id ORDER BY desc) AS row
FROM TABLE
)
DELETE till_total WHERE row > 1
This might work for you, deleting rows that are more than 1 duplicate, not less than 1.

Related

SQL: Count rows where column value changed from previous row [duplicate]

This question already has an answer here:
increment row number when value of field changes in Oracle
(1 answer)
Closed 2 years ago.
Suppose I have Oracle or Postgresql database.
ID IdExample OrderByColumn What I want
---------- ---------- ---------- ----------
1 1 1300 1
2 1 2450 1
3 2 5000 2
4 2 4800 2
5 1 5100 3
6 1 6000 3
7 4 7000 4
8 1 8000 5
How do count the changes that are in idExample, data is sorted by OrderByColumn
I need output new column that is represented by "what I want"
pay attention to "1" in IdExample. It repeats but I wants to iterate.
The query should execute quickly with the table having tens of thousands of records.
THANKS

You need to use lag and sum analytical function as follows:
Select t.*,
sum(case when lg is null or lg <> idexample then 1 else 0 end)
over (order by id) as result
from
(Select t.*,
lag(idexample) over (order by id) as lg
From your_table t) t

sql best strategy to partition same values based on temporal sequence

I have data that looks like this, where there are multiple values for each ID that correspond to an ascending date variable:
ID LEVEL DATE
1 10 10/1/2000
1 10 11/20/2001
1 10 12/01/2001
1 30 02/15/2002
1 30 02/15/2002
1 20 05/17/2002
1 20 01/04/2003
1 30 07/20/2003
1 30 03/16/2004
1 30 04/15/2004
I want to acquire a count per each ID/LEVEL/DATE block that looks like this:
ID LEVEL COUNT
1 10 3
1 30 2
1 20 2
1 30 3
The problem is that if I use the count windows function and partition by level, it groups 30 together regardless of the temporal sequence. I want the count for level 30 both before and after 20 to be distinct. Does anyone know how to do that?

A standard gaps and islands solution using ROW_NUMBER(), if it's available on your particular DBMS...
WITH
ordered AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS set_ordinal,
ROW_NUMBER() OVER (PARTITION BY id, level ORDER BY date) AS grp_ordinal
FROM
yourData
)
SELECT
id,
level,
set_ordinal - grp_ordinal,
MIN(date),
COUNT(*)
FROM
ordered
GROUP BY
id,
level,
set_ordinal - grp_ordinal
ORDER BY
id,
MIN(date)
Visualising the effect of the two row numbers...
ID LEVEL DATE set_ordinal grp_ordinal set-grp GROUP
-- ----- ---------- ----------- ----------- ------- --------
1 10 10/01/2000 1 1 0 1,10,0
1 10 11/20/2001 2 2 0 1,10,0
1 10 12/01/2001 3 3 0 1,10,0
1 30 02/15/2002 4 1 3 1,30,3
1 30 02/15/2002 5 2 3 1,30,3
1 20 05/17/2002 6 1 5 1,20,5
1 20 01/04/2003 7 2 5 1,20,5
1 30 07/20/2003 8 3 5 1,30,5
1 30 03/16/2004 9 4 5 1,30,5
1 30 04/15/2004 10 5 5 1,30,5

Add order within group and mark whether a row is the last in its group

I have a table in an SQL Server database on the following form, sorted according to id.
id group
1 10
17 10
24 10
2 20
16 20
72 20
104 20
8 30
9 30
I would like to select every row grouped according to the row group and add the following information to this table: the order (as sorted) within the group and whether the row is the last row in the group. In other words, something similar to this:
id group order last
1 10 1 0
17 10 2 0
24 10 3 1
2 20 1 0
16 20 2 0
72 20 3 0
104 20 4 1
8 30 1 0
9 30 2 1
I've tried fiddling around with ROW_NUMBER, but I'm not all that experienced with SQL Server and I can't get it to work. Does anyone have a suggestion?

Use ROW_NUMBER window function
select id,[group],
row_number()over(partition by [group] order by id) as [order],
case when row_number()over(partition by [group] order by id desc) = 1 then 1 else 0 end as Last
From yourtable

SQL Server GROUP BY COUNT Consecutive Rows Only

I have a table called DATA on Microsoft SQL Server 2008 R2 with three non-nullable integer fields: ID, Sequence, and Value. Sequence values with the same ID will be consecutive, but can start with any value. I need a query that will return a count of consecutive rows with the same ID and Value.
For example, let's say I have the following data:
ID Sequence Value
-- -------- -----
1 1 1
5 1 100
5 2 200
5 3 200
5 4 100
10 10 10
I want the following result:
ID Start Value Count
-- ----- ----- -----
1 1 1 1
5 1 100 1
5 2 200 2
5 4 100 1
10 10 10 1
I tried
SELECT ID, MIN([Sequence]) AS Start, Value, COUNT(*) AS [Count]
FROM DATA
GROUP BY ID, Value
ORDER BY ID, Start
but that gives
ID Start Value Count
-- ----- ----- -----
1 1 1 1
5 1 100 2
5 2 200 2
10 10 10 1
which groups all rows with the same values, not just consecutive rows.
Any ideas? From what I've seen, I believe I have to left join the table with itself on consecutive rows using ROW_NUMBER(), but I am not sure exactly how to get counts from that.
Thanks in advance.

You can use Sequence - ROW_NUMBER() OVER (ORDER BY ID, Val, Sequence) AS g to create a group:
SELECT
ID,
MIN(Sequence) AS Sequence,
Val,
COUNT(*) AS cnt
FROM
(
SELECT
ID,
Sequence,
Sequence - ROW_NUMBER() OVER (ORDER BY ID, Val, Sequence) AS g,
Val
FROM
yourtable
) AS s
GROUP BY
ID, Val, g
Please see a fiddle here.

SQL involving MAX of two colums and Group BY [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 7 years ago.
So... i got a table like this:
id group number year
1 1 1 2000
2 1 2 2000
3 1 1 2001
4 2 1 2000
5 2 2 2000
6 2 1 2001
7 2 2 2001
8 2 3 2001
And i need to select the bigger number of the bigger year for each group. So i expect the result of the exemple to be:
3 1 1 2001
8 2 3 2001
any ideias?
OBS: using Postgres

SELECT *
FROM (
SELECT *,
row_number() over (partition by "group" order by "year" desc, "number" desc ) x
FROM table1
) x
WHERE x = 1;
demo: http://sqlfiddle.com/#!15/cd78e/2

If it's just certain rows you want to get you can use DISTINCT. If you want different maximums on the same rows you could use GROUP BY
SELECT DISTINCT ON ("group") * FROM tbl
ORDER BY "group", year DESC, id DESC;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete duplicate rows based on two values in sql [duplicate] - sql

WITH till_total AS ( SELECT till_id row_number() OVER(PARTITION BY till_id ORDER BY desc) AS row FROM TABLE ) DELETE till_total WHERE row > 1 This might work for you, deleting rows that are more than 1 duplicate, not less than 1.

Related

SQL: Count rows where column value changed from previous row [duplicate]

sql best strategy to partition same values based on temporal sequence

Add order within group and mark whether a row is the last in its group

SQL Server GROUP BY COUNT Consecutive Rows Only

SQL involving MAX of two colums and Group BY [duplicate]

Categories

Resources