Remove multiple postings but keep first - sql

I have a table which has had numerous postings over the course of the week that I need to remove, the timestamp is different so i need to keep the first entry but then remove all the others which came after that.
What techniques would be advised.
SQL Server 2008
Many thanks
J

You can use a CTE with delete. The result is something like this:
with todelete as (
select p.*,
row_number() over (partition by post_id order by datetimecol asc) as seqnum
from posts p
)
delete from todelete
where seqnum > 1;
You can just run the subquery to see what is happening.

Delete all the posts except the oldest
DELETE FROM tbl
WHERE ID NOT IN
(
select top 1 id
from tbl
order by TimeStampColumn
)

Related

SQL: Deleting Duplicates using Not in and Group by

I have the following SQL Syntax to delete duplicate rows, but never are any rows affected.
DELETE FROM content_stacks WHERE id NOT IN (
SELECT id
FROM content_stacks
GROUP BY user_id, content_id
);
The subquery itself is returning the id list of first entries correctly.
SELECT id
FROM content_stacks
GROUP BY user_id, content_id
When I'm inserting the results list as a string it is working, too:
DELETE FROM content_stacks WHERE id NOT IN (239,231,217,218,219,232,233,220,230,226,234,235,224,225,221,223,222,227,228,229,236,237,238,216,208,209,210,204,211,212,242,203,240,201,241,205,206,207,213,214,215);
I checked many similar examples and this should be working in my opinion. What am I missing?
First find first rows using ROW_NUMBER Then delete record with row number greater than 1:
WITH CTE AS (
SELECT id , ROW_NUMBER() OVER(PARTITION BY user_id, content_id, ORDER BY id) rn
FROM content_stacks
)
DELETE cs
FROM content_stacks cs
INNER JOIN CTE ON CTE.id = cs.id
WHERE rn > 1
Am sorry to ask but if your deleting why would u need to group the records.
Are not just increasing the runtime.
The code from Meyssam Toluie is not working as it is but I made a similar solution with the same idea with rownumbers:
DELETE FROM content_stacks WHERE id IN
(SELECT id FROM (
SELECT id, ROW_NUMBER() OVER(PARTITION BY user_id, content_id)row_num
FROM content_stacks
) sub
WHERE row_num > 1)
This is working for me now.
My first command did not work because: The group by command does not show all ids in the output, but they are still there, so in fact all ids were returned in the NOT IN id-list. The row number seems to be the easiest way for this problem.

Delete duplicate records on SQL Server

I have a table with duplicate records, where I've already created a script to summarize the duplicate records with the original ones, but I'm not able to delete the duplicate records.
I'm trying this way:
DELETE FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
WHERE COD_PLANO_PAGAMENTO IN (SELECT MAX(COD_PLANO_PAGAMENTO) COD_PLANO_PAGAMENTO
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
GROUP BY COD_PLANO_PAGAMENTO)
The idea was to take the last record of each COD_PLANO_PAGAMENTO and delete it, but this way all the records are being deleted, what am I doing wrong?
The table is structured as follows:
I need to delete, for example, the second record of COD_MOVIMENTO = 405 with COD_PLANO_PAGAMENTO = 9, there should only be one record of COD_PLANO_PAGAMENTO different in each COD_MOVIMENTO
You can use an updatable CTE with row-numbering to calculate which rows to delete.
You may need to adjust the partitioning and ordering clauses, it's not clear exactly what you need.
WITH cte AS (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY COD_MOVIMENTO, COD_PLANO_PAGAMENTO ORDER BY (SELECT 1)
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO mp
)
DELETE FROM cte
WHERE rn > 1;
Your delete statement will take the max() but even if you have only one record, it'll return a value.
Also note that your group by should be on COD_MOVIMENTO.
As a fix, make sure there are at least two items:
DELETE FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
WHERE COD_PLANO_PAGAMENTO IN
(SELECT MAX(COD_PLANO_PAGAMENTO)COD_PLANO_PAGAMENTO
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
WHERE cod_plano_pagamento in
(select cod_plano_pagamento
from TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
group by COD_PLANO_PAGAMENTO
having count(*) > 1)
GROUP BY COD_MOVIMENTO )
In your comment you want remove duplicate rows with same COD_MOVIMENTO, COD_PLANO_PAGAMENTO and VAL_TOTAL_APURADO, try this:
delete f1 from
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY COD_MOVIMENTO, COD_PLANO_PAGAMENTO, VAL_TOTAL_APURADO ORDER BY COD_MOVIMENTO) rang
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
) f1
where f1.rang>1

Delete registers with rownumber greater than specified for each group got from sql

I have a table of people with their commments on a blog . I need to leave the last 10 comments for each person in the table and delete the older ones. Lets say the columns are:
personId
commentId
dateFromComment
I know how to do it with several queries but not with just one query(any subqueries allowed) and for anyDatabase
With:
select personId from PeopleComments
group by personId
having count(*) >10
I would get the people ids who have more than 10 comments but I dont know how to get the comments Ids from there and delete them
Thanks!
In my other answer the DBMS must find and count rows for every row in the table. This can be slow. It would be better to find all rows we want to keep once and then delete the others. Hence this additional answer.
The following works for Oracle as of version 12c:
delete from peoplecomments
where rowid not in
(
select rowid
from peoplecomments
order by row_number() over (partition by personid order by datefromcomment desc)
fetch first 10 rows with ties
);
Apart from ROWID this is standard SQL.
In other DBMS that support window functions and FETCH WITH TIES:
If your table has a single-column primary key, you'd replace ROWID with it.
If your table has a composite primary key, you'd use where (col1, col2) not in (select col1, col2 ...) provided your DBMS supports this syntax.
You need a correlated subquery counting the following comments:
delete from peoplecomments pc
where
(
select count(*)
from peoplecomments pc2
where pc2.personid = pc.personid
and pc2.datefromcomment > pc.datefromcomment
) >= 10; -- at least 10 newer comments for the person
BTW: While it seems we could simply number our rows and delete accordingly via
delete from
(
select
pc.*, row_number() over (partition by personid order by datefromcomment desc) as rn
from peoplecomments pc
)
where rn > 10;
Oracle doesn't allow this and gives us ORA-01732: data manipulation operation not legal on this view.

Removing duplicate rows based on one column same values but keep one record

SQL Server Version
Remove all dupe rows (row 3 thru 18) with service_date = '2018-08-29 13:05:00.000' but keep the oldest row (row 2) and of course keep row 1 since its different service_date. Don't mind the create_timestamp or document_file since it's the same customer. Any idea?
In SQL Server, we can try deleting using a CTE:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY service_date ORDER BY create_timestamp) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;
The strategy here is to assign a row number to each group of records sharing the same service_date, with 1 being assigned to the oldest record in that group. Then, we can phrase the delete by just targeting all records which have a row number greater than 1.
You don't need to use Partition function.please use the below query for efficient performance.i have tested its working fine.
with result as
(
select *, row_number() over(order by create_timestamp) as Row_To_Delete from TableName
)
delete from result where result.Row_To_Delete>2
I think you will want to remove these data per customer basis
I mean, if customers are different you will want to keep the entries even on the same date
If you you will require the addition of Customer column in partition by clause used to identify duplicate rows in SQL
By copying and modifying Tim's solution, you can check following
;WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer, service_date ORDER BY create_timestamp) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;

Select all but last row in Oracle SQL

I want to pull all rows except the last one in Oracle SQL
My database is like this
Prikey - Auto_increment
common - varchar
miles - int
So I want to sum all rows except the last row ordered by primary key grouped by common. That means for each distinct common, the miles will be summed (except for the last one)
Note: the question was changed after this answer was posted. The first two queries work for the original question. The last query (in the addendum) works for the updated question.
This should do the trick, though it will be a bit slow for larger tables:
SELECT prikey, authnum FROM myTable
WHERE prikey <> (SELECT MAX(prikey) FROM myTable)
ORDER BY prikey
This query is longer but for a large table it should faster. I'll leave it to you to decide:
SELECT * FROM (
SELECT
prikey,
authnum,
ROW_NUMBER() OVER (ORDER BY prikey DESC) AS RowRank
FROM myTable)
WHERE RowRank <> 1
ORDER BY prikey
Addendum There was an update to the question; here's the updated answer.
SELECT
common,
SUM(miles)
FROM (
SELECT
common,
miles,
ROW_NUMBER() OVER (PARTITION BY common ORDER BY prikey DESC) AS RowRank
FROM myTable
)
WHERE RowRank <> 1
GROUP BY common
Looks like I am a little too late but here is my contribution, similar to Ed Gibbs' first solution but instead of calculating the max id for each value in the table and then comparing I get it once using an inline view.
SELECT d1.prikey,
d1.authnum
FROM myTable d1,
(SELECT MAX(prikey) prikey myTable FROM myTable) d2
WHERE d1.prikey != d2.prikey
At least I think this is more efficient if you want to go without the use of Analytics.
query to retrieve all the records in the table except first row and last row
select * from table_name
where primary_id_column not in
(
select top 1 * from table_name order by primary_id_column asc
)
and
primary_id_column not in
(
select top 1 * from table_name order by primary_id_column desc
)