How to delete the X oldest items in a table in Postgres?

How to delete the X oldest items in a table in Postgres? - sql

I'm trying to implement a rolling version system, where users can make keep several versions of something, but once it goes past 10 versions, it deletes the oldest item(s).
This is what I want to do, but the syntax is invalid (can't use the row count in LIMIT):
with version_ids as (
select rd.id from reels_data rd, reels r where r.owner_id = '7f92dcc6-f906-418a-aee0-074b297bfb52' and reel_id = 40 order by version
)
delete from reels_data where id in (select id, count(*) as rows from version_ids limit 10 - rows);
There's still a lot I don't know about Postgres, so I imagine there's some better way to do this.

If I understood the question correctly, you could list the existinɡ versions for a user in descending order, skip the first 10 (i.e the 10 latest) and delete the rest:
DELETE FROM reels_data
WHERE id IN (
SELECT id
FROM reels
WHERE reels.owner_id = '7f92dcc6-f906-418a-aee0-074b297bfb52'
ORDER BY version DESC /* assuming no null versions */
OFFSET 10
)
If a user has less than 10 versions, the OFFSET will make it so nothing is returned (and deleted).
If you're looking to do this for multiple users in a single query you'll need to use a window function (presumably rank() or row_number()).

If you want to just keep the 10 newer :
Select the 10 newer's ids by ordering by version and using LIMIT
Delete all from your table where id not in what you selected

I think you want:
delete from reels_data rd
where rd.id < (select rd2.id
from reels_data rd2
where rd2.owner = rd.owner
order by rd2.id desc
limit 1 offset 9
);
The subquery gets the 10th largest id for each owner. Anything smaller is deleted. If there are not 10, then the subquery returns NULL and nothing gets deleted for that owner.

Much quicker NOT IN than Marth's:
DELETE FROM tableName
WHERE id NOT IN (
SELECT id
FROM tableName
ORDER BY id DESC
LIMIT 10
)

Related

Delete multiple occurrences of the same ID # and code in a junction table

enter code here
My problem is this: in this database the junction table contains some rows where the kha_id and the icd_fk are the same. While it's OK that kha_id appears in icd_junction more than once , it has to be with a separate icd_fk. I can run a query and get all of the ID#s and the codes which are listed more than once, but is there an industry-standard way of going about deleting all but one occurrence of each?
example: what i have is above
KHA_ID: 123456 V23
123456 V23
123456 V24
I need one of the rows kha_id=123456 and ICD_FK=V23 taken out.

This:
DELETE j1
FROM ICD_Junction AS j1
WHERE EXISTS
( SELECT 1
FROM ICD_Junction AS j2
WHERE j2.KHA_ID = j1.KHA_ID
AND j2.ICD_FK = j1.ICD_FK
AND j2.ID < j1.ID
)
;
will delete, for each KHA_ID and ICD_FK, all but one relevant row of ICD_Junction. (Specifically, it will keep the one with the least ID, and delete the rest.)
Once you've run the above, you should fix whatever code caused the duplication, and add a unique constraint to prevent this from happening again.
(Disclaimer: Not tested, and it's been a while since I last used SQL Server.)
Edited to add: If I'm understanding your comment correctly, you also need help with the query to find duplicates? For that, you can write:
SELECT KHA_ID,
ICD_FK,
COUNT(1) -- the number of duplicates
FROM ICD_Junction
GROUP
BY KHA_ID,
ICD_FK
HAVING COUNT(1) > 1
;

The original question was delete but the comment was find
Select jDup.*
FROM ICD_Junction AS j
JOIN ICD_Junction AS jDup
On j.KHA_ID = jDup.KHA_ID
AND j.ICD_FK = jDup.ICD_FK
AND j.ID < jDup.ID
Select max(jDup.ID), min(jDup.ID), count(*), jDup.KHA_ID, jDup.ICD_FK
FROM ICD_Junction AS jDup
Group By jDup.KHA_ID, jDup.ICD_FK
Having Count(*) > 1

You want something that uses ROW_NUMBER() and partition by. The reason is that it will let you pick one row to keep from a table that doesn't have a unique id. Like if this was a pure intersection table with no identity, you could use a variation on this to delete all rows where RowID > 1, leaving you just the unique rows. And it works just as well when you do have a unique id, where you can choose to preserve the earliest id.
select * from (select KHA_ID, ICD_FK, ROW_NUMBER()
OVER(PARTITION BY KHA_ID, ICD_FK
ORDER BY ID ASC) AS RowID
from ICD_Junction ) ordered where RowID > 1

Getting the last record in SQL in WHERE condition

i have loanTable that contain two field loan_id and status
loan_id status
==============
1 0
2 9
1 6
5 3
4 5
1 4 <-- How do I select this??
4 6
In this Situation i need to show the last Status of loan_id 1 i.e is status 4. Can please help me in this query.

Since the 'last' row for ID 1 is neither the minimum nor the maximum, you are living in a state of mild confusion. Rows in a table have no order. So, you should be providing another column, possibly the date/time when each row is inserted, to provide the sequencing of the data. Another option could be a separate, automatically incremented column which records the sequence in which the rows are inserted. Then the query can be written.
If the extra column is called status_id, then you could write:
SELECT L1.*
FROM LoanTable AS L1
WHERE L1.Status_ID = (SELECT MAX(Status_ID)
FROM LoanTable AS L2
WHERE L2.Loan_ID = 1);
(The table aliases L1 and L2 could be omitted without confusing the DBMS or experienced SQL programmers.)
As it stands, there is no reliable way of knowing which is the last row, so your query is unanswerable.

Does your table happen to have a primary id or a timestamp? If not then what you want is not really possible.
If yes then:
SELECT TOP 1 status
FROM loanTable
WHERE loan_id = 1
ORDER BY primaryId DESC
-- or
-- ORDER BY yourTimestamp DESC

I assume that with "last status" you mean the record that was inserted most recently? AFAIK there is no way to make such a query unless you add timestamp into your table where you store the date and time when the record was added. RDBMS don't keep any internal order of the records.

But if last = last inserted, that's not possible for current schema, until a PK addition:
select top 1 status, loan_id
from loanTable
where loan_id = 1
order by id desc -- PK

Use a data reader. When it exits the while loop it will be on the last row. As the other posters stated unless you put a sort on the query, the row order could change. Even if there is a clustered index on the table it might not return the rows in that order (without a sort on the clustered index).
SqlDataReader rdr = SQLcmd.ExecuteReader();
while (rdr.Read())
{
}
string lastVal = rdr[0].ToString()
rdr.Close();
You could also use a ROW_NUMBER() but that requires a sort and you cannot use ROW_NUMBER() directly in the Where. But you can fool it by creating a derived table. The rdr solution above is faster.

In oracle database this is very simple.
select * from (select * from loanTable order by rownum desc) where rownum=1

Hi if this has not been solved yet.
To get the last record for any field from a table the easiest way would be to add an ID to each record say pID. Also say that in your table you would like to hhet the last record for each 'Name', run the simple query
SELECT Name, MAX(pID) as LastID
INTO [TableName]
FROM [YourTableName]
GROUP BY [Name]/[Any other field you would like your last records to appear by]
You should now have a table containing the Names in one column and the last available ID for that Name.
Now you can use a join to get the other details from your primary table, say this is some price or date then run the following:
SELECT a.*,b.Price/b.date/b.[Whatever other field you want]
FROM [TableName] a LEFT JOIN [YourTableName]
ON a.Name = b.Name and a.LastID = b.pID
This should then give you the last records for each Name, for the first record run the same queries as above just replace the Max by Min above.
This should be easy to follow and should run quicker as well

If you don't have any identifying columns you could use to get the insert order. You can always do it like this. But it's hacky, and not very pretty.
select
t.row1,
t.row2,
ROW_NUMBER() OVER (ORDER BY t.[count]) AS rownum from (
select
tab.row1,
tab.row2,
1 as [count]
from table tab) t
So basically you get the 'natural order' if you can call it that, and add some column with all the same data. This can be used to sort by the 'natural order', giving you an opportunity to place a row number column on the next query.
Personally, if the system you are using hasn't got a time stamp/identity column, and the current users are using the 'natural order', I would quickly add a column and use this query to create some sort of time stamp/incremental key. Rather than risking having some automation mechanism change the 'natural order', breaking the data needed.

I think this code may help you:
WITH cte_Loans
AS
(
SELECT LoanID
,[Status]
,ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS RN
FROM LoanTable
)
SELECT LoanID
,[Status]
FROM LoanTable L1
WHERE RN = ( SELECT max(RN)
FROM LoanTable L2
WHERE L2.LoanID = L1.LoanID)

How to do an update based on a count - SQL (postgres)

I have a table, let's call it 'entries' that looks like this (simplified):
id [pk]
user_id [fk]
created [date]
processed [boolean, default false]
and I want to create an UPDATE query which will set the processed flag to true on all entries except for the latest 3 for each user (latest in terms of the created column). So, for the following entries:
1,456,2009-06-01,false
2,456,2009-05-01,false
3,456,2009-04-01,false
4,456,2009-03-01,false
Only entry 4 would have it's processed flag changed to true.
Anyone know how I can do this?

I don't know postgres, but this is standard SQL and may work for you.
update entries set
processed = true
where (
select count(*)
from entries as E
where E.user_id = entries.user_id
and E.created > entries.created
) >= 3
In other words, update the processed column to true whenever there are three or more entries for the same user_id on later dates. I'm assuming the [created] column is unique for a given user_id. If not, you'll need an additional criterion to pin down what you mean as "latest".
In SQL Server you can do this, which is a little easier to follow and will probably be more efficiently executed:
with T(id, user_id, created, processed, rk) as (
select
id, user_id, created, processed,
row_number() over (
partition by user_id
order by created desc, id
)
from entries
)
update T set
processed = true
where rk > 3;
Updating a CTE is a non-standard feature, and not all database systems support row_number.

First, let's start with query that will list all rows to be updated:
select e.id
from entries as e
where (
select count(*)
from entries as e2
where e2.user_id = e.user_id
and e2.created > e.created
) > 2
This lists all ids of records, that have more than 2 such records that user_id is the same, but created is later than created in row to be returned.
That is it will list all records but last 3 per user.
Now, we can:
update entries as e
set processed = true
where (
select count(*)
from entries as e2
where e2.user_id = e.user_id
and e2.created > e.created
) > 2;
One thing thought - it can be slow. In this case you might be better off with custom aggregate, or (if you're on 8.4) window functions.

Fetch two next and two previous entries in a single SQL query

I want to display an image gallery, and on the view page, one should be able to have a look at a bunch of thumbnails: the current picture, wrapped with the two previous entries and the two next ones.
The problem of fetching two next/prev is that I can't (unless I'm mistaken) select something like MAX(id) WHERE idxx.
Any idea?
note: of course the ids do not follow as they should be the result of multiple WHERE instances.
Thanks
Marshall

You'll have to forgive the SQL Server style variable names, I don't remember how MySQL does variable naming.
SELECT *
FROM photos
WHERE photo_id = #current_photo_id
UNION ALL
SELECT *
FROM photos
WHERE photo_id > #current_photo_id
ORDER BY photo_id ASC
LIMIT 2
UNION ALL
SELECT *
FROM photos
WHERE photo_id < #current_photo_id
ORDER BY photo_id DESC
LIMIT 2;
This query assumes that you might have non-contiguous IDs. It could become problematic in the long run, though, if you have a lot of photos in your table since TOP is often evaluated after the entire result set has been retrieved from the database. YMMV.
In a high load scenario, I would probably use these queries, but I would also prematerialize them on a regular basis so that each photo had a PreviousPhotoOne, PreviousPhotoTwo, etc column. It's a bit more maintenance, but it works well when you have a lot of static data and need performance.

if your IDs are continuous you could do
where id >= #id-2 and id <= #id+2
Otherwise I think you'd have to union 3 queries, one to get the record with the given id and two others messing about with top and order by like this
select *
from table
where id = #id
union
select top 2 *
from table
where id < #id
order by id desc
union
select top 2 *
from table
where id > #id
order by id
Performance will not be too bad as you aren't retrieving massive sets of data but it won't be great due to using a union.
If you find performance starts being a problem you could add columns to hold the ids of the previous and next items; calculating the ids using a trigger or overnight process or something. This will mean you only do the hard query once rather than each time you need it.

I think this method should work fine for non-continguous ID's and should be more effecient than using a UNION's. currentID would be set either using a constant in SQL or passing from your program.
SELECT * FROM photos WHERE ID = currentID OR ID IN (
SELECT ID FROM photos WHERE ID < currentID ORDER BY ID DESC LIMIT 2
) OR ID IN (
SELECT ID FROM photos WHERE ID > currentID ORDER BY ID ASC LIMIT 2
) ORDER BY ID ASC

If you are just interested in the previous and next records by id couldn't you just have a where clause that restricts WHERE id=xx, xx-1, xx-1, xx+1, xx+2 using multiple WHERE clauses or using WHERE IN ?

sql statement to delete records older than XXX as long as there are more than YY rows

Assume a table with the following columns:
pri_id, item_id, comment, date
What I want to have is a SQL query that will delete any records, for a specific item_id that are older than a given date, BUT only as long as there are more than 15 rows for that item_id.
This will be used to purge out comment records older than 1 year for the items but I still want to keep at least 15 records at any given time. This way if I had one comment for 10 years it would never get deleted but if I had 100 comments over the last 5 days I'd only keep the newest 15 records. These are of course arbitrary record counts and date timeframes for this example.
I'd like to find a very generic way of doing this that would work in mysql, oracle, postgres etc. I'm using phps adodb library for DB abstraction so I'd like it to work well with that if possible.

Something like this should work for you:
delete
from
MyTable
where
item_id in
(
select
item_id
from
MyTable
group by
item_id
having
count(item_id) > 15
)
and
Date < #tDate

You want to keep at least 15 of them always, correct? So:
DELETE
FROM CommentTable
WHERE CommentId NOT IN (
SELECT TOP 15 CommentId
FROM CommentTable
WHERE ItemId=#ItemId
AND CommentDate < #Date
ORDER BY CommentDate DESC
)
AND ItemId=#ItemId
AND CommentDate < #Date

Is this what you're looking for?
DELETE
[MyTable]
WHERE
[item_id] = 100 and
(SELECT COUNT(*) FROM [MyTable] WHERE [item_id] = 100) > 15
I'm a MS SQL Server guy, but i think it should work elsewhere.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to delete the X oldest items in a table in Postgres? - sql

If you want to just keep the 10 newer : Select the 10 newer's ids by ordering by version and using LIMIT Delete all from your table where id not in what you selected

Much quicker NOT IN than Marth's: DELETE FROM tableName WHERE id NOT IN ( SELECT id FROM tableName ORDER BY id DESC LIMIT 10 )

Related

Delete multiple occurrences of the same ID # and code in a junction table

Getting the last record in SQL in WHERE condition

How to do an update based on a count - SQL (postgres)

Fetch two next and two previous entries in a single SQL query

sql statement to delete records older than XXX as long as there are more than YY rows

Categories

Resources