How to do an update based on a count - SQL (postgres)

How to do an update based on a count - SQL (postgres) - sql

I have a table, let's call it 'entries' that looks like this (simplified):
id [pk]
user_id [fk]
created [date]
processed [boolean, default false]
and I want to create an UPDATE query which will set the processed flag to true on all entries except for the latest 3 for each user (latest in terms of the created column). So, for the following entries:
1,456,2009-06-01,false
2,456,2009-05-01,false
3,456,2009-04-01,false
4,456,2009-03-01,false
Only entry 4 would have it's processed flag changed to true.
Anyone know how I can do this?

I don't know postgres, but this is standard SQL and may work for you.
update entries set
processed = true
where (
select count(*)
from entries as E
where E.user_id = entries.user_id
and E.created > entries.created
) >= 3
In other words, update the processed column to true whenever there are three or more entries for the same user_id on later dates. I'm assuming the [created] column is unique for a given user_id. If not, you'll need an additional criterion to pin down what you mean as "latest".
In SQL Server you can do this, which is a little easier to follow and will probably be more efficiently executed:
with T(id, user_id, created, processed, rk) as (
select
id, user_id, created, processed,
row_number() over (
partition by user_id
order by created desc, id
)
from entries
)
update T set
processed = true
where rk > 3;
Updating a CTE is a non-standard feature, and not all database systems support row_number.

First, let's start with query that will list all rows to be updated:
select e.id
from entries as e
where (
select count(*)
from entries as e2
where e2.user_id = e.user_id
and e2.created > e.created
) > 2
This lists all ids of records, that have more than 2 such records that user_id is the same, but created is later than created in row to be returned.
That is it will list all records but last 3 per user.
Now, we can:
update entries as e
set processed = true
where (
select count(*)
from entries as e2
where e2.user_id = e.user_id
and e2.created > e.created
) > 2;
One thing thought - it can be slow. In this case you might be better off with custom aggregate, or (if you're on 8.4) window functions.

Related

How to delete the X oldest items in a table in Postgres?

I'm trying to implement a rolling version system, where users can make keep several versions of something, but once it goes past 10 versions, it deletes the oldest item(s).
This is what I want to do, but the syntax is invalid (can't use the row count in LIMIT):
with version_ids as (
select rd.id from reels_data rd, reels r where r.owner_id = '7f92dcc6-f906-418a-aee0-074b297bfb52' and reel_id = 40 order by version
)
delete from reels_data where id in (select id, count(*) as rows from version_ids limit 10 - rows);
There's still a lot I don't know about Postgres, so I imagine there's some better way to do this.

If I understood the question correctly, you could list the existinɡ versions for a user in descending order, skip the first 10 (i.e the 10 latest) and delete the rest:
DELETE FROM reels_data
WHERE id IN (
SELECT id
FROM reels
WHERE reels.owner_id = '7f92dcc6-f906-418a-aee0-074b297bfb52'
ORDER BY version DESC /* assuming no null versions */
OFFSET 10
)
If a user has less than 10 versions, the OFFSET will make it so nothing is returned (and deleted).
If you're looking to do this for multiple users in a single query you'll need to use a window function (presumably rank() or row_number()).

If you want to just keep the 10 newer :
Select the 10 newer's ids by ordering by version and using LIMIT
Delete all from your table where id not in what you selected

I think you want:
delete from reels_data rd
where rd.id < (select rd2.id
from reels_data rd2
where rd2.owner = rd.owner
order by rd2.id desc
limit 1 offset 9
);
The subquery gets the 10th largest id for each owner. Anything smaller is deleted. If there are not 10, then the subquery returns NULL and nothing gets deleted for that owner.

Much quicker NOT IN than Marth's:
DELETE FROM tableName
WHERE id NOT IN (
SELECT id
FROM tableName
ORDER BY id DESC
LIMIT 10
)

if-then-else construction in complex stored procedure

I am relatively new to sql queries and I was wondering how to create a complex stored procedure. My database runs on SQL server.
I have a table customer (id, name) and a table customer_events (id, customer_id, timestamp, action_type). I want add a calculated field customer_status to table customer which is
0: (if there is no event for this customer in customer_events) or (the most recent event is > 5 minutes ago)
1: if the most recent event is < 5 minutes ago and action_type=0
2: if the most recent event is < 5 minutes ago and action_type=1
Can I use if-then-else constructions or should I solve this challenge differently?

As you mentioned in comments, you actually want to add a field to a select query, and in a general sense what you want is a CASE statement. They work like this:
SELECT field1,
field2,
CASE
WHEN some_condition THEN some_result
WHEN another_condition THEN another_result
END AS field_alias
FROM table
Applied to your specific scenario, well it's not totally straightforward. You're certainly going to need to left join your status table, you also want to aggregate to find the most recent event, along with that event's action type. Once you have that information, the case statement is straightforward.
Always hard to write sql without access to your data, but something like:
SELECT c.id,
c.name,
CASE
WHEN e.id IS NULL OR DATEDIFF(minute,e.timestamp,getDate())>=5 THEN 0
WHEN DATEDIFF(minute,e.timestamp,getDate())<5 AND s.action_type=1 THEN 1
WHEN DATEDIFF(minute,e.timestamp,getDate())<5 AND s.action_type=0 THEN 2
END as customer_status
FROM clients c
LEFT JOIN (
SELECT id, client_id, action_type,
rank() OVER(partition by client_id order by timestamp desc) AS r
FROM customer_events
) e
ON c.id=e.client_id AND e.r=1
The core of this is the subquery in the middle, it's using a rank funtion to give a number to each status by client_id ordered by the timestamp descending. Therefore every record with a rank of 1 will be the most recent (for that client). Thereafter, you simply join it on to the client table, and use it to determine the right value for customer_status

Presuming you get the event info into "Most_Recent_Event_Mins_Ago". If none it will be NULL.
SELECT Id, Name,
CASE
WHEN Most_Recent_Event_Mins_Ago IS NULL THEN 0
WHEN Most_Recent_Event_Mins_Ago <5 AND Action_type = 0 THEN 1
WHEN Most_Recent_Event_Mins_Ago <5 AND Action_type = 1 THEN 0
..other scenarions
ELSE yourDefaultValueForStatus
END as Status
FROM customer
WHERE
...
...

Delete multiple occurrences of the same ID # and code in a junction table

enter code here
My problem is this: in this database the junction table contains some rows where the kha_id and the icd_fk are the same. While it's OK that kha_id appears in icd_junction more than once , it has to be with a separate icd_fk. I can run a query and get all of the ID#s and the codes which are listed more than once, but is there an industry-standard way of going about deleting all but one occurrence of each?
example: what i have is above
KHA_ID: 123456 V23
123456 V23
123456 V24
I need one of the rows kha_id=123456 and ICD_FK=V23 taken out.

This:
DELETE j1
FROM ICD_Junction AS j1
WHERE EXISTS
( SELECT 1
FROM ICD_Junction AS j2
WHERE j2.KHA_ID = j1.KHA_ID
AND j2.ICD_FK = j1.ICD_FK
AND j2.ID < j1.ID
)
;
will delete, for each KHA_ID and ICD_FK, all but one relevant row of ICD_Junction. (Specifically, it will keep the one with the least ID, and delete the rest.)
Once you've run the above, you should fix whatever code caused the duplication, and add a unique constraint to prevent this from happening again.
(Disclaimer: Not tested, and it's been a while since I last used SQL Server.)
Edited to add: If I'm understanding your comment correctly, you also need help with the query to find duplicates? For that, you can write:
SELECT KHA_ID,
ICD_FK,
COUNT(1) -- the number of duplicates
FROM ICD_Junction
GROUP
BY KHA_ID,
ICD_FK
HAVING COUNT(1) > 1
;

The original question was delete but the comment was find
Select jDup.*
FROM ICD_Junction AS j
JOIN ICD_Junction AS jDup
On j.KHA_ID = jDup.KHA_ID
AND j.ICD_FK = jDup.ICD_FK
AND j.ID < jDup.ID
Select max(jDup.ID), min(jDup.ID), count(*), jDup.KHA_ID, jDup.ICD_FK
FROM ICD_Junction AS jDup
Group By jDup.KHA_ID, jDup.ICD_FK
Having Count(*) > 1

You want something that uses ROW_NUMBER() and partition by. The reason is that it will let you pick one row to keep from a table that doesn't have a unique id. Like if this was a pure intersection table with no identity, you could use a variation on this to delete all rows where RowID > 1, leaving you just the unique rows. And it works just as well when you do have a unique id, where you can choose to preserve the earliest id.
select * from (select KHA_ID, ICD_FK, ROW_NUMBER()
OVER(PARTITION BY KHA_ID, ICD_FK
ORDER BY ID ASC) AS RowID
from ICD_Junction ) ordered where RowID > 1

How do I check if all posts from a joined table has the same value in a column?

I'm building a BI report for a client where there is a 1-n related join involved.
The joined table has a field for employee ID (EmplId).
The query that I've built for this report is supposed to give a 1 in its field "OneEmployee" if all the related posts have the same employee in the EmplId field, null if it's different employees, i.e:
TaskTrans
TaskTransHours > EmplId: 'John'
TaskTransHours > EmplId: 'John'
This should give a 1 in the said field in the query
TaskTrans
TaskTransHours > EmplId: 'John'
TaskTransHours > EmplId: 'George'
This should leave the said field blank
The idea is to create a field where a case function checks this and returns the correct value. But my problem is whereas there is a way to check for this through SQL.

select not count(*) from your_table
where employee_id = GIVEN_ID
and your_field not in ( select min(your_field)
from your_table
where employee_id = GIVEN_ID);
Note: my first idea was to use LIMIT 1 in the inner query, but MYSQL didn't like it, so min it was - the points to use any, but only one. Min should work, but the field should be indexed, then this query will actually execute rather fast, as only indexes would be used (obviously employee_id should also be indexed).
Note2: Do not get too confused with not in front of count(*), you want 1 when there is none that is different, I count different ones, and then give you the not count(*), which will be one if count is 0, otherwise 0.

Seems a job for a window COUNT():
SELECT
…,
CASE COUNT(DISTINCT TaskTransHours.EmplId) OVER () WHEN 1 THEN 1 END
AS OneEmployee
FROM …

Getting the last record in SQL in WHERE condition

i have loanTable that contain two field loan_id and status
loan_id status
==============
1 0
2 9
1 6
5 3
4 5
1 4 <-- How do I select this??
4 6
In this Situation i need to show the last Status of loan_id 1 i.e is status 4. Can please help me in this query.

Since the 'last' row for ID 1 is neither the minimum nor the maximum, you are living in a state of mild confusion. Rows in a table have no order. So, you should be providing another column, possibly the date/time when each row is inserted, to provide the sequencing of the data. Another option could be a separate, automatically incremented column which records the sequence in which the rows are inserted. Then the query can be written.
If the extra column is called status_id, then you could write:
SELECT L1.*
FROM LoanTable AS L1
WHERE L1.Status_ID = (SELECT MAX(Status_ID)
FROM LoanTable AS L2
WHERE L2.Loan_ID = 1);
(The table aliases L1 and L2 could be omitted without confusing the DBMS or experienced SQL programmers.)
As it stands, there is no reliable way of knowing which is the last row, so your query is unanswerable.

Does your table happen to have a primary id or a timestamp? If not then what you want is not really possible.
If yes then:
SELECT TOP 1 status
FROM loanTable
WHERE loan_id = 1
ORDER BY primaryId DESC
-- or
-- ORDER BY yourTimestamp DESC

I assume that with "last status" you mean the record that was inserted most recently? AFAIK there is no way to make such a query unless you add timestamp into your table where you store the date and time when the record was added. RDBMS don't keep any internal order of the records.

But if last = last inserted, that's not possible for current schema, until a PK addition:
select top 1 status, loan_id
from loanTable
where loan_id = 1
order by id desc -- PK

Use a data reader. When it exits the while loop it will be on the last row. As the other posters stated unless you put a sort on the query, the row order could change. Even if there is a clustered index on the table it might not return the rows in that order (without a sort on the clustered index).
SqlDataReader rdr = SQLcmd.ExecuteReader();
while (rdr.Read())
{
}
string lastVal = rdr[0].ToString()
rdr.Close();
You could also use a ROW_NUMBER() but that requires a sort and you cannot use ROW_NUMBER() directly in the Where. But you can fool it by creating a derived table. The rdr solution above is faster.

In oracle database this is very simple.
select * from (select * from loanTable order by rownum desc) where rownum=1

Hi if this has not been solved yet.
To get the last record for any field from a table the easiest way would be to add an ID to each record say pID. Also say that in your table you would like to hhet the last record for each 'Name', run the simple query
SELECT Name, MAX(pID) as LastID
INTO [TableName]
FROM [YourTableName]
GROUP BY [Name]/[Any other field you would like your last records to appear by]
You should now have a table containing the Names in one column and the last available ID for that Name.
Now you can use a join to get the other details from your primary table, say this is some price or date then run the following:
SELECT a.*,b.Price/b.date/b.[Whatever other field you want]
FROM [TableName] a LEFT JOIN [YourTableName]
ON a.Name = b.Name and a.LastID = b.pID
This should then give you the last records for each Name, for the first record run the same queries as above just replace the Max by Min above.
This should be easy to follow and should run quicker as well

If you don't have any identifying columns you could use to get the insert order. You can always do it like this. But it's hacky, and not very pretty.
select
t.row1,
t.row2,
ROW_NUMBER() OVER (ORDER BY t.[count]) AS rownum from (
select
tab.row1,
tab.row2,
1 as [count]
from table tab) t
So basically you get the 'natural order' if you can call it that, and add some column with all the same data. This can be used to sort by the 'natural order', giving you an opportunity to place a row number column on the next query.
Personally, if the system you are using hasn't got a time stamp/identity column, and the current users are using the 'natural order', I would quickly add a column and use this query to create some sort of time stamp/incremental key. Rather than risking having some automation mechanism change the 'natural order', breaking the data needed.

I think this code may help you:
WITH cte_Loans
AS
(
SELECT LoanID
,[Status]
,ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS RN
FROM LoanTable
)
SELECT LoanID
,[Status]
FROM LoanTable L1
WHERE RN = ( SELECT max(RN)
FROM LoanTable L2
WHERE L2.LoanID = L1.LoanID)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to do an update based on a count - SQL (postgres) - sql

Related

How to delete the X oldest items in a table in Postgres?

if-then-else construction in complex stored procedure

Delete multiple occurrences of the same ID # and code in a junction table

How do I check if all posts from a joined table has the same value in a column?

Getting the last record in SQL in WHERE condition

Categories

Resources