Postgres remove gaps in row numbers - sql

I have a table called assignment with an index and unit_id columns. Every assignment in a unit has a unique index. This way, assignments in a unit can be reordered by swapping their indexes.
I'm trying to figure out the best way to remove potential gaps in the indexes (if a row is deleted, the indexes will go 0, 1, 3 for example). My current solution is to loop through each assignment in a unit programmatically and run an UPDATE query if its index doesn't match the loop index. Like this:
let i = 0;
for (const assignmentId of assignmentIds) {
await Assignment.query()
.patch({ index: i })
.where('id', assignmentId);
i++;
}
I'm trying to figure out how to do this with a single query using the ROW_NUMBER function like this:
UPDATE assignment SET
index = subquery.new_index - 1
FROM (
SELECT
ROW_NUMBER() OVER () as new_index
FROM assignment
WHERE assignment.unit_id = 35
ORDER BY assignment.index
) as subquery
WHERE unit_id=35;
But when I run this, it just sets all the indexes in unit 35 to 1. Why is this happening?

You need to provide a new index for each row by identifying the rows by old index:
update assignment
set index = subquery.new_index - 1
from (
select index as old_index, row_number() over (order by index) as new_index
from assignment
where assignment.unit_id = 35
) as subquery
where unit_id = 35
and old_index = index
and new_index <> old_index + 1; -- eliminate unnecessary updates

You need to join the subquery on the PK column (you neend you need to include that in the sub-query to be able to do that). The order by needs to go into the window function, not the overall query:
UPDATE assignment
SET index = subquery.new_index - 1
FROM (
SELECT pk_column,
ROW_NUMBER() OVER (partition by unit_id order by index) as new_index
FROM assignment
) as subquery
WHERE subquery.pk_column = assignment.pk_column
If you only want to do that for a single unit, you can add AND unit_id = 35 to the WHERE clause of the UPDATE statement.

If you don't mind the original index, you can run update with a temp sequence:
CREATE TEMP SEQUENCE new_index;
SELECT SETVAL('new_index', 1);
And run the update based on your unit:
UPDATE assignment SET index=nextval('new_index') WHERE unit_id=35;
This will update all lines on the unit_id=35 on the index column.
What's the diference from your FROM clause? The nextval() executes a function for every line, returning the next number of sequence.

Since you specifically indicated unit_id was not unique ("sets all the indexes in unit 35") you can use rownum() to project the expected index for each intem (as others have indicated). But I also hate using of magic numbers, the following does not depend on them.
with expectations as
(select *
from (
select unit_id
, unit_index
, row_number() over (partition by unit_id order by unit_id, unit_index) exp_index
from units
) au
where unit_index != exp_index
)
update units u
set unit_index = exp_index
from expectations e
where (u.unit_id, u.unit_index) = (e.unit_id,e.unit_index);
See fiddle here.

Related

Postgresql update query taking too long to complete every time

I have PostgreSQL DB table user_book_details with 451007 records. The user_book_details table is getting populated on daily basis with around 1K new records.
I have the following query that is taking a long time(13 Hrs) to complete every time.
update user_book_details as A1 set min_date=
(select min(A2.acc_date) as min_date from user_book_details A2 where A2.user_id=A1.user_id
and A2.book_id=A1.book_id) where A1.min_date is null;
How I can rewrite the query to improve the performance?
FYI, there is no index on user_id and book_id column.
Your query is okay:
update user_book_details ubd
set min_date = (select min(ubd2.acc_date)
from user_book_details ubd2
where ubd2.user_id = ubd.user_id and
ubd2.book_id = ubd.book_id
)
where ubd.min_date is null;
For performance you want an index on user_book_details(user_id, book_id). I also think it would be faster written like this:
update user_book_details ubd
set min_date = min_acc_date
from (select ubd2.user_id, ubd2.book_id, min(ubd2.acc_date) as min_acc_date
from user_book_details ubd2
group by ubd2.user_id, ubd2.book_id
) ubd2
where ubd2.user_id = ubd.user_id and
ubd2.book_id = ubd.book_id and
ubd.min_date is null;
The first method uses the index to look up the values for each row (something that might be a little complicated when updating the same query). The second method aggregates the data and then joins in the values.
I should note that this value is easily calculated on the fly:
select ubd.*,
min(acc_date) over (partition by user_id, book_id) as min_acc_date
from user_book_details ubd;
This might be preferable to trying to keep it up-to-date in the table.

UPDATE … FROM syntax with sub-query

I have a table, items, with a priority column, which is just an integer. In trying to do some bulk operations, I'm trying to reset the priority to be a sequential number.
I've been able to use ROW_NUMBER() to successfully generate a table that has the new priority values I want. Now, I just need to get the values from that SELECT query into the matching records in the actual items table.
I've tried something like this:
UPDATE
"items"
SET
"priority" = tempTable.newPriority
FROM
(
SELECT
ROW_NUMBER() OVER (
ORDER BY
/* pile of sort conditions here */
) AS "newPriority"
FROM
"items"
) AS tempTable
WHERE
"items"."id" = "tempTable"."id"
;
I keep getting a syntax error "near FROM".
How can I correct the syntax here?
SQLite is not as flexible as other rdbms, it does not support even joins in an update statement.
What you can do instead is something like this:
update items
set priority = 1 + (
select count(*)
from items i
where i.id < items.id
)
With this the condition is derived only by the ids.
So the column priority will be filled with sequential numbers 1, 2, 3, ....
If you can apply that pile of sort conditions in this manner, you will make the update work.
Edit.
Something like this maybe can do what you need, although I'm not sure about its efficiency:
UPDATE items
SET priority = (
SELECT newPriority FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY /* pile of sort conditions here */) AS newPriority
FROM items
) AS tempTable
WHERE tempTable.id = items.id
)
It turns out that the root answer to this specific question is that SQLite doesn't support UPDATE ... FROM. Therefore, some alternative methods are needed.
https://www.sqlite.org/lang_update.html

Update columns in DB2 using randomly chosen static values provided at runtime

I would like to update rows with values chosen randomly from a set of possible values.
Ideally I would be able to provide this values at runtime, using JdbcTemplate from Java application.
Example:
In a table, column "name" can contain any name. The goal is to run through the table and change all names to equal to either "Bob" or "Alice".
I know that this can be done by creating a sql function. I tested it and it was fine but I wonder if it is possible to just use simple query?
This will not work, seems that the value is computed once, and applied to all rows:
UPDATE test.table
SET first_name =
(SELECT a.name
FROM
(SELECT a.name, RAND() idx
FROM (VALUES('Alice'), ('Bob')) AS a(name) ORDER BY idx FETCH FIRST 1 ROW ONLY) as a)
;
I tried using MERGE INTO, but it won't even run (possible_names is not found in SET query). I am yet to figure out why:
MERGE INTO test.table
USING
(SELECT
names.fname
FROM
(VALUES('Alice'), ('Bob'), ('Rob')) AS names(fname)) AS possible_names
ON ( test.table.first_name IS NOT NULL )
WHEN MATCHED THEN
UPDATE SET
-- select random name
first_name = (SELECT fname FROM possible_names ORDER BY idx FETCH FIRST 1 ROW ONLY)
;
EDIT: If possible, I would like to only focus on fields being updated and not depend on knowing primary keys and such.
Db2 seems to be optimizing away the subselect that returns your supposedly random name, materializing it only once, hence all rows in the target table receive the same value.
To force subselect execution for each row you need to somehow correlate it to the table being updated, for example:
UPDATE test.table
SET first_name =
(SELECT a.name
FROM (VALUES('Alice'), ('Bob')) AS a(name)
ORDER BY RAND(ASCII(SUBSTR(first_name, 1, 1)))
FETCH FIRST 1 ROW ONLY)
or may be even
UPDATE test.table
SET first_name =
(SELECT a.name
FROM (VALUES('Alice'), ('Bob')) AS a(name)
ORDER BY first_name, RAND()
FETCH FIRST 1 ROW ONLY)
Now that the result of subselect seems to depend on the value of the corresponding row in the target table, there's no choice but to execute it for each row.
If your table has a primary key, this would work. I've assumed the PK is column id.
UPDATE test.table t
SET first_name =
( SELECT name from
( SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY R) AS RN FROM
( SELECT *, RAND() R
FROM test.table, TABLE(VALUES('Alice'), ('Bob')) AS d(name)
)
)
AS u
WHERE t.id = u.id and rn = 1
)
;
There might be a nicer/more efficient solution, but I'll leave that to others.
FYI I used the following DDL and data to test the above.
create table test.table(id int not null primary key, first_name varchar(32));
insert into test.table values (1,'Flo'),(2,'Fred'),(3,'Sue'),(4,'John'),(5,'Jim');

Oracle Update using Lag function

I am trying to use a lag function to update effective start dates in a SCD2 dimension. I am trying to use a subquery to self join the table based on the PK. The update will not update based on the previous end date and instead will only update to the default value (which is itself). When I remove the default value I get an error because the effective start date cannot be null. When I just use the select I get the desired result.
Any assistance would be greatly appreciated, I'm sure its something simple!
update schema.d_account ac1
set effective_start_dt = (select lag(effective_end_dt, 1, effective_start_dt) over(partition by id order by effective_end_dt asc)
from schema.d_account ac2
where ac1.account_dim_key = ac2.account_dim_key),
audit_last_update_dt = sysdate,
where id in '0013600000USKLqAAP'
Table:
Desired results:
There could be one of these reasons why your update is not working. I do not know the exact structure of your table. You have not provided the sample data. If your account_dim_key is a unique for each row then the select lag() ... from schema.d_account ac2 where ac1.account_dim_key = ac2.account_dim_key) will return one row and you will effectively update effective_start_dt to effective_start_dt (being the default value of the lag function)
If your account_dim_key is the same for all these rows you have provided as a sample then the select lag() ... from schema.d_account ac2 where ac1.account_dim_key = ac2.account_dim_key) will return multiple rows and Oracle will complain that the UPDATE is not possible (there is a specific error message, I do not remember the exact wording.
To make your query work you need to use a different approach:
update schema.d_account ac1
set effective_start_dt = (select prev_dt from
(select lag(effective_end_dt, 1, effective_start_dt) over(partition by id order by effective_end_dt asc) as prev_dt
, ROWID as rid
from schema.d_account ac2) a where a.rid = ROWID),
audit_last_update_dt = sysdate,
where id in '0013600000USKLqAAP'
So, basically you have a sub-query a with the ROWID column where you build the previous date. For the UPDATE statement you join this sub-query by the ROWID.
Note: if your account_dim_key is unique for each row you can use it in place of the ROWID: you may get better performance depending on the indexes you have for your table
UPDATE: the above query may give you a bad performance. You will be better off with the MERGE statement below:
MERGE INTO (SELECT id, effective_start_dt, ROWID rid, audit_last_update_dt
FROM schema.d_account WHERE id in '0013600000USKLqAAP') ac1
USING (select lag(effective_end_dt, 1, effective_start_dt) over(partition by id order by effective_end_dt asc) as prev_dt
, ROWID as rid
from schema.d_account) ac2
ON (ac1.rid = ac2.rid)
WHEN MATCHED THEN UPDATE SET ac1.effective_start_dt = ac2.prev_dt,
ac1.audit_last_update_dt = sysdate;

How can I iterate through SQL results like a for loop and an array?

I have a table, and I want to select only the single column of row IDs from it, but in a specific order. Then, I want to loop through that column like below:
for (i=0; i<rows.length; i++)
{
if(i==rows.length-1)
UPDATE myTable SET nextID = NULL WHERE ID = rows[i]
ELSE
UPDATE myTable SET nextID = rows[i+1] WHERE ID = rows[i]
}
I just dont know how to access the results of my select statement with an index like that. Is there a way of doing this in sql server?
Since you didn't provide many details, let's pretend your table looks something like this:
create table MyTable (
Id int not null primary key,
Name varchar(50) not null,
NextId int
)
I want to select only the single column of row IDs from it, but in a specific order
Let's just say that in this case, you decide to order the rows alphabetically by Name. So let's pretend that the select statement that you want to loop through looks like this:
select Id
from MyTable
order by Name
That being the case, instead of looping through the rows and attempting to update each row using the pseudo-code you provided, you can replace the whole thing with a single update statement that will perform the exact same work:
with cte as (
select *,
NewNextId = lead(Id) over (order by Name)
from MyTable
)
update cte
set NextId = NewNextId
Just make sure to adjust the order by clause to whatever your specific order really is. I just used Name in my example, but it might be something else in your case.
You could use a cursor, or you could use something a bit smarter.
Your example should be able to be written fairly easily along the lines of:
update mytable set nextID = LEAD(id,1) over (order by id)
Lead(id,1) will grab the next id, 1 row ahead in the record set and update the nextID field with it. If it can't find one it will return null. No looping or conditional logic needed!
edit: I forgot the over clause. This is the part that tells it how you would like it ordered for the lead