I have PostgreSQL DB table user_book_details with 451007 records. The user_book_details table is getting populated on daily basis with around 1K new records.
I have the following query that is taking a long time(13 Hrs) to complete every time.
update user_book_details as A1 set min_date=
(select min(A2.acc_date) as min_date from user_book_details A2 where A2.user_id=A1.user_id
and A2.book_id=A1.book_id) where A1.min_date is null;
How I can rewrite the query to improve the performance?
FYI, there is no index on user_id and book_id column.
Your query is okay:
update user_book_details ubd
set min_date = (select min(ubd2.acc_date)
from user_book_details ubd2
where ubd2.user_id = ubd.user_id and
ubd2.book_id = ubd.book_id
)
where ubd.min_date is null;
For performance you want an index on user_book_details(user_id, book_id). I also think it would be faster written like this:
update user_book_details ubd
set min_date = min_acc_date
from (select ubd2.user_id, ubd2.book_id, min(ubd2.acc_date) as min_acc_date
from user_book_details ubd2
group by ubd2.user_id, ubd2.book_id
) ubd2
where ubd2.user_id = ubd.user_id and
ubd2.book_id = ubd.book_id and
ubd.min_date is null;
The first method uses the index to look up the values for each row (something that might be a little complicated when updating the same query). The second method aggregates the data and then joins in the values.
I should note that this value is easily calculated on the fly:
select ubd.*,
min(acc_date) over (partition by user_id, book_id) as min_acc_date
from user_book_details ubd;
This might be preferable to trying to keep it up-to-date in the table.
Related
I have a table called assignment with an index and unit_id columns. Every assignment in a unit has a unique index. This way, assignments in a unit can be reordered by swapping their indexes.
I'm trying to figure out the best way to remove potential gaps in the indexes (if a row is deleted, the indexes will go 0, 1, 3 for example). My current solution is to loop through each assignment in a unit programmatically and run an UPDATE query if its index doesn't match the loop index. Like this:
let i = 0;
for (const assignmentId of assignmentIds) {
await Assignment.query()
.patch({ index: i })
.where('id', assignmentId);
i++;
}
I'm trying to figure out how to do this with a single query using the ROW_NUMBER function like this:
UPDATE assignment SET
index = subquery.new_index - 1
FROM (
SELECT
ROW_NUMBER() OVER () as new_index
FROM assignment
WHERE assignment.unit_id = 35
ORDER BY assignment.index
) as subquery
WHERE unit_id=35;
But when I run this, it just sets all the indexes in unit 35 to 1. Why is this happening?
You need to provide a new index for each row by identifying the rows by old index:
update assignment
set index = subquery.new_index - 1
from (
select index as old_index, row_number() over (order by index) as new_index
from assignment
where assignment.unit_id = 35
) as subquery
where unit_id = 35
and old_index = index
and new_index <> old_index + 1; -- eliminate unnecessary updates
You need to join the subquery on the PK column (you neend you need to include that in the sub-query to be able to do that). The order by needs to go into the window function, not the overall query:
UPDATE assignment
SET index = subquery.new_index - 1
FROM (
SELECT pk_column,
ROW_NUMBER() OVER (partition by unit_id order by index) as new_index
FROM assignment
) as subquery
WHERE subquery.pk_column = assignment.pk_column
If you only want to do that for a single unit, you can add AND unit_id = 35 to the WHERE clause of the UPDATE statement.
If you don't mind the original index, you can run update with a temp sequence:
CREATE TEMP SEQUENCE new_index;
SELECT SETVAL('new_index', 1);
And run the update based on your unit:
UPDATE assignment SET index=nextval('new_index') WHERE unit_id=35;
This will update all lines on the unit_id=35 on the index column.
What's the diference from your FROM clause? The nextval() executes a function for every line, returning the next number of sequence.
Since you specifically indicated unit_id was not unique ("sets all the indexes in unit 35") you can use rownum() to project the expected index for each intem (as others have indicated). But I also hate using of magic numbers, the following does not depend on them.
with expectations as
(select *
from (
select unit_id
, unit_index
, row_number() over (partition by unit_id order by unit_id, unit_index) exp_index
from units
) au
where unit_index != exp_index
)
update units u
set unit_index = exp_index
from expectations e
where (u.unit_id, u.unit_index) = (e.unit_id,e.unit_index);
See fiddle here.
I want to update a column (variableA) in a table (myTable) only when
There is no dataset with this #variableA in the variableA column
There is already a dataset with #variableB in the variableB column and with 'DUMMY' in the variableA column
FYI: Another interface inserts the 'DUMMY' datasets before and I later need to update them with the real variables/numbers.
The code below is already working fine but I am wondering if there is a more "elegant" solution to do this. I want to avoid/change the last line ("SELECT COUNT(*)" etc.)
DECLARE #variableA nvarchar(10) = '12345'
DECLARE #variableB nvarchar(10) = '67890'
UPDATE TOP (1) myTable
SET variableA = #variableA,
timestamp = GETDATE()
WHERE variableB = #variableB
AND variableA = 'DUMMY'
AND (SELECT COUNT(*) FROM myTable WHERE variableA = #variableA) = 0
Can you please help me to find a smarter solution instead of this last line?
you can use not exists operator like this
not exists (SELECT 1 FROM myTable WHERE variableA = #variableA)
and if it again slow you can set index I_my_Table_variableA by your variableA column and it will be more faster(you can set index by variable because it almost unique and it will be good index)
Well, I would write it like this:
UPDATE myTable
SET variableA = #variableA,
timestamp = GETDATE()
WHERE variableB = #variableB
AND variableA = 'DUMMY'
AND NOT EXISTS (
SELECT 1
FROM myTable
WHERE variableA = #variableA
)
First, Using TOP without specifing ORDER BY is a mistake, since database tables are unsorted by nature, this actually means that you might get unexpected results.
Second, changing the (select count) > 0 to exists(select...) might improve performance (unless the optimizer is smart enough to use the same execution plan for both cases)
Also, for your future questions - Please avoid using images to show us sample data and desired results. Use DDL+DML to show sample data, and text to show desired results. If you do that, we can copy your sample data to a test environment and actually test the answers before posting them.
I am trying to use a lag function to update effective start dates in a SCD2 dimension. I am trying to use a subquery to self join the table based on the PK. The update will not update based on the previous end date and instead will only update to the default value (which is itself). When I remove the default value I get an error because the effective start date cannot be null. When I just use the select I get the desired result.
Any assistance would be greatly appreciated, I'm sure its something simple!
update schema.d_account ac1
set effective_start_dt = (select lag(effective_end_dt, 1, effective_start_dt) over(partition by id order by effective_end_dt asc)
from schema.d_account ac2
where ac1.account_dim_key = ac2.account_dim_key),
audit_last_update_dt = sysdate,
where id in '0013600000USKLqAAP'
Table:
Desired results:
There could be one of these reasons why your update is not working. I do not know the exact structure of your table. You have not provided the sample data. If your account_dim_key is a unique for each row then the select lag() ... from schema.d_account ac2 where ac1.account_dim_key = ac2.account_dim_key) will return one row and you will effectively update effective_start_dt to effective_start_dt (being the default value of the lag function)
If your account_dim_key is the same for all these rows you have provided as a sample then the select lag() ... from schema.d_account ac2 where ac1.account_dim_key = ac2.account_dim_key) will return multiple rows and Oracle will complain that the UPDATE is not possible (there is a specific error message, I do not remember the exact wording.
To make your query work you need to use a different approach:
update schema.d_account ac1
set effective_start_dt = (select prev_dt from
(select lag(effective_end_dt, 1, effective_start_dt) over(partition by id order by effective_end_dt asc) as prev_dt
, ROWID as rid
from schema.d_account ac2) a where a.rid = ROWID),
audit_last_update_dt = sysdate,
where id in '0013600000USKLqAAP'
So, basically you have a sub-query a with the ROWID column where you build the previous date. For the UPDATE statement you join this sub-query by the ROWID.
Note: if your account_dim_key is unique for each row you can use it in place of the ROWID: you may get better performance depending on the indexes you have for your table
UPDATE: the above query may give you a bad performance. You will be better off with the MERGE statement below:
MERGE INTO (SELECT id, effective_start_dt, ROWID rid, audit_last_update_dt
FROM schema.d_account WHERE id in '0013600000USKLqAAP') ac1
USING (select lag(effective_end_dt, 1, effective_start_dt) over(partition by id order by effective_end_dt asc) as prev_dt
, ROWID as rid
from schema.d_account) ac2
ON (ac1.rid = ac2.rid)
WHEN MATCHED THEN UPDATE SET ac1.effective_start_dt = ac2.prev_dt,
ac1.audit_last_update_dt = sysdate;
So I have tables with the following structure:
TimeStamp,
var_1,
var_2,
var_3,
var_4,
var_5,...
This contains about 600 columns named var_##, the user parses some data stored by a machine and I have to update all null values inside that table to the last valid value. At the moment I use the following query:
update tableName
set var_## =
(select b.var_## from tableName as
where b.timeStamp <= tableName.timeStamp and b.var_## is not null
order by timeStamp desc limit 1)
where tableName.var_## is null;
Problem right now is the tame it takes to run this query for all columns, is there any way to optimize this query?
UPDATE: this is the output query plan when executin te query for one column:
update wme_test2
set var_6 =
(select b.var_6 from wme_test2 as b
where b.timeStamp <= wme_test2.timeStamp and b.var_6 is not null
order by timeStamp desc limit 1)
where wme_test2.var_6 is null;
Having 600 indexes on the data columns would be silly. (But not necessarily more silly than having 600 columns.)
All queries can be sped up with an index on the timeStamp column.
Is there any way to update a table within the select_expr part of a mysql select query. Here is an example of what I am trying to achieve:
SELECT id, name, (UPDATE tbl2 SET currname = tbl.name WHERE tbl2.id = tbl.id) FROM tbl;
This gives me an error in mysql, but I dont see why this shouldn't be possible as long as I am not changing tbl.
Edit:
I will clarify why I cant use an ordinary construct for this.
Here is the more complex example of the problem which I am working on:
SELECT id, (SELECT #var = col1 FROM tbl2), #var := #var+1,
(UPDATE tbl2 SET col1 = #var) FROM tbl WHERE ...
So I am basically in a situation where I am incrementing a variable during the select statement and want to reflect this change as I am selecting the rows as I am using the value of this variable during the execution. The example given here can probably be implemented with other means, but the real example, which I wont post here due to there being too much unnecessary code, needs this functionality.
If your goal is to update tbl2 every time you query tbl1, then the best way to do that is to create a stored procedure to do it and wrap it in a transaction, possibly changing isolation levels if atomicity is needed.
You can't nest updates in selects.
What results do you want? The results of the select, or of the update.
If you want to update based on the results of a query you can do it like this:
update table1 set value1 = x.value1 from (select value1, id from table2 where value1 = something) as x where id = x.id
START TRANSACTION;
-- Let's get the current value
SELECT value FROM counters WHERE id = 1 FOR UPDATE;
-- Increment the counter
UPDATE counters SET value = value + 1 WHERE id = 1;
COMMIT;