Fast update query on millions table - sql

I'm updating a table with million records with a simple query but its taking huge timimg, wondering if someone could bring some magic with alternative to speed the process query bellow
UPDATE sources.product
SET partial=left(full,7);

You need to narrow the number of rows to make it go faster. Try a few things:
Reduce the number of indexes on the partial column. Each index requires an update when you change partial so one update may cause 2 or 3 other updates.
Timestamp your rows so you only update new ones.
Create a trigger to update partial when a row is inserted or updated.

indexing is necessary for a table if it contains big data, i think you should try re-indexing and then try using this command.

Related

How to delete records without generating redo log

I have a table like following:
Create Table Txn_History nologging (
ID number,
Comment varchar2(300),
... (Another 20 columns),
Std_hash raw(1000)
);
This table is 8GB with 19 Million rows with a growth of around 50,000 rows daily.
I need to delete 300,000 rows and update 100,000 rows. I know that normally delete and update statement will cause Oracle database to generate redo log. The only way I know to avoid this is to create a new table with the updated result.
However, consider that the delete and update statement is only talking about 2% of the entire table, it appears not very worth to create a new table, follow by all corresponding indexes.
Do you have any new idea?
To be honest I don't think that the redo generation is a big problem here: just 300k rows to delete and 100k rows to update... For such batch operations Oracle uses fast "array update" REDO operation. Probably you need to trace your operation to find out real bottlenecks and load profile(IO/CPU, access paths, triggers, indexes, etc).
Basically it's better to use the partitioning option properly to update/delete(or truncate) by whole partitions.
There is also new alter table ... move including rows where ... feature starting from Oracle 12.2:
https://blogs.oracle.com/sql/how-to-delete-millions-of-rows-fast-with-sql

Update time of a big table in PostgreSQL

i have a question for performance for update on a big table that is around 8 to 10GBs of size.
I have a task where i'm supposed to detect distinct values from a table of mentioned size with about 4.3 million rows and insert them into some table. This part is not really a problem but it's the update that follows afterwards. So i need to update some column based on the id of the created rows in the table i did an import. A example of the query i'm executing is:
UPDATE billinglinesstagingaws as s
SET product_id = p.id
FROM product AS p
WHERE p.key=(s.data->'product'->>'sku')::varchar(75)||'-'||(s.data->'lineitem'->>'productcode')::varchar(75) and cloudplatform_id = 1
So as mentioned, the staging table size is around 4.3 million rows and 8-10Gb and as it can be seen from the query, it has a JSONB field and the product table has around 1500 rows.
This takes about 12 minutes, which i'm not really sure if it is ok, and i'm really wondering, what i can do to speed it up somehow. There aren't foreign key constraints, there is a unique constraint on two columns together. There are no indexes on the staging table.
I attached the query plan of the query, so any advice would be helpful. Thanks in advance.
This is a bit too long for a comment.
Updating 4.3 million rows in a table is going to take some time. Updates take time because the the ACID properties of databases require that something be committed to disk for each update -- typically log records. And that doesn't count the time for reading the records, updating indexes, and other overhead.
So, about 17,000 updates per second isn't so bad.
There might be ways to speed up your query. However, you describe these as new rows. That makes me wonder if you can just insert the correct values when creating the table. Can you lookup the appropriate value during the insert, rather than doing so afterwards in an update?

SQL indexes and update timing

I have a table with 50 columns and non-clustered indexes for about 10 columns (FKs). The table contains about 10 million records.
My question is: when is the index updated during the update of 10k rows in the table (update includes indexed columns)? Does it happen after each row update or after the whole update is complete?
The problem is that the update is very long and we receive a DB connection timeout. How can I improve update time? I cannot remove indexes before update and rebuild them later, because the table is used heavily during the update too.
You should partition the table and try to use local indexes.
By partitioning you are dividing the table data so that you can operate on relevant data.
Local indexes also mean that index is partitioned as well, so speed will improve dramatically.
Have a look at this link: http://msdn.microsoft.com/en-in/library/ms190787.aspx
We have a massive system, the main table also have a couple of million records.... At least it used to. We move data older than 6 months out to an archive table. Typically data that old is only for reporting purposes. By doing this we were able to greatly improve performance on our live system. That been said, this may not be a viable solution for you.
Are the indexes fragmented (update table and FK table)?
Type of column and does it null?
Can you live with dirty reads (nolock)?
Can you live with not checking FK contraint?
Please post the update statement?
I trust you you are checking to not update with same value.
update tt
set col1 = 'newVal'
where col1 <> 'newVal'
If those index are in good condition update of 10 K rows should be pretty fast.
A fill factor might help.

How can I efficiently manipulate 500k records in SQL Server 2005?

I am getting a large text file of updated information from a customer that contains updates for 500,000 users. However, as I am processing this file, I often am running into SQL Server timeout errors.
Here's the process I follow in my VB application that processes the data (in general):
Delete all records from temporary table (to remove last month's data) (eg. DELETE * FROM tempTable)
Rip text file into the temp table
Fill in extra information into the temp table, such as their organization_id, their user_id, group_code, etc.
Update the data in the real tables based on the data computed in the temp table
The problem is that I often run commands like UPDATE tempTable SET user_id = (SELECT user_id FROM myUsers WHERE external_id = tempTable.external_id) and these commands frequently time out. I have tried bumping the timeouts up to as far as 10 minutes, but they still fail. Now, I realize that 500k rows is no small number of rows to manipulate, but I would think that a database purported to be able to handle millions and millions of rows should be able to cope with 500k pretty easily. Am I doing something wrong with how I am going about processing this data?
Please help. Any and all suggestions welcome.
subqueries like the one you give us in the question:
UPDATE tempTable SET user_id = (SELECT user_id FROM myUsers WHERE external_id = tempTable.external_id)
are only good on one row at a time, so you must be looping. Think set based:
UPDATE t
SET user_id = u.user_id
FROM tempTable t
inner join myUsers u ON t.external_id=u.external_id
and remove your loops, this will update all rows in one statement and be significantly faster!
Needs more information. I am manipulating 3-4 million rows in a 150 million row table regularly and I am NOT thinking this is a lot of data. I have a "products" table that contains about 8 million entries - includign full text search. No problems either.
Can you just elaborte on your hardware? I assume "normal desktop PC" or "low end server", both with absolutely non-optimal disc layout, and thus tons of IO problems - on updates.
Make sure you have indexes on your tables that you are doing the selects from. In your example UPDATE command, you select the user_id from the myUsers table. Do you have an index with the user_id column on the myUsers table? The downside of indexes is that they increase time for inserts/updates. Make sure you don't have indexes on the tables you are trying to update. If the tables you are trying to update do have indexes, consider dropping them and then rebuilding them after your import.
Finally, run your queries in SQL Server Management Studio and have a look at the execution plan to see how the query is being executed. Look for things like table scans to see where you might be able to optimize.
Look at the KM's answer and don't forget about indexes and primary keys.
Are you indexing your temp table after importing the data?
temp_table.external_id should definitely have an index since it is in the where clause.
There are more efficient ways of importing large blocks of data. Look in SQL Books Online under BCP (Bulk Copy Protocol.)

Oracle SQL technique to avoid filling trans log

Newish to Oracle programming (from Sybase and MS SQL Server). What is the "Oracle way" to avoid filling the trans log with large updates?
In my specific case, I'm doing an update of potentially a very large number of rows. Here's my approach:
UPDATE my_table
SET a_col = null
WHERE my_table_id IN
(SELECT my_table_id FROM my_table WHERE some_col < some_val and rownum < 1000)
...where I execute this inside a loop until the updated row count is zero,
Is this the best approach?
Thanks,
The amount of updates to the redo and undo logs will not at all be reduced if you break up the UPDATE in multiple runs of, say 1000 records. On top of it, the total query time will be most likely be higher compared to running a single large SQL.
There's no real way to address the UNDO/REDO log issue in UPDATEs. With INSERTs and CREATE TABLEs you can use a DIRECT aka APPEND option, but I guess this doesn't easily work for you.
Depends on the percent of rows almost as much as the number. And it also depends on if the update makes the row longer than before. i.e. going from null to 200bytes in every row. This could have an effect on your performance - chained rows.
Either way, you might want to try this.
Build a new table with the column corrected as part of the select instead of an update. You can build that new table via CTAS (Create Table as Select) which can avoid logging.
Drop the original table.
Rename the new table.
Reindex, repoint contrainst, rebuild triggers, recompile packages, etc.
you can avoid a lot of logging this way.
Any UPDATE is going to generate redo. Realistically, a single UPDATE that updates all the rows is going to generate the smallest total amount of redo and run for the shortest period of time.
Assuming you are updating the vast majority of the rows in the table, if there are any indexes that use A_COL, you may be better off disabling those indexes before the update and then doing a rebuild of those indexes with NOLOGGING specified after the massive UPDATE statement. In addition, if there are any triggers or foreign keys that would need to be fired/ validated as a result of the update, getting rid of those temporarily might be helpful.