SQL indexes and update timing - sql

I have a table with 50 columns and non-clustered indexes for about 10 columns (FKs). The table contains about 10 million records.
My question is: when is the index updated during the update of 10k rows in the table (update includes indexed columns)? Does it happen after each row update or after the whole update is complete?
The problem is that the update is very long and we receive a DB connection timeout. How can I improve update time? I cannot remove indexes before update and rebuild them later, because the table is used heavily during the update too.

You should partition the table and try to use local indexes.
By partitioning you are dividing the table data so that you can operate on relevant data.
Local indexes also mean that index is partitioned as well, so speed will improve dramatically.
Have a look at this link: http://msdn.microsoft.com/en-in/library/ms190787.aspx

We have a massive system, the main table also have a couple of million records.... At least it used to. We move data older than 6 months out to an archive table. Typically data that old is only for reporting purposes. By doing this we were able to greatly improve performance on our live system. That been said, this may not be a viable solution for you.

Are the indexes fragmented (update table and FK table)?
Type of column and does it null?
Can you live with dirty reads (nolock)?
Can you live with not checking FK contraint?
Please post the update statement?
I trust you you are checking to not update with same value.
update tt
set col1 = 'newVal'
where col1 <> 'newVal'
If those index are in good condition update of 10 K rows should be pretty fast.
A fill factor might help.

Related

Update time of a big table in PostgreSQL

i have a question for performance for update on a big table that is around 8 to 10GBs of size.
I have a task where i'm supposed to detect distinct values from a table of mentioned size with about 4.3 million rows and insert them into some table. This part is not really a problem but it's the update that follows afterwards. So i need to update some column based on the id of the created rows in the table i did an import. A example of the query i'm executing is:
UPDATE billinglinesstagingaws as s
SET product_id = p.id
FROM product AS p
WHERE p.key=(s.data->'product'->>'sku')::varchar(75)||'-'||(s.data->'lineitem'->>'productcode')::varchar(75) and cloudplatform_id = 1
So as mentioned, the staging table size is around 4.3 million rows and 8-10Gb and as it can be seen from the query, it has a JSONB field and the product table has around 1500 rows.
This takes about 12 minutes, which i'm not really sure if it is ok, and i'm really wondering, what i can do to speed it up somehow. There aren't foreign key constraints, there is a unique constraint on two columns together. There are no indexes on the staging table.
I attached the query plan of the query, so any advice would be helpful. Thanks in advance.
This is a bit too long for a comment.
Updating 4.3 million rows in a table is going to take some time. Updates take time because the the ACID properties of databases require that something be committed to disk for each update -- typically log records. And that doesn't count the time for reading the records, updating indexes, and other overhead.
So, about 17,000 updates per second isn't so bad.
There might be ways to speed up your query. However, you describe these as new rows. That makes me wonder if you can just insert the correct values when creating the table. Can you lookup the appropriate value during the insert, rather than doing so afterwards in an update?

Fast update query on millions table

I'm updating a table with million records with a simple query but its taking huge timimg, wondering if someone could bring some magic with alternative to speed the process query bellow
UPDATE sources.product
SET partial=left(full,7);
You need to narrow the number of rows to make it go faster. Try a few things:
Reduce the number of indexes on the partial column. Each index requires an update when you change partial so one update may cause 2 or 3 other updates.
Timestamp your rows so you only update new ones.
Create a trigger to update partial when a row is inserted or updated.
indexing is necessary for a table if it contains big data, i think you should try re-indexing and then try using this command.

deleting rows will improve select performance in oracle?

i have a huge table (200mln records). about 70% is not need now (there is column ACTIVE in a table and those records have value N ). There are a lot of multi-column indexes but none of them includes that column. Will removing that 70% records improve SELECT (ACTIVE='Y') performance (because oracle has to read table blocks with no active records and then exclude them from final result)? Is shrink space necessary?
It's really impossible to say without knowing more about your queries.
At one extreme, access by primary key would only improve if the height of the supporting index was reduced, which would probably require deletion of the rows and then a rebuild of the index.
At the other extreme, if you're selecting nearly all active records then a full scan of the table with 70% of the rows removed (and the table shrunk) would take only 30% of the pre-deletion time.
There are many other considerations -- selecting a set of data and accessing the table via indexes, and needing to reject 99% of rows after reading the table because it turns out that there's a positive correlation between the required rows and an inactive status.
One way of dealing with this would be through list partitioning the table on the ACTIVE column. That would move inactive records to a partition that could be eliminated from many queries, with no need to index the column, and would keep the time for full scans of active records down.
If you really do not need these inactive records, why do you just not delete them instead of marking them inactive?
Edit: Furthermore, although indexing a column with a 70/30 split is not generally helpful, you could try a couple of other indexing tricks.
For example, if you have an indexed column which is frequently used in queries (client_id?) then you can add the active flag to that index. You could also construct a partial index:
create index my_table_active_clients
on my_table (case when active = 'Y' then client_id end);
... and then query on:
select ...
from ...
where (case when active = 'Y' then client_id end) = :client_id
This would keep the index smaller, and both indexing approaches would probably be helpful.
Another edit: A beneficial side effect of partitioning could be that it keeps the inactive records and active records "physically" apart, and every block read into memory from the "active" partition of course only has active records. This could have the effect of improving your cache efficiency.
Partitioning, putting the active='NO' records in a separate partition, might be a good option.
http://docs.oracle.com/cd/B19306_01/server.102/b14223/parpart.htm
Yes it will most likely. But depending on your access schema the increase will most likely not as big. Setting an index including the column would be a better solution for future IMHO.
Most probably no. Delete will not reduce the size of the table's segment. Additional maintenance might help. After the DELETE execute also:
ALTER TABLE <tablename> SHRINK SPACE COMPACT;
ALTER INDEX <indexname> SHRINK SPACE COMPACT; -- for every table's index
Alternatively you can use old school approach:
ALTER TABLE <tablename> MOVE;
ALTER INDEX <indexnamename> REBUILD;
When delting 70% of table also consider possible solution CTAS (create table as select). It will be much faster.
Indexing plays a vital role in SELECT query. The performance will drastically increase
if you use those indexed columns in the query. Ya deleting rows will enhance the performance
for sure somewhat but not drastically.

ORACLE 11g SET COLUMN NULL for specific Partition of large table

I have a Composite-List-List partitioned table with 19 Columns and about 400 million rows. Once a week new data is inserted in this table and before the insert I need to set the values of 2 columns to null for specific partitions.
Obvious approach would be something like the following where COLUMN_1 is the partition criteria:
UPDATE BLABLA_TABLE
SET COLUMN_18 = NULL, SET COLUMN_19 = NULL
WHERE COLUMN_1 IN (VALUE1, VALUE2…)
Of course this would be awfully slow.
My second thought was to use CTAS for every partition that I need to set those two columns to null and then use EXCHANGE PARTITION to update the data in my big table. Unfortunately that wouldn’t work because it´s a Composite-Partition.
I could use the same approach with subpartitions but then I would have to use CATS about 8000 times and drop those tables afterwards every week. I guess that would not pass the upcoming code-review.
May somebody has another idea how to performantly solve this?
PS: I’m using ORACLE 11g as database.
PPS: Sorry for my bad English…..
You've ruled out updating through DDL (switch partitions), so this lets us with only DML to consider.
I don't think that it's actually that bad an update with a table so heavily partitioned. You can easily split the update in 8k mini updates (each a single tiny partition):
UPDATE BLABLA_TABLE SUBPARTITION (partition1) SET COLUMN_18 = NULL...
Each subpartition would contain 15k rows to be updated on average so the update would be relatively tiny.
While it still represents a very big amount of work, it should be easy to set to run in parallel, hopefully during hours where database activity is very light. Also the individual updates are easy to restart if one of them fails (rows locked?) whereas a 120M update would take such a long time to rollback in case of error.
If I were to update almost 90% of rows in table, I would check feasibility/duration of just inserting to another table of same structure (less redo, no row chaining/migration, bypass cache and so on via direct insert. drop indexes and triggers first. exclude columns to leave them null in target table), rename the tables to "swap" them, rebuild indexes and triggers, then drop the old table.
From my experience in data warehousing, plain direct insert is better than update/delete. More steps needed but it's done in less time overall. I agree, partition swap is easier said than done when you have to process most of the table and just makes it more complex for the ETL developer (logic/algorithm bound to what's in the physical layer), we haven't encountered need to do partition swaps so far.
I would also isolate this table in its own tablespaces, then alternate storage between these two tablespaces (insert to 2nd drop table from 1st, vice-versa in next run, resize empty tablespace to reclaim space).

How can I efficiently manipulate 500k records in SQL Server 2005?

I am getting a large text file of updated information from a customer that contains updates for 500,000 users. However, as I am processing this file, I often am running into SQL Server timeout errors.
Here's the process I follow in my VB application that processes the data (in general):
Delete all records from temporary table (to remove last month's data) (eg. DELETE * FROM tempTable)
Rip text file into the temp table
Fill in extra information into the temp table, such as their organization_id, their user_id, group_code, etc.
Update the data in the real tables based on the data computed in the temp table
The problem is that I often run commands like UPDATE tempTable SET user_id = (SELECT user_id FROM myUsers WHERE external_id = tempTable.external_id) and these commands frequently time out. I have tried bumping the timeouts up to as far as 10 minutes, but they still fail. Now, I realize that 500k rows is no small number of rows to manipulate, but I would think that a database purported to be able to handle millions and millions of rows should be able to cope with 500k pretty easily. Am I doing something wrong with how I am going about processing this data?
Please help. Any and all suggestions welcome.
subqueries like the one you give us in the question:
UPDATE tempTable SET user_id = (SELECT user_id FROM myUsers WHERE external_id = tempTable.external_id)
are only good on one row at a time, so you must be looping. Think set based:
UPDATE t
SET user_id = u.user_id
FROM tempTable t
inner join myUsers u ON t.external_id=u.external_id
and remove your loops, this will update all rows in one statement and be significantly faster!
Needs more information. I am manipulating 3-4 million rows in a 150 million row table regularly and I am NOT thinking this is a lot of data. I have a "products" table that contains about 8 million entries - includign full text search. No problems either.
Can you just elaborte on your hardware? I assume "normal desktop PC" or "low end server", both with absolutely non-optimal disc layout, and thus tons of IO problems - on updates.
Make sure you have indexes on your tables that you are doing the selects from. In your example UPDATE command, you select the user_id from the myUsers table. Do you have an index with the user_id column on the myUsers table? The downside of indexes is that they increase time for inserts/updates. Make sure you don't have indexes on the tables you are trying to update. If the tables you are trying to update do have indexes, consider dropping them and then rebuilding them after your import.
Finally, run your queries in SQL Server Management Studio and have a look at the execution plan to see how the query is being executed. Look for things like table scans to see where you might be able to optimize.
Look at the KM's answer and don't forget about indexes and primary keys.
Are you indexing your temp table after importing the data?
temp_table.external_id should definitely have an index since it is in the where clause.
There are more efficient ways of importing large blocks of data. Look in SQL Books Online under BCP (Bulk Copy Protocol.)