avoiding write conflicts while re-sorting a table - sql

I have a large table that I need to re-sort periodically. I am partly basing this on a suggestion I was given to stay away from using cluster keys since I am inserting data ordered differently (by time) from how I need it clustered (by ID), and that can cause re-clustering to get a little out of control.
Since I am writing to the table on a hourly I am wary of causing problems with these two processes conflicting: If I CTAS to a newly sorted temp table and then swap the table name it seems like I am opening the door to have a write on the source table not make it to the temp table.
I figure I can trigger a flag when I am re-sorting that causes the ETL to pause writing, but that seems a bit hacky and maybe fragile.
I was considering leveraging locking and transactions, but this doesn't seem to be the right use case for this since I don't think I'd be locking the table I am copying from while I write to a new table. Any advice on how to approach this?

I've asked some clarifying questions in the comments regarding the clustering that you are avoiding, but in regards to your sort, have you considered creating a nice 4XL warehouse and leveraging the INSERT OVERWRITE option back into itself? It'd look something like:
INSERT OVERWRITE INTO table SELECT * FROM table ORDER BY id;
Assuming that your table isn't hundreds of TB in size, this will complete rather quickly (inside an hour, I would guess), and any inserts into the table during that period will queue up and wait for it to finish.

There are some reasons to avoid the automatic reclustering, but they're basically all the same reasons why you shouldn't set up a job to re-cluster frequently. You're making the database do all the same work, but without the built in management of it.
If your table is big enough that you are seeing performance issues with the clustering by time, and you know that the ID column is the main way that this table is filtered (in JOINs and WHERE clauses) then this is probably a good candidate for automatic clustering.
So I would recommend at least testing out a cluster key on the ID and then monitoring/comparing performance.
To give a brief answer to the question about resorting without conflicts as written:
I might recommend using a time column to re-sort records older than a given time (probably in a separate table). While it's sorting, you may get some new records. But you will be able to use that time column to marry up those new records with the, now sorted, older records.

You might consider revoking INSERT, UPDATE, DELETE privileges on the original table within the same script or procedure that performs the CTAS creating the newly sorted copy of the table. After a successful swap you can re-enable the privileges for the roles that are used to perform updates.

Related

Does it affect performance to frequently repopulate a highly read database table?

I have a database table with about 2500 rows in, which is frequently read by my web application. Will it affect the performance of reading from that table if all of the data in it is frequently (e.g. every 1-5 minutes) deleted and re-inserted?
By that I mean:
DELETE FROM MyTable
INSERT INTO MyTable SELECT ...
Probably not, at the given numbers ...
However, if you have one or more index(es) on your table (to help with read/select, or automatically on any PK/UK ...) you should consider that every delete/insert may result in re-calculation of any such index (on top of the delete/insert as such), not directly affecting table-reads as such, but adding to the overall load on the DB server.
There is no sourcecode, but it appears you are using this table as intermediate/interface to sth. else, so while 'updating' you'd probably want to make sure to bundle your delete(s)/insert(s) in transactions, best you can, rather than e.g. executing them all individually, like in a loop. Or see if you can keep your PKs and rather just update ...?
This could also help reduce fragmentation in the underlying storage ...

How to perform data archive in SQL Server with many tables?

Let's say I have a database with many tables in it. I want to perform data archiving on certain tables, that is create a same table with same structures (same constraint, indexes, columns, triggers, etc) as a new table and insert specific data into the new table from the old table.
Example, current table has data from 2008-2017 and I want to move only data from 2010-2017 into the new table. Then after that, I can delete the old table and rename the new table with naming conventions similar to old table.
How should I approach this?
For the sort of clone-rename-drop logic you're talking about, the basics are pretty straight forward. Really the only time this is a good idea is if you have a table with a large amount of data, which you can't afford down time or blocking on, and you only plan to do this one. The process looks something like this:
Insert all the data from your original table into the clone table
In a single transaction, sp_rename the original table from (for example) myTable to myTable_OLD (just something to distinguish it from the real table). Then sp_rename the clone table from (for example) myTable_CLONE to myTable
Drop myTable_OLD when you're happy everything has worked how you want. If it didn't work how you want, just sp_rename the objects back.
Couple considerations to think about if you go that route
Identity columns: If your table has any identities on it, you'll have to use identity_insert on then reseed the identity to pick up at where the old identity left off
Do you have the luxury of blocking the table while you do this? Generally if you need to do this sort of thing, the answer is no. What I find works well is to insert all the rows I need using (nolock), or however you need to do it so the impact of the select from the original table is mitigated. Then, after I've moved 99% of the data, I will then open a transaction, block the original table, insert just the new data that's come in since the bulk of the data movement, then do the sp_rename stuff
That way you don't lock anything for the bulk of the data movement, and you only block the table for the very last bit of data that came into the original table between your original insert and your sp_rename
How you determine what's come in "since you started" will depend on how your table is structured. If you have an identity or a datestamp column, you can probably just pick rows which came in after the max of those fields you moved over. If your table does NOT have something you can easily hook into, you might need to get creative.
Alternatives
A couple other alternatives that came to mind:
Table Partitioning:
This shards a single table across multiple partitions (which can be managed sort of like individual tables). You can, say, partition you data by year, then when you want to purge the trailing year of data, you "switch out" that partition to a special table which you can then truncate. All those operations are meta-data only, so they're super fast. This also works really well for huge amounts of data where deletes and all their pesky transaction logging aren't feasible
The downside to table partitioning is it's kind of a pain to set up and manage.
Batched Deletes:
If you're data isn't too big, you could just do batched deletes on the trailing end of your data. If you can find a way to get clustered index seeks for your deletes, they should be reasonably lightweight. As long as you're not accumulating data faster than you can get rid of it, the benefit of this kind of thing is you just run it semi-continuously and it just nibbles away at the trailing end of your data
Snapshot Isolation:
If deletes cause too much blocking, you can also set up something like snapshot isolation, which basically stores historical versions of rows in tempdb. Any query which sets isolation level read committed snapshot will then read those pre-change rows instead of contend for locks on the "real" table. You can then do batched deletes to your hearts content and know that any queries that hit the table will never get blocked by a delete (or any other DML operation) because they'll either read the pre-delete snapshot, or they'll read the post-delete snapshot. They won't wait for an in-process delete to figure out whether it's going to commit or rollback. This is not without its drawbacks as well unfortunately. For large data sets, it can put a big burden on tempdb and it too can be a little bit of a black box. It's also going to require buy-in from your DBAs.

Is there a more elegant way to detect changes in a large SQL table without altering it? [duplicate]

This question already has answers here:
How can I get a hash of an entire table in postgresql?
(7 answers)
Closed 9 years ago.
Suppose you have a reasonably large (for local definitions of “large”), but relatively stable table.
Right now, I want to take a checksum of some kind (any kind) of the contents of the entire table.
The naïve approach might be to walk the entire table, taking the checksum (say, MD5) of the concatenation of every column on each row, and then perhaps concatenate them and take its MD5sum.
From the client side, that might be optimized a little by progressively appending columns' values into the MD5 sum routine, progressively mutating the value.
The reason for this, is that at some point in future, we want to re-connect to the database, and ensure that no other users may have mutated the table: that includes INSERT, UPDATE, and DELETE.
Is there a nicer way to determine if any change/s have occurred to a particular table? Or a more efficient/faster way?
Update/clarification:
We are not able/permitted to make any alterations to the table itself (e.g. adding a “last-updated-at” column or triggers or so forth)
(This is for Postgres, if it helps. I'd prefer to avoid poking transaction journals or anything like that, but if there's a way to do so, I'm not against the idea.)
Adding columns and triggers is really quite safe
While I realise you've said it's a large table in a production DB so you say you can't modify it, I want to explain how you can make a very low impact change.
In PostgreSQL, an ALTER TABLE ... ADD COLUMN of a nullable column takes only moments and doesn't require a table re-write. It does require an exclusive lock, but the main consequence of that is that it can take a long time before the ALTER TABLE can actually proceed, it won't hold anything else up while it waits for a chance to get the lock.
The same is true of creating a trigger on the table.
This means that it's quite safe to add a modified_at or created_at column and an associated trigger function to maintain them to a live table that's in intensive real-world use. Rows added before the column was created will be null, which makes perfect sense since you don't know when they were added/modified. Your trigger will set the modified_at field whenever a row changes, so they'll get progressively filled in.
For your purposes it's probably more useful to have a trigger-maintained side-table that tracks the timestamp of the last change (insert/update/delete) anywhere in the table. That'll save you from storing a whole bunch of timestamps on disk and will let you discover when deletes have happened. A single-row side-table with a row you update on each change using a FOR EACH STATEMENT trigger will be quite low-cost. It's not a good idea for most tables because of contention - it essentially serializes all transactions that attempt to write to the table on the row update lock. In your case that might well be fine, since the table is large and rarely updated.
A third alternative is to have the side table accumulate a running log of the timestamps of insert/update/delete statements or even the individual rows. This allows your client read the change-log table instead of the main table and make small changes to its cached data rather than invalidating and re-reading the whole cache. The downside is that you have to have a way to periodically purge old and unwanted change log records.
So... there's really no operational reason why you can't change the table. There may well be business policy reasons that prevent you from doing so even though you know it's quite safe, though.
... but if you really, really, really can't:
Another option is to use the existing "md5agg" extension: http://llg.cubic.org/pg-mdagg/ . Or to apply the patch currently circulating pgsql-hackers to add an "md5_agg" to the next release to your PostgreSQL install if you built from source.
Logical replication
The bi-directional replication for PostgreSQL project has produced functionality that allows you to listen for and replay logical changes (row inserts/updates/deletes) without requiring triggers on tables. The pg_receivellog tool would likely suit your purposes well when wrapped with a little scripting.
The downside is that you'd have to run a patched PostgreSQL 9.3, so I'm guessing if you can't change a table, running a bunch of experimental code that's likely to change incompatibly in future isn't going to be high on your priority list ;-) . It's included in the stock release of 9.4 though, see "changeset extraction".
Testing the relfilenode timestamp won't work
You might think you could look at the modified timestamp(s) of the file(s) that back the table on disk. This won't be very useful:
The table is split into extents, individual files that by default are 1GB each. So you'd have to find the most recent timestamp across them all.
Autovacuum activity will cause these timestamps to change, possibly quite a while after corresponding writes happened.
Autovacuum must periodically do an automatic 'freeze' of table contents to prevent transaction ID wrap-around. This involves progressively rewriting the table and will naturally change the timestamp. This happens even if nothing's been added for potentially quite a long time.
Hint-bit setting results in small writes during SELECT. These writes will also affect the file timestamps.
Examine the transaction logs
In theory you could attempt to decode the transaction logs with pg_xlogreader and find records that affect the table of interest. You'd have to try to exclude activity caused by vacuum, full page writes after hint bit setting, and of course the huge amount of activity from every other table in the entire database cluster.
The performance impact of this is likely to be huge, since every change to every database on the entire system must be examined.
All in all, adding a trigger on a table is trivial in comparison.
What about creating a trigger on insert/update/delete events on the table? The trigger could call a function that inserts a timestamp into another table which would mark the time for any table-changing event.
The only concern would be an update event updated using the same data currently in the table. The trigger would fire, even though the table didn't really change. If you're concerned about this case, you could make the trigger call a function that generates a checksum against just the updated rows and compares against a previously generated checksum, which would usually be more efficient than scanning and checksumming the whole table.
Postgres documentation on triggers here: http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html
If you simply just want to know when a table has last changed without doing anything to it, you can look at the actual file(s) timestamp(s) on your database server.
SELECT relfilenode FROM pg_class WHERE relname = 'your_table_name';
If you need more detail on exactly where it's located, you can use:
select t.relname,
t.relfilenode,
current_setting('data_directory')||'/'||pg_relation_filepath(t.oid)
from pg_class t
join pg_namespace ns on ns.oid = t.relnamespace
where relname = 'your_table_name';
Since you did mention that it's quite a big table, it will definitely be broken into segments, and toasts, but you can utilize the relfilenode as your base point, and do a ls -ltr relfilenode.* or relfilnode_* where relfilenode is the actual relfilenode from above.
These files gets updated at every checkpoint if something occured on that table, so depending on how often your checkpoints occur, that's when you'll see the timestamps update, which if you haven't changed the default checkpoint interval, it's within a few minutes.
Another trivial, but imperfect way to check if INSERTS or DELETES have occurred is to check the table size:
SELECT pg_total_relation_size('your_table_name');
I'm not entirely sure why a trigger is out of the question though, since you don't have to make it retroactive. If your goal is to ensure nothing changes in it, a trivial trigger that just catches an insert, update, or delete event could be routed to another table just to timestamp an attempt but not cause any activity on the actual table. It seems like you're not ensuring anything changes though just by knowing that something changed.
Anyway, hope this helps you in this whacky problem you have...
A common practice would be to add a modified column. If it were MySQL, I'd use timestamp as datatype for the field (updates to current date on each updade). Postgre must have something similar.

Making structural changes to very large tables in an online environment

So here's what I'm facing.
The Problem
A large table, with ~230,000,000
rows.
We want to change the
clustering index and primary key of
this table to a simple bigint
identity field. There is one other
empty field being added to the table,
for future use.
The existing table
has a composite key. For the sake of
argument, let's say it's 2 bigint's.
The first one may have 1 or 10,000
'children' in the 2nd part of the
key.
Requirements
Minimal downtime, like preferably the
length of time it takes to run
SP_Rename.
Existing rows may change
while we're copying data. The updates
must be reflected in the new table.
Ideas
Put a trigger on existing table,
to update row in new table if it
already exists there.
Iterate through original table, copying data
into new table ~10,000 at a time.
Maybe 2,000 of the first part of the
old key.
When the copy is
complete, rename the old table to
"ExistingTableOld" and the new one
from "NewTable" to "ExistingTable".
This should allow stored procs to
continue to run without intervention
Are there any glaring omissions in the plan, or best practices I'm ignoring?
Difficult problem. Your plan sounds good, but I'm not totally sure you really need to batch the query as long as you run it in a transaction isolation level of READ UNCOMMITTED to stop locks being generated.
My experience making big schema changes is big changes are best done during a maintenance window—at night/over a weekend—when users are booted off the system. Just like running dbcc checkdb with the repair option. Then, when things go south, you have the option to roll back to the full backup that you providentially made right before starting the upgrade.
Item #3 on your list: Renaming the old/new tables. You'll probably want to recompile the stored procedures/views. My experience is that execution plans are bound against the object ids rather than object names.
Consider table dbo.foo: if it is renamed to dbo.foo_old, any stored procedures or user-defined functions won't necessarily error out until the dependent object is recompiled and its execution plan rebound. Cached execution plans continue to work perfectly fine.
sp_recompile is your friend.

Deleting rows from a contended table

I have a DB table in which each row has a randomly generated primary key, a message and a user. Each user has about 10-100 messages but there are 10k-50k users.
I write the messages daily for each user in one go. I want to throw away the old messages for each user before writing the new ones to keep the table as small as possible.
Right now I effectively do this:
delete from table where user='mk'
Then write all the messages for that user. I'm seeing a lot of contention because I have lots of threads doing this at the same time.
I do have an additional requirement to retain the most recent set of messages for each user.
I don't have access to the DB directly. I'm trying to guess at the problem based on some second hand feedback. The reason I'm focusing on this scenario is that the delete query is showing a lot of wait time (again - to the best of my knowledge) plus it's a newly added bit of functionality.
Can anyone offer any advice?
Would it be better to:
select key from table where user='mk'
Then delete individual rows from there? I'm thinking that might lead to less brutal locking.
If you do this everyday for every user, why not just delete every record from the table in a single statement? Or even
truncate table whatever reuse storage
/
edit
The reason why I suggest this approach is that the process looks like a daily batch upload of user messages preceded by a clearing out of the old messages. That is, the business rules seems to me to be "the table will hold only one day's worth of messages for any given user". If this process is done for every user then a single operation would be the most efficient.
However, if users do not get a fresh set of messages each day and there is a subsidiary rule which requires us to retain the most recent set of messages for each user then zapping the entire table would be wrong.
No, it is always better to perform a single SQL statement on a set of rows than a series of "row-by-row" (or what Tom Kyte calls "slow-by-slow") operations. When you say you are "seeing a lot of contention", what are you seeing exactly? An obvious question: is column USER indexed?
(Of course, the column name can't really be USER in an Oracle database, since it is a reserved word!)
EDIT: You have said that column USER is not indexed. This means that each delete will involve a full table scan of up to 50K*100 = 5 million rows (or at best 10K * 10 = 100,000 rows) to delete a mere 10-100 rows. Adding an index on USER may solve your problems.
Are you sure you're seeing lock contention? It seems more likely that you're seeing disk contention due to too many concurrent (but unrelated updates). The solution to that is simply to reduce the number of threads you're using: Less disk contention will mean higher total throughput.
I think you need to define your requirements a bit clearer...
For instance. If you know all of the users who you want to write messages for, insert the IDs into a temp table, index it on ID and batch delete. Then the threads you are firing off are doing two things. Write the ID of the user to a temp table, Write the message to another temp table. Then when the threads have finished executing, the main thread should
DELETE * FROM Messages INNER JOIN TEMP_MEMBERS ON ID = TEMP_ID
INSERT INTO MESSAGES SELECT * FROM TEMP_messges
im not familiar with Oracle syntax, but that is the way i would approach it IF the users messages are all done in rapid succession.
Hope this helps
TALK TO YOUR DBA
He is there to help you. When we DBAs take access away from the developers for something such as this, it is assumed we will provide the support for you for that task. If your code is taking too long to complete and that time appears to be tied up in the database, your DBA will be able to look at exactly what is going on and offer suggestions or possibly even solve the problem without you changing anything.
Just glancing over your problem statement, it doesn't appear you'd be looking at contention issues, but I don't know anything about your underlying structure.
Really, talk to your DBA. He will probably enjoy looking at something fun instead of planning the latest CPU deployment.
This might speed things up:
Create a lookup table:
create table rowid_table (row_id ROWID ,user VARCHAR2(100));
create index rowid_table_ix1 on rowid_table (user);
Run a nightly job:
truncate table rowid_table;
insert /*+ append */ into rowid_table
select ROWID row_id , user
from table;
dbms_stats.gather_table_stats('SCHEMAOWNER','ROWID_TABLE');
Then when deleting the records:
delete from table
where ROWID IN (select row_id
from rowid_table
where user = 'mk');
Your own suggestion seems very sensible. Locking in small batches has two advantages:
the transactions will be smaller
locking will be limited to only a few rows at a time
Locking in batches should be a big improvement.