I would like to populate a table from a (potentially large) view on a scheduled basis.
My process would be:
Disable indexes on table
Truncate table
Copy data from view to table
Enable indexes on table
In SQL Server, I can wrap the process in a transaction such that when I truncate the table a schema modification lock will be held until I commit. This effectively means that no other process can insert/update/whatever until the entire process is complete.
However I am aware that in Oracle the truncate table statement is considered DDL and will thus issue an implicit commit.
So my question is how can I mimic the behaviour of SQL Server here? I don't want any other process trying to insert/update/whatever whilst I am truncating and (re)populating the table. I would also prefer my other process to be unaware of any locks.
Thanks in advance.
Make your table a partitioned table with a single partition and local indexes only. Then whenever you need to refresh:
Copy data from view into a new temporary table
CREATE TABLE tmp AS SELECT ... FROM some_view;
Exchange the partition with the temporary table:
ALTER TABLE some_table
EXCHANGE PARTITION part WITH TABLE tmp
WITHOUT VALIDATION;
The table is only locked for the duration of the partition exchange, which, without validation and global index update, should be instant.
Related
I am trying to do a full load of a table in big query daily, as part of ETL. The target table has dummy partition column of type integer and is clustered. I want to have the statement to be atomic i.e either it will completely overwrite the new data or rollback to old data in case of failure for any reason in between and it will serve user queries with old data until it completely overwritten.
One way of doing this is delete and insert but big query does not support multi statement transactions.
I am thinking to use the below statement. Please let me know if this is atomic.
create or replace table_1 partition by dummy_int cluster dummy_column
as select col1,col2,col3 from stage_table1
I am trying to create a stored procedure to recreate a table from scratch, with a possible change of schema (including possible additions/removals of columns), by using a DROP TABLE followed by a SELECT INTO, like this:
BEGIN TRAN
DROP TABLE [MyTable]
SELECT (...) INTO [MyTable] FROM (...)
COMMIT
My concern is that errors could be generated if someone tries to access the table after it has been dropped but before the SELECT INTO has completed. Is there a way to lock [MyTable] in a way that will persist through the DROP?
Instead of DROP/SELECT INTO, I could TRUNCATE/INSERT INTO, but this would not allow the schema to be changed. SELECT INTO is convenient in my situation because it allows the new schema to be automatically determined. Is there a way to make this work safely?
Also, I would like to be sure that the source tables in "FROM (...)" are not locked during this process.
If you try to make a significant change to the table (like adding a column in the middle of existing columns, not at the end) using SSMS and see what script it generates, you'll see that SSMS uses sp_rename.
The general structure of the SSMS's script:
create a new table with temporary name
populate the new table with data
drop the old table
rename the new table to the correct name.
All this in a transaction.
This should keep the time when tables are locked to a minimum.
BEGIN TRANSACTION
SELECT (...) INTO dbo.Temp_MyTable FROM (...)
DROP TABLE dbo.MyTable
EXECUTE sp_rename N'dbo.Temp_MyTable', N'dbo.MyTable', 'OBJECT'
COMMIT
DROP TABLE MyTable acquires a schema modification (Sch-M) lock on it until the end of transaction, so all other queries using MyTable would wait. Even if other queries use the READ UNCOMMITTED isolation level (or the infamous WITH (NOLOCK) hint).
See also MSDN Lock Modes:
Schema Locks
The Database Engine uses schema modification (Sch-M)
locks during a table data definition language (DDL) operation, such as
adding a column or dropping a table. During the time that it is held,
the Sch-M lock prevents concurrent access to the table. This means the
Sch-M lock blocks all outside operations until the lock is released.
I want to create a temp table so as to be able to join it to a few tables because joining those tables with the content of the proposed temporary table takes a lot of time (fetching the content of the temporary table is time consuming.Repeating it over and over takes more and more time). I am dropping the temporary table when my needs are accomplished.
I want to know if these temporary tables would be visible over other client session(my requirement is to make them visible only for current client session). I am using postgresql. It would be great if you could suggest better alternatives to the solution I am thinking of.
PostgreSQL then is the database for you. Temporary tables done better than the standard. From the docs,
Although the syntax of CREATE TEMPORARY TABLE resembles that of the SQL standard, the effect is not the same. In the standard, temporary tables are defined just once and automatically exist (starting with empty contents) in every session that needs them. PostgreSQL instead requires each session to issue its own CREATE TEMPORARY TABLE command for each temporary table to be used. This allows different sessions to use the same temporary table name for different purposes, whereas the standard's approach constrains all instances of a given temporary table name to have the same table structure.
Pleas read the documentation.
Temporary tables are only visible in the current session and are automatically dropped when the database session ends.
If you specify ON COMMIT, the temporary table will automatically be dropped at the end of the current transaction.
If you need good table statistics on a temporary table, you have to call ANALYZE explicitly, as these statistics are not collected automatically.
By default , temporary tables are visible for current session only and temporary tables are dropped automatically after commit or closing that transaction, so you don't need to drop explicitly.
The auto vacuum daemon cannot access and therefore cannot vacuum or analyze temporary tables. For this reason, appropriate vacuum and analyze operations should be performed via session SQL commands. For example, if a temporary table is going to be used in complex queries, it is wise to run ANALYZE on the temporary table after it is populated.
Optionally, GLOBAL or LOCAL can be written before TEMPORARY or TEMP. This presently makes no difference in PostgreSQL and is deprecated
In my project having 23 million records and around 6 fields has been indexed of that table.
Earlier I tested to add delta column for Thinking Sphinx search but it turns in holding the whole database lock for an hour. Afterwards when the file is added and I try to rebuild indexes this is the query that holds the database lock for around 4 hours:
"update user_messages set delta = false where delta = true"
Well for making the server up I created a new database from db dump and promote it as database so server can be turned live.
Now what I am looking is that adding delta column in my table with out table lock is it possible? And once the column delta is added then why is the above query executed when I run the index rebuild command and why does it block the server for so long?
PS.: I am on Heroku and using Postgres with ika db model.
Postgres 11 or later
Since Postgres 11, only volatile default values still require a table rewrite. The manual:
Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and its indexes to be rewritten.
Bold emphasis mine. false is immutable. So just add the column with DEFAULT false. Super fast, job done:
ALTER TABLE tbl ADD column delta boolean DEFAULT false;
Postgres 10 or older, or for volatile DEFAULT
Adding a new column without DEFAULT or DEFAULT NULL will not normally force a table rewrite and is very cheap. Only writing actual values to it creates new rows. But, quoting the manual:
Adding a column with a DEFAULT clause or changing the type of an
existing column will require the entire table and its indexes to be rewritten.
UPDATE in PostgreSQL writes a new version of the row. Your question does not provide all the information, but that probably means writing millions of new rows.
While doing the UPDATE in place, if a major portion of the table is affected and you are free to lock the table exclusively, remove all indexes before doing the mass UPDATE and recreate them afterwards. It's faster this way. Related advice in the manual.
If your data model and available disk space allow for it, CREATE a new table in the background and then, in one transaction: DROP the old table, and RENAME the new one. Related:
Best way to populate a new column in a large table?
While creating the new table in the background: Apply all changes to the same row at once. Repeated updates create new row versions and leave dead tuples behind.
If you cannot remove the original table because of constraints, another fast way is to build a temporary table, TRUNCATE the original one and mass INSERT the new rows - sorted, if that helps performance. All in one transaction. Something like this:
BEGIN
SET temp_buffers = 1000MB; -- or whatever you can spare temporarily
-- write-lock table here to prevent concurrent writes - if needed
LOCK TABLE tbl IN SHARE MODE;
CREATE TEMP TABLE tmp AS
SELECT *, false AS delta
FROM tbl; -- copy existing rows plus new value
-- ORDER BY ??? -- opportune moment to cluster rows
-- DROP all indexes here
TRUNCATE tbl; -- empty table - truncate is super fast
ALTER TABLE tbl ADD column delta boolean DEFAULT FALSE; -- NOT NULL?
INSERT INTO tbl
TABLE tmp; -- insert back surviving rows.
-- recreate all indexes here
COMMIT;
You could add another table with the one column, there won't be any such long locks. Of course there should be another column, a foreign key to the first column.
For the indexes, you could use "CREATE INDEX CONCURRENTLY", it doesn't use too heavy locks on this table http://www.postgresql.org/docs/9.1/static/sql-createindex.html.
Does the time it takes to drop a table in SQL reflect the quantity of data within the table?
Lets say for dropping an identical table one with 100000 rows and 1000 rows.
Is this MySQL? The ext3 Linux filesystem is known to be slow to drop large tables. The time to drop will have to do more with the size of the table, more than the rows in the table.
http://www.mysqlperformanceblog.com/2009/06/16/slow-drop-table/
It will definitely depend on the dbserver and on the specific storage params in that db server and upon the existence of LOBs in the table. For MOST scenarios, it's very fast.
In Oracle, dropping a table is very quick and unlikely to matter compared to other operations you would be performing (creating the table and populating it).
It would not be idomatic to be creating and dropping tables enough in Oracle for this to be any sort of a performance factor. You would instead consider using global temporary tables.
From http://download.oracle.com/docs/cd/B28359_01/server.111/b28318/schema.htm
In addition to permanent tables,
Oracle Database can create temporary
tables to hold session-private data
that exists only for the duration of a
transaction or session.
The CREATE GLOBAL TEMPORARY TABLE
statement creates a temporary table
that can be transaction-specific or
session-specific. For
transaction-specific temporary tables,
data exists for the duration of the
transaction. For session-specific
temporary tables, data exists for the
duration of the session. Data in a
temporary table is private to the
session. Each session can only see and
modify its own data. DML locks are not
acquired on the data of the temporary
tables. The LOCK statement has no
effect on a temporary table, because
each session has its own private data.