I have a process that runs every 60 minutes. On one table I need to remove all data then insert records from a different table. The problem is it takes a long time to delete and reinsert the data. When the table has no data I am afraid the users will see this. Is there a way to refresh the data without users seeing this?
If you want to remove all data from the table then use the TRUNCATE
TABLE instead of delete - It'll do it faster.
As for the insert it is a bit hard to say because you did not give any details but what you can try is:
Option 1 - Using temp table
create table table_temp as select * from original_table where rownum < 1;
//insert into table_temp
drop table original_table;
Exec sp_rename 'table_temp' , 'original_table'
Option 2 - Use 2 tables "Active-Passive" -
Have 2 tables for the data and a view to select over them. The view will join with a third table that will specify from which of the tables to select. kind of an "active-passive" concept.
To demonstrate concept:
with active_table as ( select 'table1_active' active_table )
select 1 data
where 'table1_active' in (select * from active_table)
union all
select 2
where 'table2_active' in (select * from active_table)
//This returns only one record with the "1"
Are you truncating instead of deleting? A truncate (while logged) is much, much, faster then a delete.
If you cannot truncate try deleting 1000-10000 rows at a time (smaller log buildup and on deleting large amounts of rows great increase in speed.)
If you really want fast performance you can create a second table, fill it with data, and then drop the first table and rename the second table as the first table. You will lose all the permissions on the table when you do this so be sure to reapply the permissions to the renamed table.
If you are deleting all rows in a table, you can consider using a TRUNCATE statement against the table instead of a DELETE. It will speed up part of your process. Keep in mind that this will reset any identity seeds you may have on the table.
As suggested, you can wrap this process in a transaction and depending on how you set your transaction isolation level, you can control what your users will see if they query the data during the transaction.
Make it sequence based, your copied in records all have have a series number (all the same for all copied in records) and another file holds which sequence is active, and you always select on a join to this table - when you copy in new records they have a new sequence that is not yet active, when they are all copied in, then the sequence table is updated to the new sequence - the redundant sequence records are deleted at your leisure.
Example
Let's suppose your table has field SeriesNo added and table ActiveSeries has field SeriesNo.
All queries of your table:
SELECT *
FROM YourTable Y
JOIN ActiveSeries A
ON A.SeriesNo = Y.SeriesNo
then updating SeriesNo in ActiveSeries makes new series of records available instantly.
I would follow below approach. While I troubleshoot why the delete and reinsert is taking time.
Create a new table ( t1 ) which has same data as oldtable ( maintable )
Now do your stuff on t1.
When your stuff is done, rename t1 to maintable.
Related
I'm getting syntax errors when trying to create a temp table in BigQuery.
CREATE TABLE sleep_day select distinct *
FROM `<project>.<dataset>.sleepDay`
I tried to duplicate the entire dataset so I can drop duplicated values in the new table while keeping the original but not working for me as well.
SELECT * INTO sleep_day
FROM `<project>.<dataset>.sleepDay`
My goal is to remove duplicated values without losing the original data. I want to be able to go back to the original data if I need to.
Seams you show use a table that expires instead of a temp table.
Take in mind one thing:
Temporary tables let you save intermediate results to a table. These temporary tables exist at the session level, so you don't need to save or maintain them in a dataset. They are automatically deleted some time after the script completes.
It means you have to use it inside a script or session. Maybe its not what you need, as your table data may die some time after the script be executed. A table if expiration of some days may fit better for you
You can use the following query to crate a table that will expire in 3 days.
CREATE TABLE
`<project>.<dataset>.<temp_table_name>`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 3 DAY)
) AS
SELECT DISTINCT * FROM `<project>.<dataset>.<original_table>`
To load back the lines from the temp table to the original one do like:
INSERT INTO `<project>.<dataset>.<original_table>`
(field1, field2, ...)
SELECT
(field1, field2, ...)
FROM
`<project>.<dataset>.<temp_table_name>`
What does it do to create a Partition Switching table?
Then how can I fill these tables?
A partition switching table is a normal table. But it must be identical to the table you want to switch into.
With regards as to how do you fill a table? You might need to post more info... how are you filling your other tables? It's just another table. Fill it however you see fit.
The trick is....
once you have your table filled and you want to switch it in against another table you run this:
TRUNCATE TABLE targettable;
ALTER TABLE sourcetable SWITCH TO targettable;
You can also add this though I've never tested it (I will be today as I just found it today, here https://littlekendra.com/2017/01/19/why-you-should-switch-in-staging-tables-instead-of-renaming/)
TRUNCATE TABLE targettable;
ALTER TABLE sourcetable SWITCH TO targettable
WITH ( WAIT_AT_LOW_PRIORITY
(MAX_DURATION = 1 MINUTES, ABORT_AFTER_WAIT = BLOCKERS)
);
This replaces targettable with the data in sourcetable
As always, someone has done all this before you and me and blogged it, and it's only a google away
https://sqlsunday.com/2014/08/24/reloading-fact-tables-with-zero-downtime/
The basic idea... It's just a simple means of replacing an empty table with a populated table, without having to drop the empty table and rename the populated table. The only caviat is that both tables MUST have the exact same structure, including any and all indexes.
So, say you have 10 million rows of data and you want to delete 9 million of those rows. Deleting 9 million rows in one pop is likely to blow up your tempdb and your transaction logs. As an alternative, you can to a "SELECT INTO" to put the 1 million rows you want to keep into a new table (minimally logged)... add indexes that match the original table... truncate the original table (minimally logged) and the switch the partitions (minimally logged).
I am using postgres and have 2 tables Transaction and Backup.
I would like to transfer rows of data from Transaction to Backup.
There will be new rows of data in Transaction table.
How do I transfer only rows of data that have different values as the existing data in Transaction table as I do not want to have duplicate rows of data?
As long as data in 1 of the column is different, I will transfer the row from Transaction to Backup.
e.g
Day 1: Transaction (20 rows) , Backup (20 rows) [All transaction file being backup to Backup at night]
Day 2: Transaction (40 rows), Backup(20 rows) [The additional 20 rows in Transaction may contain duplicate rows as the previous 20 rows in Transaction. I only want to transfer non-duplicate rows to Backup]
Reading between the lines I think this is a harder question than you know.
The real issue here is that you don't really know what has changed. If information is append-only and we can assume they are all visible when the last backup was made then we just select rows inserted after a point in time. If these are not good assumptions, then you are going to have a long-term issue. a_horse_with_no_name has an ok solution above assuming your data is append-only (all inserts, no updates), but it is not going to perform very well as these tables bet bigger.
A few options you might consider instead:
A table audit trigger would allow you to specify columns, values, etc as well as when they are changed and it could do this real-time. That would be my first solution.
Even if it is insert only you may want to store information in the backup table regarding max ids or the like and go back only one backup in checking. Then you could use a_horse_with_no_name's solution as a template.
Suppose you want to transfer rows from table Source to table Result. From your question, I understand that they have the same columns.
As you mentioned, you need values from Source different from the ones, that are already in Result.
SELECT * FROM [Source]
WHERE column NOT IN (SELECT column FROM Result)
It will return ,,new" records. Now you need insert it:
INSERT INTO Result
SELECT * FROM [Source]
WHERE column NOT IN (SELECT column FROM Result)
Try this
insert into Backup (fields1, fields2, ......)
select fields1, fields2 from Transaction t where your condition by date here and not exists (select * from Backup b where t.fields1 = b.fields1
t.fields2 = b.fields2
.....................
)
This will insert if any changes happens in transaction table. if you change an existing row from transaction table -NULL also included- will be inserted into backup table. but you shouldnt have primary key in your backup table because you wont be able to insert that row:
duplicate key value violates unique constraint "tb_backup_pkey"
will comes out.
this should work for entire row:
insert into backup (col1, col2, col3, col4)
select t.* from transactions t
EXCEPT
select * from backup b
If I understand you correctly you want to copy data from the transactions table to the backup table.
Something like this should do it. As you haven't shown us the actual table definitions I had to use dummy names for the columns. pk_col is the primary key column of the table.
insert into backup (pk_col, col1, col2, col3, col4)
select t.*
from transactions t
full outer join backup b on t.pk_col = b.pk_col
where t is distinct from b;
This assumes that the target table does not have a unique key. If it does you need to use an ON CONFLICT clause
we've a table with 10 Billion rows. This table is Interval Partitioned on date. In a subpartition we need to update the date for 500 million rows that matches the criteria to a new value. This will definetly affect creation of new partition or something because the table is partitioned on the same date. Could anyone give me pointers to a best approach to follow?
Thanks in advance!
If you are going to update partitioning key and the source rows are in a single (sub)partition, then the reasonable approach would be to:
Create a temporary table for the updated rows. If possible, perform the update on the fly
CREATE TABLE updated_rows
AS
SELECT add_months(partition_key, 1), other_columns...
FROM original_table PARITION (xxx)
WHERE ...;
Drop original (sub)partition
ALTER TABLE original_table DROP PARTITION xxx;
Reinsert the updated rows back
INSERT /*+append*/ INTO original_table
SELECT * FROM updated_rows;
In case you have issues with CTAS or INSERT INTO SELECT for 500M rows, consider partitioning the temporary table and moving the data in batches.
hmmm... If you have enough space i would create a "copy" of the source table with the good updated rows, then check the results and drop the source table after it, in the end rename the "copy" to the source. Yes this have a long executing time, but this could be a painless way, of course parallel hint is needed.
You may consider to add a new column (Flag) 'updated' bit that have by fedault the values NULL (Or 0, i preffer NULL) to your table, and using the criticias of dates that you need to update you can update data group by group in the same way described by Kombajn, once the group of data is updated you can affect the value 1 to the flag 'updated' to your group of data.
For exemple lets start by making groups of datas, let consider that the critecia of groups is the year. so lets start to treate data year by year.
Create a temporary table of year 1 :
CREATE TABLE updated_rows
AS
SELECT columns...
FROM original_table PARITION (2001)
WHERE YEAR = 2001
...;
2.Drop original (sub)partition
ALTER TABLE original_table DROP PARTITION 2001;
3.Reinsert the updated rows back
INSERT /*+append*/ INTO original_table(columns....,updated)
SELECT columns...,1 FROM updated_rows;
Hope this will helps you to treat data step by step to prevent waiting all data of the table to be updated in once. You may consider a cursor that loop over years.
I have a table ( A ) in a database that doesn't have PK's it has about 300 k records.
I have a subset copy ( B ) of that table in other database, this has only 50k and contains a backup for a given time range ( july data ).
I want to copy from the table B the missing records into table A without duplicating existing records of course. ( I can create a database link to make things easier )
What strategy can I follow to succesfully insert into A the missing rows from B.
These are the table columns:
IDLETIME NUMBER
ACTIVITY NUMBER
ROLE NUMBER
DURATION NUMBER
FINISHDATE DATE
USERID NUMBER
.. 40 extra varchar columns here ...
My biggest concern is the lack of PK. Can I create something like a hash or a PK using all the columns?
What could be a possible way to proceed in this case?
I'm using Oracle 9i in table A and Oracle XE ( 10 ) in B
The approximate number of elements to copy is 20,000
Thanks in advance.
If the data volumes are small enough, I'd go with the following
CREATE DATABASE LINK A CONNECT TO ... IDENTIFIED BY ... USING ....;
INSERT INTO COPY
SELECT * FROM table#A
MINUS
SELECT * FROM COPY;
You say there are about 20,000 to copy, but not how many in the entire dataset.
The other option is to delete the current contents of the copy and insert the entire contents of the original table.
If the full datasets are large, you could go with a hash, but I suspect that it would still try to drag the entire dataset across the DB link to apply the hash in the local database.
As long as no duplicate rows should exist in the table, you could apply a Unique or Primary key to all columns. If the overhead of a key/index would be to much to maintain, you could also query the database in your application to see whether it exists, and only perform the insert if it is absent