Teradata - Which approach is faster for larger tables, update or delete and insert - sql

I have a very large table in Teradata.
I have to run a UDF on a column and store the output of the UDF back in to the column.
We can do it in two ways
do a direct update of that column using the update query
create a staging table and select the rows with UDF applied and load the staging table. then delete all the rows from the original table and load the data back from the staging table.
Which is a better option performance-wise?

Related

What is an efficient way to bulk copy data from a CLOB column to a VARCHAR2 column in Oracle

I have a table TEST that has 41 million+ records in it.
I have two main columns in this table that I am interested in:
MESSAGE of type CLOB
MESSAGE_C of type VARCHAR2(2048)
The table Test is range partitioned using a partition column named PART_DATE where one partition has data for one day.
I tried using the below to get the job done:
ALTER TABLE TEST ADD MESSAGE_C VARCHAR2(2048);
UPDATE TEST SET MESSAGE_C = MESSAGE;
COMMIT;
ALTER TABLE TEST DROP COLUMN MESSAGE;
ALTER TABLE TEST RENAME COLUMN MESSAGE_C TO MESSAGE;
But I got stuck on step 2 for around 4 hours. Our DBA said, these was a blocking due to full table scans.
Can someone please tell me:
What would be a better/more efficient way to get this done?
Would using the PART_DATE field in the where clause of the update query help?
Consider using an INSERT INTO SELECT to create the new table on the fly with a new name, then add the indexes after creating the table, drop the old table, and rename the new table to the old name.
It's a DML operation, so it will be significantly faster, and also isn't slowed down by server logging settings.
I've used this approach to alter tables with 500 million records a bit recently.

Is it possible to update one table from another table without a join in Vertica?

I have two tables A(i,j,k) and B(m,n).
I want to update the 'm' column of B table by taking sum(j) from table A. Is it possible to do it in Vertica?
Following code can be used for Teradata, but does Vertica have this kind of flexibility?
Update B from (select sum(j) as m from A)a1 set m=a1.m;
The Teradata SQL syntax won't work with Vertica, but the following query should do the same thing :
update B set m = (select sum(j) from A)
Depending on the size of your tables, this may not be an efficient way to update data. Vertical is a WORM (write once read many times) store, and is not optimized for updates or deletes.
An alternate way would be to first temporarily move the data in the target table to another intermediate (but not temporary) table. After that write a join query using the other table to produce the desired result, and finally use export table with that join query. Finally drop the intermediate table. Of course, this is assuming you have partitioned your table in a way suitable for your update logic.

TSQL - If Exists Update Else Insert

I have a data file with data in below format.
--removed the data and query
Thank you!
Reverse the order of operations - first update lookup tables, then do your import merge. You can remove your temp table, the inserts, and updates for those lookup tables. You just need a merge statement to bring in the new and/or updated rows.

Oracle SQL merge tables without specifying columns

I have a table people with less than 100,000 records and I have taken a backup of this table using the following:
create table people_backup as select * from people
I add some new records to my people table over time, but eventually I want to merge the records from my backup table into people. Unfortunately I cannot simply DROP my table as my new records will be lost!
So I want to update the records in my people table using the records from people_backup, based on their primary key id and I have found 2 ways to do this:
MERGE the tables together
use some sort of fancy correlated update
Great! However, both of these methods use SET and make me specify what columns I want to update. Unfortunately I am lazy and the structure of people may change over time and while my CTAS statement doesn't need to be updated, my update/merge script will need changes, which feels like unnecessary work for me.
Is there a way merge entire rows without having to specify columns? I see here that not specifying columns during an INSERT will direct SQL to insert values by order, can the same methodology be applied here, is this safe?
NB: The structure of the table will not change between backups
Given that your table is small, you could simply
DELETE FROM table t
WHERE EXISTS( SELECT 1
FROM backup b
WHERE t.key = b.key );
INSERT INTO table
SELECT *
FROM backup;
That is slow and not particularly elegant (particularly if most of the data from the backup hasn't changed) but assuming the columns in the two tables match, it does allow you to not list out the columns. Personally, I'd much prefer writing out the column names (presumably those don't change all that often) so that I could do an update.

Merge Statement VS Lookup Transformation

I am stuck with a problem with different views.
Present Scenario:
I am using SSIS packages to get data from Server A to Server B every 15 minutes.Created 10 packages for 10 different tables and also created 10 staging table for the same. In the DataFlow Task it is selecting data from server A with ID greater last imported ID and dumping them onto a Staging table.(Each table has its own stagin table).After the DataFlow task I am using a MERGE statement to merge records from Staging table to Destination table where ID is NO Matched.
Problem:
This will take care all new records inserted but if once a record is picked by SSIS job and is update at the source I am not able to pick it up again and not able to grab the updated data.
Questions:
How will I be able to achieve the Update with impacting the source database server too much.
Do I use MERGE statement and select 10,000 records every single run?(every 15 minutes)
Do I use LookUp transformation to do the updates
Some tables have more than 2 million records and growing, so what is the best approach for them.
NOTE:
I can truncate tables in destination and reinsert complete data for the first run.
Edit:
The Source has a column 'LAST_UPDATE_DATE' which I can Use in my query.
If I'm understanding your statements correctly it sounds like you're pretty close to your solution. If you currently have a merge statement that includes the insert (where source does not match destination) you should be able to easily include the update statement for the (where source matches destination).
example:
MERGE target_table as destination_table_alias
USING (
SELECT <column_name(s)>
FROM source_table
) AS source_alias
ON
[source_table].[table_identifier] = [destination_table_alias].[table_identifier]
WHEN MATCHED THEN UPDATE
SET [destination_table_alias.column_name1] = mySource.column_name1,
[destination_table_alias.column_name2] = mySource.column_name2
WHEN NOT MATCHED THEN
INSERT
([column_name1],[column_name2])
VALUES([source_alias].[column_name1],mySource.[column_name2])
So, to your points:
Update can be achieved via the 'WHEN MATCHED' logic within the merge statement
If you have the last ID of the table that you're loading, you can include this as a filter on your select statement so that the dataset is incremental.
No lookup is needed with the 'WHEN MATCHED' is utilized.
utilizing a select filter in the select portion of the merge statement.
Hope this helps