merging data from old table into new for a monthly archive - sql

I have a sql statement to insert data into a table for archiving, but I need a merge statement to run on a monthyl basis to update the new table(2) with any data that changed in the old table(1) that should now be moved into archive.
Part of the issue is to remove the moved data from the old table. My insert is not doing that, but I need to have it to where the saved data is purged from the original table.
Is there a single sql statement that will move data out of one table into another in this way? Or does it need to be a two step operation?
the initial statement moved data depending on age and a few other relative factors.
insert is:
INSERT /*+ append */
INTO tab1
SELECT *
FROM tab2
WHERE (Postingdate < TO_DATE ('2001/07/01', 'yyyy/mm/dd')
OR jobname IS NULL)
AND STATUS <> '45';
All help appreciated...

The merge statement will let you do this in one statement by adding a delete statement in the update clause. See Oracle Documentation on Merge.

I think you should try this with a partition table. My idea is to create table which have range partition on date:
create table(id number primary key,name varchar,J_date date )
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(sysdate-30)),
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(maxvalue)));
then move that partition in to another table and and truncate that partition

Related

How to replace all the NULL values in postgresql?

I found the similar question and solution for the SQL server. I want to replace all my null values with zero or empty strings. I can not use the update statement because my table has 255 columns and using the update for all columns will consume lots of time.
Can anyone suggest to me, how to update all the null values from all columns at once in PostgreSQL?
If you want to replace the data on the fly while selecting the rows you need:
SELECT COALESCE(maybe_null_column, 0)
If you want the change to be saved on the table you need to use an UPDATE. If you have a lot of rows you can use a tool like pg-batch
You can also create a new table and then swap the old one and the new one:
# Create new table with updated values
CREATE TABLE new_table AS
SELECT COALESCE(maybe_null_column, 0), COALESCE(maybe_null_column2, '')
FROM my_table;
# Swap table
ALTER TABLE my_table RENAME TO obsolete_table;
ALTER TABLE new_table RENAME TO my_table;

How to append unique values from temp_tbl into original_tbl (SQL Server)?

I have a table that I'm trying to append unique values to. Every month I get list of user logins to import into this table. I would like to keep all the original values and just append the new and unique values onto the existing table. Both the table and the flatfile have a single column, with unique values, built like this:
_____
login
abcde001
abcde002
...
_____
I'm bulk ingesting the flat file into a temp table, with this:
IF OBJECT_ID('tempdb..#FLAT_FILE_TBL') IS NOT NULL
DROP TABLE #FLAT_FILE_TBL
CREATE TABLE #FLAT_FILE_TBL
(
ntlogin2 nvarchar(15)
)
BULK INSERT #FLAT_FILE_TBL
FROM 'C:\ImportFiles\logins_Dec2021.csv'
WITH (FIELDTERMINATOR = ' ');
Is there a join that would give me the table with existing values + new unique values appended? I'd rather not hard code a loop to evaluate it line by line.
Something like (pseudocode):
append unique {login} from temp_tbl into original_tbl
Hopefully it's an easy answer for someone out there.
Thanks!
Poster on Reddit r/sql provided this answer, which I'm pursuing:
Merge statement?
It looks like using a merge statement will do exactly what I want. Thanks for those who already posted replies.
You can check if a record exists using 'EXISTS' clause and insert if it doesn't exist in the target table. You can also use MERGE statement to achieve the same. Depending on what you want to do to the existing records in the target table, you can modify the Merge statement. Here since you only want to insert new records, you need to specify only what you want to do when a new record comes in. Here is an example
MERGE original_tbl T
USING temp_tbl S
ON T.login = S.login
WHEN NOT MATCHED THEN
INSERT (login)
VALUES(S.login)
Another solution would be to left join the target table to the temp table and insert only when the record doesn't exist.
INSERT INTO original_tbl(login)
SELECT S.Login
FROM temp_tbl S
LEFT JOIN original_tbl T
ON S.Login = T.Login
WHERE T.Login IS NULL

How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

I have a query that I want to execute daily that's to be partitioned by the date it's executed. The results of this query should be appended to a the same table.
My idea was ideally having something similar to the CREATE TABLE IF NOT EXISTS command for adding data by a new partition every day to the existing table if the partition doesn't already exist, but I can't figure out how I'd be able to integrate this in my query.
My query:
CREATE TABLE IF NOT EXISTS db_name.table_name
WITH (
external_location = 's3://my-query-results-location/',
format = 'PARQUET',
parquet_compression = 'SNAPPY',
partitioned_by = ARRAY['date_executed'])
AS
SELECT
{columns_that_I_am_selecting_here_including_'date_executed'}
What this does is create a new table for the first day it's executed but nothing happens for subsequent days, I'm assuming because of the CREATE TABLE IF NOT EXISTS validating that the table already exists and not proceeding with the logic.
Is there a way to modify my query to create a table for the first day executed and append the results by a new partition for each subsequent day?
I'm quite sure ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION would not apply to my use case here as I'm running a CTAS query.
You can simply use INSERT INTO existing_table SELECT....
Presumably your table is already partitioned, so include that partition column in the SELECT and Amazon Athena will automatically put the data in the correct directory.
For example, you might include hte column like this: SELECT ... CURRENT_DATE as date_executed
See: INSERT INTO - Amazon Athena

Need help to optimize my stored procedure

I need help optimizing my stored procedure. This is for our fact table, and currently the stored procedure truncates the table, and then loads the data back in. I want to get rid of truncating and instead append new rows or delete rows by a last_update column which currently does not exist. There also is a last_update table with one column, which changes at every stored procedure run, but I'd rather the last_update be a column in the table itself, rather than a separate column.
I've created a trigger that should update the last_updated column with the current date when the stored procedure runs, but I would also like to get rid of truncating and instead append/delete rows as well. The way the stored procedure is currently structured is making it difficult for me to figure out how best to do it.
The stored procedure begins by adding data into 2 temp tables, then adds the data from the two temp tables into a 3rd temp table, then truncates the current FACT TABLE and then the 3rd temp table finally inserts into the FACT table.
--CLEAR LAST UPDATE TABLE
TRUNCATE TABLE ADM.LastUpdate;
--SET NEW LAST UPDATE TIME
INSERT INTO ADM.LastUpdate(TABLE_NAME, UPDATE_TIME)
VALUES('FactBP', CONVERT(VARCHAR, GETDATE(), 100)+' (CST)');
--CHECK TO SEE IF TEMP TABLES EXISTS THEN DROP
IF OBJECT_ID('tempdb.dbo.#TEMP_CARTON', 'U') IS NOT NULL
DROP TABLE #TEMP_CARTON;
IF OBJECT_ID('tempdb.dbo.#TEMP_ORDER', 'U') IS NOT NULL
DROP TABLE #TEMP_ORDER;
--CREATE TEMP TABLES
SELECT *
INTO #TEMP_CARTON
FROM [dbo].[FACT_CARTON_V];
SELECT *
INTO #TEMP_ORDER
FROM [dbo].[FACT_ORDER_V];
--CHECK TO SEE IF DATA EXISTS IN #TEMP_CARTON AND #TEMP_ORDER
IF EXISTS(SELECT * FROM #TEMP_CARTON)
AND EXISTS(SELECT * FROM #TEMP_ORDER)
--CODE HERE joins the data from #TEMP_CARTON and #TEMP ORDER and puts it into a 3rd temp table #TEMP_FACT.
--CLEAR ALL DATA FROM FACTBP
TRUNCATE TABLE dbo.FactBP;
--INSERT DATA FROM TEMP TABLE TO FACTBP
INSERT INTO dbo.FactBP
SELECT
[SOURCE]
,[DC_ORDER_NUMBER]
,[CUSTOMER_PURCHASE_ORDER_ID]
,[BILL_TO]
,[CUSTOMER_MASTER_RECORD_TYPE]
,[SHIP_TO]
,[CUSTOMER_NAME]
,[SALES_ORDER]
,[ORDER_CARRIER]
,[CARRIER_SERVICE_ID]
,[CREATE_DATE]
,[CREATE_TIME]
,[ALLOCATION_DATE]
,[REQUESTED_SHIP_DATE]
,[ADJ_REQ_SHIP]
,[CANCEL_DATE]
,[DISPATCH_DATE]
,[RELEASED_DATE]
,[RELEASED_TIME]
,[PRIORITY_ORDER]
,[SHIPPING_LOAD_NUMBER]
,[ORDER_HDR_STATUS]
,[ORDER_STATUS]
,[DELIVERY_NUMBER]
,[DCMS_ORDER_TYPE]
,[ORDER_TYPE]
,[MATERIAL]
,[QUALITY]
,[MERCHANDISE_SIZE_1]
,[SPECIAL_PROCESS_CODE_1]
,[SPECIAL_PROCESS_CODE_2]
,[SPECIAL_PROCESS_CODE_3]
,[DIVISION]
,[DIVISION_DESC]
,[ORDER_QTY]
,[ORDER_SELECTED_QTY]
,[CARTON_PARCEL_ID]
,[CARTON_ID]
,[SHIP_DATE]
,[SHIP_TIME]
,[PACKED_DATE]
,[PACKED_TIME]
,[ADJ_PACKED_DATE]
,[FULL_CASE_PULL_STATUS]
,[CARRIER_ID]
,[TRAILER_ID]
,[WAVE_NUMBER]
,[DISPATCH_RELEASE_PRIORITY]
,[CARTON_TOTE_COUNT]
,[PICK_PACK_METHOD]
,[RELEASED_QTY]
,[SHIP_QTY]
,[MERCHANDISE_STYLE]
,[PICK_WAREHOUSE]
,[PICK_AREA]
,[PICK_ZONE]
,[PICK_AISLE]
,EST_DEL_DATE
FROM #TEMP_FACT;
Currently, since I've added the last_updated column into my FACT TABLE and created a trigger, I don't actually pass any value via the stored procedure for it, so I get an error
An object or column name is missing or empty.
I am not sure as to where I'm supposed to pass any value for the LAST_UPDATED column.
Here is the trigger I've created for updating the last_updated column:
CREATE TRIGGER last_updated
ON dbo.factbp
AFTER UPDATE
AS
UPDATE dbo.factbp
SET last_updated = GETDATE()
FROM Inserted i
WHERE dbo.factbp.id = i.id
The first thing I would try is to create primary keys on the two temp tables #TEMP_CARTON and #TEMP_ORDER and use the intersect command to get the rows that are common to both tables:
select * from #TEMP_CARTON
intersect
SELECT * FROM #TEMP_ORDER
Figured out the answer. I just had to put "null" for the last_updated value during Insert, and then the Trigger took care of adding the timestamp on its own.

Insert overwrite in Hive

I am trying to use Insert overwrite in Hive. Basically I would like to insert overwrite not the complete partition but only a few records in the partition. I am not finding any solution to do it (Insert overwrite in destination table based on a filter on non partition column also).
Is there any way I can achieve it?
Hive is not as Regular RDBMS, If you want to update the record simple do INSERT OVERWRITE TABLE Table_Name...simple change your data in one temporary table or by using WITH clause simply insert overwrite..by using table partioning..it is safe.
QUERY[HIVE]:
WITH TEMP_TABLE AS (SELECT * FROM SOURCE_TABLE_NAME) INSERT OVERWRITE TABLE TARGET_TABLE_NAME SELECT * FROM TEMP_TABLE
Hive is not an RDBMS. What you are trying to achieve with Hive is not recommended. Hive is better suited for batch processing over very large sets of immutable data.
However, from what I could deduce, you are trying to update an existing record in your table. To do so, enable ACID support on the table that needs to be updated and your update queries will start working.
UPDATE <TABLE>
SET <COL1>='Value1',
SET <COL2>='Value2'
WHERE <Some Condition That Only Evaluates To The Rows You Need Updated>