Hope you can help.
In [DB1].[TableA] I have a list of data all with unique ID's.
The same data is in [DB2].[TableB] but is always updated and new data is inserted here.
I want to be able to insert any new data from [DB2].[TableB] into [DB1].[TableA].
I may want it to run as a SQL job on a schedule to check if there are any news and then insert them into TableA from TableB. Maybe 3 times a day.
Hope someone can help!
Thanks in advance :)
a simple approach would be to create a scheduled job with an update script like;
INSERT INTO TestDB.DB1.TableA (id, firstname)
SELECT tfr.Id, tfr.firstname
FROM TestDB.DB2.TableB tfr
LEFT JOIN TestDB.DB1.TableA tto on tfr.Id = tto.tid
WHERE tto.id is null
Set the tasks schedule to repeat daily and either set the frequency to every x hours or set three schedules at specific times.
If you need something more complex to manage TableB data changes and logging, maybe consider creating an SSIS package to encompass your data flow logic.
Related
I have 3 tables like the below picture. I need to push data in Conditional table when it reach the ExpireDate, the database must move to Unconditional table (all codes in SQL). How can I do this?
Maybe the easiest way is to make a job that runs daily with:
insert into unconditional_table
select
*
from conditional_table a
where a.expiration_date = trunc(sysdate,'dd')
When the condition is not met - no data will be transferred.
You want to move records from a table to another.
When do you want this to happen ?
You can set up a job. This job will check the dates and move the records to another table at the beginnig of every day.
I have a live production table which has more than 1 million records. Now i don't need to tamper anything on this table and would like to create another table which fetches all records from this live production table. I would schedule a job which can take entries from my main table and inserts them to my new table. But i don't want all the records daily; i just need the records added on a daily basis in the production table to get added in my new table.
Please suggest a faster and efficient approach.
You could do this with an INSERT/UPDATE/DELETE trigger to send the INSERTED/UPDATED/DELETED row to the new table, however this feels like reinventing the wheel on the most basic level.
You could just use asynchronous replication rather than hand-rolling it all yourself, this is probably safer, more sustainable and scalable. You could add as many tables as you like to the replicated source.
Copying one million records from an existing table to a new table should not take very long -- and might even be faster than figuring out what records to copy. You could do something like:
truncate table copytable;
insert into copytable
select *
from productiontable;
Note that you should explicitly list the columns when doing the insert.
You can also readily add new records -- assuming you have some form of id on the production table, such as an id assigned by a sequence. Then you can do:
insert into copytable
select *
from productiontable p
where p.id > (select max(id) from copytable);
I am stuck with a problem with different views.
Present Scenario:
I am using SSIS packages to get data from Server A to Server B every 15 minutes.Created 10 packages for 10 different tables and also created 10 staging table for the same. In the DataFlow Task it is selecting data from server A with ID greater last imported ID and dumping them onto a Staging table.(Each table has its own stagin table).After the DataFlow task I am using a MERGE statement to merge records from Staging table to Destination table where ID is NO Matched.
Problem:
This will take care all new records inserted but if once a record is picked by SSIS job and is update at the source I am not able to pick it up again and not able to grab the updated data.
Questions:
How will I be able to achieve the Update with impacting the source database server too much.
Do I use MERGE statement and select 10,000 records every single run?(every 15 minutes)
Do I use LookUp transformation to do the updates
Some tables have more than 2 million records and growing, so what is the best approach for them.
NOTE:
I can truncate tables in destination and reinsert complete data for the first run.
Edit:
The Source has a column 'LAST_UPDATE_DATE' which I can Use in my query.
If I'm understanding your statements correctly it sounds like you're pretty close to your solution. If you currently have a merge statement that includes the insert (where source does not match destination) you should be able to easily include the update statement for the (where source matches destination).
example:
MERGE target_table as destination_table_alias
USING (
SELECT <column_name(s)>
FROM source_table
) AS source_alias
ON
[source_table].[table_identifier] = [destination_table_alias].[table_identifier]
WHEN MATCHED THEN UPDATE
SET [destination_table_alias.column_name1] = mySource.column_name1,
[destination_table_alias.column_name2] = mySource.column_name2
WHEN NOT MATCHED THEN
INSERT
([column_name1],[column_name2])
VALUES([source_alias].[column_name1],mySource.[column_name2])
So, to your points:
Update can be achieved via the 'WHEN MATCHED' logic within the merge statement
If you have the last ID of the table that you're loading, you can include this as a filter on your select statement so that the dataset is incremental.
No lookup is needed with the 'WHEN MATCHED' is utilized.
utilizing a select filter in the select portion of the merge statement.
Hope this helps
Quick Version: I have 4 tables (TableA, TableB, TableC, TableD) identical in design. TableC is a complete History of TableA & B. I want to periodically update TableC with new data from TableA & B. TableD contains a copy of the row most recently transferred from A/B to C. I need to select all records from TablesA/B that are more recent than the record in TableD. Any advice?
Long Version: I'm trying trying to ETL (Extract, Transform, Load) some information from a few different tables into some other tables for quicker, easier reporting... kind of like a data warehouse but within the same database (don't ask).
Basically we want to record and report on system performance. ORACLE have logs for this in tables flows_030100.wwv_flow_activity_log1$ and flows_030100.wwv_flow_activity_log2$ - I believe these tables are filled and cleared every two weeks or something...
I have created a table:
CREATE TABLE dw_log_hist AS
SELECT * FROM flows_030100.wwv_flow_activity_log WHERE 1=0
and filled it with the current information:
INSERT INTO dw_log_hist
SELECT *
FROM flows_030100.wwv_flow_activity_log1$
INSERT INTO dw_log_hist
SELECT *
FROM flows_030100.wwv_flow_activity_log2$
HOWEVER, these log files record EVERY click in the APEX screens. As such, they are continually growing.
I want to periodically update my DW_Log_Hist table with only new information (I am fully aware my history table will grow to be ridiculously sized but I'll deal with that later).
Unfortunately, these tables have no primary key, so I've had to create another table to store marker records that will tell me the latest logs I copied over -_-
CREATE TABLE dw_log_temp AS
SELECT * FROM flows_030100.wwv_flow_activity_log
WHERE time_stamp = (SELECT MAX (time_stamp)
FROM flows_030100.wwv_flow_activity_log2$)
NOW THEN after all that waffle... this is what I need your help with:
Does anyone know whether one of the log tables (wwv_flow_activity_log1$ or wwv_flow_activity_log2$) always has the latest logs? Is it a case of log1$ filling up, log2$ filling then log1$ being overwritten with log2$ so that log2$ always has the latest data? Or do they both fill up and then get filled up again?
Can anyone advise how I would go about populating the DW_Log_Hist table using the DW_Log_Temp marker records?
Conceptually it would be something like:
insert everything into dw_log_hist from activity_log1$ and activity_log2$ where the time_stamp is > (time_stamp of the record in dw_log_temp)
Super sorry for such a long post.
Got the answer :-)
A chap on Reddit helped me realise my over complication...
insert into dw_log_hist
select *
from flows_030100.wwv_flow_activity_log1$
where time_stamp > (select max(time_stamp)
from dw_log_hist)
union
select *
from flows_030100.wwv_flow_activity_log2$
where time_stamp > (select max(time_stamp)
from dw_log_hist)
Hurrah! Always feel like such an idiot when you see the simple answer...
I have 2 tables of identical schemea. I need to move rows older than 90 days (based on a dataTime column present in the table) from table A to table B. Here is the pseudo code for what I want to do
SET #Criteria = getdate()-90
Select * from table A
Where column X<#Criteria
Into table B
--now clean up the records we just moved to table B, in Table A
delete from table A Where column X<#Criteria
My questions are:
What is the most efficient way to do this (will select-in perform well under high volumes)? Table A will have ~180,000,000 rows in it, and will need to move ~4,000,000 rows at a time to table B.
How do I encapsulate this under one transaction so that I will not delete rows from Table A if there was an error inserting them to Table B. I just want to make sure that I don't accidentally delete a row from table A unless I have successfully written it to table B.
Are there any good SQL Server 2005 books that you recommend?
Thanks,
Chris
I think that SSIS is probably the best solution for your needs.
I think you can just use the SSIS tasks like Data Flow task to achieve your needs. There doesnt seem to be any need to create a procedure separately for the logic.
Transactions can be set for any Data Flow task using TransactionOption property. Check out this article as to how to use Transactions in SSIS
Some basic tutorials on SSIS packages and how to create them can be referred to here and here
regarding
How do I encapsulate this under one transaction so that I will not delete rows from Table A if there was an error inserting them to Table B.
you can delete all rows from A that are in B using a join. Then, if the copy to B failed, nothing will be deleted from A.