I have staging table in job in SQL server which dumped data in another table and once data is dumped to transaction table I have truncated staging table.
Now, problem occurs if job is failed then data is transaction table is
roll back and all data is placed in staging table. So staging table already consist data and if I rerun the job then it merges all new data with existing data in staging table.
I want my staging table to empty when the job will run.
Can I make use of temp table in this scenario?
This is a common scenario in data warehousing project and the answer is logical instead of technical. You've two approaches to deal with these scenario:
If your data is important then first check if staging is empty or not. If the table is not empty then it means last job failed; in this case instead of inserting into staging do a insert-update operation and then continue with the job steps. If the table is empty then it means that last job was successful then new data will be insert only.
If you can afford data loss from last job then make a habit to truncate table before running your package.
Related
I usually follow this strategy to load fact tables via ETL:
Truncate the staging table
Insert new rows (those added after the previous ETL run) into the fact table and changed rows into the staging table
Perform updates on the fact table based upon the data in the staging table
The challenge I am facing is that the source table rows can be deleted. How do I handle this deletion in ETL? I want the deleted row to be removed from the fact table. I cannot use merge between source oltp db table and target data warehouse table because that puts additional load at each ETL run.
Note: the source table has got a date last modified column, but this is not useful to me because the record disappears from the source table upon deletion.
I have a simple package where I pull down every table from a remote source DB into my local server DB.
Every data flow task is simple source to destination.
The problem I have is occasionally the download will stop and I won't get all the tables, or sometimes tables won't fully pull down all data.
All I want to do is have a table with all table names that I need to pull down.
After each table in my data flow completes I need to update a flag in my new table of table names so that there is a 1 for every table that fully downloads from the source to the destination. Any help would be appreciated.
A simple approach is to add an Execute SQL Task after each Data Flow Task that updates the table containing a LastOperationDate which will have the last execution time.
UPDATE AuditTable
SET LastExecutionDate=GETDATE()
WHERE TableName LIKE #param
The #param is the name of the table in the data destination of the concerned data flow.
I need to perform some calculations using few columns from a table. This database table that gets updated every couple of hours generates duplicates on couple of columns every other day. There is no way tell which one is inserted first which affects my calculations.
Is there a way to copy these rows into a new table automatically as data gets added every couple of hours and perform calculations on the fly? This way whatever comes first will be captured into a new table for a dashboard and for other business use cases.
I thought of creating a stored procedure and using a job scheduler to perform this. But I do not have admin access and can not schedule jobs. Is there another way of doing this efficiently? Much appreciated!
Edit: My request for admin access is being approved.
Another way as to stated in the answers, what you can do is:
Make a temp table.
Make a prod table.
Use stored procedure to copy everything from the temp table into prod table after any load have been done.
Use the same stored procedure to clean the temp table after the load is done.
Don't know if this will work, but this is in general how we are dealing with huge amount of load on a daily basis.
How do I create a trigger in Microsoft SQL server to keep a track of all deleted data of any table existing in the database into a single audit table? I do not want to write trigger for each and every table in the database. There will only be once single audit table which keeps a track of all the deleted data of any table.
For example:
If a data is deleted from a person table, get all the data of that person table and store it in an XML format in an audit table
Please check my solution that I tried to describe at SQL Server Log Tool for Capturing Data Changes
The solution is build on creating trigger on selected tables dynamically to capture data changes (after insert, update, delete) and store those in a general table.
Then a job executes periodically and parses data captured and stored in this general table. After data is parsed it will be easier for humans to understand and see easily which table field is changed and its old and new values
I hope this proposed solution helps you for your own solution,
I am working on a transaction Table, where I am storing tasks as their status. Now I am working on a Dashboard Where I need to have the info of how many tasks are in which status.
I thought of having staging tables corresponding to each status so that it won't affect any transactional activity. Now to push data into the staging table I have two options.
Have a trigger on the transactional table on each status update i shall update the staging table.
Have SQL Job which will run every 5 minute to update the data in staging table
Please suggest which way to go..
Thanks