my application has the following procedure.
a database of products (20,000 rows) exists.
our client has 'import' feature where he imports an excel file.
this is implemented by deleting all products table rows, then doing the import - which is a long thing since we programmatically performing calculations on the data.
the obvious problem is if the 'import' action fails (IO stuff), they now have none/partial/curropt data in the products table.
We wish that if the 'import' operation fails, the original data remains.
this is ASP.NET application, written in C#, using SQL Server 2005 and using XSD which we created through the VS2005 design tools.
Start transaction
delete data from table
Insert new data
if no problem happens, commit changes
I would import the data onto a table with the same structure as your products table, and then replace the data on your products table once you're happy the import has been successful.
This will mean that users can carry on using the system while the import is underway, as well as minimizing the down time, while you update the products table.
Using a transaction would be the obvious choice here.
I guess the first thing I would ask is do you really need to clear the entire table? Is there a timestamp or something you could use to limit the amount of data that needs to be refreshed? Could you re-work the logic to use updates instead of all the deletes and inserts? That way your transactions will be smaller.
-Dan
I would go with the transaction approach as outlined above. But the only problem I can see is that you might end up locking the whole table for the entire period the import process is taking place. you might need to think about it. The seperate table approach can be one of the solutions.
Related
I was told that there is a way to do this but I have no idea. I am hoping that someone more educated than I can help me understand the answer...
There is a table that has been imported from an external source via SSIS. The destination in SSIS needs to be updated frequently from the external source. The users are complaining about performance problems in their queries.
How would you update this destination in SSIS to achieve these goals?
Anyone have a clue? I'm "dry"...
If your users are complaining about performance then it is not an SSIS issue. You need to look at what queries are running against the table. Make sure your table has a primary key and appropriate filters based on the columns used to sort and filter the data.
Can you give us a listing of the table definition?
i'll give u some advice then maybe can improve your ssis performance
Use SQL statement rather than instant dropdown interface when you import data from db to your SSIS (using SQL Statement u can import, filter, Convert, and Sort at once)
Import & Filter only Column you needed in SSIS
Some reference say, you can customize your default buffer settings (DefaultBufferSize & DefaultBufferMaxRows). this way will improve your bandwidth data process.
I have a huge schema containing billions of records, I want to purge data older than 13 months from it and maintain it as a backup in such a way that it can be recovered again whenever required.
Which is the best way to do it in SQL - can we create a separate copy of this schema and add a delete trigger on all tables so that when trigger fires, purged data gets inserted to this new schema?
Will there be only one record per delete statement if we use triggers? Or all records will be inserted?
Can we somehow use bulk copy?
I would suggest this is a perfect use case for the Stretch Database feature in SQL Server 2016.
More info: https://msdn.microsoft.com/en-gb/library/dn935011.aspx
The cold data can be moved to the cloud with your given date criteria without any applications or users being aware of it when querying the database. No backups required and very easy to setup.
There is no need for triggers, you can use job running every day, that will put outdated data into archive tables.
The best way I guess is to create a copy of current schema. In main part - delete all that is older then 13 months, in archive part - delete all for last 13 month.
Than create SP (or any SPs) that will collect data - put it into archive and delete it from main table. Put this is into daily running job.
The cleanest and fastest way to do this (with billions of rows) is to create a partitioned table probably based on a date column by month. Moving data in a given partition is a meta operation and is extremely fast (if the partition setup and its function is set up properly.) I have managed 300GB tables using partitioning and it has been very effective. Be careful with the partition function so dates at each edge are handled correctly.
Some of the other proposed solutions involve deleting millions of rows which could take a long, long time to execute. Model the different solutions using profiler and/or extended events to see which is the most efficient.
I agree with the above to not create a trigger. Triggers fire with every insert/update/delete making them very slow.
You may be best served with a data archive stored procedure.
Consider using multiple databases. The current database that has your current data. Then an archive or multiple archive databases where you move your records out from your current database to with some sort of say nightly or monthly stored procedure process that moves the data over.
You can use the exact same schema as your production system.
If the data is already in the database no need for a Bulk Copy. From there you can backup your archive database so it is off the sql server. Restore the database if needed to make the data available again. This is much faster and more manageable than bulk copy.
According to Microsoft's documentation on Stretch DB (found here - https://learn.microsoft.com/en-us/azure/sql-server-stretch-database/), you can't update or delete rows that have been migrated to cold storage or rows that are eligible for migration.
So while Stretch DB does look like a capable technology for archive, the implementation in SQL 2016 does not appear to support archive and purge.
When I am using the above syntax in "Execute row script" step...it is showing success but the temporary table is not getting created. Plz help me out in this.
Yes, the behavior you're seeing is exactly what I would expect. It works fine from the TSQL prompt, throws no error in the transform, but the table is not there after transform completes.
The problem here is the execution model of PDI transforms. When a transform is run, each step gets its own thread of execution. At startup, any step that needs a DB connection is given its own unique connection. After processing finishes, all steps disconnect from the DB. This includes the connection that defined the temp table. Once that happens (the defining connection goes out of scope), the temp table vanishes.
Note, that this means in a transform (as opposed to a Job), you cannot assume a specific order of completion of anything (without Blocking Steps).
We still don't have many specifics about what you're trying to do with this temp table and how you're using it's data, but I suspect you want its contents to persist outside your transform. In that case, you have some options, but a global temp table like this simply won't work.
Options that come to mind:
Convert temp table to a permanent table. This is the simplest
solution; you're basically making a staging table, loading it with a
Table Output step (or whatever), and then reading it with Table
Input steps in other transforms.
Write table contents to a temp file with something like a Text File
Output or Serialze to File step, then reading it back in from the
other transforms.
Store rows in memory. This involves wrapping your transforms in a
Job, and using the Copy Rows to Results and Get Rows from Results steps.
Each of these approaches has its own pros and cons. For example, storing rows in memory will be faster than writing to disk or network, but memory may be limited.
Another step it sounds like you might need depending on what you're doing is the ETL Metadata Injection step. This step allows you in many cases to dynamically move the metadata from one transform to another. See the docs for descriptions of how each of these work.
If you'd like further assistance here, or I've made a wrong assumption, please edit your question and add as much detail as you can.
I am inserting data into a database using millions of insert statements stored in a file. Is it better to insert this row by row or in bulk ? I am not sure what the implications can be.
Any suggestions on the approach ? Right now, I am executing 50K of these statements at a time.
Generally speaking, you're much better off inserting in bulk, provided you know that the inserts won't fail for some reason (i.e. invalid data, etc). If you're going row by row, what you're doing, is opening the data connection, adding the row, closing the data connection. Rinse wash, repeat in your case tens of thousands of times (or more?). It's a huge performance hit as opposed to opening the connection once, dumping all the data at one shot, then closing the connection once. If your data ISN'T a clean set of data, you might be better off going row by row, as the bulk insert won't fail if you have data to be cleaned up.
If you are using SSIS, I would suggest a data flow task as another possible avenue. This will allow you to move data from a flat text file, SQL table or other source and map it into your new table. Performance, I have found, is always pretty good and I use it regularly.
If your table is not created before the insert, what I do is drag an Execute SQL Task function into my process with the table creation query (CREATE TABLE....etc.) and update the properties on the data flow function to delay validation.
As long as my data structure is consistent, this works. Here are a couple screenshots.
You should definitely use the BULK INSERT instead of inserting row by row. The BULK INSERT is the in-process method designed for bringing data from a text file into SQL Server, ant it is the fasted among other approaches described in the The Data Loading Performance Guide online article
The other alternative is to use a batch process that uses set-based processing over a smaller set of records (say 5000 at a time) . This can keep the server from getting totally locked up and is faster than one record at a time.
I am implementing an A/B/View scenario, meaning that the View points to table A, while table B is updated, then a switch occurs and the view points to table B while table A is loaded.
The switch occurs daily. There are millions of rows to update and thousands of users looking at the view. I am on SQL Server 2012.
My questions are:
how do I insert data into a table from another table in the fastest possible way? (within a stored proc)
Is there any way to use BULK INSERT? Or, is using regular insert/select the fastest way to go?
You could to a Select ColA, ColB into DestTable_New From SrcTable. Once DestTable_New is loaded, recreate indexes and constraints.
Then rename DestTable to DestTable_Old and rename DestTable_New to DestTable. Renaming is extremly quick. If something turns out to have gone wrong, you also have a backup of the previous table close by (DestTable_Old).
I did this scenario once where we had to have the system running 24/7 and needed to load tens of millions of rows each day.
I'd be inclined to use SSIS.
Make table A an OLEDB source and table B an OLEDB destination. You will bypass the transaction log so reduce the load on the DB. The only way (I can think of) to do this using T-SQL is to change the recovery model for your entire database, which is far from ideal because it means no transactions are stored, not just the ones for your transfer.
Setting up SSIS Transfer
Create a new project and drag a dataflow task to your design surface
Double click on your dataflow task which will take you through to the Data Flow tab. Then drag and drop an OLE DB source from the "Data flow Sources" menu, and an OLE DB destination from the "Data flow Destinations" menu
Double click on the OLE DB source, set up the connection to your server, choose the table you want to load from and click OK. Drag the green arrow from the OLE DB source to the destination then double click on the destination. Set up your connection manager, destination table name and column mappings and you should be good to go.
OLE DB Source docs on MSDN
OLE DB Destination docs on MSDN
You could do the
SELECT fieldnames
INTO DestinationTable
FROM SourceTable
as a couple answers suggest, that should be as fast as it can get (depending on how many indexes you'd need to recreate, etc).
But I would suggest using synonyms in order to change the pointer from one table to another. They're very transparent and in my opinion, cleaner than updating the view, or renaming tables.
I know the question is old, but I was hunting for an answer to the same question and didn't find anything really helpful. Yes the SSIS approach is a possibility, but the question wanted a stored proc.
To my delight I have discovered (almost) the solution that the original question wanted; you can do it with a CLR SP.
Select the data from TableA into a DataTable and then use the WriteToServer(DataTable dt) method of the SqlBulkCopy class with TableB as the DestinationTableName.
The only slight drawback is that the CLR procedure must use external access in order to use SqlBulkCopy, and does not work with context connection, so you need to fiddle a little bit with permissions and connection strings. But hey! nothing is ever perfect.
INSERT... SELECT... functions fairly similarly to BULK INSERT. You could use SSIS, like #GarethD says, but that might be overly complex if you're just copying rows from table1 to table2.
If you are copying serious quantities of data, keep an eye on the transaction log -- it can bloat up pretty fast when doing huge inserts. One work-around is to "chunkify" the data you are inserting, by looping over an insert statment that processes, say, only 100,000 or 10,000 rows a time (depends on how wide your rows are, i.e. how many MB per pass).
(Just curious, are you doing ALTER VIEW to reset the view? I did something similar once, though we had to have four tables and four views to support past/present/next/swap sets.)
You can simply do like this
select * into A from B Where [criteria]
This shall select the data from B, based on the criteria and shall insert it into A, provided the columns are same or you can specify column names instead of *.