SQL Server SSIS Transaction Handling - sql

My organization is upgrading to SQL Server 2008 from 2000 (yay!) and I am unfamiliar with the inter workings of Integration Services.
I currently manage very large databases that store business process transactions that amount to between 500000 and 1000000 transactions daily. We've had very poor management of the databases in the past and they have thus grown to an unmaintainable size. I'm working to provide daily archival of the databases so that the working databases are more manageable. I wrote several stored procedures to do an archive and subsequently prune the working databases. However, in dabbling with Integration Services, I've found great built in functionality for the job that my SPs currently do.
What I've created are several SSIS packages that perform an export/import. Since I'm only interested in certain data, I use a custom query in the packages that is of the form:
DELETE TransactionTable
OUTPUT
DELETED.*
WHERE (EventTimestamp >=
DATEADD(D, 0, DATEDIFF(D, 0, (SELECT MIN(EventTimestamp) FROM TransactionTable)))
)
AND (EventTimestamp <
DATEADD(HH, 0, (DATEADD(YY, -1, DATEDIFF(D, 1, GETDATE()))))
);
This query grabs the data I'm interested in and deletes it from the working table. Using SSIS, this query produces the output that is placed into the archive table.
My question(s) are:
Since I want to import records into my archive and delete those records from the working database within the same SSIS package to ensure consistency, this query seems to be the way to do it. However, I'm concerned about the structure of the transaction. I'm deleting records from my working database as output to be inserted into my archive database. How does SQL Server handle errors in this case? Is running this package safe? What happens if the output generated by the statement is invalid and an error occurs? Does the statement get rolled back? Will the DELETE only be committed if all of the output was able to be transferred to the archive? If not, how might I be able to achieve a fail-safe condition?

The good news is, you can set the SSIS package to rollback a set of tasks if any failure occurs.
There is a TransactionOption property that is available on pretty much every container/task, including the package itself. You can set it to Required, Supported, and NotSupported.
You can details on each option here: http://msdn.microsoft.com/en-us/library/ms137690.aspx
Obviously play around with this a bit, forcing errors on different steps to see what the result is.

Related

How do I copy data from one Azure database table to a different Azure database table and also convert data types?

I have to copy data from one table to another, the tables are held in two different databases within Azure. I did a quick search for answers to this and whilst a query seems fairly straight forward i.e.
INSERT INTO table1 (make, model, type, serial)
SELECT the_make, the_model, the_type, ref_no
FROM database2.dbo.table2
I encountered issues because I'm using Azure.
Msg 40515, Level 15, State 1, Line 16 Reference to database and/or
server name in 'database2.dbo.table2' is not supported in this version of
SQL Server.
The above issue led me to the Cross-Database Queries articles. My requirements are a little more complicated than some of the scenarios provided and I need some help in making it work.
I also need to convert some columns such as reg_no which is a 'string' to an 'int' and then copy the value to the 'serial' column.
My question is, what the best way to create a script for this that allows me to reference both databases without any errors, copy the data and convert the columns at the same time? I tried the simple way of exporting data and importing it, editing the mappings for the columns, it wasn't that good I found and was causing problems all over the place.
Any guidance is appreciated on this.
You're getting this error because there's no linked server by default. You'll need to add it, in order to access the secondary db server. Here's a link about how to do it:
https://www.sqlshack.com/create-linked-server-azure-sql-database/
In terms of the transformation. It depends on many factors e.g. amount of rows, frequency, etc..
Usually the best alternative is by using an external tool (ETL) such as SSIS / Azure Data Factory because you can schedule it's execution and get the status of each execution.

Executing T-sql query from hard drive

I have T-SQL queries stored on a hard drive: I:\queries\query1.sql and I:\queries\query2.sql.
I usually work in a way that I execute a query from a drive, and then I copy results into Excel, and then I work on it.
My problem here is that query1.sql is already long, and now I would like to extend it by getting a result of query2.sql, and join it with a result of query1.sql.
What I could do is appending a code from query2.sql to query1.sql. But then the query is getting really long and hard to maintain.
I would like to do something like this:
SELECT * FROM ("Result of I:\queries\query1.sql") q1
LEFT JOIN ("Result of I:\queries\query2.sql") q2 ON q1.ID=q2.ID
Is there any way to write a query or stored procedure, which will be again stored on a drive to do this?
Basically, you need to ask your DBA for a database when you are able to store things in the database. This can be on the same system where the data is stored. Or, it could be on a linked system. Gosh, you could run SQL Server locally and store the information and data there.
Then, the queries that you are storing in files should be views in the database. You can then run the queries and store and combine the results locally.
You are essentially recreating database functionality using text files and data files -- going through a lot of effort when SQL Server already supports this functionality.
To expand on Gordon's comment (+1), why are you running scripts off of a drive? Most DBA's I've known would treaten bodily harm over this as executing code that they can't control / troubleshoot / see source code control on brings a whole host of security and supportability issues.
Far better to store this code in a Stored Procedure, which will have a saved query execution plan, can be tracked using various DMV's, and have permissions assigned to it, then your outside Excel doc can just set a connection and execute the SP.

Azure SQL "select" query not showing all rows

I just used the SQLAzureMW (SQL Azure Migration Wizard Tool) to migrate my SQL Server database to Azure SQL. It went off without a hitch - all my tables are there, the website is running fine off it, etc.
Here's what's odd: if I execute a simple SELECT statement against my tables, I get only a few of the rows. I assumed they were missing, but my website is using some of those records as if they're there. So I queried with a WHERE clause and BAM - they showed up. How the... what the... why isn't my select showing me everything? This applies to many of the tables I've tested.
SQL Azure
On-Premise
I gave up on MS SQL Management Studio and am instead using SQL Server Object Explorer from Visual Studio 2012/2013. It functions properly and allows inline editing of data.
Consider this SELECT statement:
SELECT
SvcTimeID,
LoginName,
MeanSeconds,
MedianSeconds,
RequestCount,
StdDevSeconds,
SvcDate,
CAST (TS AS INT) AS TS
FROM dbo.SvcTime
WHERE SvcDate >= #SvcDate
Where the parameter is set:
cmd.Parameters["#SvcDate"].Value = DateTime.UtcNow - new TimeSpan(31, 0, 0, 0);
Execute that statement in an Azure Web Role - brought back, say 24 rows.
Now, insert two new rows; wait at least one minute; execute the statement again. Do the recently inserted rows appear? In my case, they did not. Note: the default value of SvcDate in the database is getutcdate().
Move the SQL Azure database from the web edition to the standard (S2) edition. Rows magically appear.
Here is my theory. The issue you had was not with MS SQL Management Studio but with SQL Azure itself where, under certain circumstances, the same query will return the original rows from a cache someplace and will miss the new rows in the database.
This has blown any remaining confidence I had with Azure.
I was scared at first, but I think this has an explanation:
If you inserted some rows in connection "A" and can't find them in other sessions, maybe you have a uncommited transaction. By default, in SQL Server on premise, your second connections would hung until transaction is commited or rolled back. (Isolation level read committed)
Somehow, using the same isolation level, Azure acts differently. I seems to work in some cases as a snapshot isolation. Because of that, you can read from the table, but results are not updated. Or maybe the lock are set in a different way.
To solve this, check sysprocesses for sessions with open_tran > 0 or just be careful commmiting trans. In the example, running commit in your session "A" should do it.
Good luck!

How do I create and synchronize a combined reporting-only db from two live dbs?

I need to quickly implement a read-only database containing data pulled from two identically structured live databases.
The live dbs are actually company dbs from a Dynamics accounting system so I'm happy for any Dynamics specific advice but this is mostly a SQL question. It's a fairly old version of Dynamics from before Great Plains was acquired by Microsoft. This is on SQL Server 2000.
We have reports and applications which access the Dynamics data. These apps are designed to look at one company db. Now we need to add another. It's appropriate that most of these reports and apps see combined data. They don't really care which company an order or invoice exists in. They only look at a small number of the tables.
It seems to me that the simplest solution is to create a reports only db with combined data. Preferably, we need an efficient way to update this db with changes several times a day.
I'm a developer, not a db expert but here's my plan:
Create the combined reporting db with the required tables initially with the same table structure as the live dbs.
All Dynamics tables seem to have an int identity column called DEX_ROW_ID. I'm not sure what it's used for, (it's not indexed) but that seems like the obvious generic way to uniquely identify rows. On the reporting db I will change it to a normal int (not an identity). I will create a unique index on DEX_ROW_ID in all dbs.
Dynamics does not have timestamps so I will add a timestamp column to tables in the live dbs and a corresponding binary(8) column in the reporting db. I'm assuming and hoping that Dynamics won't be upset by the additional index and column.
Add an int CompanyId column to the reporting db tables and add it to the end of any unique indexes. Most data will be naturally unique even without that. ie, order and invoice numbers etc will be different for the two live dbs. We may need to make some minor changes to the applications but I'm not expecting to do much other than point them to the new reporting db.
Assuming my reporting db is called Reports, the live dbs are Live1 and Live2, the timestamp column is called TS and all dbs are on the same server ... here's my first attempt at an update script for copying the changes in one table called MyTable in Live1 to the reporting db.
USE Reports
CREATE TABLE #Changes
(
ReportId int,
LiveId int
)
/* Collect in a temp table the ids or rows which have been deleted or changed
in the live db L.DEX_ROW_ID will be null if the row has been deleted */
INSERT INTO #Changes
SELECT R.DEX_ROW_ID, L.DEX_ROW_ID
FROM MyTable R LEFT OUTER JOIN Live1.dbo.MyTable L ON L.DEX_ROW_ID = R.DEX_ROW_ID
WHERE R.CompanyId = 1 AND L.DEX_ROW_ID IS NULL OR L.TS <> R.TS
/* Delete rows that have been deleted or changed on the live db
I wonder if using join syntax would run better than the subquery. */
DELETE FROM MyTable
WHERE CompanyId = 1 AND DEX_ROW_ID IN (SELECT ReportId FROM #Changes)
/* Recopy rows that have changed in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable L
WHERE L.DEX_ROW_ID IN (SELECT ReportId FROM #Changes WHERE LiveId IS NOT NULL)
/* Copy the rows that are new in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable
WHERE DEX_ROW_ID > (SELECT MAX(DEX_ROW_ID) FROM MyTable WHERE CompanyId = 1)
Then do the same for the Live2 db. Repeat for every table in Reports. I know I should use a parameter #CompanyId instead of the literal but I can't do that for the live db name some I might generate these dynamically with a C# program or something.
I'm looking for any advice, suggestions or critique on what I'm doing here. I know it won't be atomic. Things could be happening on the live db while this script runs. I think we can live with that. We'll probably do a full copy either nightly or weekly when nothing is happening on the live dbs.
We need to favor performance over elegance or perfection. Some initial testing has the first query with the TS comparisons running at about 30 seconds for the biggest table so I'm optimistic that this is going to work but I'd also like to know if I'm missing something obvious or not seeing the forest for the trees.
We don't really want to deal with log files on the reporting db. Can we just set that to simple recovery model and forget about logs?
Thanks
I think there are a couple open questions here.
Do you need these reports to be near-real-time? Or is this this sort of reporting that could live with daily updates? But assume you need up-to-the-minute data.
Have you considered querying the databases directly and merging the data per-report on the fly? You'll have to do a lot of reporting to duplicate the effort that's going to go into designing, creating, and supporting a real-time merged replicated database.
Thirty seconds is (IMHO) unacceptable for any single query against a production database. There could be any number of tuning-related reasons for taking this long, but it at least means you're going to need serious professional SQL Server optimization resources (i.e. people). And if this is a problem for the queries for reports, it doesn't bode well for the queries to maintain a separate database for reporting.
Tuck into the back of your mind the consideration that, if you need to consolidate to a single database, it's worth considering whether you should make it an OLAP database rather than a mirror. The mirror will be quicker and easier, but the OLAP would be far more flexible and powerful in the long term; and it might be well to go the whole way from the beginning.
The last thing I'd want to do is write a custom update script. Try these bulletproof methods first:
Let's hope your production databases are backed up. Restore those backups every night to the reporting server. You can automate restores with the RESTORE command, which will work with a file on a network server.
Use SQL Server replication to push data from the live servers to the backend.
Schedule a DTS package every night to import the entire production database.
This might seem like brute force. But since you're copying a 2000-era database, brute force cannot be a problem with today's hardware. As an added advantage, these methods can be supported by a sysadmin instead of a developer.
Method 1 has the added added advantage of serving as backup verification. :)

best way in producing a master script for SQL

i want to extract specific database tables & stored procedures into one master script. Do you know any software that can help me do this faster? I've tried using the SQL Database publishing tool, but it's not that efficient since its gathering tables that I didn't select.
In SQL Server 2005, right click on the database, then select Tasks, and then select Generate Scripts.
Generating SQL Scripts in SQL Server 2005
As mentioned in that link, I'm fairly sure you have to generate the DROP and CREATE statements separately.
Try DBSourceTools. http://dbsourcetools.codeplex.com
Its open source, and specifically designed to script databases - tables, views, procs to disk.
It also allows you to select which tables, views, db-objects to script.
I use Redgate SQL compare for this (by comparing to an empty DB), as well as for doing upgrades between all my DB versions (I save a copy of the DB for each released version, and then just do a compare between current and previous to get a change script for that version).
I have found the "Generate Scripts" does a bad job in some cases with dependencies - eg, it will try to create a stored procedure that uses a table before the table is created, causing the script to fail. I'll accept I'm possibly using it wrong, but SQL Compare "just works". The scripts it generates are also enclosed in a transaction -- so if something fails, the whole change is rolled back. You don't end up with a half-populated or half-upgraded database.
Downside is that this is a commercial tool, but IMHO worth the money.