How can I use transactions that span procedures chained across multiple servers? - sql-server-2005

I'm trying to test a proposition that one of our vendors presented to us for accessing their product database and it regards to queries and transactions that span multiple servers. I've never done this directly on the database before and to be frank, I'm clueless, so I'm trying to mock up a proof that this works at least conceptually.
I've got two SQL Server 2005 servers. Let's for argument's sake call them Server1 and Server2 [hold your applause] each containing a dummy database. The dummy database on Server1 is called Source and that on Server2 is called Destination, just to keep things simple. The databases each hold a single table called Input and Output respectively, so the structure is quasi explained like so:
Server1.Source.dbo.Input
Server2.Destination.dbo.Output
I have a stored procedure on Server2 called WriteDataToOutput that receives a single Varchar argument and writes it's content to the output table.
Now the trickiness starts:
I want to create a stored procedure on Server1.Source that calls the WriteDataToOutput stored procedure defined on Server2, which seems like the simple step.
I want this call to be part of a transaction so that if the procedure that invokes it fails, the entire transaction is is rolled back.
And here endeth my knowledge of what to do. Can anyone point me in the right direction? I tried this on two different databases on the same server, and it worked just fine, leading me to assume that it will work on different servers, the question is, how do I go about doing such a thing? Where do I start?

As others have noted, I agree that a linked server is the best way to go.
Here are a couple of pointers that snagged me the first time I dealt with linked servers:
If the linked server is an instance, make sure you bracket the name. For example [SERVERNAME\INSTANCENAME].
Use an alias for the table or view from the linked server or you will get a "multi-part identifier cannot be bound" error. There is a limit of a 4 part naming convention. For example SERVER.DATABASE.dbo.TABLE.FIELD has five parts and will give an error. However, SELECT linked.FieldName FROM SERVER.DATABASE.dbo.TABLE AS linked will work fine

You will want to link the servers:
http://msdn.microsoft.com/en-us/library/aa213778.aspx

for step 2 you need to have Distributed Transaction Coordinator running, you also need to use SET XACT_ABORT ON to make sure it will all rollback
you also need to enable RPC which is turned off by default in 2005 and up
There is a whole bunch of stuff that can bite you in the neck

MSDN says you can have transactions across linked servers if you use the command BEGIN DISTRIBUTED TRANSACTION.
I remember though that I had problems called a stored procedure on a linked server, but I worked around it, rather than solving it.

Using linked servers, you can run stored procedures on either server within a single transaction using DTC (Distributed Transactino Coordinator). You will definitely want to do some performance analysis. I have found some SPs using links can drastically slow down down database performance, especially if you try to join result sets from each of the two servers.

Set up a linked server, then you should be able to execute selects/inserts/updates across the servers. Something like:
INSERT INTO Server2.Destination.dbo.Output
SELECT * FROM Input
WHERE <Criteria>
This assumes you are running the query from Server1.Source, so you wouldn't need to fully qualify.

Related

Best practice: sending a stored procedure for "SQL Command from Variable" in OLE DB Source?

In a SSIS ETL, I have a query that I need to run on a server/db that does not allow us to create stored procedures.
I would normally use the stored procedure in my variable as the source for my OLE DB source:
However, since we can't put the stored procedure on this server, I was going to store the code for the stored procedure into a variable by executing a SQL statement, retrieving the text from our home database, then use the text stored in this variable as the SQL command for the source:
This way, I can still remotely change the SSIS OLE DB Source object WHERE clause (as long as I don't change the SELECT portion).
I can't imagine that this is very common, so I wanted to get some opinions - is there a better way to do this? I don't want to put all of the code for this SP into the OLE DB Source editor directly because we can't afford to redeploy in case of a WHERE clause update.
You've got the part down that many folks don't do and that's using Variables to drive your package execution. You are further correct in that you can't exactly swap out your columns. To be pedantic, which I am, you can completely change out the query as long as the same metadata is presented.
So, then this question becomes how best to accomplish allowing a package to have a query's filter driven by an external force. Factoring in maintainability, ease of debugging, etc.
My gut reaction is 3 Variables
QueryBase: String. Hardcoded. SELECT * FROM MyTable except of course I'd enumerate my columns
Query: String. EvaluateAsExpression = True Expression: #[User::QueryBase] + #[User::QueryFilter]
QueryFilter: String
So, we use Query in the OLE DB Source much as you have your longer variable name in there. The only downside to this approach, pre SSIS-2012 is the limitation on string length in an expression. It was ... 4k I believe. If you assign a value of 5k characters, it's fine. It's just in the expression language, adding two strings together can't exceed 4k.
I didn't specify what QueryFilter is going to have in it or the magic to get it there. That, I would base on the bigger picture of your environment, usage, etc. but the general concept is that it will eventually turn into WHERE Condition1 IS NOT NULL but maybe in a full reload situation, it becomes an empty string.
So, what are our options for changing the value of QueryFilter
/SET is an optional parameter passed to the invoking process (dtexec.exe) that makes SSIS packages go. If you have a very limited set of choices and aren't interested in building additional infrastructure out to support the parameters, just hard code some examples. Approximately dtexec /file p1.dtsx /set \Package.Variables[User::QueryFilter].Properties[Value];" WHERE Condition1 IS NOT NULL" Save it into .bat files, different sql agent jobs, whatever. Click and run and you're done.
Configuration approach. SSIS offers native ability to use configurations from a SQL Server table, XML, Registry, Parent Package and Environment Variable for 2005 to current edition. The only downside to this approach is that it would not support concurrent execution with different parameters like the first would.
Environment approach. 2012 and 2014, with their new Project Deployment Model, give us the concept of Environments within the SSISDB catalog which is similar to configuration with a SQL Server table but it is done after development is complete and the packages are deployed. It's rather nice as it builds out a history of values used so if someone asks why is the data all wrong, you can write a query to pull back the parameters used and Oh look someone used the initial load filter instead of the daily. Whoopsidaisy. Same concern over concurrent execution and changing values.
Table driven approach. Instead of using the Configuration with SQL Server table backing, you roll your own table and then add into your package an Execute SQL Task to retrieve the filter, Single Row, into our QueryFilter Variable.
Script Task. Use whatever floats your boat to determine what the filter should be.
Message Queue. They have built in a Message Queue Task and might be of use here if you're already doing it. Otherwise, too much effort to manage

Query SQL Linked Server only pulling data from one server

I have four SQL Servers that are named in the following way:
dbs
dbs2
dbs3
dbs4
I have a table that is on dbs3 called table1 in database1. This table does not exist on the other servers. However when I run the query:
select *
from dbs.database1.dbo.table1 (or any of the database servers)
it returns the results as if I queried the existing table on dbs3. It is like the DBMS is ignoring the 4 part nameing in the query and returning the results from table on dbs3 no matter which server I try to designate in the 4 part naming convention. Any ideas what could be going on here. The servers appear in the linked servers list.
If you can make changes without breaking stuff (or if it's already broken enough in your opinion), I recommend recreating your linked servers. If your linked server is another SQL Server, you can do
exec sp_dropserver 'dbs';
exec sp_addlinkedserver 'dbs';
This creates a linked server definition with the default configuration, which is appropriate for most applications (and can still be tweaked afterwards).

SQL Server: is it possible to get data from another SQL server without setting linked server?

I need to do the following query (for example):
SELECT c1.CustomerName FROM Customer as c1
INNER JOIN [ExternalServer].[Database].[dbo].[Customer] as c2
ON c2.RefId = c1.RefId
For some security reason my client doesn't allow me to create a linked server. The user under whom I execute this query has access to both tables. Is it possible to make it work without using linked server? Thanks.
You could use OPENROWSET, which'll require the connection info, username & password...
While I understand that the client believes that having an always-on connection to their data is risky, that's why you lock down the account. OPENROWSET means including the connection info in plain text.
'Linked Server' is a very specific thing -- basically, a permanent connection between servers. I can think of all sorts of reasons not to want that, while at the same time having no problem with folks writing queries that combine data from the two different data sources.
Anyway, depending on your requirement -- if this is just for ad hoc querying, OPENROWSET is good if inside of SQL-Server, or if you want to do this in MS Access, just link to the two tables, and your Access query won't care that one comes from one server, and one comes from another.
Alternatively, with a web or windows front-end, you could indpendently query each table into a data object, and then build a separate query on top of that.
Http Endpoints...
WebServices...
There's a million ways. I wouldn't be so quick to assume, as #Lasse suggests, that any form of 'linking' this data together would make you some kind of rougue data linker.

Local vs Global temp tables - When to use what?

I have a report which on execution connects to the database with my_report_user username. There can be many end-users of the report. And in each execution a new connection to the database will be made with my_report_user (there is no connection pooling)
I have a result set which I think can just be created once (may be on the first run of the report) and other report executions can just reuse that stuff. Basically each report execution should check whether this result set (stored as temp table) exists or not. If it does not exist then create that result set else just reuse whats available.
Should I use local temp tables (#) or global temp tables (##)?
Has anyone tried such stuff and if yes, please let me know what all things should I care about? (Almost simultaneous report runs, etc.)
EDIT: I am using Sql-Server 2005
Neither
If you want to cache result result sets under your own control, then you cannot use temp tables, of any kind. You should use ordinary user tables, stored either in tempdb or even have your own result set cache database.
Temp tables, bot #local and ##shared have a lifetime controlled by the connection(s). If your application disconnect, the temp table is deleted, and this does not work well with what you describe.
The real difficult prolem will be to populate these cached result sets under concurent runs without mixing things up (end up with result sets containing duplicate items from concurent report runs that both believed are the 'first' run).
As a side note SQL Server Reporting Services already does this out-of-the-box. You can cache and share datasets, you can cache and share reports, it already works and was tested for you.
I find #temp tables can be useful in certain scenarios, but not as a best practice. I have yet to find a valid use for global ##temp tables, either in my own work, or in the work of anyone else who has written about them. The only case I can think of is BCP or other external process which needs to build a temporary data store and then retrieve it in some subsequent step. In that case I would prefer to use a permanent table with some kind of key and a background process to handle cleanup.
It sounds like you are getting into an OLTP mode now. Reading up on database warehousing will definitely help you.

SQL Timeouts and SSIS

I've an SSIS package that runs a stored proc for exporting to an excel file. Everything worked like a champ until I needed to a do a bit of rewriting on the stored proc. The proc now takes about 1 minute to run and the exported columns are different, so my problems are the following;
1) SSIS complains when I hit the preview button "No column information returned by command"
2) It times out after about 30 seconds.
What I've done.
Tried to clean up/optimize the query. That helped a bit, but it still is doing some major calculations and it runs just fine in SSMS.
Changed the timeout values to 90 seconds. Didn't seem to help. Maybe someone here can?
Thanks,
Found this little tidbit which helped immensely.
No Column Names
Basically all you need to do is add the following to your SQL query text in SSIS.
SET FMTONLY OFF
SET NOCOUNT ON
Only problem now is it runs slow as molasses :-(
EDIT: It's running just too damn slow.
Changed from using #tempTable to tempTable. Adding in appropriate drop statements. argh...
Although it appears you may have answered part of your own question, you are probably getting the "No column information returned by command" error because the table doesn't exist at the time it tries to validate the metadata. Creating the tables as non-temporary tables resolves this issue.
If you insist on using temporary tables, you can create the temporary tables in the step preceeding the data flow. You would need to create it as a ## table and turn off connection sharing for the connection for this to work, but it is an alternative to creating permanent tables.
A shot in the dark based on something obscure I hit years ago: When you modified the procedure, did you add a call to a second procedure? This might mess up SSIS's ability to determine the returned data set.
As for (2), does the procedure take 30+ or 90+ seconds to run in SSMS? If not, do you know that the query is actually getting into SQL from SSIS? Might be worth firing up SQL Profiler to see what's actually being sent to SQL Server. [Which was the way I found out my obscure factoid.]