Local vs Global temp tables - When to use what? - sql

I have a report which on execution connects to the database with my_report_user username. There can be many end-users of the report. And in each execution a new connection to the database will be made with my_report_user (there is no connection pooling)
I have a result set which I think can just be created once (may be on the first run of the report) and other report executions can just reuse that stuff. Basically each report execution should check whether this result set (stored as temp table) exists or not. If it does not exist then create that result set else just reuse whats available.
Should I use local temp tables (#) or global temp tables (##)?
Has anyone tried such stuff and if yes, please let me know what all things should I care about? (Almost simultaneous report runs, etc.)
EDIT: I am using Sql-Server 2005

Neither
If you want to cache result result sets under your own control, then you cannot use temp tables, of any kind. You should use ordinary user tables, stored either in tempdb or even have your own result set cache database.
Temp tables, bot #local and ##shared have a lifetime controlled by the connection(s). If your application disconnect, the temp table is deleted, and this does not work well with what you describe.
The real difficult prolem will be to populate these cached result sets under concurent runs without mixing things up (end up with result sets containing duplicate items from concurent report runs that both believed are the 'first' run).
As a side note SQL Server Reporting Services already does this out-of-the-box. You can cache and share datasets, you can cache and share reports, it already works and was tested for you.

I find #temp tables can be useful in certain scenarios, but not as a best practice. I have yet to find a valid use for global ##temp tables, either in my own work, or in the work of anyone else who has written about them. The only case I can think of is BCP or other external process which needs to build a temporary data store and then retrieve it in some subsequent step. In that case I would prefer to use a permanent table with some kind of key and a background process to handle cleanup.

It sounds like you are getting into an OLTP mode now. Reading up on database warehousing will definitely help you.

Related

sql temp table join between servers

So I have a summary i need to return to the end user application.
It should accept 3 parameters DateType, StartDate, EndDate.
Date Type will determine the date field I use to filter the data.
The way i accomplished this was putting all the IDs of the records for a datetype into a TEMP table and then joining my summary to the list of IDs.
This worked fine when running on the query on the SQL server that houses the data.
However, that is a replicated server, so when I compiled to a stored proc that would be on the server with the rest of the application data, it slowed the query down. IE 2 seconds vs 50 seconds.
I think the cross join from the temp table that is created on the SQL server then joining to the tables on the replciation server, is causing the slow down.
Are there any methods or techniques that I can use to get around this and build this all in one stored procedure?
If I create 3 stored procedures with their own date range, then they are fast again. However, this means maintaining multiple stored procs for the same thing.
First off, if you are running a version of SQL Server older than 2012 SP1, one problem is that users who aren't allowed to run DBCC SHOW_STATISTICS (which is most users who aren't sysadmins, see the "Permissions" section in the documentation) don't get access to statistics on remote tables. This can severely cripple the optimizer's ability to generate a good execution plan. Upgrading SQL Server or granting more permissions can help there.
If your query involves filtering or joining on a character column, make sure the remote server is flagged in the linked server options as "collation compatible". If this option is off, SQL Server can't assume strings can be compared across the servers and it will start pumping entire tables up and down just to make sure the data ends up where the comparison has to be made.
If the execution plan is as good as it gets and it's still not good enough, one general (lame) technique is to transfer all data locally first (SELECT * INTO #localtable FROM remote.db.schema.table), then run the query as a non-distributed query. Obviously, in order for this to work, the remote table cannot be "too big" and in some cases this actually has worse performance, depending on how many rows are involved. But it's always worth considering, because the optimizer does a better job with local tables.
Another approach that avoids pulling tables together across servers is packing up data in parameters to remote stored procedure calls. Entire tables can be passed as XML through an NVARCHAR(MAX), since neither XML columns nor table-valued parameters are supported in distributed queries. The basic idea is the same: avoid the need for the the optimizer to figure out an efficient distributed query. The best approach greatly depends on your data and your query, obviously.

SSAS Multidimensional - Table Value Function as a Query for Partition

#GregGalloway was able to answer the question I should have asked. I am adding a more concise question here, while maintaining the original lengthy text
How do I use a table valued function as the query for a partition, when the function is in separate database from my fact and referenced dimensions?
Overview: I am building a SSAS multidimensional cube that is built off of a single fact table in our application's data warehouse, and want to use the result set from a table valued function as my fact table's partition query. We are using SQL Server (and SSAS) 2014
Condition: For each environment (Dev,Tst,Prd) there are 2 separate databases on the same server, one for the application data warehouse [DW_App], the other for custom objects [DW_Custom]. I cannot create any objects in [DW_App], but have a lot of freedom in [DW_Custom]
Background info: I have not been able to find much information on using a TVF and partitions in this way. My thinking is that it will help streamline future development by giving me a single place to update the SQL if/when I modify the fact table.
So in testing out my crazy idea of using a TVF as the query for my partitions I have run into a bit of a conundrum. I am able to use my TVF when I explicitly state the Database in my FROM clause.
SELECT * FROM [DW_Custom].[dbo].[CubePartition](#StartDate, #EndDate)
However, that will not work, because the cube will be deployed in multiple environments before production, and it needs to point to different DBs for each. So I tried adding a new data source, setting my partition query to point to the new data source, and then remove the database name. IE:
SELECT * FROM [dbo].[CubePartition](#StartDate, #EndDate)
I get an error that
The SQL syntax is not valid. The relational database returned the following error message: Deferred prepare could not be completed. Invalid object name 'dbo.CubePartition'
If I click through this error and the subsequent warnings about the cube not being able to process if I continue I am able to build and deploy the cube. However I cannot process it, because I get an error that one of my dimensions does not exist.
Looking into the query that was generated and it is clear that it is querying my dimensions as well as fact, which do not exist inside of '[DW_Custom]' which explains that error perfectly fine.
So I guess 2 questions:
Is it possible to query another DB (on the same server) from inside of an SSAS partition query?
If not, is there any way I can use a variable as the database name in the query, and update that variable based on the project configuration (Dev,Tst,Prd)
Bonus question: Is the reason that I can not find much about doing it this way because it is an obviously bad idea that I am overlooking, and if so why?
How about creating a second SSAS Data Source pointing to the DW_Custom database (or whatever it's called in the particular environment you're deploying to)? Then when you deploy from Dev to Prod, you need only change that connection string. When you create your partitions, then specify the DW_Custom data source and then specify the query without database name:
SELECT * FROM [dbo].[CubePartition](#StartDate, #EndDate)
As long as the query plan for that table-valued function is efficient compared to a plain SELECT statement, then I don't see a problem with that.

Best practice: sending a stored procedure for "SQL Command from Variable" in OLE DB Source?

In a SSIS ETL, I have a query that I need to run on a server/db that does not allow us to create stored procedures.
I would normally use the stored procedure in my variable as the source for my OLE DB source:
However, since we can't put the stored procedure on this server, I was going to store the code for the stored procedure into a variable by executing a SQL statement, retrieving the text from our home database, then use the text stored in this variable as the SQL command for the source:
This way, I can still remotely change the SSIS OLE DB Source object WHERE clause (as long as I don't change the SELECT portion).
I can't imagine that this is very common, so I wanted to get some opinions - is there a better way to do this? I don't want to put all of the code for this SP into the OLE DB Source editor directly because we can't afford to redeploy in case of a WHERE clause update.
You've got the part down that many folks don't do and that's using Variables to drive your package execution. You are further correct in that you can't exactly swap out your columns. To be pedantic, which I am, you can completely change out the query as long as the same metadata is presented.
So, then this question becomes how best to accomplish allowing a package to have a query's filter driven by an external force. Factoring in maintainability, ease of debugging, etc.
My gut reaction is 3 Variables
QueryBase: String. Hardcoded. SELECT * FROM MyTable except of course I'd enumerate my columns
Query: String. EvaluateAsExpression = True Expression: #[User::QueryBase] + #[User::QueryFilter]
QueryFilter: String
So, we use Query in the OLE DB Source much as you have your longer variable name in there. The only downside to this approach, pre SSIS-2012 is the limitation on string length in an expression. It was ... 4k I believe. If you assign a value of 5k characters, it's fine. It's just in the expression language, adding two strings together can't exceed 4k.
I didn't specify what QueryFilter is going to have in it or the magic to get it there. That, I would base on the bigger picture of your environment, usage, etc. but the general concept is that it will eventually turn into WHERE Condition1 IS NOT NULL but maybe in a full reload situation, it becomes an empty string.
So, what are our options for changing the value of QueryFilter
/SET is an optional parameter passed to the invoking process (dtexec.exe) that makes SSIS packages go. If you have a very limited set of choices and aren't interested in building additional infrastructure out to support the parameters, just hard code some examples. Approximately dtexec /file p1.dtsx /set \Package.Variables[User::QueryFilter].Properties[Value];" WHERE Condition1 IS NOT NULL" Save it into .bat files, different sql agent jobs, whatever. Click and run and you're done.
Configuration approach. SSIS offers native ability to use configurations from a SQL Server table, XML, Registry, Parent Package and Environment Variable for 2005 to current edition. The only downside to this approach is that it would not support concurrent execution with different parameters like the first would.
Environment approach. 2012 and 2014, with their new Project Deployment Model, give us the concept of Environments within the SSISDB catalog which is similar to configuration with a SQL Server table but it is done after development is complete and the packages are deployed. It's rather nice as it builds out a history of values used so if someone asks why is the data all wrong, you can write a query to pull back the parameters used and Oh look someone used the initial load filter instead of the daily. Whoopsidaisy. Same concern over concurrent execution and changing values.
Table driven approach. Instead of using the Configuration with SQL Server table backing, you roll your own table and then add into your package an Execute SQL Task to retrieve the filter, Single Row, into our QueryFilter Variable.
Script Task. Use whatever floats your boat to determine what the filter should be.
Message Queue. They have built in a Message Queue Task and might be of use here if you're already doing it. Otherwise, too much effort to manage

How to overwrite table structure and data from db1 to db2

I am developing a Grails-application which uses several databases, others are read-only and 1 is the app's sort of a "main db". Additionally there are multiple environments: dev, qa, prod. qa is used for release-testing and is identical to prod.
Always before release-testing I need to overwrite the "main" qa-database with "main" prod-database. I don't have other than SQL-user access to the server running MS SQL instance.
What I need is the magic that drops everything in qa-database without dropping the database itself and imports everything from the prod-database. Databases contain a lot of foreign key constraints.
How to achieve the aforementioned?
P.S.
I did this on MySQL but now we've migrated to MS SQL. My MySQL-script goes somewhat like this (pseudo):
SET foreign_key_checks = 0;
-- Drop all tables..
SET foreign_key_checks = 1;
-- Import prod-dump to DB..
You shouldn't do this in straight T-SQL.
You really should use something like SMO Scripting in .NET to export objects in this way. There is NO clean way to do what you are asking in pure SQL code.
There are too many variables to account for if you plan to just build dynamic SQL from system tables, which is the only way to approach this in T-SQL.
I think the the tool "xSQL Data Compare" exactly matches your requirements. You will need "sa" access at least for the qa-DB though.

How can I use transactions that span procedures chained across multiple servers?

I'm trying to test a proposition that one of our vendors presented to us for accessing their product database and it regards to queries and transactions that span multiple servers. I've never done this directly on the database before and to be frank, I'm clueless, so I'm trying to mock up a proof that this works at least conceptually.
I've got two SQL Server 2005 servers. Let's for argument's sake call them Server1 and Server2 [hold your applause] each containing a dummy database. The dummy database on Server1 is called Source and that on Server2 is called Destination, just to keep things simple. The databases each hold a single table called Input and Output respectively, so the structure is quasi explained like so:
Server1.Source.dbo.Input
Server2.Destination.dbo.Output
I have a stored procedure on Server2 called WriteDataToOutput that receives a single Varchar argument and writes it's content to the output table.
Now the trickiness starts:
I want to create a stored procedure on Server1.Source that calls the WriteDataToOutput stored procedure defined on Server2, which seems like the simple step.
I want this call to be part of a transaction so that if the procedure that invokes it fails, the entire transaction is is rolled back.
And here endeth my knowledge of what to do. Can anyone point me in the right direction? I tried this on two different databases on the same server, and it worked just fine, leading me to assume that it will work on different servers, the question is, how do I go about doing such a thing? Where do I start?
As others have noted, I agree that a linked server is the best way to go.
Here are a couple of pointers that snagged me the first time I dealt with linked servers:
If the linked server is an instance, make sure you bracket the name. For example [SERVERNAME\INSTANCENAME].
Use an alias for the table or view from the linked server or you will get a "multi-part identifier cannot be bound" error. There is a limit of a 4 part naming convention. For example SERVER.DATABASE.dbo.TABLE.FIELD has five parts and will give an error. However, SELECT linked.FieldName FROM SERVER.DATABASE.dbo.TABLE AS linked will work fine
You will want to link the servers:
http://msdn.microsoft.com/en-us/library/aa213778.aspx
for step 2 you need to have Distributed Transaction Coordinator running, you also need to use SET XACT_ABORT ON to make sure it will all rollback
you also need to enable RPC which is turned off by default in 2005 and up
There is a whole bunch of stuff that can bite you in the neck
MSDN says you can have transactions across linked servers if you use the command BEGIN DISTRIBUTED TRANSACTION.
I remember though that I had problems called a stored procedure on a linked server, but I worked around it, rather than solving it.
Using linked servers, you can run stored procedures on either server within a single transaction using DTC (Distributed Transactino Coordinator). You will definitely want to do some performance analysis. I have found some SPs using links can drastically slow down down database performance, especially if you try to join result sets from each of the two servers.
Set up a linked server, then you should be able to execute selects/inserts/updates across the servers. Something like:
INSERT INTO Server2.Destination.dbo.Output
SELECT * FROM Input
WHERE <Criteria>
This assumes you are running the query from Server1.Source, so you wouldn't need to fully qualify.