I am using SSDT 2017 and I am working on a solution that basically gets a full result set from a query into a variable (1 column only: AccountID), and I need to include the values in that object variable in a query, something like this:
"SELECT * FROM dbo.account WHERE AccountID IN (" + #AccountIDObjectVariable + ")"
I tried with an expression but I get an error, so I am not sure if there's a better way, also I tried a for each loop container logic but since I have millions of record in the object variable I think that's not the best way.
Any idea?
It doesn't work that way. Where "it" is going to be a host of things.
The SSIS data types are primitive types (boolean, date, numbers) or Object. The only supported operations for Object is a null check and enumeration.
SSIS parameterization is only for equality based substitutions. There is no concept of a list data type in SQL so there's no analog in SSIS.
I have millions of record in the object variable
Even if you converted your list to a string and used string concatenation, the next problem you're going to run into is the string length limit of 4000 characters.
What is the way?
Let's reset the problem: You have a non-trivial set of identities from a source system. That set of ids needs to be used as the basis for a subsequent extract.
Is the source of identities and the actual data on the same server
While you can empty the ocean with a teaspoon, it's not the correct tool. Same holds true here. Move the query that identifies the recordset to be extracted into a filter condition for your source.
i.e.
Load dataset into #AccountIDObjectVariable
SELECT
OA.AccountId
FROM
dbo.OutstandingAccount AS OA;
Extract that isn't working
"SELECT * FROM dbo.account WHERE AccountID IN (" + #AccountIDObjectVariable + ")"
is rewritten as
SELECT * FROM dbo.account AS A WHERE EXISTS (SELECT * FROM dbo.OutstandingAccount AS OA WHERE OA.AccountID = A.AccountID);
There are two reasonable approaches for solving this
Pull it all
If the source ids list and the source table are of similar orders of magnitude, it might be easier to just bring it all down and use the account id generating query in a Lookup Task. If AccountID exists, then it's the data you want. Yes you pulled more than you wanted but you likely would have burned more cycles and complexity trying to selectively pull what you wanted.
Push and pull
This approach is going to work for SQL Server and I have no idea about any other database. Well, I suppose Sybase would be the same given database paternity.
Open SSMS and create a global temporary table on the database where dbo.account lives. Do not disconnect from SSMS.
IF OBJECT_ID('tempdb..##SO_66961235') IS NOT NULL
BEGIN
DROP TABLE ##SO_66961235;
END
GO
CREATE TABLE ##SO_66961235
(
AccountID int NOT NULL
);
Modify the Connection manager to set the RetainSameConnection Property to true for the database connection to dbo.account
Execute SQL Task - Make Temp Table
Use the connection to the account database and the above query. This will ensure the table exists for future sessions of SSIS to work.
DataFlow Load IDs
In the dataflow properties, set DelayValidation to True
Use your source query to generate the list of IDs and select the temporary table as the destination. You might need to have a second connection manager to this system running and pointed at tempdb, it's been a long time since I've done this. Same rule about RetainSameConnection will hold true though.
When this data flow completes, then we will have a temporary table on the data source server that we can reference.
Dataflow 2 Get Data
Again, DelayValidation to true.
Source will be a query
SELECT * FROM dbo.account AS A WHERE EXISTS (SELECT * FROM ##SO_66961235 AS OA WHERE OA.AccountID = A.AccountID);
What's with all the delay validation?
When the SSIS package starts, the first thing it does is ensure all the pieces are in place for it to run successfully and not only are the pieces in place, is the shape of the data still the same? A temporary table won't exist when the package starts and the package will fail with VS_NEEDSNEWMETADATA error. Setting DelayValidation tells SSIS that it should not worry about checking until the component actually gets the signal to start before it checks metadata. Since we defined the precursor Execute SQL Task to create the table, the validation should succeed.
I used global temporary tables here. You can use local scoped temporary tables but it makes the already fiddly design process much more so. Were it me, I'd have a package parameter controlling a boolean that uses a global temp table for development sessions and local temp table for actual run-time operations but that's beyond the scope of this question.
Related
Our database is set up so that each of our clients is hosted in a separate schema (the organizational level above a table in Postgres/Redshift, not the database structure definition). We have a table in the public schema that has metadata about our clients. I want to use some of this metadata in a view I am creating.
Say I have 2 tables:
public.clients
name_of_schema_for_client
metadata_of_client
client_name.usage_info
whatever columns this isn't that important
I basically want to get the metadata for the client I'm running my query on and use it later:
SELECT *
FROM client_name.usage_info
INNER JOIN public.clients
ON CURRENT_SCHEMA() = public.clients.name_of_schema_for_client
This is not possible because CURRENT_SCHEMA() is a leader-node function. This function returns an error if it references a user-created table, an STL or STV system table, or an SVV or SVL system view. (see https://docs.aws.amazon.com/redshift/latest/dg/r_CURRENT_SCHEMA.html)
Is there another way to do this? Or am I just barking up the wrong tree?
Your best bet is probably to just manually set the search path within the transaction from whatever source you call this from. See this:
https://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html
let's say you only want to use the table matching your best client:
set search_path to your_best_clients_schema, whatever_other_schemas_you_need_for_this;
Then you can just do:
select * from clients;
Which will try to match to the first clients table available, which by coincidence you just set to your client's schema!
You can manually revert afterwards if need be or just reset the connection to return to default, up to you
here's the situation:
I have an SSRS report that uses an SP as a Dataset. The SP creates a Temp Table, inserts a bunch of data into it, and select's it back out for SSRS to report. Pretty straight forward.
Question:
If multiple users run the report with different parameters selected, will the temp table created by the SP collide in the tempdb and potentially not return the data set expected?
Most likely not. If the temp table is defined as #temp or #temp, then you're safe, as those kind of temp tables can only be accessed by the creating connection, and will only last for the duration of the execution of the stored procedure. If, however, you're using ##temp tables (two "pound" signs), while those tables also only last for as long as the creating stored procedure runs, they are exposed to and accessible by all connections to that SQL instance.
Odds are good that you're not using ##tables, so you are probably safe.
A temp table with a single # is a local temporary table and its scope is limited to the session that created it, so collisions should not be a problem.
I read about temporary tables, global temporary tables and table variables. I understood it but could not imagine a condition when I have to use this. Please elaborate on when I should use the temporary table.
Most common scenario for using temporary tables is from within a stored procedure.
If there is logic inside a stored procedure which involves manipulation of data that cannot be done within a single query, then in such cases, the output of one query / intermediate results can be stored in a temporary table which then participates in further manipulation via joins etc to achieve the final result.
One common scenario in using temporary tables is to store the results of a SELECT INTO statement
The table variable is relatively new (introduced in SQL Server 2005 - as far as i can remember ) can be used instead of the temp table in most cases. Some differences between the two are discussed here
In a lot of cases, especially in OLTP applications, usage of temporary tables within your procedures means that you MAY possibly have business processing logic in your database and might be a consideration for you to re-look your design - especially in case of n tier systems having a separate business layer in their application.
The main difference between the three is a matter of lifetime and scope.
By a global table, I am assuming you mean a standard, run of the mill, table. Tables are used for storing persistent data. They are accessible to all logged in users. Any changes you make are visible to other users and vice versa.
A temporary table exist solely for storing data within a session. The best time to use temporary tables are when you need to store information within SQL server for use over a number of SQL transactions. Like a normal table, you'll create it, interact with it (insert/update/delete) and when you are done, you'll drop it. There are two differences between a table and a temporary table.
The temporary table is only visible to you. Even if someone else creates a temporary table with the same name, no one else will be able to see or affect your temporary table.
The temporary table exists for as long as you are logged in, unless you explicitly drop it. If you log out or are disconnected SQL Server will automatically clean it up for you. This also means the data is not persistent. If you create a temporary table in one session and log out, it will not be there when you log back in.
A table variable works like any variable within SQL Server. This is used for storing data for use in a single transaction. This is a relatively new feature of TSQL and is generally used for passing data between procedures - like passing an array. There are three differences between a table and a table variable.
Like a temporary table, it is only visible to you.
Because it is a variable, it can be passed around between stored procedures.
The temporary table only exists within the current transaction. Once SQL Server finishes a transaction (with the GO or END TRANSACTION statements) or it goes out of scope, it will be deallocated.
I personally avoid using temporary tables and table variables, for a few reasons. First, the syntax for them is Microsoft specific. If your program is going to interact with more than one RDBMS, don't use them. Also, temporary tables and table variables have a tendency to increase the complexity of some SQL queries. If your code can be accomplished using a simpler method, I'd recommend going with simple.
Suppose, I am about to start a project using ASP.NET and SQL Server 2005. I have to design the concurrency requirement for this application. I am planning to add a TimeStamp column in each table. While updating the tables I will check that the TimeStamp column is same, as it was selected.
Will this approach be suffice? Or is there any shortcomings for this approach under any circumstances?
Please advice.
Thanks
Lijo
First of all the way which you describe in your question is in my opinion the best way for ASP.NET application with MS SQL as a database. There is no locking in the database. It is perfect with permanently disconnected clients like web clients.
How one can read from some answers, there is a misunderstanding in the terminology. We all mean using Microsoft SQL Server 2008 or higher to hold the database. If you open in the MS SQL Server 2008 documentation the topic "rowversion (Transact-SQL)" you will find following:
"timestamp is the synonym for the
rowversion data type and is subject to
the behavior of data type synonym." …
"The timestamp syntax is deprecated.
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature."
So timestamp data type is the synonym for the rowversion data type for MS SQL. It holds 64-bit the counter which exists internally in every database and can be seen as ##DBTS. After a modification of one row in one table of the database, the counter will be incremented.
As I read your question I read "TimeStamp" as a column name of the type rowversion data. I personally prefer the name RowUpdateTimeStamp. In AzManDB (see Microsoft Authorization Manager with the Store as DB) I could see such name. Sometimes were used also ChildUpdateTimeStamp to trace hierarchical RowUpdateTimeStamp structures (with respect of triggers).
I implemented this approach in my last project and be very happy. Generally you do following:
Add RowUpdateTimeStamp column to every table of you database with the type rowversion (it will be seen in the Microsoft SQL Management Studio as timestamp, which is the same).
You should construct all you SQL SELECT Queries for sending results to the client so, that you send additional RowVersion value together with the main data. If you have a SELECT with JOINTs, you should send RowVersion of the maximum RowUpdateTimeStamp value from both tables like
SELECT s.Id AS Id
,s.Name AS SoftwareName
,m.Name AS ManufacturerName
,CASE WHEN s.RowUpdateTimeStamp > m.RowUpdateTimeStamp
THEN s.RowUpdateTimeStamp
ELSE m.RowUpdateTimeStamp
END AS RowUpdateTimeStamp
FROM dbo.Software AS s
INNER JOIN dbo.Manufacturer AS m ON s.Manufacturer_Id=m.Id
Or make a data casting like following
SELECT s.Id AS Id
,s.Name AS SoftwareName
,m.Name AS ManufacturerName
,CASE WHEN s.RowUpdateTimeStamp > m.RowUpdateTimeStamp
THEN CAST(s.RowUpdateTimeStamp AS bigint)
ELSE CAST(m.RowUpdateTimeStamp AS bigint)
END AS RowUpdateTimeStamp
FROM dbo.Software AS s
INNER JOIN dbo.Manufacturer AS m ON s.Manufacturer_Id=m.Id
to hold RowUpdateTimeStamp as bigint, which corresponds ulong data type of C#. If you makes OUTER JOINTs or JOINTs from many tables, the construct MAX(RowUpdateTimeStamp) from all tables will be seen a little more complex. Because MS SQL don't support function like MAX(a,b,c,d,e) the corresponding construct could looks like following:
(SELECT MAX(rv)
FROM (SELECT table1.RowUpdateTimeStamp AS rv
UNION ALL SELECT table2.RowUpdateTimeStamp
UNION ALL SELECT table3.RowUpdateTimeStamp
UNION ALL SELECT table4.RowUpdateTimeStamp
UNION ALL SELECT table5.RowUpdateTimeStamp) AS maxrv) AS RowUpdateTimeStamp
All disconnected clients (web clients) receive and hold not only some rows of data, but RowVersion (type ulong) of the data row.
In one try to modify data from the disconnected client, you client should send the RowVersion corresponds to the original data to server. The spSoftwareUpdate stored procedure could look like
CREATE PROCEDURE dbo.spSoftwareUpdate
#Id int,
#SoftwareName varchar(100),
#originalRowUpdateTimeStamp bigint, -- used for optimistic concurrency mechanism
#NewRowUpdateTimeStamp bigint OUTPUT
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
-- ExecuteNonQuery() returns -1, but it is not an error
-- one should test #NewRowUpdateTimeStamp for DBNull
SET NOCOUNT ON;
UPDATE dbo.Software
SET Name = #SoftwareName
WHERE Id = #Id AND RowUpdateTimeStamp <= #originalRowUpdateTimeStamp
SET #NewRowUpdateTimeStamp = (SELECT RowUpdateTimeStamp
FROM dbo.Software
WHERE (##ROWCOUNT > 0) AND (Id = #Id));
END
Code of dbo.spSoftwareDelete stored procedure look like the same. If you don’t switch on NOCOUNT, you can produce DBConcurrencyException automatically generated in a lot on scenarios. Visual Studio gives you possibilities to use optimistic concurrency like "Use optimistic concurrency" checkbox in Advanced Options of the TableAdapter or DataAdapter.
If you look at dbo.spSoftwareUpdate stored procedure carful you will find, that I use RowUpdateTimeStamp <= #originalRowUpdateTimeStamp in WHERE instead of RowUpdateTimeStamp = #originalRowUpdateTimeStamp. I do so because, the value of #originalRowUpdateTimeStamp which has the client typically are constructed as a MAX(RowUpdateTimeStamp) from more as one tables. So it can be that RowUpdateTimeStamp < #originalRowUpdateTimeStamp. Either you should use strict equality = and reproduce here the same complex JOIN statement as you used in SELECT statement or use <= construct like me and stay exact the same safe as before.
By the way, one can construct very good value for ETag based on RowUpdateTimeStamp which can sent in HTTP header to the client together with data. With the ETag you can implement intelligent data caching on the client side.
I can’t write whole code here, but you can find a lot of examples in Internet. I want only repeat one more time that in my opinion usage optimistic concurrency based on rowversion is the best way for the most of ASP.NET scenarios.
In SQL Server a recommended approach for type of situation is to create a column of type 'rowversion' and use that to check if any of the fields in that row have changed.
SQL Server guarantees that if any value in the row changes (or a new row is inserted) it's rowversion column will automatically updated to different value. Letting the database handle this for you is much more reliable than trying to do it yourself.
In your update statements you simply need to add a where clause to check that the rowversion value is the same as it was when you first retrieved the row. If not, someone else has changed the row (ie: it's dirty)
Also, from that page:
The timestamp syntax is deprecated.
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature.
I'm not sure that concurrency should be handled in the database like this. The database itself should be able manage isolation and transactional behavior, but the threading behavior ought to be done in code.
rowversion suggestion is correct I would say but its disappointing to see that timestamp will be deprecated soon. Some of my OLD applications are using this for different reasons then checking concurrency.
I have a temporary table, that isn't going away. I want to see what is in the table to determine what bad data might be in there. How can I view the data in the temporary table?
I can see it in tempdb. I ran
SELECT * FROM tempdb.dbo.sysobjects WHERE Name LIKE '#Return_Records%'
to get the name of the table.
I can see it's columns and its object id in
select c.*
from tempdb.sys.columns c
inner join tempdb.sys.tables t ON c.object_id = t.object_id
where t.name like '#Return_Records%'
How can I get at the data?
By the way, this doesn't work
SELECT * FROM #Return_Records
One way of getting at the data in a low-level and not particularly easy to manipulate manner is to use the DBCC PAGE command as described in a blog post by Paul Randal:
http://blogs.msdn.com/sqlserverstorageengine/archive/2006/06/10/625659.aspx
You should be able to find the fileid and page number of the first page in the object by querying on sysindexes .. the last time I did this was on SQL Server 7.
If the data is in the database, then DBCC page will be able to dump it.
pjjH
SQL Server limits access to Local Temp Tables (#TableName) to the connection that created the table. Global temp tables (##TableName) can be accessible by other connections as long as the connection that created it is still connected.
Even though you can see the table in the table catalog, it is not accessible when trying to do a SELECT. It gives you an "Invalid Object Name" error.
There's no documented way of accessing the data in Local Temp Tables created by other connections. I think you may be out of luck in this case.
This is something that seems like you obviously tried, but since you didn't mention it I though I would mention just in case:
Did you try "SELECT * FROM #Return_Records"?
Like José Basilio says, that's a temporary table belonging to another connection. If it's there for a long time, it must belong to a connection that has been open for a long time. Check Maintenance -> Acitivity Monitor; you can sort by Login Time.
Check if the Login Time, or Last Batch Time, matches with the create date of the temporary table. That can be retrieved with:
select crdate from tempdb.dbo.sysobjects WHERE Name LIKE '#Return_Records%'
You can shoot down suspect connections (right click and Kill Process.) If the table is gone after killing a process, you've found the culprit.
To just remove the table, restart the SQL Server service. You can attach SQL Profiler right after with a filter to start looking for the connection that creates the temporary table.