This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Parameterizing an SQL IN clause?
Every now and then I work on a system that allows the user to select multiple items and then perform a bulk action on them. Typically, I resorted to building the SQL at runtime, something like this:
string inClause = String.Join(", ", selectedIds);
string command = "SELECT * FROM Customer WHERE CustomerId IN ({0})";
command = String.Format(command, inClause);
Of course, this style of code is insecure because of SQL injection. I could solve that by putting in parameter placeholders and creating parameters.
Still, I am wondering if there is another approach that I've just not considered. I certainly don't want to execute the command once for each ID.
There are two good approaches:
Build the string with command placeholders (like you said)
Join to the values of a TVP
Burning the IDs into the SQL is not good because it prevents plan caching and opens the potential for injection.
You can build an XML string and then pass it to a stored proc. Executing it would look like:
EXECUTE getLocationTypes '<IDList><ID>1</ID><ID>3</ID></IDList>'
The stored proc would look something like:
create proc [dbo].[getLocationTypes](#locationIds XML)
as
begin
set nocount on
SELECT locationId, typeId
FROM xrefLocationTypes
WHERE locationId
IN (SELECT Item.value('.', 'int' )
FROM #locationIDs.nodes('IDList/ID') AS x(Item))
ORDER BY 1, 2
end
Notice the data type of the parameter is XML. This is a little more complicated than what you are doing, guess you could do it all in a single SQL string.
Related
I am dealing with a specific problem of identifying the dependent db objects for any SSRS RDL.
I have a good understanding of if any dataset have stored procedure as the query in a RDL then I can reference the associated stored procedure and get all the dependent objects (details can be found here: Different Ways to Find SQL Server Object Dependencies)
But I am looking specifically for the datasets with text query or inline query for any rdl. I am able to extract the CommandText from the XML of the rdl but I am not sure how to extract db objects like sp, table, views columns form a command text which is inline query in the rdl.
For example if I extract below query from XML commandText (this is a hypothetical query, names are not standardized in the database like vw_ for views , udf_ for functions):
-----This query serves Report ABC
SELECT DATE
,[amount]
,teamID = (SELECT TeamID FROM Sales.[getSalesPerson](r.date) as s WHERE R.[SalesPersonName] = S.[SalesPersonName])
,[channel]
,[product]
,[Item]
,r.[M_ID]
,Amount
,M.[Type]
FROM dbo.FactTable AS R
LEFT JOIN sp_Channel C ON R.[Channel_ID] = C.[Channel_ID]
LEFT JOIN Marketing.vw_M M ON R.[M_ID] = M.[M_ID]
Is there a way to identify that this query have dependent object as below:
ObjectName ObjectType
------------------------------------------
dbo.FactTable Table
sp_Channel Stored Procedure
Marketing.vw_M View
Sales.[getSalesPerson] Function
It is not easy to extract object names from an SQL command since they may be written in different ways (with/without schemas, databases name included ...)
But there are many option to extract objects from an SQL query that you can try:
Using Regular expressions, As example: You have to search for the words located after the following keywords:
TRUNCATE TABLE
FROM
UPDATE
JOIN
The following code is a C# example:
Regex regex = new Regex(#"\bJOIN\s+(?<Retrieve>[a-zA-Z\._\d\[\]]+)\b|\bFROM\s+(?<Retrieve>[a-zA-Z\._\d\[\]]+)\b|\bUPDATE\s+(?<Update>[a-zA-Z\._\d]+)\b|\bINSERT\s+(?:\bINTO\b)?\s+(?<Insert>[a-zA-Z\._\d]+)\b|\bTRUNCATE\s+TABLE\s+(?<Delete>[a-zA-Z\._\d]+)\b|\bDELETE\s+(?:\bFROM\b)?\s+(?<Delete>[a-zA-Z\._\d]+)\b");
var obj = regex.Matches(sql);
foreach(Match m in obj)
{
Console.WriteLine(m.ToString().Substring(m.ToString().IndexOf(" ")).Trim());
}
Output
Then you have to clean and join the result with the sys.objects tables from the SQL Server database.
Using a SQL parser, as example:
SQL Parser
SQL Parser - Code Project
You can refer to the following very helpful links for additional information:
Regular expression to find all table names in a query
Parsing SQL code in C#
If your reports are connecting to SQLServer and you have access you could try to get the execution plan with SET SHOWPLAN_XML ON and parse it.
Relevant thread for the parsing:extracting-data-from-sql-servers-xml-execution-plan
I am using SSIS in VS 2013.
I need to get a list of IDs from 1 database, and with that list of IDs, I want to query another database, ie SELECT ... from MySecondDB WHERE ID IN ({list of IDs from MyFirstDB}).
There is 3 Methods to achieve this:
1st method - Using Lookup Transformation
First you have to add a Lookup Transformation like #TheEsisia answered but there are more requirements:
In the Lookup you Have to write the query that contains the ID list (ex: SELECT ID From MyFirstDB WHERE ...)
At least you have to select one column from the lookup table
These will not filter rows , but this will add values from the second table
To filter rows WHERE ID IN ({list of IDs from MyFirstDB}) you have to do some work in the look up error output Error case there are 2 ways:
set Error handling to Ignore Row so the added columns (from lookup) values will be null , so you have to add a Conditional split that filter rows having values equal NULL.
Assuming that you have chosen col1 as lookup column so you have to use a similar expression
ISNULL([col1]) == False
Or you can set Error handling to Redirect Row, so all rows will be sent to the error output row, which may not be used, so data will be filtered
The disadvantage of this method is that all data is loaded and filtered during execution.
Also if working on network filtering is done on local machine (2nd method on server) after all data is loaded is memory.
2nd method - Using Script Task
To avoid loading all data, you can do a workaround, You can achieve this using a Script Task: (answer writen in VB.NET)
Assuming that the connection manager name is TestAdo and "Select [ID] FROM dbo.MyTable" is the query to get the list of id's , and User::MyVariableList is the variable you want to store the list of id's
Note: This code will read the connection from the connection manager
Public Sub Main()
Dim lst As New Collections.Generic.List(Of String)
Dim myADONETConnection As SqlClient.SqlConnection
myADONETConnection = _
DirectCast(Dts.Connections("TestAdo").AcquireConnection(Dts.Transaction), _
SqlClient.SqlConnection)
If myADONETConnection.State = ConnectionState.Closed Then
myADONETConnection.Open()
End If
Dim myADONETCommand As New SqlClient.SqlCommand("Select [ID] FROM dbo.MyTable", myADONETConnection)
Dim dr As SqlClient.SqlDataReader
dr = myADONETCommand.ExecuteReader
While dr.Read
lst.Add(dr(0).ToString)
End While
Dts.Variables.Item("User::MyVariableList").Value = "SELECT ... FROM ... WHERE ID IN(" & String.Join(",", lst) & ")"
Dts.TaskResult = ScriptResults.Success
End Sub
And the User::MyVariableList should be used as source (Sql command in a variable)
3rd method - Using Execute Sql Task
Similar to the second method but this will build the IN clause using an Execute SQL Task then using the whole query as OLEDB Source,
Just add an Execute SQL Task before the DataFlow Task
Set ResultSet property to single
Select User::MyVariableList as Result Set
Use the following SQL command
DECLARE #str AS VARCHAR(4000)
SET #str = ''
SELECT #str = #str + CAST([ID] AS VARCHAR(255)) + ','
FROM dbo.MyTable
SET #str = 'SELECT * FROM MySecondDB WHERE ID IN (' + SUBSTRING(#str,1,LEN(#str) - 1) + ')'
SELECT #str
If the column has string data type you should add quotation before and after values as below:
SELECT #str = #str + '''' + CAST([ID] AS VARCHAR(255)) + ''','
FROM dbo.MyTable
Make sure that you have set the DataFlow Task Delay Validation property to True
This is a classic case for using LookUp Transformation. First, use a OLE DB Source to get data from the first database. Then, use a LookUp Transformation to filter this data-set based on the ID values from the second data-set. Here is the steps for using a LookUp Transformation:
In the General tab, select Full Cash, OLE DB Connection Manager and Redirect rows to no match output as shown in the following picture. Notice that using Full Cash provides great performance for your package.
General Setting
In the Connection tab, use OLE DB Connection Manager to connect to your second server. Then, you can either directly select the data-set with ID values or (as is shown in the picture below) you can use SQL code to select the IDs from the filtering data-set.
Connection:
Go to Columns tab and select ID columns from the both datasets. For each record from your first data-set, it will check to see if its ID is in the Available LookUp Column. If it is, it will go to the Matching output, else to No Matching output.
Match ID columns:
Click on OK to close the LookUp. Then you need to select the LookUp Match Output.
Match Output:
The "best" answer depends on data volumes and source systems involved.
Many of the other answers propose building out a list of values based on clever concatenation within SQL Server. That doesn't work so well if the referenced system is Oracle, MySQL, DB2, Informix, PostGres, etc. There may be an equivalent concept but there might not be.
For best performance, you need to filter against the second db before any of those rows ever hit the data flow. That means adding a filtering condition, as the others have suggested, to your source query. The challenge with this approach is that your query is going to be limited by some practical bounds that I don't remember. Ten, one hundred, a thousand values in your where clause is probably fine. A lakh, a million - probably not so much.
In the cases where you have large volumes of values to filter against the source table, it can make sense to create a table on that server and truncate and reload that table (execute sql task + data flow). This allows you to have all of the data local and then you can index the filter table and let the database engine do what it's really good at.
But, you say the source database is some custom solution that you can't make tables in. You can look at the above approach with temporary tables and within SSIS you just need to mark the connection as singleton/persisted (TODO: look this up). I don't much care for temporary tables with SSIS as debugging them is a nightmare I'd not wish upon my mortal enemy.
If you're still reading, we've identified why filtering in the source system might not be "doable", even if it will provide the best performance.
Now we're stuck with purely SSIS solutions. To get the best performance, do not select the table name in the drop down - unless you absolutely need every column. Also, pay attention to your data types. Pulling LOB (XML, text, image (n)varchar(max), varbinary(max)) into the dataflow is a recipe for bad performance.
The default suggestion is to use a Lookup Component to filter the data within the data flow. As long as your source system supports and OLE DB provider (or you can coerce the data into a Cache Connection Manager)
If you can't use a Lookup component for some reason, then you can explicitly sort your data in your source systems, mark your source components as such, and then use a Merge Join of type Inner Join in the data flow to only bring in matched data.
However, be aware that sorts in source systems are going to be sorted according to native rules. I ran into a situation where SQL Server was sorting based on the default ASCII sort and my DB2 instance, running on zOS, provided an EBCDIC sort. Which was great when my domain was only integers but went to hell in a handbasket when the keys became alphanumeric (AAA, A2B, and AZZ will sort differently based on this).
Finally, excluding the final paragraph, the above assumes you have integers. If you're performing string matching, you get an extra level of ugliness because different components may or may not perform a case sensitive match (sorting with case sensitive systems can also be a factor).
I would first create a String variable e.g. SQL_Select, at the Scope of the Package. Then I would assign that a value using an Execute SQL Task against the 1st database. The ResultSet property on the General page should be set to Single row. Add an entry to the Result Set tab to assign it to your Variable.
The SQL Statement used needs to be designed to return the required SELECT statement for your 2nd database, in a single row of text. An example is shown below:
SELECT
'SELECT * from MySecondDB WHERE ID IN ( '
+ STUFF ( (
SELECT TOP 5
' , ''' + [name] + ''''
FROM dbo.spt_values
FOR XML PATH(''), TYPE).value('(./text())[1]', 'VARCHAR(4000)'
) , 1 , 3, '' )
+ ' ) '
AS SQL_Select
Remove the TOP 5 and replace [name] and dbo.spt_values with your column and table names.
Then you can use the variable SQL_Select in a downstream task e.g. an OLE DB Source against database 2. OLE DB Sources and OLE DB Command Tasks both let you specify a Variable as the SQL Statement source.
You could add a LinkedServer between the two servers. The SQL command would be something like this:
EXEC sp_addlinkedserver #server='SRV' --or any name you want
EXEC sp_addlinkedsrvlogin 'SRV', 'false', null, 'username', 'password'
SELECT * FROM SRV.CatalogNameInSecondDB.dbo.SecondDBTableName s
INNER JOIN FirstDBTableName f on s.ID = f.ID
WHERE f.ID IN (list of values)
EXEC sp_dropserver 'SRV', 'droplogins'
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
SQL : in clause in storedprocedure:how to pass values
I'm using MS SQL Server 2005, and trying to basically script a 2-step process:
Query a table for a list of IDs matching certain criteria
Update a field in that table, where the ID is in the list of IDs returned by the first
With the catch being that steps 1 and 2 might be separated by a considerable time delay and executed in different sessions. Essential the list of IDs used in #2 is historical data: the values which #1 returned at a past point in time.
What I've attempted to do is write all of IDs from #1 into a varchar(8000) in "##, ##, ##, ##," format (this part is working great), and then use that string like:
UPDATE table SET field=newValue WHERE (id IN (#varcharOfCommaSeparatedIDs))
But this is giving me a syntax error, stating that it cannot convert that varchar value into whatever is needed (the error message is being truncated)
Is there a way to do this without putting the entire SQL command into a string and executing that (using EXEC or sp_executesql)? After years of avoiding injection attacks I have a somewhat instinctive (and perhaps irrational) aversion to "dynamic SQL"
If you're passing the values around between SP's on the SQL Server, I highly recommend storing the values in tables...
- Temp Tables (#mytable)
- Table Variables (#table)
- Real Tables
In SQL Server 2008 onwards you can have table valued input parameters...
If you're passing the values in from an app, the dread comma-separated-string is indeed useful. There are many answers on SO that give Table Valued Functions for turning a string into a table of ids, read to be joined on.
SELECT
*
FROM
foo
INNER JOIN
dbo.bar(#mystring) AS bar
ON foo.id = bar.id
Just write it out to a table.
IF EXISTS (SELECT 1 FROM Database.dbo.MyHoldingTable)
DROP TABLE Database.dbo.MyHoldingTable
SELECT <fields>
INTO Database.dbo.MyHoldingTable
FROM <other table>
WHERE <conditions>
Then, later:
UPDATE OtherTable
Set Column=NewValue
WHERE ID IN (SELECT id FROM Database.dbo.MyHoldingTable)
Also note you could also use an INNER JOIN on your table instead of a IN clause if you prefer.
I am currently writing a VBA-based Excel add-in that's heavily based on a Jet database backend (I use the Office 2003 suite -- the problem would be the same with a more recent version of Office anyway).
During the initialization of my app, I create stored procedures that are defined in a text file. Those procedures are called by my app when needed.
Let me take a simple example to describe my issue: suppose that my app allows end-users to select the identifiers of orders for which they'd like details. Here's the table definition:
Table tblOrders: OrderID LONG, OrderDate DATE, (other fields)
The end-user may select one or more OrderIDs, displayed in a form - s/he just has to tick the checkbox of the relevant OrderIDs for which s/he'd like details (OrderDate, etc).
Because I don't know in advance how many OrderID s/he will select, I could dynamically create the SQL query in the VBA code by cascading WHERE clauses based on the choices made on the form:
SELECT * FROM tblOrders WHERE OrderID = 1 OR OrderID = 2 OR OrderID = 3
or, much simpler, by using the IN keyword:
SELECT * FROM tblOrders WHERE OrderID IN (1,2,3)
Now if I turn this simple query into a stored procedure so that I can dynamically pass list of OrderIDs I want to be displayed, how should I do? I already tried things like:
CREATE PROCEDURE spTest (#OrderList varchar) AS
SELECT * FROM tblOrders WHERE OrderID IN (#OrderList)
But this does not work (I was expecting that), because #OrderList is interpreted as a string (e.g. "1,2,3") and not as a list of long values. (I adapted from code found here: Passing a list/array to SQL Server stored procedure)
I'd like to avoid dealing with this issue via pure VBA code (i.e. dynamically assigning list of values to a query that is hardcoded in my application) as much as possible. I'd understand if ever this is not possible.
Any clue?
You can create the query-statement string dynamically. In SQL Server you can have a function whose return value is a TABLE, and invoke that function inline as if it were a table. Or in JET you could also create a kludge -- a temporary table (or persistent table that serves the function of a temporary table) that contains the values in your in-list, one per row, and join on that table. The query would thus be a two-step process: 1) populate temp table with INLIST values, then 2) execute the query joining on the temp table.
MYTEMPTABLE
autoincrementing id
QueryID [some value to identify the current query, perhaps a GUID]
myvalue one of the values in your in-list, string
select * from foo
inner join MYTEMPTABLE on foo.column = MYTEMPTABLE.myvalue and MYTEMPTABLE.QueryId = ?
[cannot recall if JET allows ANDs in INNER JOIN as SQL Server does --
if not, adjust syntax accordingly]
instead of
select * from foo where foo.column IN (... )
In this way you could have the same table handle multiple queries concurrently, because each query would have a unique identifier. You could delete the in-list rows after you're finished with them:
DELETE FROM MYTEMPTABLE where QueryID = ?
P.S. There would be several ways of handling data type issues for the join. You could cast the string value in MYTEMPTABLE as required, or you could have multiple columns in MYTEMPTABLE of varying datatypes, inserting into and joining on the correct column:
MYTEMPTABLE
id
queryid
mytextvalue
myintvalue
mymoneyvalue
etc
I have a query that looks like this:
SELECT last_name,
first_name,
middle_initial
FROM names
WHERE last_name IN ('smith', 'jones', 'brown')
I need to be able to parameterize the list in the IN clause to write it as a JDBC PreparedStatement. This list could contain any number of names in it.
Is the correct way to do this:
SELECT last_name,
first_name,
middle_initial
FROM names
WHERE last_name IN (?)
and then build a list of parameters? Or is there a better (more correct) way to do that?
In short, you can't out of the box. However, with Spring you can do what you want. See How to generate a dynamic "in (...)" sql list through Spring JdbcTemplate?
Standard SQL doesn't allow the IN clause to be parameterized into a single variable -- only dynamic SQL, the SQL query being constructed as a string prior to execution with the comma separated list of values is supported.
I'm going to research this topic, as well. I've been guilty of writing similar code and never felt 100% comfortable with it. I suppose I'd like to find something on "variable SQL parameter lists".
In code, using hibernate, and given a String of comma-delimited order Ids, I've used:
Session s = getSession();
Criteria crit = s.createCriteria(this.getOrderListingClass());
crit.add(Expression.sql(String.format("{alias}.orderId in (%s)", orderIds)));
crit.add(Expression.eq("status", OrderInfo.Order_STATUS_UNFILLED));
orders = crit.list();
Whereas orderId is really part of a "SELECT x FROM y WHERE IN (%s)".
I did run the orderIds String through a validator prior to passing it to hibernate - being fearful of injections, etc.
Something else that I've been meaning to do is check the limit on SQL parameters and number of characters in the query. I seem to recall hitting a limit somewhere around 2000+ (with MS SQL). That's something to consider if you go with this approach.
I think this is kludgy... to be passing off that many Ids in a Where-clause, but it's a section of code that needs refactoring. Thankfully, the use case has only seen a handful of Ids queried at any one time.
You could also construct your query as a stored procedure that takes the parameterized list as a varchar. For example, in sql server:
CREATE PROCEDURE dbo.[procedure_name]
#IN_LIST VARCHAR(MAX)
AS
BEGIN
DECLARE #SQL VARCHAR(MAX)
SET #SQL = '
SELECT last_name,
first_name,
middle_initial
FROM names
WHERE last_name IN (' + #IN_LIST + ')'
EXECUTE(#SQL)
END
Just make sure your #IN_LIST is formatted as a string that includes the single quotes and commas. For example in java:
String inList = "'smith','jones','brown'";
If You use MS SQL Server, try reshape your TSQL to use UDF, Maybe this my post can help You