Query a database based on result of query from another database - sql

I am using SSIS in VS 2013.
I need to get a list of IDs from 1 database, and with that list of IDs, I want to query another database, ie SELECT ... from MySecondDB WHERE ID IN ({list of IDs from MyFirstDB}).

There is 3 Methods to achieve this:
1st method - Using Lookup Transformation
First you have to add a Lookup Transformation like #TheEsisia answered but there are more requirements:
In the Lookup you Have to write the query that contains the ID list (ex: SELECT ID From MyFirstDB WHERE ...)
At least you have to select one column from the lookup table
These will not filter rows , but this will add values from the second table
To filter rows WHERE ID IN ({list of IDs from MyFirstDB}) you have to do some work in the look up error output Error case there are 2 ways:
set Error handling to Ignore Row so the added columns (from lookup) values will be null , so you have to add a Conditional split that filter rows having values equal NULL.
Assuming that you have chosen col1 as lookup column so you have to use a similar expression
ISNULL([col1]) == False
Or you can set Error handling to Redirect Row, so all rows will be sent to the error output row, which may not be used, so data will be filtered
The disadvantage of this method is that all data is loaded and filtered during execution.
Also if working on network filtering is done on local machine (2nd method on server) after all data is loaded is memory.
2nd method - Using Script Task
To avoid loading all data, you can do a workaround, You can achieve this using a Script Task: (answer writen in VB.NET)
Assuming that the connection manager name is TestAdo and "Select [ID] FROM dbo.MyTable" is the query to get the list of id's , and User::MyVariableList is the variable you want to store the list of id's
Note: This code will read the connection from the connection manager
Public Sub Main()
Dim lst As New Collections.Generic.List(Of String)
Dim myADONETConnection As SqlClient.SqlConnection
myADONETConnection = _
DirectCast(Dts.Connections("TestAdo").AcquireConnection(Dts.Transaction), _
SqlClient.SqlConnection)
If myADONETConnection.State = ConnectionState.Closed Then
myADONETConnection.Open()
End If
Dim myADONETCommand As New SqlClient.SqlCommand("Select [ID] FROM dbo.MyTable", myADONETConnection)
Dim dr As SqlClient.SqlDataReader
dr = myADONETCommand.ExecuteReader
While dr.Read
lst.Add(dr(0).ToString)
End While
Dts.Variables.Item("User::MyVariableList").Value = "SELECT ... FROM ... WHERE ID IN(" & String.Join(",", lst) & ")"
Dts.TaskResult = ScriptResults.Success
End Sub
And the User::MyVariableList should be used as source (Sql command in a variable)
3rd method - Using Execute Sql Task
Similar to the second method but this will build the IN clause using an Execute SQL Task then using the whole query as OLEDB Source,
Just add an Execute SQL Task before the DataFlow Task
Set ResultSet property to single
Select User::MyVariableList as Result Set
Use the following SQL command
DECLARE #str AS VARCHAR(4000)
SET #str = ''
SELECT #str = #str + CAST([ID] AS VARCHAR(255)) + ','
FROM dbo.MyTable
SET #str = 'SELECT * FROM MySecondDB WHERE ID IN (' + SUBSTRING(#str,1,LEN(#str) - 1) + ')'
SELECT #str
If the column has string data type you should add quotation before and after values as below:
SELECT #str = #str + '''' + CAST([ID] AS VARCHAR(255)) + ''','
FROM dbo.MyTable
Make sure that you have set the DataFlow Task Delay Validation property to True

This is a classic case for using LookUp Transformation. First, use a OLE DB Source to get data from the first database. Then, use a LookUp Transformation to filter this data-set based on the ID values from the second data-set. Here is the steps for using a LookUp Transformation:
In the General tab, select Full Cash, OLE DB Connection Manager and Redirect rows to no match output as shown in the following picture. Notice that using Full Cash provides great performance for your package.
General Setting
In the Connection tab, use OLE DB Connection Manager to connect to your second server. Then, you can either directly select the data-set with ID values or (as is shown in the picture below) you can use SQL code to select the IDs from the filtering data-set.
Connection:
Go to Columns tab and select ID columns from the both datasets. For each record from your first data-set, it will check to see if its ID is in the Available LookUp Column. If it is, it will go to the Matching output, else to No Matching output.
Match ID columns:
Click on OK to close the LookUp. Then you need to select the LookUp Match Output.
Match Output:

The "best" answer depends on data volumes and source systems involved.
Many of the other answers propose building out a list of values based on clever concatenation within SQL Server. That doesn't work so well if the referenced system is Oracle, MySQL, DB2, Informix, PostGres, etc. There may be an equivalent concept but there might not be.
For best performance, you need to filter against the second db before any of those rows ever hit the data flow. That means adding a filtering condition, as the others have suggested, to your source query. The challenge with this approach is that your query is going to be limited by some practical bounds that I don't remember. Ten, one hundred, a thousand values in your where clause is probably fine. A lakh, a million - probably not so much.
In the cases where you have large volumes of values to filter against the source table, it can make sense to create a table on that server and truncate and reload that table (execute sql task + data flow). This allows you to have all of the data local and then you can index the filter table and let the database engine do what it's really good at.
But, you say the source database is some custom solution that you can't make tables in. You can look at the above approach with temporary tables and within SSIS you just need to mark the connection as singleton/persisted (TODO: look this up). I don't much care for temporary tables with SSIS as debugging them is a nightmare I'd not wish upon my mortal enemy.
If you're still reading, we've identified why filtering in the source system might not be "doable", even if it will provide the best performance.
Now we're stuck with purely SSIS solutions. To get the best performance, do not select the table name in the drop down - unless you absolutely need every column. Also, pay attention to your data types. Pulling LOB (XML, text, image (n)varchar(max), varbinary(max)) into the dataflow is a recipe for bad performance.
The default suggestion is to use a Lookup Component to filter the data within the data flow. As long as your source system supports and OLE DB provider (or you can coerce the data into a Cache Connection Manager)
If you can't use a Lookup component for some reason, then you can explicitly sort your data in your source systems, mark your source components as such, and then use a Merge Join of type Inner Join in the data flow to only bring in matched data.
However, be aware that sorts in source systems are going to be sorted according to native rules. I ran into a situation where SQL Server was sorting based on the default ASCII sort and my DB2 instance, running on zOS, provided an EBCDIC sort. Which was great when my domain was only integers but went to hell in a handbasket when the keys became alphanumeric (AAA, A2B, and AZZ will sort differently based on this).
Finally, excluding the final paragraph, the above assumes you have integers. If you're performing string matching, you get an extra level of ugliness because different components may or may not perform a case sensitive match (sorting with case sensitive systems can also be a factor).

I would first create a String variable e.g. SQL_Select, at the Scope of the Package. Then I would assign that a value using an Execute SQL Task against the 1st database. The ResultSet property on the General page should be set to Single row. Add an entry to the Result Set tab to assign it to your Variable.
The SQL Statement used needs to be designed to return the required SELECT statement for your 2nd database, in a single row of text. An example is shown below:
SELECT
'SELECT * from MySecondDB WHERE ID IN ( '
+ STUFF ( (
SELECT TOP 5
' , ''' + [name] + ''''
FROM dbo.spt_values
FOR XML PATH(''), TYPE).value('(./text())[1]', 'VARCHAR(4000)'
) , 1 , 3, '' )
+ ' ) '
AS SQL_Select
Remove the TOP 5 and replace [name] and dbo.spt_values with your column and table names.
Then you can use the variable SQL_Select in a downstream task e.g. an OLE DB Source against database 2. OLE DB Sources and OLE DB Command Tasks both let you specify a Variable as the SQL Statement source.

You could add a LinkedServer between the two servers. The SQL command would be something like this:
EXEC sp_addlinkedserver #server='SRV' --or any name you want
EXEC sp_addlinkedsrvlogin 'SRV', 'false', null, 'username', 'password'
SELECT * FROM SRV.CatalogNameInSecondDB.dbo.SecondDBTableName s
INNER JOIN FirstDBTableName f on s.ID = f.ID
WHERE f.ID IN (list of values)
EXEC sp_dropserver 'SRV', 'droplogins'

Related

How to use a table's content for querying other tables in BIgQuery

My team and I are using a query on a daily basis to receive specific results from a large dataset. This query is constantly updated with different terms that I would like to receive from the dataset.
To make this job more scaleable, I built a table of arrays, each containing the terms and conditions for the query. That way the query can lean on the table, and changes that I make in the table will affect the query without the need to change it.
The thing is - I can't seem to find a way to reference the table in the actual query without selecting it. I want to use the content of the table as a WHERE condition. for example:
table1:
terms
[term1, term2, term3]
query:
select * from dataset
where dataset.collumn like '%term1'
or dataset.collumn like '%term2'
or dataset.collumn like '%term3'
etc.
If you have any ideas please let me know (if the solution involves Python or JS this is also great)
thanks!
You can "build" the syntax you want using Procedural Language in BigQuery and then execute it. Here is a way of doing it without "leaving" BQ (meaning, without using external code):
BEGIN
DECLARE statement STRING DEFAULT 'SELECT col FROM dataset.table WHERE';
FOR record IN (SELECT * FROM UNNEST(['term1','term2','term3']) as term)
DO
SET statement = CONCAT(statement, ' col LIKE "', '%', record.term, '" OR');
END FOR;
SET statement = CONCAT(statement, ' 1=2');
EXECUTE IMMEDIATE statement;
END;

SqlQuery and SqlFieldsQuery

it looks that SqlQuery only supports sql that starts with select *? Doesn't it support other sql that only select some columns like
select id, name from person and maps the columns to the corresponding POJO?
If I use SqlFieldQuery to run sql, the result is a QueryCursor of List(each List contains one record of the result). But if the sql starts with select *, the this list's contents would be different with field query like:
select id,name,age from person
For the select *, each List is constructed with 3 parts:
the first elment is the key of the cache
the second element is the pojo object that contains the data
the tailing element are the values for each column.
Why was it so designed? If I don't know what the sql that SqlFieldsQuery runs , then I need additional effort to figure out what the List contains.
SqlQuery returns key and value objects, while SqlFieldsQuery allows to select specific fields. Which one to use depends on your use case.
Currently select * indeed includes predefined _key and _val fields, and this will be improved in the future. However, generally it's a good practice to list fields you want to fetch when running SQL queries (this is true for any SQL database, not only Ignite). This way your code will be protected from unexpected behavior in case schema is changed, for example.

Transferring several similar named tables in SSIS

I want to create an interface between 2 databases on SQL Server 2008+ to copy several similar named tables into one.
I have n tables that all have the same naming convention, for example:
SalesInvoicePlanning2014ver1
SalesInvoicePlanning2015ver1
SalesInvoicePlanning2015ver2
etc.
The numbers can vary and do not have a set start (or end), but are always of the "int"-Datatype.
I also have a table "tabledir" that contains all table names as list. (one field) There are a total of 30-40 entries in that list with (for me) undesired entries. In the above example I would need 3 of the 30 tables.
The plan is to use a loop container to
select Top 1([name]) from [tabledir] where name like 'SalesinvoicePlanning%'
and then use the result as variable in the following SSIS Data transfer task:
Select * from [variable]
However, I'm stuck with the SQL statement to give me the desired tablename on each iteration.
Performance is not really an issue. Any advice? Am I wrong trying to use a loop-container?
You can follow below steps -
Step 1 - You can first create SQL task to get all table names into one variable lets say, TableNames of type Object(recordset) using you query.
e.g. select ([name]) as TableName from [tabledir] where name like 'SalesinvoicePlanning%'
Step 2 - Add foreach loop container to iterate over this variable TableNames to take single table name into new variable current_table and add data flow into the container to import data to destination table. Your source query will be expression like -
Select column_names from current_table

Oracle SQL "meta" query for records which have specific column values

I'd like to get all the records from a huge table where any of the number columns countains a value greater than 0. What's the best way to do it?
E.g.:
/* table structure*/
create table sometable (id number,
somestring varchar2(12),
some_amount_1 number(17,3),
some_amount_2 number(17,3),
some_amount_3 number(17,3),
...
some_amount_xxx number(17,3));
/* "xxx" > 100, and yeah I did not designed that table structure... */
And I want any row where any of the some_amount_n > 0 (even better solution is to add a field in the first place to show which field(s) are greater than zero).
I know I can write this with a huge some_amount_1 > 0 OR some_amount_2 > 0 OR ... block (and the field names with some case when but is there should be some more elegant solution, isn't there?
Possible solutions:
Normalize the table. You said you are not allowed to. Try to convince those that forbid such a change by explaining the benefits (performance, ease of writing queries, etc).
Write the huge ugly OR query. You could also print it along with the version of the query for the normalized tables. Add performance tests (you are allowed to create another test table or database, I hope.)
Write a program (either in PL/SQL or in another procedural language) that produces the horrible OR query. (Again, print along with the elegant version)
Add a new column, say called Any_x_bigger_than_zero which is automatically filled with either 0 or 1 via a trigger (that uses a huge ugly OR). Then you just need to check: WHERE Test_x_bigger_than_zero = 1 to see if any of the rows is > 0
Similar to previous but even better, create a materialized view with such a column.
First, create a table to sort the data into something more easily read from...something simple like id,column_name,column_value. You'll have to bear with me, been a while since I've operated in oracle, so this is heavy pseudo code at best:
Quick dynamic sql blurb...you can set a variable to a sql statement and then execute that variable. There are some security risks and it's possible this feature is disabled in your environment...so confirm you can run this first. Declare a variable, set the variable to 'select 1' and then use 'execute immediate' to execute the sql stored in your variable.
set var = 'select id, ''some_amount_' || 1 || '', some_amount || 1 || ' from table where some_amount_' || 1 || ' <> 0'
Assuming I've got my oracle syntax right...( pipe is append right? I believe a 3 single quote as ''' should result in one ' when in a variable too, you may have to trial and error this line until you have the var set to):
select id, 'some_amount_1',some_amount_1
from table
where some_amount_1 <> 0
This should select the ID and the value in some_amount_1 for each id in your database. You can turn this into an insert statement pretty easily.
I'm assuming some_amount_xxx has an upper limit...next trick is to loop this giant statement. Once again, horrible pseudo code:
declare sql_string
declare i and set to 1
for i = 1 to xxx (whatever your xxx is)
set sql_string to the first set var statement we made, replacing the '1' with the i var here.
execute sql
increment i
loop
Hopefully it makes sense...it's one of the very few scenarios you would ever want to loop dynamic sql on. Now you have a relatively straight forward table to read from and this should be a relatively easy query from here

T-SQL - Select records into concatenated text?

I'm trying to investigate when & why certain rows are getting deleted in a SQL 2005 database. I've started building a trigger to log some information when a row is deleted.
My trigger is activated when row(s) are deleted from a certain table. I have it set up to log a timestamp in another logging table when the delete occurs. I'd also like to log the data that was deleted, but would prefer not to hassle with writing code for each field and value.
I know when data is deleted it can be seen (temporarily) in the "Deleted" table in SQL Server. So right after a delete, I could "SELECT * FROM Deleted" and see the data. I would like to take the contents of this table, and turn it into one large text blob that I can just save into a TEXT field in my logging table.
So... in simpler terms, is there a way I can take a recordset of one or more rows and turn it into a single string variable? all within SQL commands in my trigger? Bonus points if I can include column names.
Thanks
I would stay away from anything that would run too long when working in a trigger. That includes some query just to determine a static table layout (because you don't want to write the code yourself) so you can build a string.
I do this type of thing all the time, but mostly with stored procedure parameters. I have most of this in a template I use.
create this function, will display the column nicely within quotes or show NULL:
CREATE FUNCTION [dbo].[QuoteNull]
(
#InputStr varchar(8000) --value to pad
)
RETURNS
varchar(8000)
AS
/*
TEST WITH:
----------
PRINT ' dbo.QuoteNull(null) ->'+dbo.QuoteNull(null)+'<-'
PRINT ' dbo.QuoteNull(''apple'') ->'+dbo.QuoteNull('apple')+'<-'
PRINT ' dbo.QuoteNull(123) ->'+dbo.QuoteNull(123)+'<-'
PRINT ' dbo.QuoteNull(GETDATE()) ->'+dbo.QuoteNull(GETDATE())+'<-'
PRINT ' dbo.QuoteNull(GETDATE()) ->'+dbo.QuoteNull(CONVERT(varchar(23),GETDATE(),121))+'<-'
*/
BEGIN
RETURN COALESCE(''''+#InputStr+'''','null')
END
GO
paste this into your code:
INSERT INTO YourLogTable
(xxx,yyy,zzz,ColumnTextValue)
SELECT
xxx,yyy,zzz,'values:'
+' '+RTRIM('ColumnNameInt ')+'='+dbo.QuoteNull( ColumnNameInt )
+', '+RTRIM('ColumnNameVarchar ')+'='+dbo.QuoteNull( ColumnNameVarchar )
+', '+RTRIM('ColumnNameChar ')+'='+dbo.QuoteNull( ColumnNameChar )
+', '+RTRIM('ColumnNameDate ')+'='+dbo.QuoteNull(CONVERT(varchar(23),ColumnNameDate ,121))
FROM DELETED
make sure you have one row for each column in your table (if you have more just delete the extra ones later), if you want to see any dates in detail, use the convert as shown above.
run this query:
select sc.name
FROM syscolumns sc INNER JOIN sysobjects so ON sc.id = so.id
where UPPER(so.name)=UPPER('YourTableName') order by sc.colorder desc
take the output in SQL Server Management Studio (within text output mode), and do ALT-LEFT_Click-Drag a square over the column names, and copy this column based selection.
Go back to your code and ALT-Click-Drag a square over the complete "ColumnName..." values in the left column of the insert statement and paste. If you made a column selection, it will replace the column only and leave the code unchanged to the left and right. Do the same thing for the "ColumnName..." values in the right of the insert and you now have an INSERT that will build the data you want but will not waste too much time in the trigger.