While this seems like a basic problem, I've been ripping my hair out FOR DAYS trying to get an efficient solution to this.
I have a lookup table of values on a server that I read from and assemble into a string using a C# Script task. I write this string into a variable that I want to pass in as my WHERE parameters inside a large SQL query on a ADO.NET data source (from a different server which I only have read access to) in my data flow. For example, this string would just be something like
('Frank', 'John', 'Markus', 'Tom')
and I would append that as my WHERE clause.
I can't read from a variable directly for an ADO.NET data source AND I can't use the 'Expression' property to set my SQL either as my SQL query is over 4000 characters. I could use an Execute SQL Task to run my query, load the results into a recordset and I assume, then loop through the recordset but that's extremely inefficient.
What would be the best way to do this? My end goal is to put these results inside a table on the first server.
You could try to set up Script Component as source - variables and strings inside scripts can be longer than 4000 characters so you can fit your query inside.
Setup your component similar to this article: http://beyondrelational.com/modules/2/blogs/106/posts/11119/script-componentsource-part1.aspx
In this one you have example how to fetch data using ExecuteReader and put it to output of script component: http://beyondrelational.com/modules/2/blogs/106/posts/11124/ssis-script-component-as-source-adonet.aspx In this one you have instructions how to aquire connection properly: http://www.toadworld.com/platforms/sql-server/b/weblog/archive/2011/05/30/use-connections-properly-in-an-ssis-script-task
By joining this pieces of information you should be able to write your source Script Component which can fetch data using any length dynamically constructed query.
Good luck :)
You can do a simple select statement to return a list of values that will include ('Frank', 'John', 'Markus', 'Tom'). So your select would return :
Name
----------
Frank
John
Markus
Tom
Then, in SSIS, use a Merge Join Component (that will act as a INNER JOIN) instead of a where clause in your main query.
This is the cleaniest way to achieve what you want.
Related
Is it possible to reference a SQL field in your SSIS variable?
For instance, I would like use the field from the "table" below
Select '999999' AS Physician_Profile_ID
as a dynamic variable (named "CMSPhysProID" in our example) here
I plan on concatenating multiple IDs into a In statement.
Possible by using execute sql taskIn left side pan of Execute SQL task, general tab 1.Select result set as single row2. Connection type ole db 3. Set connection and form SQL statement, As you mentioned Select '999999' AS Physician_Profile_ID 4.Go to result set in your left side pan 5. Add your variable where you want to store '999999' 6. Click ok
If you are looking to store the value within the variable to be used later, you can simply use an Execute SQL Task with a single row result set. More details in the following article:
SSIS Basics: Using the Execute SQL Task to Generate Result Sets
If you are looking to add a computed column while importing data, you must use a Derived Column Transformation within the data flow task to add a column based on another one, you can refer to the following article for more details about this component:
SSIS Derived Columns with Multiple Expressions vs Multiple Transformations
What are you trying to accomplish by concatenating the IDs into an "IN" statement? If the idea is to use the values of the IDs to limit the results, as a dynamic WHERE clause, you may have better luck just using a lookup against either a table you maintain with the desired IDs or even a static list generated in the package with a script task. (If you can use the lookup table method it will be much easier to maintain as you only have to update a table, not your source code.)
Alternatively, you may even be able to accomplish the goal with a join. Create a temp table from the profile IDs you want to keep and join to it, or, again, use it as a lookup component. Dynamically creating a where clause using IN will come in a lot slower and will be cumbersome to maintain.
I am currently using a function in SQL Server to get the max-value of a certain column. I Need this value to generate a specific number of dummy files to insert flowfiles that are created later on.
Is there a way of calling this function via a nifi-processor?
By using ExecuteSQL I Always get error like unable to execute SQL select query or the column "ab" was not found, when using select ab.functionname() (ab is the loginname of the db)
In SQL Server I can just use select ab.functionname() and get the desired results.
If there is no possible way of calling this function, is there another way to create #flowfiles dummyfiles to reserve this place for them in the DB so that no one else could insert or use this ids (not autoincremt, because it is not possible) while the flowfiles are getting processed?
I tried using $flowfile.count and the Counterprocessor, but this did not solve the Problem.
It should look like: INSERT INTO table (id,nr) values (max(id)+1,anynumber) for every flowfiles, unfortunately the ExecuteSQL is not able to do this.
Think this conversation can help you:
https://community.hortonworks.com/questions/26170/does-executesql-processor-allow-to-execute-stored.html
Gist:
You can use ExecuteScript or ExecuteProcess to call appropriate script. For example for ExecuteProcess just call sqlplus command. Choose type of command "sqlplus". In command arguments set something like: user_id/password#dbname #"script_path/someScript.sql". In someScript.sql you put something like:
execute spname(param)
You can write your own processor :) Of course it's more difficulty and often unnecessary
I am using pentaho for data migration testing. I have set a "table input" step where many parts of the query inside "table inputs" are variables. I have been looking for a way to capture that query after it gets executed during runtime.
I was wondering if there is any specific system log variables for sql or is it to do with metadata. need help! Thanks
Maybe the following approach will help:
We assume a transformation reading a CSV file to get the dynamic portion of the SELECT statement (e.g. the columns) and setting the variable columns with it.
The second transformation uses this variable to generate the SELECT statement and store it into the variable sql_statement.
In the main transformation we use ${sql_statement} as the SELECT statement of the table input and write the data to an output file (that's the business process so to say). From the same input we copy the output to another path. There we add the current time as a field (use element "Get system data") and we add the generated SQL statement, join them as a cartesian product and group the result by the sql_statement. That way we can compute the first time and the last time that the statement was used. These results are written to a text file.
The last thing we need is a job calling the three transformations sequentially.
This is a sample output:
sql_statement;min_time;max_time
SELECT my_column FROM test_table;2014/05/08 00:41:21.143;2014/05/08 00:41:21.144
Thank you Marcus! I did some thing similar.
It works. awesome.
I gathered parts of queries from table field where they were kept and formed a full query in javascript. After that full query will be sent as parameter to a transformation that will run and log the query.
My project is in Visual Foxpro and I use MS SQL server 2008. When I fire sql queries in batch, some of the queries don't execute. However, no error is thrown. I haven't used BEGIN TRAN and ROLLBACK yet. What should be done ??
that all depends... You don't have any sample of your queries posted to give us an indication of possible failure. However, one thing I've had good response with from VFP to SQL is to build into a string (I prefer using TEXT/ENDTEXT for readabilty), then send that entire value to SQL. If there are any "parameter" based values that are from VFP locally, you can use "?" to indicate it will come from a variable to SQL. Then you can batch all in a single vs multiple individual queries...
vfpField = 28
vfpString = 'Smith'
text to lcSqlCmd noshow
select
YT.blah,
YT.blah2
into
#tempSqlResult
from
yourTable YT
where
YT.SomeKey = ?vfpField
select
ost.Xblah,
t.blah,
t.blah2
from
OtherSQLTable ost
join #tempSqlResult t
on ost.Xblah = t.blahKey;
drop table #tempSqlResult;
endtext
nHandle = sqlconnect( "your connection string" )
nAns = sqlexec( nHandle, lcSqlCmd, "LocalVFPCursorName" )
No I don't have error trapping in here, just to show principle and readability. I know the sample query could have easily been done via a join, but if you are working with some pre-aggregations and want to put them into temp work areas like Localized VFP cursors from a query to be used as your next step, this would work via #tempSqlResult as "#" indicates temporary table on SQL for whatever the current connection handle is.
If you want to return MULTIPLE RESULT SETs from a single SQL call, you can do that too, just add another query that doesn't have an "into #tmpSQLblah" context. Then, all instances of those result cursors will be brought back down to VFP based on the "LocalVFPCursorName" prefix. If you are returning 3 result sets, then VFP will have 3 cursors open called
LocalVFPCursorName
LocalVFPCursorName1
LocalVFPCursorName2
and will be based on the sequence of the queries in the SqlExec() call. But if you can provide more on what you ARE trying to do and their samples, we can offer more specific help too.
Using SQL Server 2000 and Microsoft SQL Server MS is there a way to create a delimited string based upon an unknown number of columns per row?
I'm pulling one row at a time from different tables and am going to store them in a column in another table.
A simple SQL query can't do anything like that. You need to specify the fields you are concatenating.
The only method that I'm aware of is to dynamincally build a query for each table.
I don't recall the structure of MSSQL2000, so I won't try to give an exact example, maybe someone else can. But there -are- system tables that contain table defintions. By parsing the contents of those system tables you can dynamically build the necessary query for each source data table.
TSQLthat writes TSQL, however, can be a bit tricky to debug and maintain :) So be careful how you structure everything...
Dems.
EDIT:
Or just do it in your client application.