apologies if I've phrased this terribly. I only started using SSIS today.
I've written a FOREACH which loops through all the files in a folder, and updates my table f_actuals together with the filename without the extension - this filename is a combination of a PeriodKey and Business Unit. It works well.
However, this is intended to be a daily upload from our system for the entire month for each business unit (so the month-to-date refreshes daily until we close that period), so what I really need is the FOREACH to include something which does the following: -
Checks the filenames due for import in the designated folder against the filenames already in the f_actuals table
Removes all the matches from the f_actuals table
Continues with the FOREACH I've already built
I know this is probably a massively inefficient way to do this (preference would be daily incremental uploads), but the files need to be month-to-date, as our system cannot provide anything else easily.
Hope this makes sense.
Any help greatly appreciated.
You can use an Execute SQL Task within the For Each Loop to do this.
You can either use an SQL statement:
DELETE
FROM f_actuals
WHERE filename = ?
Or perhaps a stored procedure (accepting your filename as a parameter and doing the same thing as the statement above), e.g.:
EXEC DeleteFromActuals ?
For each filename in your loop, you would store this in a variable, and pass the variable as a parameter in the Execute SQL Task (this is what the ? is).
To map the parameter in the Execute SQL Task, go to 'Parameter Mapping', and add a new parameter. Select the variable containing the filename from the dropdown list, choose a data type of VARCHAR, and set the 'Parameter Name' to 0. The 'Direction' should be 'Input', which is the default.
Related
This is merely a SSIS question for advanced programmers. I have a sql table that holds clientid, clientname, Filename, Ftplocationfolderpath, filelocationfolderpath
This table holds a unique record for each of my clients. As my client list grows I add a new row in my sql table for that client.
My question is this: Can I use the values in my sql table and somehow reference each of them in my SSIS package variables based on client id?
The reason for the sql table is that sometimes we get request to change the delivery or file name of a file we send externally. We would like to be able to change those things dynamically on the fly within the sql table instead of having to export the package each time and manually change then re-import the package. Each client has it's own SSIS package
let me know if this is feasible..I'd appreciate any insight
Yes, it is possible. There are two ways to approach this and it depends on how the job runs. First is if you are running for a single client for a single job run or if you are running for multiple clients for a single job run.
Either way, you will use the Execute SQL Task to retrieve data from the database and assign it to your variables.
You are running for a single client. This is fairly straightforward. In the Result Set, select the option for Single Row and map the single row's result to the package variables and go about your processing.
You are running for multiple clients. In the Result Set, select Full Result Set and assign the result to a single package variable that is of type Object - give it a meaningful name like ObjectRs. You will then add a ForEachLoop Enumerator:
Type: Foreach ADO Enumerator
ADO object source variable: Select the ObjectRs.
Enumerator Mode: Rows in all the tables (ADO.NET dataset only)
In Variable mappings, map all of the columns in their sequential order to the package variables. This effectively transforms the package into a series of single transactions that are looped.
Yes.
I assume that you run your package once per client or use some loop.
At the beginning of the "per client" code read all required values from the database into SSIS varaibles and the use these variables to define what you need. You should not hardcode client specific information in the package.
I'm working with SSIS and am needing to read all the Customer Numbers out of the Customer table, and search through a directory to see if a PDF file exists with the Customer Number (e.g. A000134) within the filename. If it does, attach it to an email and send.
Does anyone have any suggestions on how this could be achieved?
How I thought of approaching it was:
1) Loop through the directory and get all the filenames/path and write to a table.
2) Using String Functions, pull the Customer Number out of the filename.
3) Call a Stored Procedure and within the stored proc, loop through all customers who have a file to send (joining on Customer Number between CustomerTable and FilesTable), and use #file_attachments in sp_send_dbmail to reference the file name to send.
Nice problem but i think SSIS is not right toll to solve this.
Still you can achieve using ssis
Load file data in TXT column and use full text search and use
full-text-search query in ssis
Use custom code to look into files using (C# or Vb) for ssis
But My suggestion would be write some stand alone utility to do this job done.
I have a job that I want to run that passes a variable to an ssis package. The variable is a filename but the filename changes daily. I have an access front end that the user enters the filename into. The access program runs a stored procedure which writes the filename to a temp table and then runs the job. I would like the job to query that table for the filename and pass it along to my package variable.
I can get the job to work using a static filename. On the set values tab I used the property path \Package.Variables[User::FileName] and the value \myserver......\filename.txt. But I don't know how to replace that filename with the results of the query
Thanks in advance.
Scott
I may have spoke too soon. The data source saved in my job step was still an value in my package. I had removed the value but didn't re-import the package to SQL. Now that I did that it is not importing anything at all.
I ended up creating an Execute SQL Task in my package that assigns the value in the temp table to my package variable.
I have a series of task that are very similar:
SELECT a,b FROM c
Lookup in another table and change value in column b.
Save new value back to c and if not match, send the result on to an error table.
That part is pretty straight forward and illustrated here:
Source ==> Lookup =match=> SQL Update command
=No match=> SQL Save Error command
(Hope you understand what I mean - but it works!)
I now have to repeat this a number of times, where my source-sql changes. So what I want to do is to insert a Script Component in front of the Source and set my User::Sql variable like:
Variables.Sql = "SELECT d, e FROM f"
All of the above is contained in a Data Flow. When I have created one I can then copy that one and only change the Sql variable in the script and then it should all work.
My problem is: When I insert the Script Command it asks me if it is a Source, Destination or Transscript script. And by only setting the variable it does not produce any rows for output and cannot connect to my Source.
Anyone know how to make that work?
(I have simplified the above. I actually want to update multiple variables and use those in my Source, Lookup and Error update as well - therefore it is not more simple just to change the SQL script in the initial Source! But being able to do the above, I will be able to achieve what I want :-))
You should set your variable containing the SQL query in the control flow, before you execute the dataflow.
Then you need to use that variable as an expression in your Dataflow. You can parametrize the query used in the lookup or any other parameters of your dataflow.
If your dataflows really have always the same structure, you could even generate a list of queries and call your dataflow task in a loop, preventing the duplication of the same tasks.
I have files abc.xlsx, 1234.xlsx, and xyz.xlsx in some folder. My requirement is to develop a transformation where the Microsoft Excel Input in PDI (Pentaho Data Integration) should only pick the file based on the output of a sql query. If the output query is abc.xlsx. Microsoft Excel Input should pick of abc.xlsx for further processing. How do I achieve this? Would really appreciate your help. Thanks.
Transformations in Kettle run asynchronously, so you're probably looking into needing a job for this.
Files to create
Create a transformation that performs the SQL query you're looking for and populates a variable based on the result
Create a transformation that pulls data from the Excel file, using the variable populated as the filename
Create a job that executes the first transformation, then steps into the second transformation
Jobs run sequentially, so it will execute the first transformation, perform the query, get the result, and set a variable. Variables need to be set and retrieved in different transformations because of their asynchronous nature. This is the reason for the second transformation; the job won't step into the second transformation until the first one is done running (therefore, not until the variable is populated).
This is all assuming you only want to run the transformation once, expecting a single result from the query. If you want to loop it, pulling data from a set, then setup is a little bit different.
The Excel input step has a "accept filenames from previous step" option. You can have a table input build the full path of the file you want to read (or you somehow build it later knowing the base dir and the short filename), pass the filename to the excel input, tick that box and specify the step and the field you want to use for the filename.