I have a transformation that is successfully writing the first row to the log file.
However the same transformation is not writing the first row to a text file.
The text file remains blank.
Does anyone know why this may be?
edited - only focusing on the applications to run and set pm variable transformations, as the other transformations are replications of set pm variable but for different fields
It looks like your Set Variables step is distributing its rows over the two follow-up steps in a round-robin way, which is the default setting in PDI.
Right-click the Set Variables step and under Data Movement, select Copy. That will send all rows to BOTH steps. You should see a documents icon on the hops then.
Related
I'm working on a package to import data from a raw text file to a table in SQL Server. My package contains:
1) An Execute Process Task that runs a batch file to compile .txt files
2) An Execute SQL Task that Truncates the table I want to import
3) A Data Flow Task that takes the data from the raw text file and puts it in the table in SQL Server
I was able to run each step individually and they worked as expected - however, when I run the solution from inside SSIS itself, it gives me the "success" message but nothing actually happens. Even worse, the components of the data flow task are now missing.
Has anyone experienced this who found a work around?
Sorry for the lack of specifics! I actually figured it out. Let me clarify my second paragraph:
The batch portion and the Execute SQL Task work perfectly when I disable the Data Flow Task! However, upon enabling the Data Flow Task, the package would "run" but not complete the Data Flow Task and would delete the Data Flow Task's components completely. Within the data flow task I had:
1) Flat File Source
2) Conditional split that ignored rows in the first column if the value was "".
3) OLE DB destination table
What I found is that changing the Conditional Split from specifically ignoring rows with "" to making the criteria based on value length, rather than looking for that symbol, worked out and didn't completely delete out components in the data flow task.
TL;DR: For whatever reason, the solution I built didn't like the conditional split criteria being based on the "" character. When I removed that, the solution worked perfectly.
Step 1- This is X job that creates the (b) job.dat file
Step 2- This is an SSIS package that splits the output dat file into 4 different files to send to Destination
Step 3-Moves the four files from the workarea to another location where MOVEIT can pick them up from
***Step two is not restartable
***There is no reversing out if any of the step fails
Note: what if i add exception handler or should I add condional split... any other ideas ?
Batch Persistence
One thing you can do for starters is to append the file(s) names with a timestamp that includes the date time of the last record processed (if timestamps do not apply then you can use a primary key incrementing value). The batch identifiers could also be stored in a database. If your SSIS package can smartly name the files in chronological sequence then third step can safely ignore files that is has already processed. Actually, you could do that at each step. This would give you the ability to start the whole process from scratch, if you must do it that way.
Ignorant and Hassle Free Dumping
Another suggestion would be do dump all data each day. If the files do not get super large then just dump all data for whatever it is you are dumping. This way each step would not have to maintain state and the process could start/stop at anytime.
I have a requirement where we can get list of file names from SQL and need to pass these file names as variable to Step which can poll folder for these file names as text file. Please advise how to set SQL output of file names as array variable and pass to polling folder step ?
Don't use variables. Variables are only suitable if your input has 1 single row.
Instead, use two transformations inside a parent job. The first transformation gets a list of filenames and passes those to a step Copy Rows to Result;
The second transformation can do one of two things:
Process all files at once: just use a Get Rows from Result step as your entry point to the transformation;
Process one file at a time: create a parameter for the filename on the transformation; open the parent job, and on the properties of the transformation go to Advanced and tick the box "Execute for every input row" and on Parameters put the child trans parameter name and the stream column name coming from the 1st transformation.
I have files abc.xlsx, 1234.xlsx, and xyz.xlsx in some folder. My requirement is to develop a transformation where the Microsoft Excel Input in PDI (Pentaho Data Integration) should only pick the file based on the output of a sql query. If the output query is abc.xlsx. Microsoft Excel Input should pick of abc.xlsx for further processing. How do I achieve this? Would really appreciate your help. Thanks.
Transformations in Kettle run asynchronously, so you're probably looking into needing a job for this.
Files to create
Create a transformation that performs the SQL query you're looking for and populates a variable based on the result
Create a transformation that pulls data from the Excel file, using the variable populated as the filename
Create a job that executes the first transformation, then steps into the second transformation
Jobs run sequentially, so it will execute the first transformation, perform the query, get the result, and set a variable. Variables need to be set and retrieved in different transformations because of their asynchronous nature. This is the reason for the second transformation; the job won't step into the second transformation until the first one is done running (therefore, not until the variable is populated).
This is all assuming you only want to run the transformation once, expecting a single result from the query. If you want to loop it, pulling data from a set, then setup is a little bit different.
The Excel input step has a "accept filenames from previous step" option. You can have a table input build the full path of the file you want to read (or you somehow build it later knowing the base dir and the short filename), pass the filename to the excel input, tick that box and specify the step and the field you want to use for the filename.
Im new to the pentaho suite and its automation functionality. i have files that come in on a daily basis and two columns need to be put in place. I have figured out how to add the columns but now i am stuck on the automation side of things. The filename is constant but it has a datestamp at the end. EG: LEAVER_REPORT_NEW_20110623.csv. The file will always be in the same directory. How do i go about using Pentaho data integration to solve this issue? ive tried get files but that doesnt seem to work.
create a variable in a previous transform which contains 20110623 (easy with a get system info step to get the date, and then a select values step to format to string, then a set variables step)
then change the filename of the textfile input to use:
LEAVER_REPORT_NEW_${variablename}.csv