We have a script task that processes a flatflat, inserts data into the database, then records any duplicates (via a stored proccedure) into a SQL table, which then passes it to a work flow task that looks up that table & writes all data into a file then trucates the table.
The problem is even when there is 0 errors recorded, it always writes an log flatfile.
Is there a way to write a flatfile where there is > 0 records in the duplicate log table?
Here is a possible option that might give you an idea of getting rid of Error file that have no records.
Here is step-by-step process on how to do this. In this example, I have used a csv file named Country_State.csv containing countries and states as the source file.
Scenario:
The sample package will read the file and then write to a text file named Destination.txt. In this scenario, the error file Error.txt will be created but later deleted if there are no errors. Here, I have the files stored in the path c:\temp\
Step by step flow:
On the connection manager section, create three flat file connections namely Source, Destination and Error. Refer screenshot #1.
Source connection should point to the csv file path c:\temp\Country_State.csv. Refer screenshot #2 for the contents of this file.
Destination connection should point to a text file named c:\temp\Destination.txt.
Error connection should point to a text file named c:\temp\Error.txt.
Create a variable of data type Int32 named ErrorCount.
On the Control Flow tab, place a Data Flow Task and then place a File System Task.
Connect the Data Flow Task to File System Task.
Right click on the connector between Data Flow Task and File System Task.
On the Precedence Constraint Editor, change the Evaluation operation to Expression and paste the value #ErrorCount == 0 in the Expression textbox.
Your control flow should look like as shown in screenshot #3.
Inside the data flow task on the data flow tab, drag and drop a Flat File Source and configure it to use Source connection manager.
Place a Flat File Destination and configure it to use Destination connection manager.
Connect the green output arrow from the Flat File Source to the Flat File Destination.
Place a Row Count transformation on the data flow tab and configure it to use the Variable User:ErrorCount.
Connect the red output arrow from the Flat File Source to the Row Count transformation.
Place a Flat File Destination and configure it to use Error connection manager.
Connect the output from Row Count to the Flat File Destination using Error connection.
Your data flow task should look like as shown in screenshot #4.
On the Control Flow tab, double-click on the File System Task.
On the File System Task Editor, set the Operation to Delete file and set the SourceConnection to Error. Refer screenshot #5.
Contents of the folder path C:\temp before package execution are shown in screenshot #6.
Data flow tab execution is shown in screenshot #7.
Control flow execution is shown in screenshot #8.
Contents of the folder path C:\temp after package execution are shown in screenshot #9.
To show this actually works, I changed the second column on the Source connection manager to integer (even though state names are strings) so that the data flow task redirects to the Error output.
Scenario 2 Data flow tab execution is shown in screenshot #10.
Scenario 2 Control flow execution is shown in screenshot #11. Notice that the File System Task is not executed because the error file is not empty.
Contents of the folder path C:\temp after Scenario 2 package execution are shown in screenshot #12. Notice that the file Destination.txt is present even though there were no successful rows. This is because the example deletes only the Error file if it is empty.
Similar logic can be used to delete a empty Destination file.
Hope that helps.
Screenshot #1:
Screenshot #2:
Screenshot #3:
Screenshot #4:
Screenshot #5:
Screenshot #6:
Screenshot #7:
Screenshot #8:
Screenshot #9:
Screenshot #10:
Screenshot #11:
Screenshot #12:
You could try this..put a script before the task is executed to check the file size, and the add a "Precedence Constraint"...when TRUE, then proceed. with
Dim FileInfo As System.IO.FileInfo
you can obtain the file length like...FileInfo.Length.
Related
I need to archive the txt file using Pentaho PDI by giving it a dynamic timestamp and append the variable to the output filename. I used get system info which automatically assigns variable as well as value. So my job was Start__ get system info___zip file. In the zip file component, I tried called the variable while giving the output filename along with ${Variable} but the output filename is not coming properly. It should be off filename__timestamp__variable. Can someone please help me with this?
What I want to do is the following...
I want to divide the input file into registers, convert each record into a
file and leave all the files in a directory.
My .csv file has the following structure:
ERP,J,JACKSON,8388 SOUTH CALIFORNIA ST.,TUCSON,AZ,85708,267-3352,,ALLENTON,MI,48002,810,710-0470,369-98-6555,462-11-4610,1953-05-00,F,
ERP,FRANK,DIETSCH,5064 E METAIRIE AVE.,BRANDSVILLA,MO,65687,252-5592,1176 E THAYER ST.,COLUMBIA,MO,65215,557,291-9571,217-38-5525,129-10-0407,1/13/35,M,
As you can see it doesn't have Header row.
Here is my flow.
My problem is that when the Split Proccessor divides my csv into flows with 400 lines, it isn't save in my output directory.
It's first time using NIFI, sorry.
Make sure your RecordReader controller service is configured correctly(delimiter..etc) to read the incoming flowfile.
Records per split value as 1
You need to use UpdateAttribute processor before PutFile processor to change the filename to unique value (like UUID) unless if you are configured PutFile processor Conflict Resolution strategy as Ignore
The reason behind changing filename is SplitRecord processor is going to have same filename for all the splitted flowfiles.
Flow:
I tried your case and flow worked as expected, Use this template for your reference and upload to your NiFi instance, Make changes as per your requirements.
I experienced an error in SAP ABAP which says DATASET_CANT_CLOSE with error number 32 (Broken Pipe). Question is: what procedure triggered this kind of error?
As far as I know, this error was triggered by:
CLOSE DATASET dset
But I can't reproduce the error since I don't know what procedure does trigger this kind of error.
This is the code I use:
method GENERATE_TXT_FILE.
DATA :
lwa_data TYPE t_line,
lv_param TYPE sxpgcolist-parameters.
"Upload File to Server
*Open Dataset
OPEN DATASET im_file_name FILTER 'dos2ux'
FOR OUTPUT IN TEXT MODE ENCODING DEFAULT.
CLEAR lwa_data.
LOOP AT it_data INTO lwa_data.
CATCH SYSTEM-EXCEPTIONS file_access_errors = 4
OTHERS = 8.
TRANSFER lwa_data-lines TO im_file_name.
ENDCATCH.
IF sy-subrc <> 0.
CLEAR lwa_data.
EXIT.
ENDIF.
CLEAR lwa_data.
ENDLOOP.
*Close Dataset
CLOSE DATASET im_file_name.
As I have investigated through the background job log, it seems that the current server which run the background job haven't got mapped yet to the text file folder. Solution is to re-map the server to text file folder.
You are using the FILTER extension to OPEN DATASET - which can be a HUGE security issue as well as raise loads of portability issues unless you know what you're doing, but that's not what the question is about. From the documentation:
When the statement OPEN DATASET is executed, a process is started in
the operating system for the specified statement. When the file is
opened for reading, a channel (pipe) is linked with STDOUT of the
process, from which the data is read during file reading. The file
itself is linked with STDIN of the process. When the file is opened
for writing, a channel (pipe) is linked to STDIN of the process, to
which data is passed when writing. The output of the process is
diverted to this file.
In your case, the filter command probably decided to bail out - see this answer among many. Why is hard to investigate - you may have to go through various system logs to find out. If the problem really is some unmapped network folder, you could try switching to UNC paths.
I've got a visual studio 'web performance test' to run from the command line. The plan is to create a scheduled task to run this. How do i trigger an email on failure? Either I wire that logic up in the test itself or it's external and dependent on return code but i don't think there is a return value - i.e. failure is shown in output text or by checking the saved results file.
You can use the /resultsfile:[ file name ] option with mstest.exe to create a ".trx" file. Its contents is XML and it contains a section similar to:
<ResultSummary outcome="Completed">
<Counters total="1" executed="1" passed="1" error="0" failed="0"
timeout="0" aborted="0" inconclusive="0" passedButRunAborted="0"
notRunnable="0" notExecuted="0" disconnected="0" warning="0"
completed="0" inProgress="0" pending="0" />
</ResultSummary>
(Extra white space added for clarity).
It should be a simple matter to examine the TRX file after the run and send an email if anything failed.
I'm attempting to create a process to import data. I created the entire process and it works, but I'm having trouble creating the variable to find the file name of the csv i want to import automatically. Each time a new csv is uploaded to me it has a timestamp on it. I want to be able to grab that file no matter what the name is and do work to it.
So for example this week the file name would be
filename_4-14-2014.csv
And next week
filename_4_21_2014.csv
And so on into eternity. . .
Is there a way to create a variable that picks up the full file name even though its changing?
After doing some poking around, I've discovered the following...
You can use a file system task to perform the copy operation I was referring to. You can set the input file and the output file as variables. This way you can always know that the file you use for import is always named the same, and has the right data.
You just need to add the variables and a File System Task to your package.
Ok so to accomplish what I wanted I created a Foreach Loop Container. Using the foreach loop container I had it look for any files ending with .csv in my specified folder by using a wildcard [denoted by asterisk: *.csv] .
Within the Foreach Loop container is as follows.
Step 1: File System Task - rename file.
Step 2: Data Flow Task - Import data to sql
Step 3: File System Task - Copy the file to another folder, append datetime to filename
Step 4: File System Task - Delete source file.
I used variables to get all the file and folder names plus datetimes.