Pentaho - JSON Input not looping over files

Pentaho - JSON Input not looping over files - pentaho

So I've got about 10 JSON files that I have to stuff into an Elasticsearch setup. I have 3 steps currently, "Get file names", "JSON Input", and "Elasticsearch bulk insert". When I look at the Step metrics, I see that Get File Names is correctly reading the 10 files. But when it comes to the JSON input, only the first file is processed. What could be going on.
Here is an image of my setup, and I've attached the ktr file.
Link to the ktr file as it stands currently
Any help is greatly appreciated.

In the Content tab of the step you have the "Limit" atribute set to 1, you can Edit this by unchecking the "Source is from a previous step" option in the File tab, then you set "Limit" to 0.

Related

How to read (downloaded / result) filename from job

In a Pentaho 9.1 Job, using the "Get File(s) from SFTP" step, I download a CSV file. I would like to use that downloaded file name in the subject line of the email message in the "Mail" step.
I have tried calling it as a variable but it is not really a variable but a "results" value. Eample if what I tried below...
Downloaded file name = "somefile.csv"
Syntax in the "Mail" step for "Subject" = "File Processing Complete: ${short_filename}"
When email sends the subject is exactly "File Processing Complete: ${short_filename}" when I need it to be "File Processing Complete: somefile.csv"

We can get file information from result. But unfortunately this step available only in the transformation. Thus, we need to get help a transformation to read the file name. I have prepared a SOLUTION for you. You need to give right information for SFTP & MAIL configuration. Also please run job "getFromSFTP".
[getFromSFTP.kjb] Here, I download the csv file from sftp and send file information to transformation
[getFileName.ktr] Here, read the file information and send filename to another job for mail sending.
[sendMail.kjb] This job only use for send the mail with filename = ${filename1}

JMeter - testing unique links taken from CSV file

I want to test unique link taken from csv file in Jmeter. I have a csv file with unique values - "value 1", "value 2" .....
For each thread, I need to append it as part of url(or path).
Ex: BaseURL: example.com
For thread 1, example.com/value1
For thread2, example.com/value2
How can I do this in JMeter? Any help will be highly appreciated. I have created CSV and all. Just need to know how to take that value and set as part of path.

Put your "base url" into "Server name or IP" tab of the HTTP Request Defaults:
Configure CSV Data Set Config to read the values from the file and store them into a JMeter Variable
Use the variable from the step 2 in "Path" field of the HTTP Request sampler:
That should be it, each thread will read the next value from the CSV file on each iteration:
Demo:

In CSV Data Set Config add path as Variables Names, use Sharing mode All Threads
In HTTP Request Path field add ${path}
Each thread will send a different path taken from CSV line

In kettle use text file input read csv file from a tar.gz file but it didn't worked. Where it might be wrong?

I have a csv file that is tared and zipped. So I have test.tar.gz.
I would like, through text file input, read csv file.
I try this tar:gz:file://C:/test/test.tar.gz!/test.tar! use wildcard like ".*\.csv".
But it sometime can't read success.
It throws Exception
org.apache.commons.vfs.FileNotFolderException:
Could not list the contents of
"tar:gz:file:///C:/test/test.tar.gz!/test.tar!/"
because it is not a folder.
I use windows8.1, pdi 5.2
Where it might be wrong?

For a compressed file csv reading, "Text File Input" step in Pentaho Kettle only supports the first files inside the compressed folder(either in Zip/GZip file). Check the Pentaho Wiki in the compression section.
Now for your issue, try removing the wildcard entry since only the first file inside the zip/gzip file will be read. (as explained above)
I have placed a sample code containing both reading zip and gzip files. Check it here.
Hope it helps :)

Make Summary Report Jmeter output in CSV what it shows in table

The Filename given to store the results of Jmeter Summary report should (as I understand) store the same info I see on the screen . But instead it stores a short response of the HTTP request sent like this :
<httpSample t="72" lt="66" ts="1305479685437" s="true" lb="login" rc="200" rm="OK" tn="Virtual users 1-1" dt="text" by="12978">
I defined the Filename as a .csv file
Any idea how to turn it into a replica of the screen Summary report ( sample, average, Min, Max,Std.Dev etc) ?

To make a summary report of the JMeter output you need to do the following:
Make sure that in the jmeter.properties file the 'Results file configuration' is not in comment and all the fields you want to use are marked as true as shown below.
<pre><code>
#---------------------------------------------------------------------------
# Results file configuration
#---------------------------------------------------------------------------
# This section helps determine how result data will be saved.
# The commented out values are the defaults.
# legitimate values: xml, csv, db. Only xml and csv are currently supported.
jmeter.save.saveservice.output_format=csv
# true when field should be saved; false otherwise
# assertion_results_failure_message only affects CSV output
#jmeter.save.saveservice.assertion_results_failure_message=false
#
# legitimate values: none, first, all
#jmeter.save.saveservice.assertion_results=none
#
jmeter.save.saveservice.data_type=false
jmeter.save.saveservice.label=true
jmeter.save.saveservice.response_code=true
</code></pre>
Then configure the Summary Report by clicking on Configure button.

It's possible, if you want to turn it into a replica of the screen Summary report ( sample, average, Min, Max,Std.Dev etc) just click the Save Table Data button and then save it as .csv format.
You will get .csv file like this :

Jp#gc has command-line tool for exporting jmeter reports to csv. Feed saved jtl file to it and use Aggregate report mode.
See. JMeterPluginsCMD Command Line Tool docs for help

How to write an SQL statement output to a CSV file?

We have a script task that processes a flatflat, inserts data into the database, then records any duplicates (via a stored proccedure) into a SQL table, which then passes it to a work flow task that looks up that table & writes all data into a file then trucates the table.
The problem is even when there is 0 errors recorded, it always writes an log flatfile.
Is there a way to write a flatfile where there is > 0 records in the duplicate log table?

Here is a possible option that might give you an idea of getting rid of Error file that have no records.
Here is step-by-step process on how to do this. In this example, I have used a csv file named Country_State.csv containing countries and states as the source file.
Scenario:
The sample package will read the file and then write to a text file named Destination.txt. In this scenario, the error file Error.txt will be created but later deleted if there are no errors. Here, I have the files stored in the path c:\temp\
Step by step flow:
On the connection manager section, create three flat file connections namely Source, Destination and Error. Refer screenshot #1.
Source connection should point to the csv file path c:\temp\Country_State.csv. Refer screenshot #2 for the contents of this file.
Destination connection should point to a text file named c:\temp\Destination.txt.
Error connection should point to a text file named c:\temp\Error.txt.
Create a variable of data type Int32 named ErrorCount.
On the Control Flow tab, place a Data Flow Task and then place a File System Task.
Connect the Data Flow Task to File System Task.
Right click on the connector between Data Flow Task and File System Task.
On the Precedence Constraint Editor, change the Evaluation operation to Expression and paste the value #ErrorCount == 0 in the Expression textbox.
Your control flow should look like as shown in screenshot #3.
Inside the data flow task on the data flow tab, drag and drop a Flat File Source and configure it to use Source connection manager.
Place a Flat File Destination and configure it to use Destination connection manager.
Connect the green output arrow from the Flat File Source to the Flat File Destination.
Place a Row Count transformation on the data flow tab and configure it to use the Variable User:ErrorCount.
Connect the red output arrow from the Flat File Source to the Row Count transformation.
Place a Flat File Destination and configure it to use Error connection manager.
Connect the output from Row Count to the Flat File Destination using Error connection.
Your data flow task should look like as shown in screenshot #4.
On the Control Flow tab, double-click on the File System Task.
On the File System Task Editor, set the Operation to Delete file and set the SourceConnection to Error. Refer screenshot #5.
Contents of the folder path C:\temp before package execution are shown in screenshot #6.
Data flow tab execution is shown in screenshot #7.
Control flow execution is shown in screenshot #8.
Contents of the folder path C:\temp after package execution are shown in screenshot #9.
To show this actually works, I changed the second column on the Source connection manager to integer (even though state names are strings) so that the data flow task redirects to the Error output.
Scenario 2 Data flow tab execution is shown in screenshot #10.
Scenario 2 Control flow execution is shown in screenshot #11. Notice that the File System Task is not executed because the error file is not empty.
Contents of the folder path C:\temp after Scenario 2 package execution are shown in screenshot #12. Notice that the file Destination.txt is present even though there were no successful rows. This is because the example deletes only the Error file if it is empty.
Similar logic can be used to delete a empty Destination file.
Hope that helps.
Screenshot #1:
Screenshot #2:
Screenshot #3:
Screenshot #4:
Screenshot #5:
Screenshot #6:
Screenshot #7:
Screenshot #8:
Screenshot #9:
Screenshot #10:
Screenshot #11:
Screenshot #12:

You could try this..put a script before the task is executed to check the file size, and the add a "Precedence Constraint"...when TRUE, then proceed. with
Dim FileInfo As System.IO.FileInfo
you can obtain the file length like...FileInfo.Length.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pentaho - JSON Input not looping over files - pentaho

In the Content tab of the step you have the "Limit" atribute set to 1, you can Edit this by unchecking the "Source is from a previous step" option in the File tab, then you set "Limit" to 0.

Related

How to read (downloaded / result) filename from job

JMeter - testing unique links taken from CSV file

In kettle use text file input read csv file from a tar.gz file but it didn't worked. Where it might be wrong?

Make Summary Report Jmeter output in CSV what it shows in table

How to write an SQL statement output to a CSV file?

Categories

Resources