Archiving files using Pentaho PDI - pentaho

I need to archive the txt file using Pentaho PDI by giving it a dynamic timestamp and append the variable to the output filename. I used get system info which automatically assigns variable as well as value. So my job was Start__ get system info___zip file. In the zip file component, I tried called the variable while giving the output filename along with ${Variable} but the output filename is not coming properly. It should be off filename__timestamp__variable. Can someone please help me with this?

Related

pentaho data integration dynamic file name

New to PDI here. Need to output data from a view in a postgresql database to a file daily. The output file will be like xxxx_20160427.txt, so need to append the dynamic date in the file name. How to do it?
EDIT-----------------
I was not clear here by asking how to add dynamic date, I was trying to add not just date but optional other parts to the file name. E.g adding a serial no (01) at the end: xxxx_2016042701.txt etc. So my real question is how to make a dynamic file name? In other ETL tool e.g. SSIS it will be a simple expression. Not sure how it is done in PDI?
In your Text file output step, simply check "Include date in filename?" under the files tab.
You can create a dynamic filename variable with a Modified Java Script value STEP.
and then in the Text File Output STEP click on "Accept file name from field", and select your variable declared from previous step (filename_var on this example).

Check if Windows batch variable starts with a specific string

How can I find out (with Windows a batch command), if, for example, a variable starts with ABC?
I know that I can search for variables if I know the whole content (if "%variable%"=="abc"), but I want that it only looks after the beginning.
I also need it to find out where the batch file is located, so if there is a other command that reveals the file's location, please let me know.
Use the variable substring syntax:
IF "%variable:~0,3%"=="ABC" [...]
If you need the path to the batch file without the batch file name, you can use the variable:
%~dp0
Syntax for this is explained in the help for the for command, although this variable syntax extends beyond just the for command syntax.
to find batch file location use %0 (gives full patch to current batch file) or %CD% variable which gives local directory

Storing from wildcard input path

I’m having issues using wildcard input paths in Pig.
If I run the following commands:
A = load ‘/something/*.csv’ using PigStorage(‘,’)
dump A;
I see the output from all csv files in the something folder printed to my console after the job is run.
If, however, I run a store instead:
A = load ‘/something/*.csv’ using PigStorage(‘,’)
store A into ‘somedestination’;
The job fails with the following error message:
Input(s):
Failed to read data from “/something/*.csv”
It looks like the store is attempting to load from the literal path instead of globbing using the wildcard, but if that’s the case then why does it work during the dump? Is there another way to accomplish this?
You may not have the permission to write to that folder.
The dump essentially writes to the tmp folder (or another folder if the configuration is different) and then prints that to the screen.
Do a dump. Look at the log. It should say something like:
Input(s):
Successfully read 0 records from: "‘/something/*.csv’"
Output(s):
Successfully stored 0 records in: "file:/tmp/temp1865628879/tmp-1573237939"
Then next time try and store to the folder that you saw when you did the dump. If that works fine, then you have a permissions problem.

BIDS Import from changing file name [wildcard?]

I'm attempting to create a process to import data. I created the entire process and it works, but I'm having trouble creating the variable to find the file name of the csv i want to import automatically. Each time a new csv is uploaded to me it has a timestamp on it. I want to be able to grab that file no matter what the name is and do work to it.
So for example this week the file name would be
filename_4-14-2014.csv
And next week
filename_4_21_2014.csv
And so on into eternity. . .
Is there a way to create a variable that picks up the full file name even though its changing?
After doing some poking around, I've discovered the following...
You can use a file system task to perform the copy operation I was referring to. You can set the input file and the output file as variables. This way you can always know that the file you use for import is always named the same, and has the right data.
You just need to add the variables and a File System Task to your package.
Ok so to accomplish what I wanted I created a Foreach Loop Container. Using the foreach loop container I had it look for any files ending with .csv in my specified folder by using a wildcard [denoted by asterisk: *.csv] .
Within the Foreach Loop container is as follows.
Step 1: File System Task - rename file.
Step 2: Data Flow Task - Import data to sql
Step 3: File System Task - Copy the file to another folder, append datetime to filename
Step 4: File System Task - Delete source file.
I used variables to get all the file and folder names plus datetimes.

Kettle Spoon - variable in file name input

anyone know how to set variable for file name in 'Text File Input'?
I want the file name depends on when I execute the transformation, example:
D:\input_file_<variable>.txt
today = D:\input_file_20131128.txt
tomorrow = D:\input_file_20131129.txt
FYI, I'm using Kettle Spoon - 4.2.0
In set form, you can use variable as ${Variable_Name} in file name.
You should notice the system information:
Please remember that the variables you define with this step can't be used in this transformation. This is simply because all steps in a transformation run in parallel without a certain order of execution.
As alternative correct usage, you can set variables you want to use in the first transformation of a job