Pentaho skipping headers, adding filename to output - pentaho

I need to read a .vcf.gz file from pentaho.
I can read it from "Text file input" in "Content" tab setting "compressed" to "GZ".
-First of all i need to skip the headers ( basically every row with # at begin).
-Second i need to insert a new column where at every row i insert the file name.
E.g.
My file is:
#header
#header
#header
# chr pos ref alt
chr1 3 A A
What I want is:
chr1 3 A A id_001 (Taken readeing file name)
How can I achieve this?

If you've found the Content tab, you must see the Header checkbox. You can specify the number of lines to skip.
As for the filename, the "Additional output fields" tab is what you need.
Here's the preview of output:
If you need to remove the file extension from the filename, there are a few ways to do that.

Related

Rename txt file name in Pentaho

I have a problem. I have created a few files txt in directory.
file1.txt
file2.txt
file3.txt
Next I writing name files to file txt: filenames.txt with step: Shell.
ls D:\test\prep\ > filename.txt
I have there all name files which are in directory. My filenames.txt looks like this:
file1.txt
file2.txt
file3.txt
Later I read the values from the file in step Text file input and value which I get I writing to step copy to result.
Next I use get rows from result and transformation_executor.
I would like get a new name file for each file with step get rows from result: instead file1.txt I want file.txt. I think that in transformation_executor I must have TABLE_INPUT with name with step get rows from result but I don't know what's next.
Any have idea?
You need to use below step/way, if you want to read a directory files based on another configuration file (which contain the directory files information).
Step-1:
Step-2:
Step-3:
You can found the all transformation/Job from HERE
Please let me know if its ok with you.

Get file name from SAP Data service

I'm unable to read file name from data services which contain date_time format, I can read date but time can be variable, I've tried with *.csv on file name(s) property for flat file, but this for static file name.
Example: File_20180520_200003.csv, File_20180519_192503.csv, etc.
My script:
$Filename= 'File_'|| to_char(sysdate()-1, 'YYYYMMDD')|| '_'|| '*.csv';
I want to find a solution to read the 6 digits (any number) *.
Finally, I've found a solution by using
$Csv = word(exec('cmd','dir /b [$Filename]*.csv',8),2) ;
on the flat file (file name property), I've added $Csv
It works fine.

Upload csv files with comma inside it

As per my requirement, I need to upload a .csv file into the application. I am trying to simulate this using loadrunner. The issue I am encoutering is that my csv file is in the below format
Header - AA,BB,CC
Data-xyz,"yyx,zzy",xxz
On using the below statement to upload the file, I am getting an error ""line 2 contains 4 columns instead of 3"
web_submit_data("upload",
"Action=xxx/upload",
"Method=POST",
"EncType=multipart/form-data",
"RecContentType=text/html",
"Referer=xxx",
"Snapshot=t86.inf",
"Mode=HTML",
ITEMDATA,
"Name=utf8", "Value=✓", ENDITEM,
"Name=token", "Value={token_1}", ENDITEM,
"Name=upload_file", "Value={NewParam_5}", "File=yes", "ContentType=text/csv", ENDITEM,
"Name=Button1", "Value=Upload", ENDITEM,
LAST);
AS per information provided in How to deal with a string with comma in it from a csv, when we have to read the data by using loadrunner? ,
I tried updating the .prm file to a new delimiter pipe, | but still i get the error.
[parameter:NewParam_5]
Delimiter="|"
ParamName="NewParam_5"
TableLocation="C:\temp"
ColumnName="Col 1"
I also notice that even though I set the delimiter to pipe, if I rightclick on the web_submit_data() and go to Parameter properties, i see a column delimiter option there as well and it is not set to pipe and is set to comma which indicates that this setting is taking higher precedence to the setting in .prm file.
Can someone please guide me the right way to set a new delimiter so that vugen recognizes and parses the csv file as I want it to.
I am using loadrunner 12.5
Thanks for your help.
Do you need to upload a file or a line of comma separated variables? Right now you appear to be reading a line of CSV variables, not a file as your parameter file would contain a list of filenames or a single file reference within the directory of the virtual user (extra files, transferred with the use) or created by the virtual user and then uploaded.

CSV to CSV datamapper in mule

I am trying to transform one csv file to another one using mule.
But how I want is for example I have 4 header in the source csv file,
heade1, header2, header3, header4
And client may pass only first 3 header and its value in the csv file. I am getting error if mule datamapper does not find all the header in source csv.
Parsing error: Unexpected end of file in record 1, field 2 ("test2"),
metadata "headertest"; value: '<Raw record data is not available,
please turn on verbose mode.>'
How can I set the datamapper to work if source file does not contains all the header/values
I couldn't find a clean way to do that yet, but you could add a pre process step that adds a field separator at the end of each line in the input csv (i.e. add a comma at the end of each line).
This way the last field will be assumed empty.
HTH,
Marcos

Use awk to split one file into many files

Have a Master file (Master.txt) where each row is a string defining an HTML page and each field is tab delimited.
The record layout is as follows:
<item_ID> <field_1> <field_2> <field_3>
1 1.html <html>[content for 1.html in HTML format]</html> <EOF>
2 2.html <html>[content for 2.html in HTML format]</html> <EOF>
3 3.html <html>[content for 3.html in HTML format]</html> <EOF>
The HTML page is defined in <field_2>. <field_3> may not be necessary, but included here to indicate the logical location of end_of_file.
How to use awk to generate a file for each row (which begins with <item_ID>) where the content of the new file is <field_2> and the name of the new file is <field_1>?
Am running GNUwin32 under Windows 7 and will configure an awk solution to execute in a .bat file. Unfortunately can't do pipe-lining in Windows, so hoping for an single-awk-program solution.
TY in advance.
Assuming the HTML in field 3 may or may not contain tabs:
awk -F'\t' 'match($0,/<html>.*<\/html>/){print substr($0,RSTART,RLENGTH) > $2}' file