Pentaho Spoon Text File Output Additional Information Header - pentaho

I am using the Text File Output step to create a CSV file, however i need to insert some additional rows of information at the top of the file. I have been able to have another transform output this data in a previous job step, however when doing so prevents me from outputting column headers in the appended csv output.
The end result I am looking for would look something like this:
EXTRACT TYPE: XYZ
DATE: 20110520
FIRST NAME,LAST NAME,AMOUNT
charlie, chaplain, 2345
someone, else, 1234
Any help would be greatly appreciated. Thanks.

You can output the text file without header option. Check the KTR file - I attach the links below.
Here's the : http://pentaho.phi-integration.com/kettle/kettle-files/csv_header_solution.ktr and the sample source CSV file : http://pentaho.phi-integration.com/kettle/kettle-files/source.csv.
Hope this help.

Related

Store xml result as .xml file

By using the query:
SELECT CAST([RESULT] as xml)
FROM [dbo].[EVENT_RESULTS]
WHERE RESULT_TYPE=21
AND ANNUALIZATION_ID = 1
I produced an xml output with of course multiple lines. Now I simply want to store this output as .xml file in the c: folder.
The output looks as follows:
enter image description here
If I klick on it, it is declared as a .xml file. But how can I store this result automatically when I run the SQL script? I don't want to do the storage "by hand", since the SQL code will be part of a BATCH Programm.
Thanks very much!
This is Working in MSSQL Server
select * from TableNamefor Xml RAW('Rec'),elements XSINIL,root('Data')

importing training data to CloudML with images that do not have a file-extension

i created some training data and put the CSV in the google-storage, but it looks like the import won't work when the files do not have a proper .jpg extension:
Error: INVALID_ROW: Invalid input found at row 1 of gs://weg-li-production/training/test.csv: "Unsupported file extension."
values look like this:
TRAIN,gs://weg-li-production/d7nwcheo8774rvbcgj4lyta3athj,Opel
is there a way to work around this issue?
It seems you put the whole "TRAIN,gs://weg-li-production/d7nwcheo8774rvbcgj4lyta3athj,Opel" into a single unit in your csv file. The comma should represent another unit in the csv file. You can open it in Excel to check your csv file, and the correct format should include three columns in Excel.
Assuming gs://weg-li-production/d7nwcheo8774rvbcgj4lyta3athj is the image file & Opel is the label. It all looks fine, just that the image file name does not have a valid extension.
Check https://cloud.google.com/vision/automl/docs/prepare for valid file types (extension), during training & predictions

I want to know how my data should be in text file in relative to following script?

I want to know how my data should be text file in relative to following script?
How pig differentiate delimiter for following script?
Please give me sample one row of input?
A = LOAD 'mydata.txt' AS (P:int, T1:tuple(f1:int, f2:int), B:{T2:(t1:int,t2:int)}, M:[] );
At first, there is a document:
Load/Store Functions
And, see this:
Apache Pig - Not able to read the bag
Sample data:
30|(1,2)|{(3,4)}|[]
Sample code:
A = LOAD 'mydata.txt' USING PigStorage('|') AS (P:int, T1:tuple(f1:int, f2:int), B:{T2:(t1:int,t2:int)}, M:[] );
DUMP A;
It seems PigStorage cannot determine commas in bag. I guess it's bug.

pentaho data integration dynamic file name

New to PDI here. Need to output data from a view in a postgresql database to a file daily. The output file will be like xxxx_20160427.txt, so need to append the dynamic date in the file name. How to do it?
EDIT-----------------
I was not clear here by asking how to add dynamic date, I was trying to add not just date but optional other parts to the file name. E.g adding a serial no (01) at the end: xxxx_2016042701.txt etc. So my real question is how to make a dynamic file name? In other ETL tool e.g. SSIS it will be a simple expression. Not sure how it is done in PDI?
In your Text file output step, simply check "Include date in filename?" under the files tab.
You can create a dynamic filename variable with a Modified Java Script value STEP.
and then in the Text File Output STEP click on "Accept file name from field", and select your variable declared from previous step (filename_var on this example).

CSV to CSV datamapper in mule

I am trying to transform one csv file to another one using mule.
But how I want is for example I have 4 header in the source csv file,
heade1, header2, header3, header4
And client may pass only first 3 header and its value in the csv file. I am getting error if mule datamapper does not find all the header in source csv.
Parsing error: Unexpected end of file in record 1, field 2 ("test2"),
metadata "headertest"; value: '<Raw record data is not available,
please turn on verbose mode.>'
How can I set the datamapper to work if source file does not contains all the header/values
I couldn't find a clean way to do that yet, but you could add a pre process step that adds a field separator at the end of each line in the input csv (i.e. add a comma at the end of each line).
This way the last field will be assumed empty.
HTH,
Marcos