CSV file input not working together with set field value step in Pentaho Kettle - pentaho

I have a very simple Pentaho Kettle transformation that causes a strange error. It consists of reading a field X from a CSV, add a field Y, set Y=X and finally write it back to another CSV.
Here you can see the steps and the configuration for them:
You can also download the ktr file from here. The input data is just this:
1
2
3
When I run this transformation, I get this error message:
ERROR (version 5.4.0.1-130, build 1 from 2015-06-14_12-34-55 by buildguy) : Unexpected error
ERROR (version 5.4.0.1-130, build 1 from 2015-06-14_12-34-55 by buildguy) : org.pentaho.di.core.exception.KettleStepException:
Error writing line
Error writing field content to file
Y Number : There was a data type error: the data type of [B object [[B©b4136a] does not correspond to value meta [Number]
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.writeRowToFile(TextFiIeOutput.java:273)
at org.pentaho.di.trans.steps.textfiIeoutput.TextFileOutput.processRow(TextFiIeOutput.java:195)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
atjava.Iang.Thread.run(Unknown Source)
Caused by: org.pentaho.di.core.exception.KettleStepException:
Error writing field content to file
Y Number : There was a data type error: the data type of [B object [[B©b4136a] does not correspond to value meta [Number]
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.writeField(TextFileOutput.java:435)
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.writeRowToFile(TextFiIeOutput.java:249)
3 more
Caused by: org.pentaho.di.core.exception.KettleVaIueException:
Y Number : There was a data type error: the data type of [B object [[B©b4136a] does not correspond to value meta [Number]
at org.pentaho.di.core.row.vaIue.VaIueMetaBase.getBinaryString(VaIueMetaBase.java:2185)
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.formatField(TextFiIeOutput.java:290)
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.writeField(TextFileOutput.java:392)
4 more
All of the above lines start with 2015/09/23 12:51:18 - Text file output.0 -, but I edited it out for brevity. I think the relevant, and confusing, part of the error message is this:
Y Number : There was a data type error: the data type of [B object [[B©b4136a] does not correspond to value meta [Number]
Some further notes:
If I bypass the set field value step by using the lower hop instead, the transformation finish without errors. This leads me to believe that it is the set field value step that causes the problem.
If I replace the CSV file input with a data frame with the same data (1,2,3) everything works just fine.
If I replace the file output step with a dummy the transformation finish without errors. However, if I preview the dummy, it causes a similar error and the field Y has the value <null> on all three rows.
Before I created this MCVE I got the error on all sorts of seemingly random steps, even when there was no file output present. So I do not think this is related to the file output.
If I change the format from Number to Integer, nothing changes. But if I change it to string the transformations finish without errors, and I get this output:
X;Y
1;[B#49e96951
2;[B#7b016abf
3;[B#1a0760b0
Is this a bug? Am I doing something wrong? How can I make this work?

It's because of lazy conversion. Turn it off. This is behaving exactly as designed - although admittedly the error and user experience could be improved.
Lazy conversion must not be used when you need to access the field value in your transformation. That's exactly what it does. The default should probably be off rather than on.
If your field is going directly to a database, then use it and it will be faster.
You can even have "partially lazy" streams, where you use lazy conversion for speed, but then use select values step, to "un-lazify" the fields you want to access, whilst the remainder remain lazy.
Cunning huh?

Related

rpubchem error - collecting data from pubchem

This is my first try at using R to collect data from pubchem. However i am getting the following error every time for every cid that i have used.
library("rpubchem")
get.cid(46926545)
Error in do.call(cbind, unlist(Filter(function(x) !is.null(x), cvals), :
second argument must be a list
Can anyone point out what am i doing wrong. The link for the cid is "https://pubchem.ncbi.nlm.nih.gov/compound/46926545#section=Top"

Text was truncated or one or more characters had no match in the target code page ole db source to flat file destination

I'm exporting a table output to a CSV file. I'm doing it using SSIS package which has OLE DB Source and Flat File Destination. I'm getting the following errors:
[Flat File Destination [2]] Error: Data conversion failed. The data conversion for column "Address" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
[Flat File Destination [2]] Error: Cannot copy or convert flat file data for column "Address".
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Flat File Destination" (2) failed with error code 0xC02020A0 while processing input "Flat File Destination Input" (6). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
[OLE DB Source [9]] Error: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on OLE DB Source returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Can anyone please advise?
The output column for Address is specified as smaller than your original table column.
See this SO: SSIS data conversion failed
Summary:
(1) Right Click on Flat File Source and choose “Show Advanced Editor” Go to “Input and Output Properties “ Tab Expand “Flat File Source Output” and choose “External Columns”
(2) Select column "Address" and on right hand side, increase length to be same size as column in your original table
Double check anywhere in your Export wizard that allows you to set column sizes. Make sure those of your output file match those of your original table columns.
#user7396598
Thank you for pointing me in the right direction. So I ran a comparison the records seem to be inserting in the same order only until a point then after they are not matching. I could captured the bad data. by running the following:
select * from table where address != cast(address as varchar(1000)), when I removed the bad data my SSIS packaged worked.
Now I need to figure out how to convert the bad data into acceptable format for the CSV.
Reference - https://stackoverflow.com/a/2683496/8452633
SO i had a similar problem of bad data in one of my columns causing this error even after increasing the size of the output column. In my case I solved this problem by replacing the bad data in my columns by using replace function.
I exported the data by writing a query and in that query instead of "select *" I wrote all the column names and used the replace function on the columns that were causing the problems. I replaced all the characters that could potentially cause truncation e.g. comma, pipe, tabs etc with an empty space.

Importing data from multi-value D3 database into SQL issues

Trying to use the mv.NET by bluefinity tools. Made some integration packages with it for importing data from a d3 multi-value database into MS SQL 2012 but seem to be having some trouble with the mapping.
For the VOYAGES table have some commentX fields in the D3 application that are acting quite unwieldy and the INSERT fails after a certain number of rows with the following message
>Error: 0xC0047062 at INSERT, mvNET Source[354]: System.Exception: Error #8: dataReader[0] = LTPAC002 ci.BufferColumnIndex = 52, ci.ColumnName = COMMGROUP(Error #8: dataReader[0] = LTPAC002 ci.BufferColumnIndex = 52, ci.ColumnName = COMMGROUP(The value is too large to fit in the column data area of the buffer.))
at mvNETDataSource.mvNETSource.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers)
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPrimeOutput(IDTSManagedComponentWrapper100 wrapper, Int32 outputs, Int32[] outputIDs, IDTSBuffer100[] buffers, IntPtr ppBufferWirePacket)
Error: 0xC0047038 at INSERT, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.The PrimeOutput method on mvNET Source returned error code 0x80131500.The component returned a failure code when the pipeline engine called PrimeOutput().The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.There may be error messages posted before this with more information about the failure.
The value is too large to fit in the column data area of the buffer. -> tried changing the input / outputs types but can't seem to get it right.
In the SQL table the columns are of type ntext.
In the .dtsx job the data type for the columns are of type Unicode String [DT_WSTR] with length 4000 , I guess these are auto-detected.
The import worked for other D3 files like this not sure why it fails for these comment fields.
Running the query on the mv.NET Data Manager ( on the d3 server) times out after 240 seconds so maybe this is the underlying issue?
Any ideas how to proceed? Thank you ~
Most like reason is column COMMGROUP does not have correct data type or some record in source do not fit in output type
To find error record (causing) you have to use on redirect row (property of component failing component ) and get the result set in some txt.csv /or tsv file .
then check data
The exception is being thrown from mv.NET so I suggest you call (or ask your reseller) to call Bluefinity support and ask them about this. You're paying for support, might as well use it. Those programs shouldn't be allowed to throw exceptions like that.
D3 doesn't export Unicode, that might be one issue. But if the Data Manager times-out then I suspect something is wrong in the connectivity into D3. Open a Connection Monitor from the Session Monitor and watch the connection when you make the request. I'm guessing it's either hanging or more probably it's falling into BASIC Debug.
Make sure all D3-side programs related to this are either all Flash-compiled, or all Not Flashed. Your app code will fall into Debug if it's not Flashed but MVNET.BP is.
If it's your program that's in Debug, fix it. If you're not sure which program it is, LIST-RUNTIME-ERRORS in DM.
If it's a MVNET.BP program, again work with Bluefinity. If you are using MVSP for connectivity then the Connection Monitor may be useless, you'll need to change that to an IP (Telnet) connection to see the raw data exchange.

SSIS XML Source Error - Input string was not in a correct format

I have an attribute tlost with the definition below in the XSD file. I have tried both use="required" and use="optional".
<xs:attributeGroup name="defense">
<xs:attribute name="tlost" use="required" type="xs:decimal"/>
</xs:attributeGroup>
In the XML document I am trying to import I will get a value like the following:
<defense ast="0" category="special_team" tlost="0" int="0"/>
I am executing an SSIS package that takes the tlost value and inserts it into a sql database table. The column in the database table has a datatype of DECIMAL(28,10) and allows nulls.
When I execute the package, the previous values work perfectly and the data is inserted. However, when I get a value where tlost="" in the XML file, the package fails and the record is not inserted.
In the data flow path editor, the data type for tlost is DT_DECIMAL. When I check the Advanced Editor for the XML Source, the Input and Output properties have a data type for tlost as decimal [DT_DECIMAL].
I can't figure out why this is failing. I tried to create a derived column and cast it as a (DT_DECIMAL, 10) data type. That didn't work. I tried to check for a null value and replace with 0 if null, that didn't work. So I just ignored the column all together and in the Derived Column task, I replaced the tlost column value with (DT_DECIMAL, 10) 0 to just insert a 0 value and ignore whatever is in the xml file, and the job still failed with the following error message:
Error: 0xC020F444 at Load Play Summary Tables, XML Source [1031]: The error "Input string was not in a correct format." occurred while processing "XML Source.Outputs[defense].Columns[tlost]".
Error: 0xC02090FB at Load Play Summary Tables, XML Source [1031]: The "XML Source" failed because error code 0x80131537 occurred, and the error row disposition on "XML Source.Outputs[defense].Columns[tlost]" at "XML Source.Outputs[defense]" specifies failure on error. An error occurred on the specified object of the specified component.
Error: 0xC02092AF at Load Play Summary Tables, XML Source [1031]: The XML Source was unable to process the XML data. Pipeline component has returned HRESULT error code 0xC02090FB from a method call.
Error: 0xC0047038 at Load Play Summary Tables, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on XML Source returned error code 0xC02092AF. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Please help. I have exhausted everything I can think of to fix this issue. I am processing hundreds of files, and I can't keep fixing bad data files every time this issue occurs.
Can you please try these
1 - Change to data type to string in xsd and before loading into tables take care of data type conversion.
2 - If possible generate the xsd by passing your xml and then verify the data type and use it accordingly ...
rest of the xsd can be changed accordingly...
below is screen grab of what I tried. hope it helps]1

SSIS export to CSV file failing

I am trying to export the contents of a SQL Server 2005 table to a csv file using SSIS. In the Data Flow Task I have a OLE DB Source for the table and a Flat File Destination for the file.
When copying the data I started getting a failure on one of the column on a certain row and following some investigation found the problem was with comma's in the data below
Data Issue (nvarchar255)
errors code l075 showing,,,re test.
OLE DB Source for Comment col
Derived Column
Given that this was the issue I created a Derived Column object between the source and destination and destination objects and tried filtering out the comma's using a replace REPLACE(Comment,","," ") but the same column is still failing with the below errors.
Destination Component
Exception
[Inspection Failures Destination [206]] Error: Data conversion failed.
The data conversion for column "Comment" returned status value 4 and
status text "Text was truncated or one or more characters had no
match in the target code page.".
[Inspection Failures Destination [206]] Error: Cannot copy
or convert flat file data for column "Comment".
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED.
The ProcessInput method on component "Inspection Failures
Destination" (206) failed with error code 0xC02020A0 while
processing input "Flat File Destination Input" (207). The
identified component returned an error from the ProcessInput
method. The error is specific to the component, but the error
is fatal and will cause the Data Flow task to stop running.
There may be error messages posted before this with more
information about the failure.
[Inspecton Failures Source [128]] Error: The attempt to
add a row to the Data Flow task buffer failed with error
code 0xC0047020.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.
The PrimeOutput method on component "Inspecton Failures Source"
(128) returned error code 0xC02020C4. The component returned
a failure code when the pipeline engine called PrimeOutput().
The meaning of the failure code is defined by the component,
but the error is fatal and the pipeline stopped executing.
There may be error messages posted before this with more
information about the failure.
Ok, the problem actually appears to be a hidden illegal character in the text
In the image below the top line shows a square before the re test string. The comment column in the database is an nvarchar which apparently uses a different character set so I can not just use the CHAR(13) + CHAR(10) to replace the carriage return.
The fix involved converting the field from an nvarchar to a varchar then performing a replace on the converter ? character resulting in the corrected second ling in the image
SELECT ID,
REPLACE(REPLACE(CAST(Comment AS varchar(255)),'?',' '),',',' ') Comment
FROM tblInspectionFailures WHERE (ID = 216899)
The conversion requirement is detailed here
This does not should like an ideal solution to me but it does work. Does anyone have any other options.
Without replacing comment column can you create another column and map the new derived column to destination column and see.