Assigning an integer value to an output row, pentaho - pentaho

I'm using kettle in a very basic way. What I want to do is read from csv file, do some kind of transformation in User Defined Java Class step and write output to a text file.
a picture http://imageshack.com/a/img34/1669/vo18.png
When I run this I essentially get this error:
value Integer<binary-string> : There was a data type error: the data type of java.lang.Long object [100] does not correspond to value meta [Integer<binary-string>]
This is the line in UDJC step that seems to make the problem (field "value" is of Integer type):
get(Fields.Out, "value").setValue(out_row,new Long(100));
How can I solve this?

So the answer is: Turn lazy conversion off.

Related

Vega Visualization where Data Element Name Contains At Symbol

We have data created by an external source (i.e. I cannot just change the names used so it works) -- the datetime field is named #timestamp and I cannot figure out how to address that element within a transformation expression.
Sample data is available on Vega.GitHub.IO and in a Gist with the data -- I added the "timestamp" element to verify the issue I am experiencing is related to the at sign in the name. Using the 'timestamp' data field, I am able to transform and graph the data as desired:
But I have been unable to use the #timestamp field. I get a parse failure if I use "expr": "datetime(datum.#timestamp)" and an invalid date if I use "expr": "datetime('datum.#timestamp')". Attempting to escape the at sign (or the quotes) gives me parse errors as well. How exactly can I use the data element named #timestamp in the expression?
I finally figured this out -- I don't think it has anything to do with Vega but, rather, is a JavaScript limitation. I can use the array subscript method of accessing the data and the dates parse.

ERROR: function regexp_matches(jsonb, unknown) does not exist in Tableau but works elsewhere

I have a column called "Bakery Activity" whose values are all JSONs that look like this:
{"flavors": [
{"d4js95-1cc5-4asn-asb48-1a781aa83": "chocolate"},
{"dc45n-jnsa9i-83ysg-81d4d7fae": "peanutButter"}],
"degreesToCook": 375,
"ingredients": {
"d4js95-1cc5-4asn-asb48-1a781aa83": [
"1nemw49-b9s88e-4750-bty0-bei8smr1eb",
"98h9nd8-3mo3-baef-2fe682n48d29"]
},
"numOfPiesBaked": 1,
"numberOfSlicesCreated": 6
}
I'm trying to extract the number of pies baked with a regex function in Tableau. Specifically, this one:
REGEXP_EXTRACT([Bakery Activity], '"numOfPiesBaked":"?([^\n,}]*)')
However, when I try to throw this calculated field into my text table, I get an error saying:
ERROR: function regexp_matches(jsonb, unknown) does not exist;
Error while executing the query
Worth noting is that my data source is PostgreSQL, which Tableau regex functions support; not all of my entries have numOfPiesBaked in them; when I run this in a simulator I get the correct extraction (actually, I get "numOfPiesBaked": 1" but removing the field name is a problem for another time).
What might be causing this error?
In short: Wrong data type, wrong function, wrong approach.
REGEXP_EXTRACT is obviously an abstraction layer of your client (Tableau), which is translated to regexp_matches() for Postgres. But that function expects text input. Since there is no assignment cast for jsonb -> text (for good reasons) you have to add an explicit cast to make it work, like:
SELECT regexp_matches("Bakery Activity"::text, '"numOfPiesBaked":"?([^\n,}]*)')
(The second argument can be an untyped string literal, Postgres function type resolution can defer the suitable data type text.)
Modern versions of Postgres also have regexp_match() returning a single row (unlike regexp_matches), which would seem like the better translation.
But regular expressions are the wrong approach to begin with.
Use the simple json/jsonb operator ->>:
SELECT "Bakery Activity"->>'numOfPiesBaked';
Returns '1' in your example.
If you know the value to be a valid integer, you can cast it right away:
SELECT ("Bakery Activity"->>'numOfPiesBaked')::int;
I found an easier way to handle JSONB data in Tableau.
Firstly, make a calculated field from the JSONB field and convert the field to a string by using str([FIELD_name]) command.
Then, on the calculated field, make another calculated field and use function:
REGEXP_EXTRACT([String_Field_Name], '"Key_to_be_extracted":"?([^\n,}]*)')
The required key-value pair will form the second caluculated field.

Lossless assignment between Field-Symbols

I'm currently trying to perform a dynamic lossless assignment in an ABAP 7.0v SP26 environment.
Background:
I want to read in a csv file and move it into an internal structure without any data losses. Therefore, I declared the field-symbols:
<lfs_field> TYPE any which represents a structure component
<lfs_element> TYPE string which holds a csv value
Approach:
My current "solution" is this (lo_field is an element description of <lfs_field>):
IF STRLEN( <lfs_element> ) > lo_field->output_length.
RAISE EXCEPTION TYPE cx_sy_conversion_data_loss.
ENDIF.
I don't know how precisely it works, but seems to catch the most obvious cases.
Attempts:
MOVE EXACT <lfs_field> TO <lfs_element>.
...gives me...
Unable to interpret "EXACT". Possible causes: Incorrect spelling or comma error
...while...
COMPUTE EXACT <lfs_field> = <lfs_element>.
...results in...
Incorrect statement: "=" missing .
As the ABAP version is too old I also cannot use EXACT #( ... )
Example:
In this case I'm using normal variables. Lets just pretend they are field-symbols:
DATA: lw_element TYPE string VALUE '10121212212.1256',
lw_field TYPE p DECIMALS 2.
lw_field = lw_element.
* lw_field now contains 10121212212.13 without any notice about the precision loss
So, how would I do a perfect valid lossless assignment with field-symbols?
Don't see an easy way around that. Guess that's why they introduced MOVE EXACT in the first place.
Note that output_length is not a clean solution. For example, string always has output_length 0, but will of course be able to hold a CHAR3 with output_length 3.
Three ideas how you could go about your question:
Parse and compare types. Parse the source field to detect format and length, e.g. "character-like", "60 places". Then get an element descriptor for the target field and check whether the source fits into the target. Don't think it makes sense to start collecting the possibly large CASEs for this here. If you have access to a newer ABAP, you could try generating a large test data set there and use it to reverse-engineer the compatibility rules from MOVE EXACT.
Back-and-forth conversion. Move the value from source to target and back and see whether it changes. If it changes, the fields aren't compatible. This is unprecise, as some formats will change although the values remain the same; for example, -42 could change to 42-, although this is the same in ABAP.
To-longer conversion. Move the field from source to target. Then construct a slightly longer version of target, and move source also there. If the two targets are identical, the fields are compatible. This fails at the boundaries, i.e. if it's not possible to construct a slightly-longer version, e.g. because the maximum number of decimal places of a P field is reached.
DATA target TYPE char3.
DATA source TYPE string VALUE `123.5`.
DATA(lo_target) = CAST cl_abap_elemdescr( cl_abap_elemdescr=>describe_by_data( target ) ).
DATA(lo_longer) = cl_abap_elemdescr=>get_by_kind(
p_type_kind = lo_target->type_kind
p_length = lo_target->length + 1
p_decimals = lo_target->decimals + 1 ).
DATA lv_longer TYPE REF TO data.
CREATE DATA lv_longer TYPE HANDLE lo_longer.
ASSIGN lv_longer->* TO FIELD-SYMBOL(<longer>).
<longer> = source.
target = source.
IF <longer> = target.
WRITE `Fits`.
ELSE.
WRITE `Doesn't fit, ` && target && ` is different from ` && <longer>.
ENDIF.

Any idea about this weird type error in Pentaho Data Integration?

I have this :
Insertion des données dans table some_table.0 - SOME_AUTO_GENERATED_DB_KEY Integer : There was a data type error: the data type of java.lang.Boolean object [true] does not correspond to value meta [Integer]
What boolean??? Where do you see a boolean? I have added a trace writing to the step just before this failing inserting step, and I see a perfectly fine integer as value of SOME_AUTO_GENERATED_DB_KEY .
How can this be possible? I am very new to Kettle, if you have any idea or tips it would be awesome.
Here a screenshot of the transformation :
Just before the failed insert, you have a filter that splits the stream. On one half of the stream, it looks like you have an Add Constant step. If I'm reading this right, then the two inputs to the Insert step don't have the same fields in the same order. A few steps earlier, there is a similar splitting of the paths that goes off to the right, which could have the same effect.
Whenever you remerge streams like this without being very careful, strange errors like this can pop up. Pentaho usually tries to warn you when you create the hop to remerge the streams, but there are ways to miss that warning.
Suggestion: For each time the stream remerges, right-click on each of the two previous steps, and have it show you the output fields. Compare the two lists side-by-side to verify they are the same. If not, then you will have to add or remove fields as appropriate to make them the same on both sides.

Get Text Symbol Programmatically With ID

Is there any way of programmatically getting the value of a Text Symbol at runtime?
The scenario is that I have a simple report that calls a function module. I receive an exported parameter in variable LV_MSG of type CHAR1. This indicates a certain status message created in the program, for instance F (Fail), X (Match) or E (Error). I currently use a CASE statement to switch on LV_MSG and fill another variable with a short description of the message. These descriptions are maintained as text symbols that I retrieve at compile time with text-MS# where # is the same as the possible returns of LV_MSG, for instance text-MSX has the value "Exact Match Found".
Now it seems to me that the entire CASE statement is unnecessary as I could just assign to my description variable the value of the text symbol with ID 'MS' + LV_MSG (pseudocode, would use CONCATENATE). Now my issue is how I can find a text symbol based on the String representation of its ID at runtime. Is this even possible?
If it is, my code would look cleaner and I wouldn't have to update my actual code when new messages are added in the function module, as I would simply have to add a new text symbol. But would this approach be any faster or would it in fact degrade the report's performance?
Personally, I would probably define a domain and use the fixed values of the domain to represent the values. This way, you would even get around the string concatenation. You can use the function module DD_DOMVALUE_TEXT_GET to easily access the language-dependent text of a domain value.
To access the text elements of a program, use a function module like READ_TEXT_ELEMENTS.
Be aware that generic programming like this will definitely slow down your program. Whether it would make your code look cleaner is in the eye of the beholder - if the values change rarely, I don't see why a simple CASE statement should be inferior to some generic text access.
Hope I understand you correctly but here goes. This is possible with a little trickery, all the text symbols in a report are defined as variables in the program (with the name text-abc where abc is the text ID). So you can use the following:
data: lt_all_text type standard table of textpool with default key,
lsr_text type ref to textpool.
"Load texts - you will only want to do this once
read textpool sy-repid into lt_all_text language sy-langu.
sort lt_all_Text by entry.
"Find a text, the field KEY is the text ID without TEXT-
read table lt_all_text with key entry = i_wanted_text
reference into lsr_text binary search.
If you want the address you can add:
field-symbols: <l_text> type any.
data l_name type string.
data lr_address type ref to data.
concatenate 'TEXT-' lsr_text->key into l_name.
assign (l_name) to <l_text>.
if sy-subrc = 0.
get reference of <l_text> into lr_address.
endif.
As vwegert pointed out this is probably not the best solution, for error handling rather use message classes or exception objects. This is useful in other cases though so now you know how.