How to look up technical key of dimension using its natural key? - pentaho

According to the Wiki:
"The Dimension Lookup/Update step allows you to implement Ralph Kimball's slowly changing dimension for both types: Type I (update) and Type II (insert) ..."
"To do the lookup it uses not only the specified natural keys (with an "equals" condition) but also the specified "Stream datefield" (see below)."
"As a result of the lookup or update operation of this step type, a field is added to the stream containing the technical key of the dimension."
So if I understand that correctly, it should be possible to have the "Dimension Lookup/Update" step lookup a dimensions technical/surrogate key using a natural key. In case no entry yet exists the step could also be configured to add the requested natural key to the dimension table using a unique technical key. But for now I would like to only use the lookup functionality - no update and no insert.
Here's my setup:
This is my dimension table (SCD Type 1) named "dims":
The transformation looks as follows:
But if I run this in Preview mode I get:
What I would like to see is actually the values of id (1,2,3) next to the natural keys (a,b,c)
What am I doing wrong here?
Effectively I could achieve this using a join step - but I would like to use the advanced dimension handling functionality after I got this working.
Kind regards
Raffael
http://www.joyofdata.de/blog/a-stackoverflow-but-for-business-intelligence/

This step expects a table with 3 more attributes:
start_date (date)
end_date (date)
version (int)
Check that your date settings in the „Lookup / Update“ step matches your data. Check the version field too.
Below an example:
Table:
Setting for the „Dimension Lookup / Update“ step:
Preview table (the id's that match the date are returned)

Related

Calculated Attribute - Min and Max Valid Date

We have some data inside a table (Dimension) with historical values.
Like this (Small example)
ProductId is our Primary Key (and then is unique)
Code is our Business Key
Color and Type are our historical values
In Analysis Services (Tabular mode), our users want to build a report on that values.
Client usage Could be:
(1) If they only want to see the code ('CAR' in our example) the result would be:
(2) If they want to see the code and the Color:
Same for all the attributes that we can have and all the combinations.
Do you know how to solve this?
Can we add some logic in a calculated attribute
Thank you,
Arnaud
In essence, you want to aggregate by date? So, for any set of attributes you put in your pivot table, you want to show the earliest ValidFrom date and the latest ValidTo date that applies?
To accomplish this in SSAS Tabular, import the table and hide the columns ValidFrom & ValidTo. (To hide a column, right click it in Visual Studio and select Hide from Client Tools.)
Then, create 2 measures. For example:
Valid From := MIN([ValidFrom])
Valid To := MAX([ValidTo])
Note the extra space in the names to distinguish them from the column names. You could also call them something completely different. (E.g. Earliest Valid From Date)
When people connect to your cube, people will use these 2 measures rather than the columns from the original table. (They won't even see the columns because you've hidden them.)
If their pivot table includes all the attributes above (Product ID, Code, Color, Type), then the table will look exactly like your original table. If they only show Code, then your table will look like your (1). If they only show Code & Color, then your table will look like (2).

talend - output of tMap to another query

I have a one view query (which is quite heavy) so I want to avoid re-querying again.
The output of this query is transformed and put into the file. There is a unique reference number on this file (field reference in the query).
The "references" I need as an input as a where clause in my second query.
I'm thinking of this flow:
1st subjob:
tOutputFile
/
tOracleInput -> tMap -> tReplicate
\
tMap (will only map the reference field)
\
tSetGlobalVar
(set to a list, and add to globalMap)
And upon complete of that subjob, the next subjob will run;
tOracleInput (build the where clause from the list from globalMap) -> tMap -> tOutputFile
Does this design looks okay? Or am I better off using a subquery on the references number in my 2nd tOracleInput?
SELECT ... FROM table1 WHERE references IN (SELECT references from BIGVIEW WHERE ...)
Depending on how many different values are retrieved for the reference field, the query should exceed the maximum length authorized by Oracle.
You should consider to join these values with the 2nd tOracleInput using facilities offered by "Reload at each row" lookup model.
Lear how it works here.
Hope this helps.

SSAS 2016: "The attribute key cannot be found" error, when processing a dimension after adding an attribute

Suppose I have a dimension DIM_Users with two attributes UserId [bigint] and Reputation [int]. In this case I can successfully process the table.
But, after I add DisplayName [nvarchar(255)] attribute to the dimension, processing fails with the next message:
Errors in the OLAP storage engine: The attribute key cannot be found
when processing: Table: 'cube_DIM_Users', Column: 'DisplayName',
Value: 'Justin ᚅᚔᚈᚄᚒᚔ'. The attribute is 'Display Name'.
Comparing the screenshots I've noticed that the first time 5987286 UserIds were processed (which is the correct value), but the second time only 70000.
And also I see that the value "Justin ᚅᚔᚈᚄᚒᚔ" looks strange, but I can't figure out how it can affect processing of the Attribute Key.
Any ideas about what's wrong with my dimension?
I've found this article but it doesn't help.
It seems this problem is caused by a collation mismatch between your data source and ssas. You will get a better understanding for possible collation issues if you fire a sql select like SELECT DISTINCT UserId FROM yourTable WHERE UserId LIKE 'Justin%'. There should be more than one entry, which potentially causes collation issues.
Please try the following workaround, if your attribute "User Id" is unique. Add an artificial unique key for each UserId row to your dimension table, e.g. an incrementing integer. Assign this created key to the key column of your attribute and assign your "UserId" to the name column.
Hint: If you expand the key column properties of an attribute in a ssas dimension, you can also change the collation ssas is using for processing. I've tried this in the past but sometimes it didn't resolve collation based issues for me.

Implementing Pure SCD Type 6 in Pentaho

I have an interesting task to create a Kettle transformation for loading a table which is a Pure Type 6 dimension. This is driving me crazy
Assume the below table structure
|CustomerId|Name|Value|startdate|enddate|
|1|A|value1|01-01-2001|31-12-2199|
|2|B|value2|01-01-2001|31-12-2199|
Then comes my input file
Name,Value,startdate
A,value4,01-01-2010
C,value3,01-01-2010
After the kettle transformation the data must look like
|CustomerId|Name|Value|startdate|enddate|
|1|A|value1|01-01-2001|31-12-2009|
|1|A|value4|01-01-2010|31-12-2199|
|2|B|value2|01-01-2001|31-12-2199|
|3|C|value3|01-01-2010|31-12-2199|
Check for existing data and find if the incoming record is insert/update
Then generate Surrogate keys only for the insert records & perform inserts.
Retain the surrogate keys for the update records and insert it as new records and assign an open end date for the new record ( A very high value ) and close the previous corresponding record as new record's start date - 1
Can some one please suggest the best way of doing this? I could see only Type 1 and 2 using the Dimension Lookup-Update option
I did this using a mixed approach of ETLT.

How to implement a key lookup for generated keys table in pentaho Kettle

I just started to use Pentaho Kettle for integration. Seems great so far, quite intuitive compared to Talend, which I was also investigating.
I am trying to migrate some customers without their keys. So I have their email addresses.
The customer may already exist in the database, so what I need to do is:
If the customer exists, add it's id to the imported field and continue.
But if the customer doesn't exist I need to get the next Hibernate key from the table Hibernate_Sequences and set it as the id.
But I don't want to always allocate a key, so I want to conditionally execute a step to allocate the next key.
So what I want to do, is in the flow execute the db procedure, which allocates the next key and returns it, only if there's no value in id from the "lookup id" step.
Is this possible?
Just posting my updated flow - so the answer was to use a filter rows component which splits the data on true/false. I really had trouble getting the id out of the database stored proc because of a bug, so I had to use decimal and then convert back to integer (which I also couldn't figure out how to do, so used a javascript component).
Yes it is. As per official documentation (i left only valuable information) "Lookup values are added as new fields onto the stream". So u need just to put step "Filter row" in Flow section and check for "id" which suppose to be added in "Existing Id Lookup" step.