Pentaho merge two tables - pentaho

I'm new to Pentaho.
This is the transformation I'm trying to do:
Import Transactions and Merchant and merge these two tables using MERCH_KEY
However, this isn't working, see below:
This is table Transactions:
And this is table Merchant:
And this is the Merge join:
This all seems good, but it's not working and I have no idea why.
From the preview, I can see the two tables are being imported, so how can I merge them?

Remove all fields except "MERCH_KEY" from the right-side list.
The Merge join step is a little misleading with its behavior, as it only allows you to click Get Fields and get ALL fields, but doesn't let you pick a single field in a dropdown.
As the name suggest, these are the key fields that will be compared in the join, so fields need to be in the correct order. In your case, you only need the one. The other input fields (from both tables) will be automatically added to the output stream, with fields from the second table getting renamed if duplicated.

Related

(SQL) Crystal Report, Linking two tables into one without breaking report, picture attached

Tables in Question
I don't think I am going to be able to do what I am trying to, but thought it would be worth asking the question. In my point of sale system there is no report available for a specific type of order so I am writing a crystal report to be able to view the order type in question. Unfortunately the POS system only saves the document sequence field in one table. The document sequence is the number that is shown on a document when viewing it, all of the DocIDInternal numbers are gibberish that are not visible to employees when pulling up a document to view it.
I would like to be able to view the document sequence for both the order and the associated purchase order on my report, but I can not find a way to link both into the Documents table to access the document sequence without breaking the report. As far as I can tell I am able to link either the order or the purchase order to the document table, in order to view the document sequence field for one or the other. Neither the order table or the purchase order table contain the document sequence and it is only available through the documents table. I have searched and been unable to locate any other tables available to me that contain the document sequence field other than the documents table.
If you open Database Expert and go to the Links tab where you took the screenshot above. Double-click on the line linking your OrdersDtl and Document tables to edit the link. On the left side is a set of radio buttons that define the type of join to use with the tables. Try changing this value to "Left Outer Join" if the current selection is "Inner Join". This should solve the problem with it breaking the report, but I suspect there is more to troubleshoot before you will have the data your want from the Document table.
When Inner Join is used as the type of join, if there are no records found on the Documents table that correspond to the value of DocIdInternal on the OrdersDtl table, then the query returns no rows. When Left Outer Join is used, the query will always return all rows from the table on the left side of the join, even when no corresponding records are found on the right side of the join.
Once you have the report returning data again, now we need to determine why there are no records being linked from the Documents table. To do this, I would place the DocIdInternal database field on your report so you can verify it has a value on the OrdersDtl table. You may need to discuss this with your Database Administrator to determine if this is the proper way to link these tables.

SAP Business objects how to create different kinds of join between different data providers

I have two data providers. One is a universe, one is an excel file. Excel file has column ID. I want to find ID,JOB_ID, Cost
I have created a merged dimension:- ID. When I create report with ID and Cost, I'm getting an outer joined result which is what I want. But when I add another attribute from universe it is being inner joined result. Where can I control this feature
You are ever so close. Here are the basics when working with a zero or one to many relationships. Credit for this goes to this blog post. I am copying it here if perchance that link goes dead.
As a rule of thumb , when trying to merge DP’s with a 1xN relationship
:
Merge the common fields
Use the dimension coming from the N side query
Create detail variables from the 1 side query for each dimension needed with associated dimension equal the merged dimension
Check "Show rows with empty dimension values" on Table formatting for each table using dimensions coming from both queries.
Here is a screen shot to highlight where to find the setting in step #4.

How to implement an add if not available in the database in Pentaho?

How do I implement, or what steps do I use to create a transformation that compares a table and a list . For example a database table name Schools and an excel file with a huge list of names of Schools.
if the entry in the excel is not seen in the database, it should then be added to the database table.
I'm not quite sure if I can use the database lookup step, it does not tell if a lookup fails. insert update step doesn't seem a solution as well, for it requires some ID value but no ID is present on the list of schools in the excel file
Based on the information that you provided a simple join with table insert step will do your task. You can use the Merge rows step for comparing both the data stream (excel and database). The merge rows step uses the key to compare two streams and add a flag field which marks the row as new, identical, changed, deleted. In your case you would like to insert all the rows that are marked as new by using table insert step.
Please check the below links for more reference.
Merge rows, Synchronize after merge
This what worked for me,
excel file -->
select values (to delete unnecessary fields) -->
database lookup (this will create a new field, and will set null if not found) -->
filter rows (get the fields with null output from lookup) -->
table output (insert the filtered records)

SQL Reporting Service - Data Model

I create a data model that would replace two cut down tables. With these two table I placed a Many to One relationships.
When I create reports that just uses a single table (This is the Destination table, within Data source views relationship) its missing records that should be displayed. The records that do not display are records that do no have a linked record in the other table.
Help
You probably need to create a query or a view that includes an outer join. Search your docs for OUTER JOIN. That should take you where you need to be. Topics should include LEFT, RIGHT, and FULL outer joins.

How can i design a DB where the user can define the fields and types of a detail table in a M-D relationship?

My application has one table called 'events' and each event has approx 30 standard fields, but also user defined fields that could be any name or type, in an 'eventdata' table. Users can define these event data tables, by specifying x number of fields (either text/double/datetime/boolean) and the names of these fields. This 'eventdata' (table) can be different for each 'event'.
My current approach is to create a lookup table for the definitions. So if i need to query all 'event' and 'eventdata' per record, i do so in a M-D relaitionship using two queries (i.e. select * from events, then for each record in 'events', select * from 'some table').
Is there a better approach to doing this? I have implemented this so far, but most of my queries require two distinct calls to the DB - i cannot simply join my master 'events' table with different 'eventdata' tables for each record in in 'events'.
I guess my main question is: can i join my master table with different detail tables for each record?
E.g.
SELECT E.*, E.Tablename
FROM events E
LEFT JOIN 'E.tablename' T ON E._ID = T.ID
If not, is there a better way to design my database considering i have no idea on how many user defined fields there may be and what type they will be.
There are four ways of handling this.
Add several additional fields named "Custom1", "Custom2", "Custom3", etc. These should have a datatype of varchar(?) or similiar
Add a field to hold the unstructured data (like an XML column).
Create a table of name /value pairs which are associated with some type of template. Let them manage the template. You'll have to use pivot tables or similiar to get the data out.
Use a database like MongoDB or another NoSql style product to store this.
The above said, The first one has the advantage of being fast but limits the number of custom fields to the number you defined. Older main frame type applications work this way. SalesForce CRM used to.
The second option means that each record can have it's own custom fields. However, depending on your database there are definite challenges here. Tried this, don't recommend it.
The third one is generally harder to code for but allows for extreme flexibility. SalesForce and other applications have gone this route; including a couple I'm responsible for. The downside is that Microsoft apparently acquired a patent on doing things this way and is in the process of suing a few companies over it. Personally, I think that's bullcrap; but whatever. Point is, use at your own risk.
The fourth option is interesting. We've played with it a bit and the performance is great while coding is pretty darn simple. This might be your best bet for the unstructured data.
Those type of joins won't work because you will need to pivot the eventdata table to make it columns instead of rows. Therefore it depends on which database technology you are using.
Here is an example with MySQL: How to pivot a MySQL entity-attribute-value schema
My approach would be to avoid using a different table for each event, if that's possible.
I would use something like:
Event (EventId, ..., ...)
EventColumnType (EventColumnTypeId, EventTypeId, ColumnName)
EventColumnData (EventColumnTypeId, Data)
You are them limited to the type of data you can store (everything would have to be strings, for example), but you the number of events and columns are unrestricted.
What I'm getting from your description is you have an event table, and then a separate EventData table for each and every event.
Rather than that, why not have a single EventCustomFields table that contains a foreign key to the event table, a field Name (event+field being the PK) and a field value.
Sure it's not the best. You'd be stuck serializing the value or storing everything as a string. And you'd still be stuck doing two queries, one for the event table and one to get it's custom fields, but at least you wouldn't have a new table for every event in the system (yuck x10)
Another, (arguably worse) option is to serialize the custom fields into a single column of the and then deserialize when you need. So your query would be something like
Select E.*, C.*
From events E, customFields C
Where E.ID = C.ID
Is it possible to just impose a limit on your users? I know the tables underneath Sharepoint 2007 had a bunch of columns for custom data that were just named like CustomString1, CustomDate2, etc. That may end up easier than some of the approaches above, where everything is in one column (though that's an approach I've taken as well), and I would think it would scale up better.
The answer to your main question is: no. You can't have different rows in the result set with different columns. The result set is kind of like a table, so each row has to have the same columns. You can fake it with padding and dummy columns, but that's probably not much better.
You could try defining a fixed event data table, with (say) ten of each type of column. Then you'd store the usage metadata in a separate table and just read that in at system startup. The metadata would tell you that event type "foo" has a field "name" mapped to column string0 in the event data table, a field named "reporter" mapped to column string1, and a field named "reportDate" mapped to column date0. It's ugly and wastes space, but it's reasonably flexible. If you're in charge of the database, you can even define a view on the table so to the client it looks like a "normal" table. If the clients create their own tables and just stick the table name in the event record, then obviously this won't fly.
If you're really hardcore you can write a database procedure to query the table structures and serialize everything to a lilst of key/type/value tuples and return that in one long string as the last column, but that's probably not much handier than what you're doing now.