Create Database Source - Define Joins - Join by multiple colums - pentaho

I'm trying to create a DataSource in Pentaho, but I cant define a join by two or more columns.
For example: my invoice table has the PK defined as [ClientId,InvoiceId] so diferent Clients can have the same InvoiceId. So the join with the table InvoiceProduct should be based by those two columns.
Yet Pentaho only allows me to select only one column from each table to define the join.
This is the official documentation from Pentaho: Create Database Sources. On #9 it talks about Join Definition, but never mentions PKs that have more than one column (which IMHO is quite common), so probably I'm doing something wrong.
Can anyone please point me out on how to define a join that involves more than one column?
Hope I made myself clear.
Best regards,
Federico.
Pentaho 8
Mysql 5.6
Windows 10

I have not tried with two keys, but it is not working you can generate a checksum value using checksum step and then use it while joining.

Related

Is a generic ID column in a SQL table a bad idea?

In our database we have many tables with a 'Notes' column. This is important functionality, but for most rows the value of Notes is null. These tables have many columns and we would like to remove some columns for better legibility.
We could add one Notes table for every table that has a notes column. But this would create clutter of a different kind- too many small tables.
My idea is to create a generic Notes table and also a reference table. The Notes table would have a column for the notes text, a column for the id of the row being linked to, and a foreign key to the reference table. The reference table would have a text value for each table for which we need notes. Using these two tables we should be able to link the note back to whichever table and column it came from.
By using this solution, we remove any cases of null values from notes and also slim down some of our tables. All at the modest price of two additional tables. It feels very 'hacky' to me however. Is there a reason why using a 'generic' id column or a reference table of other tables is a bad idea from a DB management perspective?
Managing the references to disparate entities can be really challenging in SQL Server. Postgres, by contrast, supports inheritance which makes this much simpler.
So, my recommendation is to add a notes column to every entity where you want notes. You an add a view to bring all the notes together if you need a view of all the notes.
This has minimal impact on performance or data size. There is no additional overhead for a varchar column, other than the additional NULL bit -- and that is pretty minimal.
IMO, the other solution of managing two tables doesn't bring in much efficiency but adds complexity to the solution. You should probably stick with the the notes column in the original table with datatype as varchar.
Generic id column is not bad inherently but the use of it generally gives smell of bad/hacky design.
Additionaly for SQL Server you can use sparse for the note columns to reduce size.
But i used a similary approach myself. (Note column needed for many columns to write info / changerequest / lockcomment. But normally never used).
Works fine and can be programmed genericaly in source.
But if you need only one comment column per table i wood prefer sparse

FIND all tables I need to join to get relationship between two tables

I'm using SQL Server 2012. I want to join two tables without columns that I can join them, how can I find all the tables to reach to this two tables?
For example: I need to join the Table A to table D and to do that I need to connect A to B and then to C and at the end to D.
My question is: can I find the tables B and C among thousands of tables in the database without searching table by table?
Thanks a lot,
Ohad
Assuming that:
You want to automate this process
You have FOREIGN KEY constraints that you can rely on
You should proceed as follows:
Query sys.foreign_keys and create a directed graph structure that will contain the links between tables.
Then implement a graph search algorithm that will start from table A and try to find a path to table D and from D to A.
Once you have found the path, it will be easy to construct dynamic SQL containing the join of all tables on the path. You will need to query sys.foreign_key_columns as well to be able to construct the ON clauses of the JOIN's.
Let me know if you need help with more detail.
There's a couple of things you can do to help your cause, but for the most part, there's no direct way and you would need to know how your database is structured and the purposes of the tables. Furthermore, based on the database's design, it might be very difficult for you to intuitively find your answer and you might need just need to get guidance from someone who is knowledgeable with the database design. Regardless:
Fields in your tables A & D:
you can look at primary fields or unique fields in the tables to determine what other tables may link to those table. Usually they are named in a way that match those other tables and you can tell what table they're coming from.
Information_Schema Views
You can use information_schema.tables and information_schema.column views to easily search for names of tables and columns across the entire database and narrow your search to less tables.

left join using comma separated column using sql

I am working on an asp.net application with SQL server database. This db has two tables Vacancies and dutystations. Vacancies table has a column named dutystationId which stores ids of dutystations in comma separated list like this:
2,12,15,18,19,23
Now I want to show this vacancy in grid and I have used left join like this:
QUERY
SELECT * FROM dbo.hr_Vacancies
CROSS APPLY dbo.hr_Split(dbo.hr_Vacancies.DutyStationID, ',') AS s
LEFT OUTER JOIN dbo.hr_DutyStations
ON s.Data = dbo.hr_DutyStations.DutyStationID
and in xsd, I have set vacancyid as primary key. but I get error:
ERROR
Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints.
If I remove this constraint, I get 6 rows. I want to show one row only. How can I do this?
I stopped reading here:
Vacancies table has a column named dutystationId which stores ids of dutystations in comma seperated list
That is your problem right there. If you have comma separated values in an RDBMS, specifically if they contain foreign keys to other tables, you should halt full stop whatever you're doing and start redesigning your database. Many-to-many relations in an RDBMS are implemented with junction tables, and if you use them all your problems will suddenly solve themselves.
Your current design is not only hell to write SQL queries for, like this question illustraties perfectly as you cannot solve a trivial task, but it also kills performance - those calls to hr_Split are infinitely more computationally expensive than just doing proper joins.
Don't fall into the XY trap, solve the real problem first. Which is that you're even violating First Normal Form right now.

How can I create a relationship in excel for multiple columns?

I'm trying to create a relationship between two tables in powerpivot. However, my tables don't have any keys. What I would like to do is create a SQL-Unique-Constraint-like relationship, which is based upon multiple values combined, being the key.
For example:
Table1 columns are First, Last, Address, Phone
Table2 columns are the same.
I want to create a relationship in excel that is the equivalent of
select * from Table1 full join Table2 on 1.Fist=2.First and 1.Last=2.Last and 1.Address=2.Address
However, the create relationship dialogue doesn't allow multiple columns to selected. I tried going the route of just creating multiple 1-column relationships. However, relationships also cannot include columns were there are duplicate values in the column.
I have a feeling I may just be approaching accomplishing this from the wrong direction. Any help is appreciated! Thank you.
Zee,
You are right that PowerPivot does not natively support multi-column relationships. There are however 2 work arounds:
Add a key to each table of the respective columns concatenated together and providing this is unique in at least one the relationship can be created. If you have a situation where neither table has unique keys then an intermediate table of unique keys could be created using SQL.
Technically multiple relationships can be created between tables but only one can be active. There is a DAX function called USERELATIONSHIP() which can use inactive relationships. This is an advanced technique.
Your solution may well be to combine the two tables in your source SQL query.
Jacob
If all you want to do is inner join using 2 or more columns, please consider creating a calculated column that concatenates the 2 or 3 columns in each of the 2 tables and then create a relationship between them.
I have had similar cases and used this technique.

Mapping the fields of two database tables

The scenario is that, I have 2 database tables A and B. The table B is an upgraded version of the table A. (ie It might possibly have different field names and some extra fields). I need to compare these 2 tables to inform the user about these extra fields and propose to him a mapping of the fields between the tables.
Currently I am thinking of comparing them using info like field name, data element and domain in that order.
Is there a standard way to do this? Thanks in advance.
There is no standard tool to do this - why would you want to do it this way anyway? The canonical way is to extend the original table and fill the new fields in place.