I have three tables in 3rd Normalized form and these tables are populated by java application.
MA_COMPANY_PROFILE (table 1)
MA_ACCOUNT (table 2)
SEC_USER (table 3)
Hierarchy is from MA_COMPANY_PROFILE, MA_ACCOUNT, SEC_USER.
relationship between MA_COMPANY_PROFILE and MA_ACCOUT is 1:n
relationship between MA_COMPANY_PROFILE and SEC_USER is n:n
relationship between MA_ACCOUNT and SEC_USER is n:1
When we use below sql in informatica to load this data in denormalized format,
select *
from
MA_COMPANY_PROFILE MA_CMY_PRF,
MA_ACCOUNT MA_AC,
ACCOUNT_STATUS AC_ST,
SEC_USER SEC_USR,
SEC_USERS_LASTLOGIN SEC_USR_LL
where
MA_CMY_PRF.PROFILE_ID=MA_AC.PROFILE_ID(+) and
MA_CMY_PRF.PROFILE_ID =SEC_USR.PROFILE_ID(+)
we get different number of accounts in source table and warehouse table
or when we try to match number of security users in source and warehouse.
how do we approach this or prepare oracle sql to develop correctly to match source accounts and users and warehouse tables?
If we're talking just about the 3 tables concerned and ignore the 2 tables which dont have conditions... you should be flowing from one table to the inter table and from the inter table to the final table, so the final condition would look like
MA_CMY_PRF.PROFILE_ID=MA_AC.PROFILE_ID AND MA_AC.USER_ID = SEC_USR.USER_ID
This assumes that PROFILE_ID is the primary key of MA_COMPANY_PROFILE, USER_ID is the primary key for SEC_USER and that there are foreign keys to both on the MA_ACCOUNT table. Also you've used (+) presumably to ensure that, where no match is found, you always have a record populated with just the info from MA_CMY_PRF and nulls from the related tables, I've left this off as I don't know your requirement.
I found the reason why the source table count and my target count are mismatch for profile_id, account_id, user_id. Since source tables are having null value in my join column ie profile_id.
Related
I am currently building a tabular cube in SSAS, however I'm having issues when creating relationships.
I have 3 tables
Master
Supplier
Customer
In the master table I have a list of unique IDs, these ID's appear in the other 2 tables (there can be duplicate records for the ID in the Supplier and Customer table).
What I'm tring to do is create a relationship between the Master table and the Supplier and Customer table. One to Many. But SSAS gives me an error The relationship cannot be created because each column contains duplicate values. Select at least one column that contains only unique values.
Any advise would be greatly appreciated
Based on the error message it seems to be there are some duplicate values in the master table within the column that is being selected for relationship.
Operational databases of identical structure work in several countries.
country A has table Users with column user_id
country B has table Users with column user_id
country C has table Users with column user_id
When data from all three databases is brought to the staging area for the further data warehousing purposes all three operational tables are integrated into a single table Users with dwh_user_id.
The logic looks like following:
if record comes from A then dwh_user_id = 1000000 + user_id
if record comes from B then dwh_user_id = 4000000 + user_id
if record comes from c then dwh_user_id = 8000000 + user_id
I have a strong feeling that it is a very bad approach. What would be a better approach?
(user_id + country_iso_code maybe?)
In general, it's a terrible idea to inject logic into your primary key in this way. It really sets you up for failure - what if country A gets more than 4000000 user records?
There are a variety of solutions.
Ideally, you include the column "country" in all tables, and use that together with the ID as the primary key. This keeps the logic identical between master and country records.
If you're working with a legacy system, and cannot modify the country tables, but can modify the master table, add the key there, populate it during load, and use the combination of country and ID as the primary key.
The way we handle this scenario in Ajilius is to add metadata columns to the load. Values like SERVER_NAME or DATABASE_NAME might provide enough unique information to make a compound key unique.
An alternative scenario is to generate a GUID for each row at extract or load time, which would then uniquely identify each row.
The data vault guys like to use a hash across the row, but in this case it would only work if no row was ever a complete duplicate.
This is why they made the Uniqueidentifier data type. See here.
If you can't change to that, I would put each one in a different table and then union them in a view. Something like:
create view vWorld
as
select 1 as CountryId, user_id
from SpainUsers
UNION ALL
select 2 as CountryId, user_id
from USUsers
Most efficient way to do this would be :-
If record from Country A, then user * 0 = Hence dwh_user_id = 0.
If record from Country B, then (user * 0)- 1 = Hence dwh_user_id = -1.
If record from Country C, then (user * 0)+ 1 = Hence dwh_user_id = 1.
Suggesting this logic assuming the dwh_user_id is supposed to be a number field.
I'm trying to build a query where I'm able to get names of clients. So I have two tables, the 1st table has a column AppointmentNO, and this field is a number (there are other columns but they're irrelevant). In the 2nd table I have an ID as primary key, FirstName, LastName. ID is what matches the AppointmentNO in the first table.
Basically what I'm trying to do is link the two tables so that when I have an AppointmentNO in one column, I can see the LASTNAME associated with it in the 2nd column (need to include this in my report). I'm trying to link the AppointmentNO to ID and on JOIN PROPERTIES -> include all records from left table (1st table) and only those from right table (2nd table) where the joined fields are equal.
If I try to run the query it gives me a MISMATCH error. What am I doing wrong?
The type mismatch error could be happening:
because the two fields that you're trying to join aren't set as the same data type (e.g. one is Number and the other is Text) - check this in the Properties tabs for the relevant fields in each table;
it could be that Access has a join between the tables involving other fields (it will sometimes do this with AutoIDs) - you can check the relationships (and establish them) in the Tools -> Relationships window (where this is located might depend on your version). You can also use this tool to explicitly build the relationship, by connecting your 'ID' to 'AppointNO' - though you should still ensure that the fields are of the same data type.
ADDITION:
Based on what you're describing, I think this is the situation (correct me if I'm wrong, though):
Three tables - Client, AppointmentNO, Children
In each table, there is a 'MemberID' - this is primary key in Client Table, and is Foreign Key in the other tables.
The Children and AppointmentNO tables are linked to Client table by one-to-many relationships (a client can have >1 children and >1 appointment).
I'd set this up so that the Member ID is the same datatype in each table, and join all tables on that field. Then, when set up a query that gives you MemberID, ClientName, ClientDOB (and anything else you want from the client table), ChildName, and AppointmentID. Once the query is working and giving you the desired output, you can build a report and group the output by Client and Client Description, so you'll get "Client A" followed by list of appointments and children, then "Client B" etc.
Hope that's clear-ish.
I am trying to use Pentaho Keetle to do some data migration. I would like to create a transformation to accomplish the following:
I have the following tables in the source:
table 1
id [PK]
name
table 2
id [PK]
source_id [FK with table 1.id]
state
I have the same structures in the destination server. Let's say i would like to migrate 10 rows from table 1 along with their relations from table 2 in the destination server.
How would i do that with a Keetle transformation?
Thanks
you would do it in 2 transformations, with a job wrapped around them. Do table1 first, then table2.
How to migrate tables with foreign keys in Pentaho Kettle?
Create 3 tables “USER”, “USER_STATE”, “USER_MIGRATE”
USER
Create 2 fields “ID” and “NAME” in USER table as displayed in the screen shot
USER_STATE
Create 3 fields “ID”, “USER_ID”, “STATE” in the USER_STATE table as displayed in the screen shot. Here USER_ID is the foreign key of the “USER” table.
USER_MIGRATE
This is the table where we will migrate the data from the other two tables “USER” and “USER_STATE”. Create 5 fields “ID”, “USER_ID”, “USER_STATE_ID”, “USER_NAME”, “USER_STATE” as displayed in the screen shot
In this table “USER_STATE_ID” is the foreign key of the table USER_STATE
We can do it in one transformation. We will use the join query to select the data from the two table “USER” and “USER_STATE”. Then we can put these data into our third table which is the migrate table
Please find the join query below
The below screen shot tells about how to map the table fields
This is the transformation used to migrate the data from the source tables to destination tables
I have a table which has employee relationship defined within itself.
i.e.
EmpID Name SeniorId
-----------------------
1 A NULL
2 B 1
3 C 1
4 D 3
and so on...
Where Senior ID is a foreign key whose primary key table is same with refrence column EmpId
I want to clear all rows from this table without removing any constraint. How can i do this?
Deletion need to be performed like this
4, 3 , 2 , 1
How can I do this
EDIT:
Jhonny's Answer is working for me but which of the answers are more efficient.
I don't know if I am missing something, but maybe you can try this.
UPDATE employee SET SeniorID = NULL
DELETE FROM employee
If the table is very large (cardinality of millions), and there is no need to log the DELETE transactions, dropping the constraint and TRUNCATEing and recreating constraints is by far the most efficient way. Also, if there are foreign keys in other tables (and in this particular table design it would seem to be so), those rows will all have to be deleted first in all cases, as well.
Normalization says nothing about recursive/hierarchical/tree relationships, so I believe that is a red herring in your reply to DVK's suggestion to split this into its own table - it certainly is viable to make a vertical partition of this table already and also to consider whether you can take advantage of that to get any of the other benefits I list below. As DVK alludes to, in this particular design, I have often seen a separate link table to record self-relationships and other kinds of relationships. This has numerous benefits:
have many to many up AND down instead of many-to-one (uncommon, but potentially useful)
track different types of direct relationships - manager, mentor, assistant, payroll approver, expense approver, technical report-to - with rows in the relationship and relationship type tables instead of new columns in the employee table
track changing hierarchies in a temporally consistent way (including terminated employee hierarchy history) by including active indicators and effective dates on the relationship rows - this is only fully possible when normalizing the relationship into its own table
no NULLs in the SeniorID (actually on either ID) - this is a distinct advantage in avoiding bad logic, but NULLs will usually appear in views when you have to left join to the relationship table anyway
a better dedicated indexing strategy - as opposed to adding SeniorID to selected indexes you already have on Employee (especially as the number of relationship types grows)
And of course, the more information you relate to this relationship, the more strongly is indicated that the relationship itself merits a table (i.e. it is a "relation" in the true sense of the word as used in relational databases - related data is stored in a relation or table - related to a primary key), and thus a normal form for relationships might strongly indicate that the relationship table be created instead of a simple foreign key relationship in the employee table.
Benefits also include its straightforward delete scenario:
DELETE FROM EmployeeRelationships;
DELETE FROM Employee;
You'll note a striking equivalence to the accepted answer here on SO, since, in your case, employees with no senior relationship have a NULL - so in that answer the poster set all to NULL first to eliminate relationships and then remove the employees.
There is a possibly appropriate usage of TRUNCATE depending upon constraints (EmpployeeRelationships is typically able to be TRUNCATEd since its primary key is usually a composite and not a foreign key in any other table).
Try this
DELETE FROM employee;
Inside a loop, run a command that deletes all rows with an unreferenced EmpID until there are zero rows left. There are a variety of ways to write that inner DELETE command:
DELETE FROM employee WHERE EmpID NOT IN (SELECT SeniorID FROM employee)
DELETE FROM employee e1 WHERE NOT EXISTS
(SELECT * FROM employee e2 WHERE e2.SeniorID = e.EmpID
and probably a third one using a JOIN, but I'm not familiar with the SQL Server syntax for that.
One solution is to normalize this by splitting out "senior" relationship into a separate table. For the sake of generality, make that second table "empID1|empID2|relationship_type".
Barring that, you need to do this in a loop. One way is to do it:
declare #count int
select #count=count(1) from table
while (#count > 0)
BEGIN
delete employee WHERE NOT EXISTS
(select 1 from employee 'e_senior'
where employee.EmpID=e_senior.SeniorID)
select #count=count(1) from table
END