How to migrate tables with foreign keys in Pentaho Kettle? - pentaho

I am trying to use Pentaho Keetle to do some data migration. I would like to create a transformation to accomplish the following:
I have the following tables in the source:
table 1
id [PK]
name
table 2
id [PK]
source_id [FK with table 1.id]
state
I have the same structures in the destination server. Let's say i would like to migrate 10 rows from table 1 along with their relations from table 2 in the destination server.
How would i do that with a Keetle transformation?
Thanks

you would do it in 2 transformations, with a job wrapped around them. Do table1 first, then table2.

How to migrate tables with foreign keys in Pentaho Kettle?
Create 3 tables “USER”, “USER_STATE”, “USER_MIGRATE”
USER
Create 2 fields “ID” and “NAME” in USER table as displayed in the screen shot
USER_STATE
Create 3 fields “ID”, “USER_ID”, “STATE” in the USER_STATE table as displayed in the screen shot. Here USER_ID is the foreign key of the “USER” table.
USER_MIGRATE
This is the table where we will migrate the data from the other two tables “USER” and “USER_STATE”. Create 5 fields “ID”, “USER_ID”, “USER_STATE_ID”, “USER_NAME”, “USER_STATE” as displayed in the screen shot
In this table “USER_STATE_ID” is the foreign key of the table USER_STATE
We can do it in one transformation. We will use the join query to select the data from the two table “USER” and “USER_STATE”. Then we can put these data into our third table which is the migrate table
Please find the join query below
The below screen shot tells about how to map the table fields
This is the transformation used to migrate the data from the source tables to destination tables

Related

Insert distinct records from duplicate valued records in DATASTAGE

Hi I am using DATASTAGE to import hive data into Oracle, I don't have any primary key constraints in hive but on Oracle I am having a combination primary key.
For example I have a data which doesn't have duplicates on basis of whole record but the pk constraints has duplicate
Table has columns
Table name: item_details ;--(hive)
no primary key constraints
Id mfg_date item exp_date
1 12-01-2018 abc 31-03-2018
2 12-01-2018 cde 28-02-2018
3 15-01-2018 efg 10-04-2018
4 12-01-2018 abc 10-04-2018
Where the mfg_date and item together are primary key for the target table(Oracle) which is same structure.
And I need to push the data into target table.
But it says a primary key violation and gets aborted.
Can anybody give me a solution.
Ps. We cannot change the schema for the tables
This is what I use [[twt]] for. it's faster than doing Lookup and join stages.
start by changing the output sql from Insert to Custom SQL.
then you can create a custom SQL Statement like the one below:
insert into <<target table>>
select Id, mfg_date, item, exp_date from [[twt]]
where not exist (select 1 from <<target table>> where <<target table>>.id = [[twt]].id)
it'll fire off records that don't exist.
because custom sql allows for multiple statements, you can do your whole update this way.
By using [[twt]], you can change this to an ELT form of loading, and control the inserts.
thanks.

How do I move a field to a different table and maintain relationships?

I figure this has to be easy, I'm just not sure how to ask the question.
I have thousands of records I imported from a Excel Spreadsheet in a Microsoft Access table with a field that I want to extract into a new table. How do I move the data from the field in the existing table to a new table and maintain the relationships to the record?
The goal is to move the existing data from the one field into a new table that will have a one-to-many relationship with the existing parent table.
For example, in a table called tblProperties I have the following fields:
Property_Address | Property_Owner | UtilityMeter_Number
I want to maintain a history of utility meters on properties as they are replaced, so I want to move the UtilityMeter_Number field from tblProperties into a new table called tblMeters and create a one-many relationship between the two so I can have multiple meter records for each property record.
How do I move all the existing data from the UtilityMeter_Number field in tblProperties into tblMeters, and maintain the relationship?
What is what I'm trying to do called, and how do I do it?
This is called normalizing data structure.
Use a SELECT DISTINCT query to create unique records. Use that dataset as source to create a new table. Something like:
SELECT DISTINCT CustID, LName, FName, MName INTO Customers FROM Orders;
Now delete unnecessary LName, FName, MName fields from Orders table.
Tables are related on the common CustID fields. An autonumber primary key is not utilized. If you do want a relationship on autonumber PK, then continue with following steps:
add an autonumber field in new table
create a number field in original table
run an UPDATE action SQL to populate new number field with autonumber value from new table - join tables on common CustID fields
also delete CustID field from original table

develop oracle sql to build dimension

I have three tables in 3rd Normalized form and these tables are populated by java application.
MA_COMPANY_PROFILE (table 1)
MA_ACCOUNT (table 2)
SEC_USER (table 3)
Hierarchy is from MA_COMPANY_PROFILE, MA_ACCOUNT, SEC_USER.
relationship between MA_COMPANY_PROFILE and MA_ACCOUT is 1:n
relationship between MA_COMPANY_PROFILE and SEC_USER is n:n
relationship between MA_ACCOUNT and SEC_USER is n:1
When we use below sql in informatica to load this data in denormalized format,
select *
from
MA_COMPANY_PROFILE MA_CMY_PRF,
MA_ACCOUNT MA_AC,
ACCOUNT_STATUS AC_ST,
SEC_USER SEC_USR,
SEC_USERS_LASTLOGIN SEC_USR_LL
where
MA_CMY_PRF.PROFILE_ID=MA_AC.PROFILE_ID(+) and
MA_CMY_PRF.PROFILE_ID =SEC_USR.PROFILE_ID(+)
we get different number of accounts in source table and warehouse table
or when we try to match number of security users in source and warehouse.
how do we approach this or prepare oracle sql to develop correctly to match source accounts and users and warehouse tables?
If we're talking just about the 3 tables concerned and ignore the 2 tables which dont have conditions... you should be flowing from one table to the inter table and from the inter table to the final table, so the final condition would look like
MA_CMY_PRF.PROFILE_ID=MA_AC.PROFILE_ID AND MA_AC.USER_ID = SEC_USR.USER_ID
This assumes that PROFILE_ID is the primary key of MA_COMPANY_PROFILE, USER_ID is the primary key for SEC_USER and that there are foreign keys to both on the MA_ACCOUNT table. Also you've used (+) presumably to ensure that, where no match is found, you always have a record populated with just the info from MA_CMY_PRF and nulls from the related tables, I've left this off as I don't know your requirement.
I found the reason why the source table count and my target count are mismatch for profile_id, account_id, user_id. Since source tables are having null value in my join column ie profile_id.

data sync between tables in 2 Databases by SSIS 2012

I have two tables in two Databases having identical schema. The two databases are on different servers at different location. Now the data can be inserted and updated in any of the two databases table. The requirement is to sync the two tables in different databases so that they are always having the updated information.
The primary key column will always be unique in either database table.
How to achieve this via SSIS ?
Kindly guide.
You can achieve it with 2 Script Tasks. In the first one:
-- what exists in A and not in B
SELECT * INTO DB1.temp.TBL_A_except FROM
(
SELECT pk FROM DB1.schema1.TBL_A
EXCEPT
SELECT pk FROM DB2.schema2.TBL_B
);
-- what exists in B and not in A
SELECT * INTO DB2.temp.TBL_B_except FROM
(
SELECT pk FROM DB2.schema2.TBL_B
EXCEPT
SELECT pk FROM DB1.schema1.TBL_A
);
Second one:
INSERT INTO DB2.schema2.TBL_B
SELECT * FROM DB1.temp.TBL_A_except;
INSERT INTO DB1.schema1.TBL_A
SELECT * FROM DB2.schema2.TBL_B_except;
DROP TABLE DB1.temp.TBL_A_except;
DROP TABLE DB2.schema2.TBL_B;
If you really want to achieve this with SSIS transformation techniques, I'd use two data flows with 2xCache Connection Manager as our temp table 1 and 2. First one to save data into cache, second to load from cache into tables.
or
Two data flows. Source -> Lookup -> Destination.
Implement lookup to check the second table for existance of PK. If for a record Tbl_A there is no such PK in Tbl_B it means you have to insert this row into Tbl_B. No Match Output, directs row to Destination.

One way syncing of 4 columns in SQL Server

This is my problem:
I have an old database, with no constraints whatsoever. There are a handful of tables I copy from the old database to the new database. This copy is simple, and I'm running that daily at night in a Job.
In my new database (nicely with constraints) I made all these loose tables in constraint with a main table. all these tables have a key made of 3ID's and a string.
My main table would translate these 3ID's and a string to 1 ID, so this table would have 5 columns.
In the loose tables, some records can be double, so to insert the ID's in the main table I'd take a distinct of the 3 ID's and a string, and insert those into my main table.
The tables in the old database are updated daily. the copy is run daily, and from the main table I'd like to make a one-to-many relation with the copied tables.
This gives me the problem:
how do I update the main table, do nothing with the already-inserted keys and add the new keys? Don't remove old keys if removed in old database.
I was thinking to make a distinct view of all keys in the old database, but how would I update this to the main table? This would need to run before the daily copy of the other tables (or it would fail on the constraints)
One other idea is to run this update of the main table in Linq-to-SQL in my website but that doesn't seem very clean.
So in short:
Old DB is SQL Server 2000
New DB is SQL Server 2008
Old db has no constraints, copy of some tables happens daily.
There should be a Main table, translating the 3ID and 1string key to a 1ID key, with a constraint to the other tables.
The main table should be updated before the copy job, else the constraints will fail. The main table will be a distinct of a few columns of 1 table. This distinct will be in a view on the old db.
Only new rows can be added in the main db
Does anyone have some ideas, some guidance?
Visualize the DB:
these loose tables are details about a company. 1 table has address(es) 1 table has contact person, another table has it's username and login for our system (which could be more than 1 for 1 company)
a company is identified by the 3ID's and 1string. the main table would list these unique ID's and string so that they could be translated to 1 ID. this 1 ID is then used in the rest of my DB. the one to many relation would then be made from the main table to all those loose tables. I hope this clears it up a bit :)
I think you could use EXCEPT to insert the ids that aren't in your main table yet http://msdn.microsoft.com/en-us/library/ms188055.aspx
So for example:
insert into MainTable
select Id1,Id2,Id3,String1,NewId from DistinctOldTable
except
select Id1,Id2,Id3,String1,NewId from MainTable