Pentaho: Kettle/Spoon: Combining multiple data after inserts

Pentaho: Kettle/Spoon: Combining multiple data after inserts - pentaho

I'm using Pentaho's Kettle/Spoon to load a Customer. I can't figure out how to join 2 or more transformations together after they're complete
Source
/ | \
A | B
\ | /
Insert Data
(Database Alpha)
Source Data
ID, Name, SSN, Email, CanCall, EmailStatus
(Database Beta)
A) Inserts the email status table if it doesn't exist then returns the ID
B) Inserts the PII table if it doesn't exist then returns the ID
Insert Data
EmailStatusTable
1, can_email
2, can_not_email
PII Table
1, "Johnson, John", "todays_date"
2, "Jackson, Jillian", "todays_date"
CustomerTable
1, 1 (PII Table ID), "jjohnson#blah.com", true (can call), 1 (email status table ID)
2, 2 (PII Table ID), "jill_jack#home.com", false (can call), 2 (email status table ID)
I can't figure out how to make the "Insert Data" portion work. Help please.

Combination lookup/update
step will solve your problem very easily

You can use flags by setting the variables inside the transformations and use those flags values to insert data in the customer table. As mentioned by you, you have to return ID, Here return ID means you have to set that variable as a result or flag inside the transformation. Requirement is very simple. IF you need further help, please reply on the same.

Related

INSERT INTO with foreign key and data from another table

I have a table where the most important information is that it is auto-incremented, the rest of the fields in the database are not relevant. Before inserting the data into the table, I created a "helper" table to store the newly created IDs in this table.
I have a second table like this - also the most important information is that the ID is auto-incremented, and the other data is not relevant to this example. In this case, I have also created an auxiliary table that stores the newly created ID values from this table.
Now I would like to take the values from auxiliary table 1 and 2 and insert them into a third table that will take the smallest ID from auxiliary table 1 and the smallest ID from auxiliary table 2 and insert them as a record into this third table, for example:
Record ID of third table | Smallest ID from first table | Smallest ID from third table.
I have no idea how to build the query constructs in my case - could someone give me some advice, or ready-made (different) code to follow?
My code:
DECLARE #inserted1 TABLE (contact_id udt_id)
INSERT INTO t_usr_contact (contact_firstname, contact_lastname)
OUTPUT INSERTED.contact_id INTO #inserted1(contact_id)
SELECT
'Firma',
'Temporary_value'
FROM t_sup_supplier AS sup
WHERE sup.sup_id IN (175,176) AND sup.grp_id IS null
DECLARE #inserted2 TABLE (grp_id udt_id)
INSERT INTO t_usr_group (grp_label_en)
OUTPUT INSERTED.grp_id INTO #inserted2(grp_id)
SELECT
'Supplier contact'
FROM t_sup_supplier AS sup2
WHERE sup2.sup_id IN (175,176) AND sup2.grp_id IS null
INSERT INTO t_usr_contact_group (grp_id, contact_id)
I would like to go the easiest way, which is as below, but it doesnt work :/.
VALUES (#inserted2.grp_id, #inserted2.contact_id)
As for the data example, after the insert in the first table I will get the following records and in the auxiliary table number 1 I will get the following records:
**Table t_usr_contact:**
175 - Firma - Temporary_value
176 - Firma - Temporary_value
**Table #inserted1:**
175
176
**Table t_usr_group:**
201 - Supplier_contact
202 - Supplier_contact
**Table #inserted2:**
201
202
**Table t_usr_contact_group:**
201 - 175
202 - 176

I've got no idea what you're ultimately trying to do, but if you want two tables each with N rows to become one table made from the columns of the two input tables, like you've got in your example (where your table of 175,176 and your table of 201,202 shall become a table of 175|201,176|202) then you need to join them. To join them you need a key. You haven't got a key so you'll have to fake one:
INSERT INTO thirdtable
SELECT contact_id,grp_id
FROM
(SELECT *, ROW_NUMBER() OVER(ORDER BY contact_id) as FakeKey FROM #inserted1) x
INNER JOIN
(SELECT *, ROW_NUMBER() OVER(ORDER BY grp_id) as FakeKey FROM #inserted2) x
ON x.FakeKey = y.FakeKey
This, of course, joins the data in a very arbitrary fashion based on the order of the assigned IDs. If you want some specific order, like contact 175 exists first and has to get group 202, then you can make the query that inserts the group (eg 202) based on the input 175 output the 175 and the 202 together into a (temp) common table then split it into the detail and middleman tables after

Validate if exist before insert into another table

I have to read a txt data that contains a first load that someone do, and insert data into 2 tables. this means:
At the begining tab_data and tab_list is empty.
with the first txt record, i have to validate first if "C43R" exist on "tab_list" table, if not, i have to insert and get the new ID, and after that insert that new ID created on "tab_data" table with the rest of information.
With the second record, first i have to validate if "C43R" exist on "tab_list" table, if exist i have to get the ID, and after that insert that new ID created on "tab_data" table with the rest of information.
with the fourth txt record, i have to validate first if "M23K" exist on "tab_list" table, if not, i have to insert and get the new ID, and after that insert that new ID created on "tab_data" table with the rest of information.
And the same with all the rows from the txt file.
So how can i start with this?
Does any body have a suggestion or a solution?
Really thanks, regards

You could do this with two queries. The logic would be to first feed tab_list, then tab_data. Note that you need an ordering column in txt_data for this to make sense - I assumed id.
This inserts into tab_list, while manually generating a sequence that starts at 10.
insert into tab_list(id, tab)
select tab_id, 9 + row_number() over(order by min(id))
from txt_data
group by tab_id
With this set-up at hand, you can then insert in tab_data:
insert into tab_data (id, tab_id, data)
select
99 + row_number() over(order by d.id),
l.id,
d.data
from txt_data d
inner join tab_list l on l.tab_id = d.tab_id

Inserting multiple records in database table using PK from another table

I have DB2 table "organization" which holds organizations data including the following columns
organization_id (PK), name, description
Some organizations are deleted so lot of "organization_id" (i.e. rows) doesn't exist anymore so it is not continuous like 1,2,3,4,5... but more like 1, 2, 5, 7, 11,12,21....
Then there is another table "title" with some other data, and there is organization_id from organization table in it as FK.
Now there is some data which I have to insert for all organizations, some title it is going to be shown for all of them in web app.
In total there is approximately 3000 records to be added.
If I would do it one by one it would look like this:
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
VALUES
(
'This is new title',
XXXX,
CURRENT TIMESTAMP,
1,
1,
1
);
where XXXX represent "organization_id" which I should get from table "organization" so that insert do it only for existing organization_id.
So only "organization_id" is changing matching to "organization_id" from table "organization".
What would be best way to do it?
I checked several similar qustions but none of them seems to be equal to this?
SQL Server 2008 Insert with WHILE LOOP
While loop answer interates over continuous IDs, other answer also assumes that ID is autoincremented.
Same here:
How to use a SQL for loop to insert rows into database?
Not sure about this one (as question itself is not quite clear)
Inserting a multiple records in a table with while loop
Any advice on this? How should I do it?

If you seriously want a row for every organization record in Title with the exact same data something like this should work:
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
SELECT
'This is new title' as name,
o.organization_id,
CURRENT TIMESTAMP as datetime_added,
1 as added_by,
1 as special_fl,
1 as title_type_id
FROM
organizations o
;
you shouldn't need the column aliases in the select but I am including for readability and good measure.
https://www.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafymultrow.htm
and for good measure in case you process errors out or whatever... you can also do something like this to only insert a record in title if that organization_id and title does not exist.
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
SELECT
'This is new title' as name,
o.organization_id,
CURRENT TIMESTAMP as datetime_added,
1 as added_by,
1 as special_fl,
1 as title_type_id
FROM
organizations o
LEFT JOIN Title t
ON o.organization_id = t.organization_id
AND t.name = 'This is new title'
WHERE
t.organization_id IS NULL
;

Assign unique ID's to three tables in SELECT query, ID's should not overlap

I am working on SQL Sever and I want to assign unique Id's to rows being pulled from those three tables, but the id's should not overlap.
Let's say, Table one contains cars data, table two contains house data, table three contains city data. I want to pull all this data into a single table with a unique id to each of them say cars from 1-100, house from 101 - 200 and city from 300- 400.
How can I achieve this using only select queries. I can't use insert statements.
To be more precise,
I have one table with computer systems/servers host information which has id from 500-700.
I have another tables, storage devices (id's from 200-600) and routers (ids from 700-900). I have already collected systems data. Now I want to pull storage systems and routers data in such a way that the consolidated data at my end should has a unique id for all records. This needs to be done only by using SELECT queries.
I was using SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) AS UniqueID and storing it in temp tables (separate for storage and routers). But I believe that this may lead to some overlapping. Please suggest any other way to do this.
An extension to this question:
Creating consistent integer from a string:
All I have is various strings like this
String1
String2Hello123
String3HelloHowAreYou
I Need to convert them in to positive integers say some thing like
String1 = 12
String2Hello123 = 25
String3HelloHowAreYou = 4567
Note that I am not expecting the numbers in any order.Only requirement is number generated for one string should not conflict with other
Now later after the reboot If I do not have 2nd string instead there is a new string
String1 = 12
String3HelloHowAreYou = 4567
String2Hello123HowAreyou = 28
Not that the number 25 generated for 2nd string earlier can not be sued for the new string.
Using extra storage (temp tables) is not allowed

if you dont care where the data comes from:
with dat as (
select 't1' src, id from table1
union all
select 't2' src, id from table2
union all
select 't3' src, id from table3
)
select *
, id2 = row_number() over( order by _some_column_ )
from dat

Insert into table some values which are selected from other table

I have my database structure like this ::
Database structure ::
ATT_table- ActID(PK), assignedtoID(FK), assignedbyID(FK), Env_ID(FK), Product_ID(FK), project_ID(FK), Status
Product_table - Product_ID(PK), Product_name
Project_Table- Project_ID(PK), Project_Name
Environment_Table- Env_ID(PK), Env_Name
Employee_Table- Employee_ID(PK), Name
Employee_Product_projectMapping_Table -Emp_ID(FK), Project_ID(FK), Product_ID(FK)
Product_EnvMapping_Table - Product_ID(FK), Env_ID(FK)
I want to insert values in ATT_Table. Now in that table I have some columns like assignedtoID, assignedbyID, envID, ProductID, project_ID which are FK in this table but primary key in other tables they are simply numbers).
Now when I am inputting data from the user I am taking that in form of string like a user enters Name (Employee_Table), product_Name (Product_table) and not ID directly. So I want to first let the user enter the name (of Employee or product or Project or Env) and then value of its primary key (Emp_ID, product_ID, project_ID, Env_ID) are picked up and then they are inserted into ATT_table in place of assignedtoID, assignedbyID, envID, ProductID, project_ID.
Please note that assignedtoID, assignedbyID are referenced from Emp_ID in Employee_Table.
How to do this ? I have got something like this but its not working ::
INSERT INTO ATT_TABLE(Assigned_To_ID,Assigned_By_ID,Env_ID,Product_ID,Project_ID)
VALUES (A, B, Env_Table.Env_ID, Product_Table.Product_ID, Project_Table.Project_ID)
SELECT Employee_Table.Emp_ID AS A,Employee_Table.Emp_ID AS B, Env_Table.Env_ID, Project_Table.Project_ID, Product_Table.Product_ID
FROM Employee_Table, Env_Table, Product_Table, Project_Table
WHERE Employee_Table.F_Name= "Shantanu" or Employee_Table.F_Name= "Kapil" or Env_Table.Env_Name= "SAT11A" or Product_Table.Product_Name = "ABC" or Project_Table.Project_Name = "Project1";

The way this is handled is by using drop down select lists. The list consists of (at least) two columns: one holds the Id's teh database works with, the other(s) store the strings the user sees. Like
1, "CA", "Canada"
2, "USA", 'United States"
...
The user sees
CA | Canada
USA| United States
...
The value that gets stored in the database is 1, 2, ... whatever row the user selected.
You can never rely on the exact, correct input of users. Sooner or later they will make typo's.
I extend my answer, based on your remark.
The problem with the given solution (get the Id's from the parent tables by JOINing all those parent tables together by the entered text and combining those with a number of AND's) is that as soon as one given parameter has a typo, you will get not a single record back. Imagine the consequences when the real F_name of the employee is "Shant*anu*" and the user entered "Shant*aun*".
The best way to cope with this is to get those Id's one by one from the parent tables. Suppose some FK's have a NOT NULL constraint. You can check if the F_name is filled in and inform the user when he didn't fill that field. Suppose the user eneterd "Shant*aun*" as name, the program will not warn the user, as something is filled in. But that is not the check the database will do, because the NOT NULL constraints are defined on the Id's (FK). When you get the Id's one by one from the parent tables. You can verify if they are NOT NULL or not. When the text is filled in, like "Shant*aun*", but the returned Id is NULL, you can inform the user of a problem and let him correct his input: "No employee by the name 'Shantaun' could be found."
SELECT $Emp_ID_A = Emp_ID
FROM Employee_Table
WHERE F_Name= "Shantanu"
SELECT $Emp_ID_B = Emp_ID
FROM Employee_Table
WHERE B.F_Name= "Kapil"
SELECT $Env_ID = Env_ID
FROM Env_Table
WHERE Env_Table.Env_Name= "SAT11A"
SELECT $Product_ID = Product_ID
FROM Product_Table
WHERE Product_Table.Product_Name = "ABC"
SELECT $Project_ID = Project_ID
FROM Project_Table
WHERE Project_Name = "Project1"

Please use AND instead of OR.
INSERT INTO ATT_TABLE(Assigned_To_ID,Assigned_By_ID,Env_ID,Product_ID,Project_ID)
SELECT A.Emp_ID, B.Emp_ID, Env_Table.Env_ID, Project_Table.Project_ID, Product_Table.Product_ID
FROM Employee_Table A, Employee_Table B, Env_Table, Product_Table, Project_Table
WHERE A.F_Name= "Shantanu"
AND B.F_Name= "Kapil"
AND Env_Table.Env_Name= "SAT11A"
AND Product_Table.Product_Name = "ABC"
AND Project_Table.Project_Name = "Project1";
But it is best practice to use drop down list in your scenario, i guess.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas