Make sure steps run in sequence in Kettle? - pentaho

I have a Kettle transformation that inserts data in a number of database tables. For each table there is a separate transformation (with an injection step) that makes some calculations, checks the data and finally inserts. These sub-transformations are called using a single threader step.
The main transformation looks something like this:
Input from -----> Dummy -----> Dummy -----> Dummy -----> Done
file | | |
| | |
v v v
Select Select Select
values values values
| | |
| | |
v v v
Single Single Single
threader 1 threader 2 threader 3
My problem is that I want to make sure that Single threader 1 finishes for a specific row before Single threader 2 runs for that row, and so on. This is because the first single threader adds a post in one table that then should be referenced in later tables and the database will throw an error if a reference to a post that does not exist (yet) is inserted.
I can't put the single threaders in one line because I need to discard all but a few fields of about 50 in total to match the injectors of the single threaders. That is what the select values does.
How can I solve this?

Here is example of such case
create table a (id serial primary key, md5 text not null);
create table b (id serial primary key, md5 text not null, a_id int not null);
create table c (id serial primary key, md5 text not null, a_id int not null, b_id int not null);
And example of tranformation
I assume that data is flat so that each row at the beginning has data for all tables A, B and C.

Related

What is the best SQL query to populate an existing column on table A with a similar column from table B?

Let's say I have an existing table A with a column called contact_name and a ID column id as the primary key.
All the rows in A have the name value as "NULL" right now.
Another table B has different columns, but one of which is contact_name, and another is ref_id.
Each ref_id in B corresponds to a value of id in A, and there may be multiple rows in B that share the same value for ref_id (meaning they all correspond to a single entry in A).
Let me set up an example:
Table A
id | contact_name
1 | [NULL]
2 | [NULL]
Table B
ref_id | contact_name
1 | "John"
2 | "Helen"
2 | "Alex"
Note there are theoretically other values in each table but for the sake of brevity I'm just showing the values I'm interested in using.
I want to populate contact_name in table A with the first entry of the corresponding contact_name in B, where B.(first)ref_id = A.id, without adding any rows or editing the rest of the rows in either table. That is, I want A in my example to now be:
id | contact_name
1 | "John"
2 | "Helen"
Again, note how the first contact_name value, "Helen", in B is selected, not any other subsequent one, like "Alex".
The MERGE function is made for this. Give it a try:
MERGE INTO TABLEA a USING (
SELECT b.REF_ID, b.CONTACT_NAME
FROM TABLEB) b
ON (a.ID = b.REF_ID)
WHEN MATCHED THEN
UPDATE SET
a.CONTACT_NAME = b.CONTACT_NAME
WHERE a.CONTACT_NAME IS null;
You could try using Five, a low-code development environment that lets you manage your MySQL database, and build a web application on top of it.
Whenever You can add a new column in a Table, Five will give you a prompt where you can use a query to populate that particular column.
Here are the steps:
Import your existing MySQL database into Five
Add a new column to your database.
Five will prompt you to fill in values for the new column. Five gives you four options:
A. you can leave the new column empty,
B. assigned a Default value,
C. Copy Values to an existing field
D. Write a custom query.
Refer to This Screenshot to get an idea how things look inside Five
Hope this helps!
Disclaimer: I work for Five

Generating a decrementing ID while inserting data on a Teradata table

I'm trying to insert data from a query (or a volatile table) to another table which has a id column ( with only type smallint and not null constraint) which should be unique, on Teradata using teradata SQL Assistant the min(id) = -5 and i should insert the new data with a lower id.
This is a simple example:
table a
id| aa |bb
-3|text |text_2
-5|text_3|text_4
and the data i should insert is for example :
aa | bb
text_5|text_6
text_7|text_8
text_9|text_10
so the result should be like
id| aa |bb
-3|text |text_2
-5|text_3|text_4
-6|text_5|text_6
-7|text_7|text_8
-8|text_9|text_10
I tried to pass by creating volatile table with a generated id start by -5 increment by -1 no cycle.
But I get an error
Expected something like a name or a unicode delimited identifier or a cycle keyword between an integer and ','
There is any other way to do it please ?

Adding column to sqlite database and distribute rows based on primary key

I have some data elements containing a timestamp and information about Item X sales related to this timestamp.
e.g.
timestamp | items X sold
------------------------
1 | 10
4 | 40
7 | 20
I store this data in an SQLite table. Now I want to add to this table. Especially if I get data about another item Y.
The item Y data might or might not have different timestamps but I want to insert this data into the existing table so that it looks like this:
timestamp | items X sold | items Y sold
------------------------------------------
1 | 10 | 5
2 | NULL | 10
4 | 40 | NULL
5 | NULL | 3
7 | 20 | NULL
Later on additional sales data (columns) must be added with the same scheme.
Is there an easy way to accomplish this with SQLite?
In the end I want to fetch data by timestamp and get an overview which items were sold at this time. Most examples consider the usecase to add a complete row (one record) or a complete column if it perfectly matches to the other columns.
Or is sqlite the wrong tool at all? And I should rather use csv or excel?
(Using pythons sqlite3 package to create and manipulate the DB)
Thanks!
Dynamically adding columns is not a good design. You could add them using
ALTER TABLE your_table ADD COLUMN the_column_name TEXT
the column, for existing rows would be populated with nulls, although you could specify a DEFAULT value and the existing rows would then be populated with that value.
e.g. the following demonstrates the above :-
DROP TABLE IF EXISTS soldv1;
CREATE TABLE IF NOT EXISTS soldv1 (timestamp INTEGER PRIAMRY KEY, items_sold_x INTEGER);
INSERT INTO soldv1 VALUES(1,10),(4,40),(7,20);
SELECT * FROM soldv1 ORDER BY timestamp;
ALTER TABLE soldv1 ADD COLUMN items_sold_y INTEGER;
UPDATE soldv1 SET items_sold_y = 5 WHERE timestamp = 1;
INSERT INTO soldv1 VALUES(2,null,10),(5,null,3);
SELECT * FROM soldv1 ORDER BY timestamp;
resulting in the first query returning :-
and the second query returning :-
However, as stated, the above is not considered a good design as the schema is dynamic.
You could alternately manage an equivalent of the above with the addition of either a new column (to also be part of the primary key) or by prefixing/suffixing the timestamp with a type.
Consider, as an example, the following :-
DROP TABLE IF EXISTS soldv2;
CREATE TABLE IF NOT EXISTS soldv2 (type TEXT, timestamp INTEGER, items_sold INTEGER, PRIMARY KEY(timestamp,type));
INSERT INTO soldv2 VALUES('x',1,10),('x',4,40),('x',7,20);
INSERT INTO soldv2 VALUES('y',1,5),('y',2,10),('y',5,3);
INSERT INTO soldv2 VALUES('z',1,15),('z',2,5),('z',9,25);
SELECT * FROM soldv2 ORDER BY timestamp;
This has replicated, data-wise, your original data and additionally added another type (column items_sold_z) without having to change the table's schema (nor having the additional complication of needing to update rather than insert as per when applying timestamp 1 items_sold_y 5).
The result from the query being :-
Or is sqlite the wrong tool at all? And I should rather use csv or excel?
SQLite is a valid tool. What you then do with the data can probably be done as easy as in excel (perhaps simpler) and probably much simpler than trying to process the data in csv format.
For example, say you wanted the total items sold per timestamp and how many types were sold then :-
SELECT timestamp, count(items_sold) AS number_of_item_types_sold, sum(items_sold) AS total_sold FROM soldv2 GROUP by timestamp ORDER BY timestamp;
would result in :-

How can I update the table in SQL?

I've created a table called Youtuber, the code is below:
create table Channel (
codChannel int primary key,
name varchar(50) not null,
age float not null,
subscribers int not null,
views int not null
)
In this table, there are 2 channels:
|codChannel | name | age | subscribers | views |
| 1 | PewDiePie | 28 | 58506205 | 16654168214 |
| 2 | Grandtour Games | 15 | 429 | 29463 |
So, I want to edit the age of "Grandtour Games" to "18". How can I do that with update?
Is my code right?
update age from Grandtour Games where age='18'
No, in update, you'll have to follow this sequence:
update tableName set columnWanted = 'newValue' where columnName = 'elementName'
In your code, put this:
update Channel set age=18 where name='Grandtour Games'
Comments below:
/* Channel is the name of the table you'll update
set is to assign a new value to the age, in your case
where name='Grandtour Games' is referencing that the name of the Channel you want to update, is Grandtour Games */
alter table changes the the schema (adding, updating, or removing columns or keys, that kind of thing).
Update table changes the data in the table without changing the schema.
So the two are really quite different.
Here is your answer -
-> ALTER is a DDL (Data Definition Language) statement
UPDATE is a DML (Data Manipulation Language) statement.
->ALTER is used to update the structure of the table (add/remove field/index etc).
Whereas UPDATE is used to update data.
Hope this helps!

SQL - keep values with UPDATE statement

I have a table "news" with 10 rows and cols (uid, id, registered_users, ....) Now i have users that can log in to my website (every registered user has a user id). The user can subscribe to a news on my website.
In SQL that means: I need to select the table "news" and the row with the uid (from the news) and insert the user id (from the current user) to the column "registered_users".
INSERT INTO news (registered_users)
VALUES (user_id)
The INSERT statement has NO WHERE clause so i need the UPDATE clause.
UPDATE news
SET registered_users=user_id
WHERE uid=post_news_uid
But if more than one users subscribe to the same news the old user id in "registered_users" is lost....
Is there a way to keep the current values after an sql UPDATE statement?
I use PHP (mysql). The goal is this:
table "news" row 5 (uid) column "registered_users" (22,33,45)
--- 3 users have subscribed to the news with the uid 5
table "news" row 7 (uid) column "registered_users" (21,39)
--- 2 users have subscribed to the news with the uid 7
It sounds like you are asking to insert a new user, to change a row in news from:
5 22,33
and then user 45 signs up, and you get:
5 22,33,45
If I don't understand, let me know. The rest of this solution is an excoriation of this approach.
This is a bad, bad, bad way to store data. Relational databases are designed around tables that have rows and columns. Lists should be represented as multiple rows in a table, and not as string concatenated values. This is all the worse, when you have an integer id and the data structure has to convert the integer to a string.
The right way is to introduce a table, say NewsUsers, such as:
create table NewsUsers (
NewsUserId int identity(1, 1) primary key,
NewsId int not null,
UserId int not null,
CreatedAt datetime default getdaete(),
CreatedBy varchar(255) default sysname
);
I showed this syntax using SQL Server. The column NewsUserId is an auto-incrementing primary key for this table. The columns NewsId is the news item (5 in your first example). The column UserId is the user id that signed up. The columns CreatedAt and CreatedBy are handy columns that I put in almost all my tables.
With this structure, you would handle your problem by doing:
insert into NewsUsers
select 5, <userid>;
You should create an additional table to map users to news they have registeres on
like:
create table user_news (user_id int, news_id int);
that looks like
----------------
| News | Users|
----------------
| 5 | 22 |
| 5 | 33 |
| 5 | 45 |
| 7 | 21 |
| ... | ... |
----------------
Then you can use multiple queries to first retrieve the news_id and the user_id and store them inside variables depending on what language you use and then insert them into the user_news.
The advantage is, that finding all users of a news is much faster, because you don't have to parse every single idstring "(22, 33, 45)"
It sounds like you want to INSERT with a SELECT statement - INSERT with SELECT
Example:
INSERT INTO tbl_temp2 (fld_id)
SELECT tbl_temp1.fld_order_id
FROM tbl_temp1
WHERE tbl_temp1.fld_order_id > 100;