Pentaho Data Integration Define Dependencies - pentaho

I am using Pentaho Data Integration to create a job where several sql tables are created.
Here is the job:
I would like to create TABLE_D when both TABLE_C and TABLE_B are created, however it seems like first the tables in the first branch are created and then TABLE_D is created, and the job then continues to create TABLE_B.
How can I enforce the creation of TABLE_B AND TABLE_C before TABLE_D?

The way you defined it, TABLE_D is called twice, once after successfully calling TABLE_C, and another when TABLE_B succeeds.
To do what you want you have two options:
Just put them on a single chain: TABLE_A->TABLE_B->TABLE_C->TABLE_D. Sure, it adds another constraint in that TABLE_C is only created after TABLE_B, but does what you need it to.
Put the first 3 statements in a sub job. And the parent job calls TABLE_B after the successul end of the sub-job, which will only happen once.

Related

How to switch data from a Table A to Table B when there is no query running on Table B

i have 2 Table A, B, which have the same columns. Table B is Used for Tableau Reports. Table A is a temporary Table which has new Data from source System.
How to switch the Data from Table A to Table B when there is no Query running on Table B?
i need to do that to avoid downtime on Table B and make sure that Table B is always available for Users
Thankyou very much!
As far as I know this is something you cannot really do with straightforward methods since any RDBMS will take care of this functionality by itself. The only thing you can do is put your insert statements in a transaction block to make sure that the resulting table B will never be seen(=queried) in an "unready" (=half finished) state.
If i understand you correctly this is the scenario;
1 You load data into TableA and then the endusers use TableB
2 You want to switch data into TableB from TableA without downtime
This should solve your problem:
truncate table [dbo].[TableB] alter table [dbo].[TableA] switch to [dbo].[TableB]
This script executes within miliseconds and should be enough for your requirements. One notice though is that your TableA and TableB has to be completely the same. Same Index, same columns etc.

Will one trigger activate another trigger in SQL Server?

There are three tables A, B, C and two triggers a, b.
When table A is updated, trigger a will be activated and update table B
When table B is updated, trigger b will be activated and update table C
When I update table A, will table C be updated?
If not, how?
DML triggers can be nested up to 32 levels, however this can be switched off at the server level. So if it is important that tables B and C are updated, then you need to be certain that this setting will never be switched off, which could be difficult to ensure for the lifetime of an application.
See MSDN > Create Nested Triggers: https://msdn.microsoft.com/en-GB/library/ms190739.aspx

A possible way to remove BigQuery column

I'm looking around for an approach to update an existing BigQuery table.
With the CLI I'm able to copy the table to a new one. And now, i'm looking for an effective to remove/rename a column.
It's said that is not possible to remove a column . So is it possible when copying table1 to table2 to exclude some columns ?
Thanks,
You can do this by running a query that copies the old table to the new one. You should specify allowLargeResults:true and flattenSchema:false. The former allows you to have query results larger than 128MB, the latter prevents repeated fields from being flattened in the result.
You can write the results to the same table as the source table, but use the writeDisposition:WRITE_TRUNCATE. This will atomically overwrite the table with the results. However, if you'd like to test out the query first, you always could write the results to a temporary table first, then copy the temporary table over the old table when you're happy with it (using WRITE_TRUNCATE to atomically replace the table).
(Note, the flags I'm describing here are their names in the underling API, but they have analogues in both the query options in the Web UI and the bq CLI).
For example, if you have a table t1 with schema {a, b, c, d} and you want to drop field c, and rename b to b2 you can run
SELECT a, b as b2, d FROM t1

update data from one table to another (in a database)

DB gurus,
I am hoping someone can set set me on the right direction.
I have two tables. Table A and Table B. When the system comes up, all entries from Table A are massaged and copied over to Table B (according to Table B's schema). Table A can have tens of thousands of rows.
While the system is up, Table B is kept in sync with Table A via DB change notifications.
If the system is rebooted, or my service restarted, I want to re-initialize Table B. However, I want to do this with the least possible DB updates. Specifically, I want to:
add any rows that are in Table A, but not in Table B, and
delete any rows that are not in Table A, but are in Table B
any rows that are common to Table A and Table B should be left untouched
Now, I am not a "DB guy", so I am wondering what is conventional way of doing this.
Use exists to keep processing to a minimum.
Something along these lines, modified so the joins are correct (also verify that I didn't do something stupid and get TableA and TableB backwards from your description):
insert into TableB
select
*
from
TableA a
where
not exists (select 1 from TableB b where b.ID = a.ID)
delete from
TableB b
where
not exists (select 1 from TableA a where a.ID = b.ID)
Informix's Enterprise Replication features would do all this for you. ER works by shipping the logical logs from one server to another, and rolling them forward on the secondary.
You can configure it to be as finely-grained as you need (ie just a handful of tables).
You use the term "DB change notifications" - are you already using ER or is this some trigger-based arrangement?
If for some reason ER can't work for your configuration, I would suggest rewriting the notifications model to behave asynchronously, ie:
write notifications to a table in server 'A' that contains a timestamp or serial field
create a table on server 'B' that stores the timestamp/serial value of the last processed record
run a daemon process on server 'B' that:
compares 'A' and 'B' timestamps/serials
selects 'A' records between 'A' and 'B' timestamps
processes those records into 'B'
update 'B' timestamp/serial
sleep for appropriate time-period, and loop
So Server 'B' is responsible for ensuring its copy is in sync with 'A'. 'A' is not inconvenienced by 'B' being unavailable.
A simple way would be to use a historic table where you would put the changes from A that happened since the last update, and use that table to sync the table B instead of a direct copy from A to B. Once the sync is done, you delete the whole historic table and start anew.
What I don't understand is how table A can be update and not B if your service or computer is not running. Are they found on 2 different database or server?
Join data from both tables according to comon columns and this gives you the rows that have a match in both tables, i.e. data in A and in B. Then use this values (lets call this set M) with set operations, i.e. set minus operations to get the differences.
first requirement: A minus M
second requrement: B minus A
third requirement: M
Do you get the idea?
I am a Sql Server guy but since Sql Server 2008, for this kind of operation , a feature call MERGE is available.
By using MERGE statement we can perform insert, update and delete operations in a single statement.
So I googled and found that Informix also supports the same MERGE statement but I am not sure whether it takes care of delete too or not though insert and update is being taken care off. Moreover, this statement takes care of transaction by itself

Reconciling a column across two tables in SQL Server

There are two Databases, Database A has a table A with columns of id, group and flag. Database B has a table B with columns of ID and flag. Table B is essentially a subset of table A where the group == 'B'.
They are updated/created in odd ways that are outside my understanding at this time, and are beyond the scope of this question (this is not the time to fix the basic setup and practices of this client).
The problem is that when the flag in Table A is updated, it is not reflected in table B, but should be. This is not a time-critical problem, so it was suggested I create a job to handle this. Maybe because it's the end of the week, or maybe because I've never written more than the most basic stored procedure (I'm a programmer, not a DBA), but I'm not sure how to go about this.
At a simplistic level, the stored procedure would be something along of the lines of
Select * in table A where group == B
Then, loop through the resultset, and for each id, update the flag.
But I'm not even sure how to loop in a stored procedure like this. Suggestions? Example code would be preferred.
Complication: Alright, this gets a little harder too. For every group, Table B is in a separate database, and we need to update this flag for all groups. So, we would have to set up a separate trigger for each group to handle each DB name.
And yes, inserts to Table B are already handled - this is just to update flag status.
Assuming that ID is a unique key, and that you can use linked servers or some such to run a query across servers, this SQL statement should work (it works for two tables on the same server).
UPDATE Table_B
SET Table_B.Flag = Table_A.Flag
FROM Table_A inner join Table_B on Table_A.id = Table_B.id
(since Table_B already contains the subset of rows from Table_A where group = B, we don't have to include this condition in our query)
If you can't use linked servers, then I might try to do it with some sort of SSIS package. Or I'd use the method described in the linked question (comments, above) to get the relevant data from Database A into a temp table etc. in Database B, and then run this query using the temp table.
UPDATE
DatabaseB.dbo.Table_B
SET
DatabaseB.dbo.Table_B.[Flag] = DatabaseA.dbo.Table_A.Flag
FROM
DatabaseA.dbo.Table_A inner join DatabaseB.dbo.Table_B B
on DatabaseA.dbo.id = DatabaseB.dbo.B.id
Complication:
For sevaral groups run one such update SQL per group.
Note you can use Flag without []. I'm using the brackets only because of syntax coloring on stackoverflow.
Create an update trigger on table A that pushes the necessary changes to B as A is modified.
Basically (syntax may not be correct, I can't check it right now). I seem to recall that the inserted table contains all of the updated rows on an update, but you may want to check this to make sure. I think the trigger is the way to go, though.
create trigger update_b_trigger
on Table_A
for update
as
begin
update Table_B
set Table_B.flag = inserted.flag
from inserted
inner join Table_B
on inserted.id = Table_B.id
and inserted.group = 'B'
and inserted.flag <> Table_B.flag
end
[EDIT] I'm assuming that inserts/deletes to Table B are already handled and it's just flag updates to Table B that need to be addressed.