pentaho spoon/kettle merge row diff step - pentaho

I want to update a new database's table based on an old one
this is the data in the old table:
id,type
1,bla
2,bla bla
the new table is empty. Currently i have the two input table steps connected to a merge rows diff step and then funnel that into a sync after merge step.
the issue is that I get the flagfield set to deleted because it cannot find the any values in the compare stream (duh its an empty table!). Is my logic wrong or should it not work like this:
not found in compare stream --> set flag to needs insert --> insert in compare table ??
How do I do this?

I set the insert when value equal field in the advanced tab of the sync after merge step to "deleted". It now inserted it into the table

Related

Using Update on Delta table is changing the state of an intermediate DataFrame

I am facing a situation here.
So below are the steps I am using to transform my DataFrame
val filteringRecordsToExpire = collectAllActiveRecords.join(collectingSrcSysIdsToExpire, Seq("trans_id"), "leftsemi")
filteringRecordsToExpire contains few of the IDS which I need to make Invalid
val expiredList = filteringRecordsToExpire.select("trans_id").distinct().collect()
expiredList.foreach(v => expireRecords(v(0).toString)) --> Here I am Updating each record
Now I want to use those Same IDs that I expired and further want to re-Enter them in the same Table with some new Values.
But I am getting an Empty DataFrame After I perform the Expire ( Which is basically updating the existing table for those same IDs )
collectingSrcSysIdsToExpire - So this DataFrame holds all those IDs which further I want to modify and INSERT into the Table.
But in this process The Whole Dataframe is going empty.
I have tried persisting this dataframe and Also registered to a Temp Table and tried using it. But nothing is working.
Any Help or suggestion would be a big help. Thanks in advance.
-----------------------------solution----------------------------------
So here is how I solved this issue.
As Suggested I used MERGE INTO which was a lot faster and as I am using unique transaction ids so I dint have any duplicate issues. Previously I was Updating the table for those transaction IDS then tried to use those same unique IDs with modified values and INSERT INTO the same table.
As a solution I first picked distinct transaction ids from my source and INSERT INTO the table with my updated values , then stored those same list of transaction ids and updated existing older record entries from the table.
val filteringRecordsToExpire = delta.join(collectingSrcSysIdsToExpire, Seq("trans_id"), "leftsemi")
.distinct()
collectingSrcSysIdsToExpire.select(TargetTable.schema.map(f => col(f.name)): _*).write.insertInto(Table)
val sqlUpdateQry =
s""" MERGE INTO TargetTable as tgtTable
USING expireSrsIds as source
ON tgtTable.trans_id = source.trans_id
AND few more conditions
WHEN MATCHED
THEN UPDATE SET
expiring older entries
So somehow INSERT then UPDATE works sequentially.
But UPDATE then INSERT does not work.
The foreach is by definition doesn't return any data - you can see from API docs that the return type is Unit. I also don't recommend to update individual records - it will be too slow as it will rewrite the data for each record separately. Instead, use the MERGE operation, with something like this (it's not Scala, just algorithm):
sourceTable
.as("source")
.merge(
dfUpdates.as("updates"),
"source.id = updates.id")
.whenMatched
.updateExpr(
Map(
"status" -> "'expired"
))
See MERGE documentation for full details. Also instead of updating records, you can delete them.

Automatically fill row with value based on inserted id

I have a table where the user is able to insert the ID of a Node that corresponds to a title elsewhere in the database. I want this tile to be automatically inserted into the row after the user has chosen the id.
This is my table:
I need to have the "SommerhusNavn" column be automatically filled with values based on the "SommerhusId" inserted.
I am using a third party to handle the CRUD functionality, where the user picks the ID from a dropdown. I already know in which table the title for the ID is located, I'm just not sure how to fill the row with the insert statement. Would I need to run a separate query for this to happen?
Edit:Solution
CREATE TRIGGER [dbo].[BlokeredePerioderInsert]
ON dbo.BlokeredePerioder
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
UPDATE BlokeredePerioder SET SommerhusNavn = text FROM umbracoNode AS umbNode
where SommerhusId = umbNode.id
END
GO
Yes, you need to run additional UPDATE query. Let's assume that you have the TitlesTable, with columns ID and Title. Then it should look like:
UPDATE MyTable SET SommerhusNavn = Title FROM TitlesTable AS A
WHERE SommerhusId = A.ID
AND SommerhusNavn IS NOT NULL --not necessary
Perhaps i'm not understanding, but why can't you use send the value across in the initial update?
Can you use a trigger on the database side?
Alternatively, you'll need to send a update across, following the insert.

Using MERGE in SQL Server 2012 to insert/update data

I am using SQL Server 2012 and have two tables with identical structure. I want to insert new records from table 1 to table 2 if they don't already exist in table 2.
If they already exist, I want to update all of the existing records in table 2.
There are some 30 columns in my tables and I want to update all of them.
Can someone please help with this? I had a look at various links posted over internet, but quite don't understand how my statement should look like.
It's really not that hard....
You need:
a source table (or query) to provide data
a target table to merge it into
a condition on which those two tables are checked
a statement what to do if a match (on that condition) is found
a statement what to do if NO match (on that condition) is found
So basically, it's something like:
-- this is your TARGET table - this is where the data goes into
MERGE dbo.SomeTable AS target
-- this is your SOURCE table where the data comes from
USING dbo.AnotherTable AS source
-- this is the CONDITION they have to "meet" on
ON (target.SomeColumn = source.AnotherColumn)
-- if there's a match, so if that row already exists in the target table,
-- then just UPDATE whatever columns in the existing row you want to update
WHEN MATCHED THEN
UPDATE SET Name = source.Name,
OtherCol = source.SomeCol
-- if there's NO match, that is the row in the SOURCE does *NOT* exist in the TARGET yet,
-- then typically INSERT the new row with whichever columns you're interested in
WHEN NOT MATCHED THEN
INSERT (Col1, Col2, ...., ColN)
VALUES (source.Val1, source.Val2, ...., source.ValN);

Delete record from DB warehouse after matching it with staging for the last 5 days

I have a data warehouse and a staging DB. The staging gets a new file everyday on an ftp which gets loaded on the staging DB. It is then inserted/updated/deleted in the DB warehouse. However, the staging file has only the last 5 days' records, which are on a rolling basis. That is, from 8/8 to 8/13 would be today, but tomorrow the file would have data from 8/9 to 8/14, while the DB warehouse has all the history.
When I use
WHEN NOT MATCHED BY SOURCE THEN DELETE
it will delete all the records from DBwarehouse which do not match the staging. This would wipe out all the history. I want to control the script to go back only 5 days back and check if it does not match the source. Here is the query:
MERGE INTO
[x].[y].[z] AS Target
USING [a].[y].[z]AS Source
ON target.[PROBLEM_ID] =source.[PROBLEM_ID]
WHEN MATCHED THEN
UPDATE SET
Target.[CUSTNO] = Source.[CUSTNO],
Target.[SALESID] = Source.[SALESID],
Target.[PCODE] = Source.[PCODE]
WHEN NOT MATCHED BY TARGET THEN
INSERT
([CUSTNO]
,[SALESID]
,[PCODE])
VALUES
(source.[CUSTNO]
,source.[SALESID]
,source.[PCODE])
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
;
Can I get a constraint on the delete statement to go only 5 days back on the DB warehouse? If yes, please help me with the constraint code.
I haven't tried this, but the documentation says that you can add an "AND" clause to "WHEN NOT MATCHED BY SOURCE". This would let you do this:
WHEN NOT MATCHED BY SOURCE AND Your_Date_Field > DateAdd(Day,-5,GetDate())
THEN DELETE;
Note that if your dates includes times you might need to truncate the time before you compare the dates.
Here's basically what you want to do. You use a common table expression to build up a more complex set to merge against. You can also do an "and" on when matched and when not matched, but I find it cleaner to start first with a data set built to purpose in a cte.
Peace
Katherine
with [merge_helper] ([custno], [salesid], [pcode])
as (select [source].[id],
[source].[custno],
[source].[salesid],
[source].[pcode]
from [a].[y].[z] as [source]
left join [x].[y].[z] as [target]
on [target].[id] = [source].[id]
union
select [target].[id],
[target].[custno],
[target].[salesid],
[target].[pcode]
from [x].[y].[z] as [target]
where [target].[id] not in (select [id]
from [source]))
merge into [x].[y].[z] as target
using [merge_helper] as source
on target.[id] = source.[id]
when matched then
update set target.[custno] = source.[custno],
target.[salesid] = source.[salesid],
target.[pcode] = source.[pcode]
when not matched by target then
insert ([custno],
[salesid],
[pcode])
values (source.[custno],
source.[salesid],
source.[pcode])
when not matched by source then
delete;

How to add lines from text file to sqlite db rows that already exist?

I have 12 columns with +/- 2000 rows in a sqlite DB.
Now I want to add a 13th column with the same amount of rows.
If I import the text from a cvs file it will add this after the existing rows (now I have a 4000 row table)
How can I avoid adding it underneath these rows?
Do I need to create a script to run trough each row of the table and add the text from the cvs file for each row?
If you have the code that imported the original data, and if the data has not changed in the meantime, you could just drop the table and reimport it.
Otherwise, you indeed have to create a script that looks up the corresponding record in the table and updates it.
You could also import the new data into a temporary table, and then copy the values over with a command like this:
UPDATE MyTable
SET NewColumn = (SELECT NewColumn
FROM TempTable
WHERE ID = MyTable.ID)
I ended up using Razor SQL great program.
http://www.razorsql.com/