Manage auto increment column values in Mosaic Decisions - mosaic-decisions

I have a table in which auto incremental sequence is defined for a key column.
Problem Statement-
Source_table has column ID with values 101, 102, 103 so on and Destination_table has column ID with values 1201,1202,1203.. and is in incremental form.
Source_table
Destination_table
Now, in Mosaic I am able to read the data from the Source_table, but while writing into the Destination table, the values of Destination_table.ID gets overwritten by the Source_table.ID values.
Is there any way that I can preserve the sequence of my Destination_table in Mosaic?

Yes, you can use the Skip Insert and Skip Update functionalities available in the writer node for Upsert write mode, in which it will skip the sequence coming from the input and preserve the sequence at the destination column.
To achieve this, in the Writer Node Configuration menu, select the Write Mode as Upsert and drag in the columns you wish to preserve the sequence for to the Skip Insert and Skip Update, as shown below:

Related

Liquibase run part of a changeset?

I have the below changeset but, I may want to add additional inserts in the future. It could be in the hundreds or thousands.
--changeset author:changesetid endDelimiter:;
INSERT INTO "MY_TABLE" (id, name) VALUES (1, 'bob');
INSERT INTO "MY_TABLE" (id, name) VALUES (2, 'jim');
INSERT INTO "MY_TABLE" (id, name) VALUES (3, 'mick');
I can't execute previous inserts because that would throw errors about that data already existing. Similarly, I don't want to have dozens or hundreds of individual changesets with their own combination of different inserts overtime. I would like everything consolidated under 1 changeset.
Maybe I could use some type of precondition with this attribute but, I wouldn't want to keep checking the table size (or have to update the precondition value) everytime I need to update the changeset.
--preconditions onFail:WARN
How about:
create 2 changesets
1st
-insert your data to e.g MY_TABLE_TMP ( table without PK allowing duplicates)
add TRUNCATE at the beginning of this changeset and new inserts below,
2st
your final changeset (sql script)
use a LOOP to get data from MY_TABLE_TMP and insert it with dynamic SQL into
MY_TABLE
when error occures, ignore it and continue a LOOP
Rafał
Does your target database support an UPSERT or MERGE statement? If so, you could make your SQL script so if the record is already found in the table nothing happens, and if not found the record is inserted. This would make your script more complicated, but should achieve what you are looking for if you set the changeset to runOnChange.

Vertica sql overwrite data on insert

How to overwrite the table each time there is an insert statement in a vertica?
Consider:
INSERT INTO table1 VALUES ('My Value');
This will give say
| MyCol |
----------
MyValue
How to overwrite the same table on next insert statement say
INSERT INTO table1 VALUES ('My Value2');
| MyCol |
----------
MyValue2
You can either DELETE or TRUNCATE your table. There is no override method for Vertica. Use TRUNCATE since you have wanted only and only a value.
Source
INSERT INTO table1 VALUES ('My Value');
TRUNCATE TABLE table1;
INSERT INTO table1 VALUES ('My Value2');
Or (if connection get lost before you commit then it will not get effected.)
Rollback
An individual statement returns an ERROR message. In this case, Vertica rolls back the statement.
DDL errors, systemic failures, dead locks, and resource constraints return a ROLLBACK message. In this case, Vertica rolls back the entire transaction.
INSERT INTO table1 VALUES ('My Value');
DELETE FROM table1
WHERE MyCol !='My Value2';
INSERT INTO table1 VALUES ('My Value2');
COMMIT;
I might suggest that you don't do such a thing.
The simplest method is to populate the table with a row, perhaps:
insert into table1 (value)
values (null);
Then use update, not insert:
update table1
set value = ?;
That fixes your problem.
If you insist on using insert, you could insert values with an identity column and use a view to get the most recent value:
create table table1 (
table1_id identity(1, 1),
value varchar(255)
);
Then access the table using a view:
create view v_table1 as
select value
from table1
order by table1_id desc
limit 1;
If the view becomes inefficient, you can periodically empty the table.
One advantage of this approach is that the table is never empty and not locked for very long -- so it is generally available. Deleting rows and inserting rows can be tricky in that respect.
If you really like triggers, you can use a table as above. Then use a trigger to update the row in another table that has a single row. This also maximizes availability, without overhead for fetching the most recent value.
If it is a single-row table, then there's no risk whatsoever to fill it with a single row that can be NULL, as #Gordon Linoff suggests.
Internally, you should be aware that Vertica, in the background, always implements an UPDATE as a DELETE, by adding a delete vector for the row, and then applying an INSERT.
No problem with a single-row table, as the Tuple Mover (the background daemon process that wakes up all 5 mins to de-fragment the internal storage, to put it simply, and will create a single data (Read Optimized Storage - ROS) container out of: the previous value; the delete vector pointing to that previous value, thus deactivating it, and the newly inserted value that it is updated to.
So:
CREATE TABLE table1 (
mycol VARCHAR(16)
) UNSEGMENTED ALL NODES; -- a small table, replicate it across all nodes
-- now you have an empty table
-- for the following scenario, I assume you commit the changes every time, as other connected
-- processes will want to see the data you changed
-- then, only once:
INSERT INTO table1 VALUES(NULL::VARCHAR(16);
-- now, you get a ROS container for one row.
-- Later:
UPDATE table1 SET mycol='first value';
-- a DELETE vector is created to mark the initial "NULL" value as invalid
-- a new row is added to the ROS container with the value "first value"
-- Then, before 5 minutes have elapsed, you go:
UPDATE table1 SET mycol='second value';
-- another DELETE vector is created, in a new delete-vector-ROS-container,
-- to mark "first value" as invalid
-- another new row is added to a new ROS container, containing "second value"
-- Now 5 minutes have elapsed since the start, the Tuple Mover sees there's work to do,
-- and:
-- - it reads the ROS containers containing "NULL" and "first value"
-- - it reads the delete-vector-ROS containers marking both "NULL" and "first value"
-- as invalid
-- - it reads the last ROS container containing "second value"
-- --> and it finally merges all into a brand new ROS container, to only contain.
-- "second value", and, at the end the four other ROS containers are deleted.
With a single-row table, this works wonderfully. Don't do it like that for a billion rows.

Pentaho: execute insert only if there are no duplicates

Basically I want to insert a set of rows only if there are no changes from the target row.
I have implemented a blocking step to wait for all rows to be processed before proceeding. After this I want to add a condition to check if there are any changed data and if there are any abort the process else insert all rows.
Any suggestions?
This seems to be very easy with only 2 steps
Try this:
Step 1 : Use Database lookup step, look up on the key columns, And retrieve the columns you want compare including key fields in the target table for the duplicates.
Step 2: Use Filter Step, Here compare all the field which you have retrieved from the db look with the stream / table / source input input. like id (from source input) = id (from target) and name (from source input) = name (from target) , false condition point to Target table and true to dummy for testing.
Note: If you want populate table key max + 1 then for Combination lookup and update step instead of the table output
If I understand your question properly, you want to insert rows if they are identical to the rows in target? Would that not result in a PK violation?
Anyways from your code screen shot, you seem to have used a Merge Rows(Diff) step which will give you rows flagged with a 'new', 'changed', 'identical' or 'deleted' status.
From here you want to check for two things: Changed or Identical
If it is changed you have to abort and if it is identical you will insert
Now you use a simple filter step with the status = 'identical' as the true condition (i.e.) in your case to the insert flow
The false condition would go to the abort step.
Although do note that even if a single row is found to be changed the entire transformation would be aborted
If I understand your use case properly, I would not use the "Table output" step for this kind of move.
"Table output" is a great step for data warehousing, where you usually insert data to tables which are supposed to be empty and are part of a much broader process.
Alternatively, I would use "Execute SQL script" to tweak the INSERT to your own needs.
Consider this to be your desired SQL statement (PostgreSQL syntax in this example):
INSERT INTO ${TargetTable}
(contact_id, request_id, event_time, channel_id)
VALUES ('?', '?', '?', '?')
WHERE
NOT EXISTS (
SELECT contact_id, request_id, event_time, channel_id FROM ${TargetTable}
WHERE contact_id = '?' AND
-- and so on...
);
:
Get the required fields for mapping (will be referenced by the question marks into an argument sequence);
Check the "Variable substitution" check box in case you intend to use variables which were loaded and/or created along the broader process;
SQL-performance-wise, it may not be the most efficient way, but it looks to me like a better implementation for your use case.
the simplest way to do that is to use the insert/update step. not need to make any query: if the row exists it updates, if not exists it creates a new row.

pentaho spoon/kettle merge row diff step

I want to update a new database's table based on an old one
this is the data in the old table:
id,type
1,bla
2,bla bla
the new table is empty. Currently i have the two input table steps connected to a merge rows diff step and then funnel that into a sync after merge step.
the issue is that I get the flagfield set to deleted because it cannot find the any values in the compare stream (duh its an empty table!). Is my logic wrong or should it not work like this:
not found in compare stream --> set flag to needs insert --> insert in compare table ??
How do I do this?
I set the insert when value equal field in the advanced tab of the sync after merge step to "deleted". It now inserted it into the table

Why are sequences not updated when COPY is performed in PostgreSQL?

I'm inserting bulk records using COPY statement in PostgreSQL. What I realize is, the sequence IDs are not getting updated and when I try to insert a record later, it throws duplicate sequence ID. Should I manually update the sequence number to get the number of records after performing COPY? Isn't there a solution while performing COPY, just increment the sequence variable, that is, the primary key field of the table? Please clarify me on this. Thanks in advance!
For instance, if I insert 200 records, COPY does good and my table shows all the records. When I manually insert a record later, it says duplicate sequence ID error. It very well implies that it didn’t increment the sequence ids during COPYing as work fine during normal INSERTing. Instead of instructing the sequence id to set the max number of records, won’t there be any mechanism to educate the COPY command to increment the sequence IDs during its bulk COPYing option?
You ask:
Should I manually update the sequence number to get the number of records after performing COPY?
Yes, you should, as documented here:
Update the sequence value after a COPY FROM:
| BEGIN;
| COPY distributors FROM 'input_file';
| SELECT setval('serial', max(id)) FROM distributors;
| END;
You write:
it didn’t increment the sequence ids during COPYing as work fine during normal INSERTing
But that is not so! :) When you perform a normal INSERT, typically you do not specify an explicit value for the SEQUENCE-backed primary key. If you did, you would run in to the same problems as you are having now:
postgres=> create table uh_oh (id serial not null primary key, data char(1));
NOTICE: CREATE TABLE will create implicit sequence "uh_oh_id_seq" for serial column "uh_oh.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "uh_oh_pkey" for table "uh_oh"
CREATE TABLE
postgres=> insert into uh_oh (id, data) values (1, 'x');
INSERT 0 1
postgres=> insert into uh_oh (data) values ('a');
ERROR: duplicate key value violates unique constraint "uh_oh_pkey"
DETAIL: Key (id)=(1) already exists.
Your COPY command, of course, is supplying an explicit id value, just like the example INSERT above.
I realize that this is a bit old but maybe someone might still be looking for the answer.
As other said COPY works in a similar way as INSERT, so for inserting into a table that has a sequence, you simply don't mention the sequence field at all and it is taken care of for you. For COPY it works in the same exact way. But doesn't it COPY require ALL fields in the table to be present in the text file? The correct answer is NO, it doesn't, but it is the default behavior.
To COPY and leave the sequence out do the following:
COPY $YOURSCHEMA.$YOURTABLE(col1,col2,col3,col4) FROM '$your_input_file' DELIMITER ',' CSV HEADER;
No need to manually update the schema afterwards, it works as intended and in my testing is just about as fast.
You could copy to a sister table, then insert into mytable select * from sister - that would increment the sequence.
If your loaded data has the id field, don't select it for the insert: insert into mytable (col1, col2, col3) select col1, col2, col3 from sister