Apache Ignite: how to insert into table with IDENTITY key (SQL Server)

Apache Ignite: how to insert into table with IDENTITY key (SQL Server) - ignite

I have a table in SQL Server where the primary key is autogenerated (identity column), i.e.
CREATE TABLE TableName
(
table_id INT NOT NULL IDENTITY (1,1),
some_field VARCHAR(20),
PRIMARY KEY (table_id)
);
Since table_id is an autogenerated column, when I implemented the SqlFieldQuery INSERT clause I do not set any argument to table_id:
sql = new SqlFieldsQuery("INSERT INTO TableName (some_field) VALUES (?)");
cache.query(sql.setArgs("str");
However at runtime I get the following error:
Exception in thread "main" javax.cache.CacheException: class org.apache.ignite.internal.processors.query.IgniteSQLException: Failed to execute DML statement [stmt=INSERT INTO TableName (some_field) VALUES (?), params=["str"]]
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:807)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:765)
...
Caused by: class org.apache.ignite.internal.processors.query.IgniteSQLException: Failed to execute DML statement [stmt=INSERT INTO TableName (some_field) VALUES (?), params=["str"]]
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1324)
at org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1815)
at org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1813)
at org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
at org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2293)
at org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:1820)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:795)
... 5 more
Caused by: class org.apache.ignite.IgniteCheckedException: Key is missing from query
at org.apache.ignite.internal.processors.query.h2.dml.UpdatePlanBuilder.createSupplier(UpdatePlanBuilder.java:331)
at org.apache.ignite.internal.processors.query.h2.dml.UpdatePlanBuilder.planForInsert(UpdatePlanBuilder.java:196)
at org.apache.ignite.internal.processors.query.h2.dml.UpdatePlanBuilder.planForStatement(UpdatePlanBuilder.java:82)
at org.apache.ignite.internal.processors.query.h2.DmlStatementsProcessor.getPlanForStatement(DmlStatementsProcessor.java:438)
at org.apache.ignite.internal.processors.query.h2.DmlStatementsProcessor.updateSqlFields(DmlStatementsProcessor.java:164)
at org.apache.ignite.internal.processors.query.h2.DmlStatementsProcessor.updateSqlFieldsDistributed(DmlStatementsProcessor.java:222)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1321)
... 11 more
This is how I planned to implement the insertion because it seemed more tedious to get the max table_id from the cache, increment and insert. I thought I could omit the table_id from the insert and let SQL Server insert the pk, but it doesn't seem to work like this.
Can you please tell me how this should typically be implemented in Ignite? I checked the ignite-examples, unfortunately the examples are too simple (i.e. fixed keys only, like 1 or 2).
Moreover, how does Ignite support the use of sequences?
I am using ignite-core 2.2.0. Any help is appreciated! Thank you.

That's true that as for now autoincrement fields are not supported.
As an option, you could generate IDs manually via for example Ignite's ID generator.

Ignite doesn't support identity columns [1] yet.
It may be non-obviuos, but Ignite SQL layer is built on top of key value store which can be backed by other CacheStore. Your SQL query will never go though to CacheStore as is.
Ignite internals will execute your query, save data in cache and only then update will be propagated to CacheStore which will create a new SQL query for your SQL server.
So, Ignite need the identity column value (actually a key) be known before data saved in cache.
[1] https://issues.apache.org/jira/browse/IGNITE-5625

Related

SQL Server constraint enforcement being violated for split seconds

I have a table on premise that is about 21 million rows with a primary key constraint and when I search that table, there are no duplicates. This table is in an OLTP application database that is constantly moving.
I have the exact same table in Azure which has the same primary key constraint. This table is not an application table, it's just a copy of the one that is on-premise (the goal is to use this one for ad hoc queries, as a source for other systems, etc.).
When I use Azure Data Factory to select all_columns from table on premise to the table in Azure, it returns a violation of the primary key constraint. No matter how many times I run this data factory pipeline, it comes back with a primary key violation for duplicate keys (the keys are always changing though).
So I dropped the primary key constraint in Azure and ran the pipeline again, and sure enough, duplication exists.
Upon investigation, it appears that the on-premise database is doing an insert new record then update the old record to inactivate it. So for a fraction of a second, there are two active rows that ADF is grabbing to then try to insert into the table in Azure which of course fails because of duplicate primary keys.
Now to the best of my knowledge, this shouldn't be possible. You can't insert a new row that violates the primary key constraint. But ADF seems to be grabbing all the data and some of those rows are mid-flight where the insert has happened and the update to inactivate the old row hasn't happened yet.
For those that are curious, the insert happens and the update of the old row happens within less than a second... it's typically 10-20 microseconds. I don't know how this is possible and I don't know how to fix it (because I can't modify the application code). The database for the on-premise database is a SQL Server 2000 database and Azure SQL is an Azure SQL database.

Try with readpast hint. It should not select any rows in locking state.
SELECT * FROM yourtable WITH (readpast)
Since you have create_date and updated_date column then you can select rows older than 5 seconds to avoid duplication.
select * from yourtable where created_date<=dateadd(second,-5,getdate()) and updated_date<=dateadd(second,-5,getdate());

Need to enable the Fault tolerance in a Pipeline Azure Data Factory
Copy data from a Source SQL to a Sink SQL database. A primary key is defined in the sink SQL database, but no such primary key is defined in the source SQL server. The duplicated rows that exist in the source cannot be copied to the sink. Copy activity copies only the first row of the source data into the sink. The subsequent source rows that contain the duplicated primary key value are detected as incompatible and are skipped.
To configure Json Definition skip the incompatible rows in copy activity "enableSkipIncompatibleRow": true
Please Refer: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance
If possible to modify your application, need to check the Primary key constraint before insert or update using EXISTS() function.
Example:
IF EXISTS(SELECT * FROM Table_Name WHERE primary key condition)
BEGIN
UPDATE Table_Name
SET Col_Name= value
WHERE condition
END
ELSE
BEGIN
INSERT INTO Table_Name ( col_Name1,col_Name2,,.. )
VALUES ( ‘’,’’,’’,….)
END

Nifi generates wrong column name for insert constraint

I use Nife 1.13.2 for build ETL process between Oracle and PostgresQL.
There is an ExecuteSQL processor for retrieving data from Oracle and a PutDatabaseRecord processor for inserting data to PostgresQL's table. In PostgresQL's processor there configured INSERT_IGNORE option. The name of key column in both tables is DOC_ID. But due to insert operation, from some reason, Nifi generate mistaken name of the column as it is seen from follow line: ON CONFLICT (DOCID) DO NOTHING
Here is whole error:
Failed to put Records to database for StandardFlowFileRecord[uuid=7ff8189a-2685-4f
9a-bab6-d0bc9b4f7ae0,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1623310567664-311, container=default, section=311], offset=604245, length=610377],offset=211592,name=7ff8189a-2685-4f9a-bab6-d0bc9b4f7ae0,size=6106].
Routing to failure.: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO src.rtl_sales(doc_id, complete, out_sale_sum_disc, kpp_num, org_id, kpp_status, im) VALUES (1830335807, '2020-06-12 +03', '530.67'::numeric, 565900, 62, 4, NULL
) ON CONFLICT (DOCID) DO NOTHING was aborted: ERROR: column "docid" does not exist
Here is table in PostgresQL:
Here is part of FlowFile from the queue:
What is wrong with me or Nifi?

OK, so it must be Translate Field Names -> False in PutDatabaseRecord:

Repeatable write to SQL Sink using Azure Data Factory is failing

Scenario: I'm copying data from Azure Table Storage to an Azure SQL DB using an upsert stored procedure like this:
CREATE PROCEDURE [dbo].[upsertCustomer] #customerTransaction dbo.CustomerTransaction READONLY
AS
BEGIN
MERGE customerTransactionstable WITH (HOLDLOCK) AS target_sqldb
USING #customerTransaction AS source_tblstg
ON (target_sqldb.customerReferenceId = source_tblstg.customerReferenceId AND
target_sqldb.Timestamp = source_tblstg.Timestamp)
WHEN MATCHED THEN
UPDATE SET
AccountId = source_tblstg.AccountId,
TransactionId = source_tblstg.TransactionId,
CustomerName = source_tblstg.CustomerName
WHEN NOT MATCHED THEN
INSERT (
AccountId,
TransactionId,
CustomerName,
CustomerReferenceId,
Timestamp
)
VALUES (
source_tblstg.AccountId,
source_tblstg.TransactionId,
source_tblstg.CustomerName,
source_tblstg.CustomerReferenceId,
source_tblstg.Timestamp
);
END
GO
where customerReferenceId & Timestamp constitute the composite key for the CustomerTransactionstable
However, when I update the rows in my source(Azure table) and rerun the Azure data factory, I see this error:
"ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
database operation failed with the following error: &apos;Violation of
PRIMARY KEY constraint &apos;PK_CustomerTransactionstable&apos;.
Cannot insert duplicate key in object
&apos;dbo.CustomerTransactionstable&apos;. The duplicate key value is
(Dec 31 1990 12:49AM, ABCDEFGHIGK).\r\nThe statement has been
terminated.&apos;,Source=.Net SqlClient Data
Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors=[{Class=14,Number=2627,State=1,Message=Violation
of PRIMARY KEY constraint &apos;PK_CustomerTransactionstable&apos;"
Now, I have verified that there's only one row in both the source and sink with a matching primary key, the only difference is that some columns in my source row have been updated.
This link in the Azure documentation speaks about the repeatable copying, however I don't want to delete rows for a time range from my destination before inserting any data nor do I have the ability to add a new sliceIdentifierColumn to my existing table or any schema change.
Questions:
Is there something wrong with my upsert logic? If yes, is there a better way to do upsert to Azure SQL DBs?
If I choose to use a SQL cleanup script, is there a way to delete only those rows from my Sink that match my primary key?
Edit:
This has now been resolved.
Solution:
The primary key violation will only occur if it's trying to insert a
record which already has a matching primary key. In my case there
although there was just one record in the sink, the condition on which
the merge was being done wasn't getting satisfied due to mismatch
between datetime and datetimeoffset fields.

Have you tried it using ADF Data Flows with Mapping Data Flows instead of coding it through a stored procedure? It may be much easier for you for upserts with SQL.
With an Alter Row transformation, you can perform Upsert, Update, Delete, Insert via UI settings and picking a PK: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row
You would still just need a Copy Activity prior to your Data Flow activity to copy the data from Table Storage. Put it in a Blob folder and then Data Flow can read the Source from there.

Flyway: Can't insert the value NULL in 'installed_on', table 'schema_version' column does not allow nulls. INSERT fails

Using Flyway-core:4.1.2 for database-migration. Added a new DDL file for flyway to execute. Flyway executes the DDL correctly and makes the corresponding changes to tables and columns. (We're adding a table and altering some previous columns in the new DDL). But, flyway fails to register this attempt to schema_version table: I get the following error:
Current version of schema [dbo]: 2.1
Unable to insert row for version '3.0' in metadata table [dbo].[schema_version]
Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException:
Error creating bean with name 'flywayInitializer' defined in class path resource [org/springframework/boot/autoconfigure/flyway/FlywayAutoConfiguration$FlywayConfiguration.class]: Invocation of init method failed; nested exception is org.flywaydb.core.internal.dbsupport.FlywaySqlException:
Message : Cannot insert the value NULL into column 'installed_on', table 'dbo.schema_version'; column does not allow nulls. INSERT fails.
Flyway successfully executes the DDL however, fails to logs it to the schema_version table due to NULL on installed_on. Any help will be greatly appreciated. Thanks in advance. !

In my case the error was that the database table flyway_schema_history had column installed_on defined like DATETIME NOT NULL while it should have been DATETIME DEFAULT GETDATE() NOT NULL.
The issue was resolved when I manually altered the column to include the default value definition.
My company has an number of databases which were created over a period of last 3 years, and i have noticed that the oldest and the youngest of them have the column set properly, while the ones from around 1.5 years have the column defined without the default. Perhaps it was a bug in some older versions of Flyway?

How to tell HSQLDB to allow identity definition for SERIAL?

I'm currently writing tests for a spring boot application which is using a postgreSQL database. During test I want to replace the database by some in-memory variant like H2 or HSQLDB. Sadly both do not behave the same as the postgreSQL database.
I have migrations that look like
CREATE TABLE foo(id BIGSERIAL PRIMARY KEY, ...)
This results in hsqldb telling me
SQL State : 42525
Error Code : -5525
Message : identity definition not allowed: FOO_ID
So apparently creating the matching sequence for the primary key is forbidden. Is there a way to tell hsqldb to accept this?

You need to set PostgreSQL compatibility mode in HSQLDB.
SET DATABASE SQL SYNTAX PGS TRUE
Your table definition is then accepted and converted internally to the SQL Standard equivalent.
CREATE TABLE FOO(ID BIGINT GENERATED BY DEFAULT AS IDENTITY(START WITH 1) NOT NULL PRIMARY KEY, ..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas