How to achieve "INSERT IGNORE" using PRESTO

How to achieve "INSERT IGNORE" using PRESTO - hive

So in my MYSQL you can use the INSERT IGNORE syntax when doing an insert to rather than throw an error on insert of a duplicate row rather just ignore that row
I would like to achieve the same in Presto working on a Hive database if possible?
I know hive is not a true relational database in the sense the documentation for the INSERT statement on Presto is very basic
I would just like to know if there is a simple work around as all I can think of is first doing a select with a cursor to loop through results and insert

Until Hive 3, there is no concept of unique constraints and even in Hive 3 the constraints are not enforced to the best of my knowledge.
Therefore Presto Hive connector does not enforce any unique constraints, so your INSERT query will never fail when you insert duplicated rows. They will just be stored as independent copies of data.
If you want to maintain uniqueness, this needs to be handled externally, on the application level.

Related

BigQuery Atomicity

I am trying to do a full load of a table in big query daily, as part of ETL. The target table has dummy partition column of type integer and is clustered. I want to have the statement to be atomic i.e either it will completely overwrite the new data or rollback to old data in case of failure for any reason in between and it will serve user queries with old data until it completely overwritten.
One way of doing this is delete and insert but big query does not support multi statement transactions.
I am thinking to use the below statement. Please let me know if this is atomic.
create or replace table_1 partition by dummy_int cluster dummy_column
as select col1,col2,col3 from stage_table1

INSERT new Values into multiple Tables then DELETE these new inserted values(rows) based on specific identifier

Hello Everyone I wanted to know if it is possible to Insert new values into multiple tables(in my case 22) in a single query. Once they are inserted I want to delete these 22(newly inserted rows) and move the deleted rows into a separate table ?

It depends on your definition of "a single query". You cannot do it in a single SQL statement like SELECT, UPDATE, DELETE, MERGE. But you can create a single query with multiple SQL statements using BigQuery Scripting (currently in Beta).
https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
Note that these multiple updates are not transactional, BigQuery uses single-statement transactions.

how to have postgres ignore inserts with a duplicate key but keep going

I am inserting record data in a collection in memory into postgres and want the database to ignore any record that already exists in the database (by virtue of having the same primary key) but keep going with the rest of my inserts.
I'm using clojure and hugsql, btw, but I'm guessing the answer might be language agnostic.
As I'm essentially treating the database as a set in this way I may be engaging in an antipattern.

If you're using Postgres 9.5 or newer (which I assume you are, since it was released back in January 2016), there's a very useful ON CONFLICT cluase you can use:
INSERT INTO mytable (id, col1, col2)
VALUES (123, 'some_value', 'some_other_value')
ON CONFLICT (id) DO NOTHING

I had to solve this for an early version of Postgres so instead of having a single INSERT statement with muliple rows, I used multiple INSERT statements and just ran all of them in a script and made sure that an error would not stop the script (I used Adminer with "stop on error" unchecked) so that the ones that don't throw an error are executed and then all of the new entries got inserted.

can we insert into two tables with single sql statement?

Will it be possible to insert into two tables with same insert command?

No you cannot perform multiple inserts into two tables in one query.

No you can't.
If you want to ensure the atomicity of an operation that requires data to be inserted into 2 tables, you should protect it in a transaction. You either use the SQL statements BEGIN TRAN and COMMIT TRAN, or you use a transaction boundary in whatever language you're using to develop the db access layer. E.g. something like Connection.StartTransaction and Connection.Commit (or Connection.Rollback on an error).

You can call a stored procedure with inserts into two tables.

Maybe in a future release of MySQL you could create a View containing the 2 tables and insert into that.
But with MySQL 5.1.41 you'll get the error:
"Can not modify more than one base table through a join view"
But inserting into 2 tables with 1 query is a weird thing to do, and I don't recommend it.
For more on updatable views check out the MySQL reference.

How to efficiently insert data into index-rich oracle db?

How do we insert data about 2 million rows into a oracle database table where we have many indexes on it?
I know that one option is disabling index and then inserting the data. Can anyone tell me what r the other options?

bulk load with presorted data in index key order

Check SQL*Loader out (especially the paragraph about performance optimization) : it is the standard bulk loading utility for Oracle, and it does a good job once you know how to use it (as always with Oracle).

there are many tricks to fasten de insert, below i wrote some of them
if you use sequence.nextval for insert make sure sequence has big cache value (1000 is enough usually)
drop indexes before insert and create afterwards (make sure you get the create scripts of indexes before dropping) while creating you can use parallel option
if target table has fk dependencies disable them before insert and after insert enable again. if you are sure of your data you can use novalidate option (novalidate option is valid for oracle, other rdbms systems probably have similar option)
if you select and insert you can give parallel hint for select statement and for insert you can use append hint (direct-path insert ) (direct-path insert concept is valid for oracle, other rdbms systems probably have similar option)

Not sure how you are inserting the records; if you can; insert the data in smaller chunks. In my experience 50 sets of 20k records is often quicker than 1 x 1000000
Make sure your database files are large enough before you start save you from database growth during the insert ...

If you are sure about the data, besides the index you can disable referential and constraint checks. You can also lower the transaction isolation level.
All these options come with a price, though. Each option increases your risk of having corrupt data in the sense that you may end up with null FK's etc.

As an another option, one can use oracle advanced and faster data pump (expdp, impdp) utilities availability 10 G onward. Though, Oracle still supports old export/import (exp, imp).
Oracle provides us with many choices for data loading, some way faster than others:
Oracle10 Data Pump Oracle import utility
SQL insert and merge
statements PL/SQL bulk loads for the forall PL/SQL operator
SQL*Loader
The pros/cons of each can be found here ..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to achieve "INSERT IGNORE" using PRESTO - hive

Related

BigQuery Atomicity

INSERT new Values into multiple Tables then DELETE these new inserted values(rows) based on specific identifier

how to have postgres ignore inserts with a duplicate key but keep going

can we insert into two tables with single sql statement?

How to efficiently insert data into index-rich oracle db?

Categories

Resources