Batch insert to database with skipping duplicates found in database

Batch insert to database with skipping duplicates found in database - sql

I'd like to save to the database a list of objects using Spring. In my controller I receive a list of strings, say these are cities. I filter out duplicated and have a unique Set of cities. How do I insert that set the database when I have a unique constraint on the city column?
I see three options here:
Before any insert I check whether there is already such a record, if there is none, I persist the domain object.
I try to persist the object domain object without checking whether it exists in the database or not. When I encounter an SQL Exception, I try to save another domain object.
I retrieve all the objects from the database (the list of records may definitely be larger than 100 thousand), then I create a set of unique (absent in database) domain objects and then save these to the database.
What would be the option to go? Is there (hopefully) a better option?

You could create a global temporary table owned by the oracle user that does the inserts with the same structure as the destination table. The following pseudo code shoud give you an idea. The least fuss is have it created with ON COMMIT DELETE ROWS but sometimes it is easier to use ON COMMIT PRESERVE ROWS and start off every session with
DELETE FROM <your global temporary table>;
Then insert all your data from Spring into this table. I will assume that there are no duplicate ID's or duplicate cities with the same ID in the destination table.
INSERT INTO <your destination table> SELECT city_id, city_name, third_field
FROM <your global temporary table>
WHERE <your global temporary table>.city_name
NOT IN (SELECT <your destination table>.city_name
FROM <your destination table>);
If you have duplicate city names with different ID you can just add another NOT IN clause.

Related

Use schema name in a JOIN in Redshift

Our database is set up so that each of our clients is hosted in a separate schema (the organizational level above a table in Postgres/Redshift, not the database structure definition). We have a table in the public schema that has metadata about our clients. I want to use some of this metadata in a view I am creating.
Say I have 2 tables:
public.clients
name_of_schema_for_client
metadata_of_client
client_name.usage_info
whatever columns this isn't that important
I basically want to get the metadata for the client I'm running my query on and use it later:
SELECT *
FROM client_name.usage_info
INNER JOIN public.clients
ON CURRENT_SCHEMA() = public.clients.name_of_schema_for_client
This is not possible because CURRENT_SCHEMA() is a leader-node function. This function returns an error if it references a user-created table, an STL or STV system table, or an SVV or SVL system view. (see https://docs.aws.amazon.com/redshift/latest/dg/r_CURRENT_SCHEMA.html)
Is there another way to do this? Or am I just barking up the wrong tree?

Your best bet is probably to just manually set the search path within the transaction from whatever source you call this from. See this:
https://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html
let's say you only want to use the table matching your best client:
set search_path to your_best_clients_schema, whatever_other_schemas_you_need_for_this;
Then you can just do:
select * from clients;
Which will try to match to the first clients table available, which by coincidence you just set to your client's schema!
You can manually revert afterwards if need be or just reset the connection to return to default, up to you

Data Agent - SELECT from one table and insert into another

Is there any type of product where I can write a SQL statement to select from one table and then insert into another database (The other database is out in the cloud). Also, it needs to be able to check to see if that record exists and then update the row if anything has changed. Then it will need to run every 10-30 minutes to check to see what has changed or if new records have been added.
The source database and the ending database have a different schema (if that matters?) I've been looking, but it seams that only products out there are ones that will just copy one table and insert into a table with the same schema.

Method in SQL Server for making a copy of a table and refreshing it?

I'm trying to figure out if there's a method for copying the contents of a main schema into a table of another schema, and then, somehow updating that copy or "refreshing" the copy as the main schema gets updated.
For example:
schema "BBLEARN", has table users
SELECT * INTO SIS_temp_data.dbo.bb_users FROM BBLEARN.dbo.users
This selects and inserts 23k rows into the table bb_course_users in my placeholder schema SIS_temp_data.
Thing is, the users table in the BBLEARN schema gets updated on a constant basis, whether or not new users get added, or there are updates to accounts or disables or enables, etc. The main reason for copying the table into a temp table is for data integration purposes and is unrelated to the question at hand.
So, is there a method in SQL Server that will allow me to "update" my new table in the spare schema based on when the data in the main schema gets updated? Or do I just need to run a scheduled task that does a SELECT * INTO every few hours?
Thank you.

You could create a trigger which updates the spare table whenever an updated or insert is performed on the main schema
see http://msdn.microsoft.com/en-us/library/ms190227.aspx

Update and insert in a bulk move (SQL Server)

I have a pair of databases, one is a live database and one is for testing a configuration for that live database. Both reside on the same server.
I have three tables, Users (PK UserId, FK MainGroupId) and Groups (PK GroupId) and GroupMembers (PK GroupMemberId, FK GroupId and UserId).
The tables are the same schema on both databases however the test database has a set of special test users. Groups is mostly stable, but sometimes we add groups, and sometimes we change column data in the groups. GroupMembers is the same but in the test database refers to the test users.
I need to be able to update the Groups table from the live to test user programmatically. I want to use a bulk copy operation, but to do so I have to delete the Groups table first, which will cause a constraint violation.
I could copy the table in bulk to a dummy table, and then post process by doing an insert of the new rows, and update on the existing rows. However, my problems is that there are about 30 tables like Groups, and I don't want to encode all the column names into the stored procedure in the UPDATE SET method. I'd also like to be able to do it in bulk.
The DBAs are dubious about granting ALTER TABLE permission to temporarily drop the constraints.
Any other suggestions?

SInce both databases are on the same server, why not use a MERGE statement?

select for export and import. If you do it in the right order it should work correctly.

There is already an object named 'tbltable1' in the database

I am trying to insert data from one table to another with same structure,
select * into tbltable1 from tbltable1_Link
I am getting the following error message:
There is already an object named 'tbltable1' in the database.

The SELECT INTO statement creates a new table of the name you provide and populates it with the results of the SELECT statement.
I think you should be using INSERT INTO since the table already exists. If your purpose is in fact to populate a temporary table, then you should provide a table name that does not already exist in the database.
See MSDN for more information on this.

If you are confident that tbltable1 is not required, you can drop the table first.
You may also want to consider using temporary tables...
Select * into ##MyTemporaryTable FROM tblTable1_Link
You can then use the temporary table in this session. (Ending the session should drop the temporary table automatically, if I remember correctly. It's been a while since I've worked with SQL Server).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Batch insert to database with skipping duplicates found in database - sql

Related

Use schema name in a JOIN in Redshift

Data Agent - SELECT from one table and insert into another

Method in SQL Server for making a copy of a table and refreshing it?

Update and insert in a bulk move (SQL Server)

There is already an object named 'tbltable1' in the database

Categories

Resources