Avoid inserting duplicates when using autoincrementing index

Avoid inserting duplicates when using autoincrementing index - sql

I have a query:
INSERT INTO tweet_hashtags(hashtag_id, tweet_id)
VALUES(1, 1)
ON CONFLICT DO NOTHING
RETURNING id
which work fine and inserts with id = 1, but when there is a duplicate let's say another (1, 1) it inserts with an id = 2. I want to prevent this from happening and I read that I can do ON CONFLICT (col_name) but that doesn't really help because I need to check for two values at a time.

The on conflict clause requires a unique constraint or index on the set of columns that you want to be unique - and it looks like you don't have that in place.
You can set it when you create table table:
create table tweet_hashtags(
id serial primary key,
hashtag_id int,
tweet_id int,
unique (hashtag_id, tweet_id)
);
Or, if the table already exists, you can create a unique index (but you need to get rid of the duplicates first):
create unique index idx_tweet_hashtags on tweet_hashtags(hashtag_id, tweet_id);
Then your query should just work:
insert into tweet_hashtags(hashtag_id, tweet_id)
values(1, 1)
on conflict (hashtag_id, tweet_id) do nothing
returning id
Specifying the conflict target makes the intent clearer and should be generally preferred (although it is not mandatory with do nothing).
Note that the query returns nothing when the insert is skipped (that is, the existing id is not returned).
Here is a demo on DB Fiddle that demonstrates the behavior with and without the unique index.

Related

Problems Add/Update Big Data on PostgresSQL

I want to add large amounts of data to the table.
Before adding, I check whether the data exists in the table or not.
I am dealing with the following:
Example:
Table users
id | name | address
.. | .... | .......
select id from users where id = ... and name = ...
if not exist
insert....
if exist
update ....
My problem is the time taken too long.
I wonder if everyone has a solution to solve this problem faster?

You actually do not need to perform this check manually. It is rather the job of a constraint, e.g. via a primary key.
Table with a primary key constraint based on id and name:
CREATE TABLE users (
id INT, name TEXT, address TEXT,
PRIMARY KEY (id,name));
So, if you try to insert two records with the same id and name you will get an exception - the error message bellow is in German, but it basically says that the pk constraint was violated:
INSERT INTO users VALUES (1,'foo','add 1');
INSERT INTO users VALUES (1,'foo','add 2');
FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »users_pkey«
DETAIL: Schlüssel »(id, name)=(1, foo)« existiert bereits.
In case you want to update address when id and name already exist, try using an UPSERT:
INSERT INTO users VALUES (1,'foo','add x')
ON CONFLICT (id, name)
DO UPDATE SET address = EXCLUDED.address;
If you want to simply ignore the conflicting insert without raising an exception, just do as follows:
INSERT INTO users VALUES (1,'foo','add x')
ON CONFLICT DO NOTHING;
See this answer for more details.
Regarding speed: you have to rather check if your table has a proper index or even if an index makes sense at all when performing the insert. Sometimes importing large amount of data into a temporary UNLOGGED TABLE table without index, and then populating the target table with an SQL removing the duplicates is the best choice.

How to ensure that there are no duplicates in field? MS SQL Server 2014

I have the following table:
Customer (Id, Name, employeeID)
The table is already created and is empty, I don't want to remove duplicate data, all I want is to ALTER the table to ensure that there will be no duplicate data
I want to use ALTER and ensure that there are no duplicates in employeeID.
Will
ALTER TABLE Customers
UNIQUE(employeeID)
ADD CONSTRAINT
Is there a better way?

Adding a unique constraint will ensure that no duplicate entries will be added in future:
ALTER TABLE Customers
ADD CONSTRAINT choose_a_name_for_the_constraint UNIQUE (EmployeeID);
You had it basically right, just a bit of a keyword order problem..
If you're working with SQLS, consider also that trivial operations like this can be done via the GUI in SSMS, and it will guide the process. You can also get it to turn the changes into scripts for you by right clicking the table and choosing "Script Table As..." so you can use them elsewhere

From my understanding, I create Unique Index as follows,
create table unicondtional (
i int identity (1,1)
, j int
)
insert into unicondtional values (1), (1)
select * from unicondtional
-- assume 'unicondtional' is table like what you have, so far.
CREATE UNIQUE NONCLUSTERED INDEX unique_with_condition ON unicondtional
(
j
)
WHERE (i > 2) -- max (i)
-- create unique index with condition.
-- from the 'where' clause, we say that, Index should be ensure the unique value insertion.
insert into unicondtional values (1), (2), (3) -- See the Note.
-- successful insert.
select * from unicondtional
insert into unicondtional values (2)
-- due to the Unique Index, duplicate is not allowed by the Index.
update unicondtional
set j = 3
where j = 1
-- before the Index(On the first two rows), duplicates are exist.
select * from unicondtional
So, you don't need to delete the existing duplicate records.
Note: After the Index, if you consider the 1 as duplicate, then you go with Trigger instead of Unique Index.

Since your table is empty, you can directly run
ALTER TABLE Customers
ADD CONSTRAINT UQ_EmployeeID UNIQUE(EmployeeId);
That will ensure there is no duplicate EmployeeId can be added in that table.
But if there is some data in the table and there is already a duplicate EmployeeId you will get an error message
The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'Customers' and the index name 'UQ_EmployeeId'. The duplicate key value is ("DuplicateValueHere").
For your question
Is there a better way?
You already have the better way to prevent inserting duplicates.
See
Create Unique Constraints
and
ALTER TABLE (Transact-SQL)

Swapping two rows in a table with unique, not null constraint

I have a table with column position, which has unique and not null constraint.
I have move up/down the selected table item requirement,
for that I am taking the selected index and swapping the indexes.
And saving those two items as in DB.
whenever I am trying to insert first item itself its giving UNIQUE constraint..
Because the item's index is already there in DB.
There is a possibility that I can take temporary index, swapping... and saving .. I think it works.
But is there any other way to achieve this requirement

If you do the update in one Update statement, it'll work fine.
create table t (id number primary key);
insert into t values (1);
insert into t values (2);
commit;
update t set id = case when id = 1 then 2 else 1 end
where id in (1,2);

The easiest way would be to use a temporary value like you say because the constraint will not let you have two rows with the same value at any time.
You can probably derive a temporary value that is in itself unique by basing it on the original value and looking at what kind of data you cannot normally have. For example, negative numbers might work.
Other than that, you could declare the constraint as deferred. Then it won't be enforced until the end of your transaction. But that is probably a bit too much effort/impact.
If the field in question is really only used for sorting (and not for object identity), you could consider dropping the uniqueness altogether. You can use a unique primary key as a tie-breaker if necessary.

How to make trigger on two tables?

I have a two tables which insert using jdbc. For example its parcelsTable and filesTableAnd i have some cases:
1. INSERT new row in both tables.
2. INSERT new row only in parcelsTable.
TABLES:
DROP parcelsTable;
CREATE TABLE(
num serial PRIMARY KEY,
parcel_name text,
filestock_id integer
)
DROP filesTable;
CREATE TABLE(
num serial PRIMARY KEY,
file_name text,
files bytea
)
I want to set parcelsTable.filestock_id=filesTable.num when i have INSERT in both tables using TRIGGER.
Its possible? How to know that i insert in both tables?

You don't need to use a trigger to get the foreign key value in this case. Since you have it set as serial you can access the latest value using currval. Run something like this this from your app:
insert into filesTable (file_name, files) select 'f1', 'asdf';
insert into parcelsTable (parcel_name, filestock_id) select 'p1', currval('filesTable_num_seq');
Note that this should only be used when inserting one record at a time to grab individual key values from currval. I'm calling the default sequence name of table_column_seq, which you should be able to use unless you've explicitly declared something different.
I would also recommend explicitly declaring nullability and the relationship:
CREATE TABLE parcelsTable (
...
filestock_id integer NULL REFERENCES filesTable (num)
);
Here is a working demo at SqlFiddle.

This might not be an answer, but it may be what you need. I am making this an answer instead of a comment because I need the space.
I don't know if you can have a trigger on two tables. Typically this is not needed. As in your case, typically either you are creating a parent record and a child record, or you are just creating a child record of an existing record.
So, typically, if you need a trigger when creating both, it is sufficient to put the trigger on the parent record.
I don't think you can do what you need. What you are trying to do is populate the foreign key with the parent record primary key in the same transaction. I don't think you can do that. I think you will have to provide the foreign key in the insert for parcelsTable.
You will end up leaving this NULL when you are creating a record in the parcelsTable at times when you are not creating a record in filesTable. So I think you will want to set the foreign key in the INSERT statement.

Only idea I've got by now is that you can create function that do indirect insert to the tables. then you can have whatever condition you need, with parallel inserts too.

SQL Server Update Trigger only if its Update and only for specific values in the column

There is a certain job that will insert and update this table called ContactInfo (with 2 columns - Id, EmailId) several times a day.
What's a good way to write a trigger on this table to revert back the EmailId for only specific Ids, whenever only those EmailIds for those Ids get updated?
Don't mind hard-coding those Ids in the trigger since the list is about 40.
But specifically concerned about trigger not firing for every update, since updates happen all the time, and don't want the trigger to cause resource issues.
Additional info: table has about 600k entries and is indexed on Id.
In summary: is it possible for the trigger to get fired only when certain values are updated in the column, and not any update on the column.

One alternative mechanism you might consider would be adding another table, called, say, LockedValues. I'm a bit unsure from your narrative what values you're trying to prevent changes to, but here's an example:
Table T, contains two columns, ID and Val:
create table T (
ID int not null,
Val int not null,
constraint PK_T PRIMARY KEY (ID),
constraint UK_T_Lockable UNIQUE (ID,Val)
)
And we have 3 rows:
insert into T(ID,Val)
select 1,10 union all
select 2,20 union all
select 3,30
And we want to prevent the row with ID 2 from having it's Val changed:
create table Locked_T (
ID int not null,
Val int not null,
constraint UQ_Locked_T UNIQUE (ID,Val), --Only really need an index here
constraint FK_Locked_T_T FOREIGN KEY (ID,Val) references T (ID,Val)
)
insert into Locked_T (ID,Val) select 2,20
And so now, of course, any application that is only aware of T will be unable to edit the row with ID 2, but can freely alter rows 1 and 3.
This has the benefit that the enforcement code is built into SQL Server already, so probably quite efficient. You don't need a unique key on Locked_T, but it should be indexed so that it's quite quick to detect that values aren't present.
This all assumes that you were going to write a trigger that rejected changes, rather than one that reverted changes. For that, you'd still have to write a trigger (though I'd still suggest having this separate table, and then writing your trigger to do an update inner joining inserted with Locked_T - which should be quite efficient still).
(Be warned, however: Triggers that revert changes are evil)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Avoid inserting duplicates when using autoincrementing index - sql

Related

Problems Add/Update Big Data on PostgresSQL

How to ensure that there are no duplicates in field? MS SQL Server 2014

Swapping two rows in a table with unique, not null constraint

How to make trigger on two tables?

SQL Server Update Trigger only if its Update and only for specific values in the column

Categories

Resources