Problems Add/Update Big Data on PostgresSQL

Problems Add/Update Big Data on PostgresSQL - sql

I want to add large amounts of data to the table.
Before adding, I check whether the data exists in the table or not.
I am dealing with the following:
Example:
Table users
id | name | address
.. | .... | .......
select id from users where id = ... and name = ...
if not exist
insert....
if exist
update ....
My problem is the time taken too long.
I wonder if everyone has a solution to solve this problem faster?

You actually do not need to perform this check manually. It is rather the job of a constraint, e.g. via a primary key.
Table with a primary key constraint based on id and name:
CREATE TABLE users (
id INT, name TEXT, address TEXT,
PRIMARY KEY (id,name));
So, if you try to insert two records with the same id and name you will get an exception - the error message bellow is in German, but it basically says that the pk constraint was violated:
INSERT INTO users VALUES (1,'foo','add 1');
INSERT INTO users VALUES (1,'foo','add 2');
FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »users_pkey«
DETAIL: Schlüssel »(id, name)=(1, foo)« existiert bereits.
In case you want to update address when id and name already exist, try using an UPSERT:
INSERT INTO users VALUES (1,'foo','add x')
ON CONFLICT (id, name)
DO UPDATE SET address = EXCLUDED.address;
If you want to simply ignore the conflicting insert without raising an exception, just do as follows:
INSERT INTO users VALUES (1,'foo','add x')
ON CONFLICT DO NOTHING;
See this answer for more details.
Regarding speed: you have to rather check if your table has a proper index or even if an index makes sense at all when performing the insert. Sometimes importing large amount of data into a temporary UNLOGGED TABLE table without index, and then populating the target table with an SQL removing the duplicates is the best choice.

Related

How do I ensure that a referencing table also has data

My Postgres database has the following schema where the the user can store multi profile images.
CREATE TABLE users(
id INT GENERATE AS ALWAYS PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE images(
id INT GENERATE AS ALWAYS PRIMARY KEY,
url VARCHAR(50)
);
CREATE TABLE user_images(
user_id INT REFERENCES users(id),
image_id INT REFERENCES images(id)
);
How do I ensure that when I insert a user object, I also insert at least one user image?

You cannot do so very easily . . . and I wouldn't encourage you to enforce this. Why? The problem is a "chick and egg" problem. You cannot insert a row into users because there is no image. You cannot insert a row into user_images because there is no user_id.
Although you can handle this situation with transactions or delayed constraint checking, that covers only half the issue -- because you have to prevent deletion of the last image.
Here are two alternative.
First, you can simply add a main_image_id to the users table and insist that it be NOT NULL. Voila! At least one image is required.
Second, you can use a trigger to maintain a count of images in users. Then treat rows with no images as "deleted" so they are never seen.

When you insert a data into a table database can return a id from row which was inserted. So, if id > 0 the row has been inserted. But first, add column id (bigserial, auto increment, unique) to all tables.
INSERT INTO user_images VALUES (...) RETURNING id;

Avoid inserting duplicates when using autoincrementing index

I have a query:
INSERT INTO tweet_hashtags(hashtag_id, tweet_id)
VALUES(1, 1)
ON CONFLICT DO NOTHING
RETURNING id
which work fine and inserts with id = 1, but when there is a duplicate let's say another (1, 1) it inserts with an id = 2. I want to prevent this from happening and I read that I can do ON CONFLICT (col_name) but that doesn't really help because I need to check for two values at a time.

The on conflict clause requires a unique constraint or index on the set of columns that you want to be unique - and it looks like you don't have that in place.
You can set it when you create table table:
create table tweet_hashtags(
id serial primary key,
hashtag_id int,
tweet_id int,
unique (hashtag_id, tweet_id)
);
Or, if the table already exists, you can create a unique index (but you need to get rid of the duplicates first):
create unique index idx_tweet_hashtags on tweet_hashtags(hashtag_id, tweet_id);
Then your query should just work:
insert into tweet_hashtags(hashtag_id, tweet_id)
values(1, 1)
on conflict (hashtag_id, tweet_id) do nothing
returning id
Specifying the conflict target makes the intent clearer and should be generally preferred (although it is not mandatory with do nothing).
Note that the query returns nothing when the insert is skipped (that is, the existing id is not returned).
Here is a demo on DB Fiddle that demonstrates the behavior with and without the unique index.

How to ensure that there are no duplicates in field? MS SQL Server 2014

I have the following table:
Customer (Id, Name, employeeID)
The table is already created and is empty, I don't want to remove duplicate data, all I want is to ALTER the table to ensure that there will be no duplicate data
I want to use ALTER and ensure that there are no duplicates in employeeID.
Will
ALTER TABLE Customers
UNIQUE(employeeID)
ADD CONSTRAINT
Is there a better way?

Adding a unique constraint will ensure that no duplicate entries will be added in future:
ALTER TABLE Customers
ADD CONSTRAINT choose_a_name_for_the_constraint UNIQUE (EmployeeID);
You had it basically right, just a bit of a keyword order problem..
If you're working with SQLS, consider also that trivial operations like this can be done via the GUI in SSMS, and it will guide the process. You can also get it to turn the changes into scripts for you by right clicking the table and choosing "Script Table As..." so you can use them elsewhere

From my understanding, I create Unique Index as follows,
create table unicondtional (
i int identity (1,1)
, j int
)
insert into unicondtional values (1), (1)
select * from unicondtional
-- assume 'unicondtional' is table like what you have, so far.
CREATE UNIQUE NONCLUSTERED INDEX unique_with_condition ON unicondtional
(
j
)
WHERE (i > 2) -- max (i)
-- create unique index with condition.
-- from the 'where' clause, we say that, Index should be ensure the unique value insertion.
insert into unicondtional values (1), (2), (3) -- See the Note.
-- successful insert.
select * from unicondtional
insert into unicondtional values (2)
-- due to the Unique Index, duplicate is not allowed by the Index.
update unicondtional
set j = 3
where j = 1
-- before the Index(On the first two rows), duplicates are exist.
select * from unicondtional
So, you don't need to delete the existing duplicate records.
Note: After the Index, if you consider the 1 as duplicate, then you go with Trigger instead of Unique Index.

Since your table is empty, you can directly run
ALTER TABLE Customers
ADD CONSTRAINT UQ_EmployeeID UNIQUE(EmployeeId);
That will ensure there is no duplicate EmployeeId can be added in that table.
But if there is some data in the table and there is already a duplicate EmployeeId you will get an error message
The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'Customers' and the index name 'UQ_EmployeeId'. The duplicate key value is ("DuplicateValueHere").
For your question
Is there a better way?
You already have the better way to prevent inserting duplicates.
See
Create Unique Constraints
and
ALTER TABLE (Transact-SQL)

How to make a field NOT NULL in a multi-tenant database

This is a muti-tenant app. All records have a client id to separate client data. Customers can insert their own data in this table and set their own field nullable or not null. Therefore, setting the whole field not null will not work. I need to set a field null for a specific client id.
I am currently querying the database to check if the value is null. On INSERT I check if the inserting value is null if so I throw an error. I would like the database to do all these checks. is this possible in a multi tenant database like this?
Also, I need suggestions for SQL Server, oracle and postgresql. Thanks

With Postgresql at least you could do this with table inheritance.
You could define an inherited table for this specific client which included the required constraint.
Consider the following example:
psql=> CREATE TABLE a(client INT NOT NULL, id SERIAL, foo TEXT);
CREATE TABLE
psql=> CREATE TABLE b(foo TEXT NOT NULL, CHECK (CLIENT=1) ) INHERITS (a);
NOTICE: moving and merging column "foo" with inherited definition
DETAIL: User-specified column moved to the position of the inherited column.
CREATE TABLE
psql=> INSERT INTO b(client,foo) VALUES (1,'a');
INSERT 0 1
psql=> INSERT INTO b(client,foo) VALUES (1,NULL);
ERROR: null value in column "foo" violates not-null constraint
DETAIL: Failing row contains (1, 2, null).
The table 'b' in this case inherits from 'a' but has a different definition for column 'foo' including a not-null constraint. Also note that I have used a check constraint to ensure that only records for client 1 can go into this table.
For this to work, either your application would have to be updated to insert client records into the correct table, or you would need to write a trigger that does that automatically. Examples of how to do that are given in the manual section on partitioning.
Your application can still make queries against the parent table ('a' from my example) and get the records for all clients, including any in child tables.

You won't be able to do this with a column constraint. Think you're going to have to write a trigger.

How to make trigger on two tables?

I have a two tables which insert using jdbc. For example its parcelsTable and filesTableAnd i have some cases:
1. INSERT new row in both tables.
2. INSERT new row only in parcelsTable.
TABLES:
DROP parcelsTable;
CREATE TABLE(
num serial PRIMARY KEY,
parcel_name text,
filestock_id integer
)
DROP filesTable;
CREATE TABLE(
num serial PRIMARY KEY,
file_name text,
files bytea
)
I want to set parcelsTable.filestock_id=filesTable.num when i have INSERT in both tables using TRIGGER.
Its possible? How to know that i insert in both tables?

You don't need to use a trigger to get the foreign key value in this case. Since you have it set as serial you can access the latest value using currval. Run something like this this from your app:
insert into filesTable (file_name, files) select 'f1', 'asdf';
insert into parcelsTable (parcel_name, filestock_id) select 'p1', currval('filesTable_num_seq');
Note that this should only be used when inserting one record at a time to grab individual key values from currval. I'm calling the default sequence name of table_column_seq, which you should be able to use unless you've explicitly declared something different.
I would also recommend explicitly declaring nullability and the relationship:
CREATE TABLE parcelsTable (
...
filestock_id integer NULL REFERENCES filesTable (num)
);
Here is a working demo at SqlFiddle.

This might not be an answer, but it may be what you need. I am making this an answer instead of a comment because I need the space.
I don't know if you can have a trigger on two tables. Typically this is not needed. As in your case, typically either you are creating a parent record and a child record, or you are just creating a child record of an existing record.
So, typically, if you need a trigger when creating both, it is sufficient to put the trigger on the parent record.
I don't think you can do what you need. What you are trying to do is populate the foreign key with the parent record primary key in the same transaction. I don't think you can do that. I think you will have to provide the foreign key in the insert for parcelsTable.
You will end up leaving this NULL when you are creating a record in the parcelsTable at times when you are not creating a record in filesTable. So I think you will want to set the foreign key in the INSERT statement.

Only idea I've got by now is that you can create function that do indirect insert to the tables. then you can have whatever condition you need, with parallel inserts too.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Problems Add/Update Big Data on PostgresSQL - sql

Related

How do I ensure that a referencing table also has data

Avoid inserting duplicates when using autoincrementing index

How to ensure that there are no duplicates in field? MS SQL Server 2014

How to make a field NOT NULL in a multi-tenant database

How to make trigger on two tables?

Categories

Resources