SQL conditional row insert - sql

Is it possible to insert a new row if a condition is meet?
For example, i have this table with no primary key nor uniqueness
+----------+--------+
| image_id | tag_id |
+----------+--------+
| 39 | 8 |
| 8 | 39 |
| 5 | 11 |
+----------+--------+
I would like to insert a row if a combination of image_id and tag_id doesn't exists
for example;
INSERT ..... WHERE image_id!=39 AND tag_id!=8

I think you're saying: you need to avoid duplicate rows in this table.
There are many ways of handling this. One of the simplest:
INSERT INTO theTable (image_id, tag_id) VALUES (39, 8)
WHERE NOT EXISTS
(SELECT * FROM theTable
WHERE image_id = 39 AND tag_id = 8)
As #Henrik Opel pointed out, you can use a check constraint on the combined columns, but then you have to have a try/catch block somewhere else, which adds irrelevant complexity.
Edit to explain that comment...
I'm assuming this is a table mapping a many-to-many relationship between Movies and Tags. I realize you're probably using php, but I hope the C# pseudocode below is clear enough anyway.
If I have a Movie class, the most natural way to add a tag is an AddTag() method:
class Movie
{
public void AddTag(string tagname)
{
Tag mytag = new Tag(tagname); // creates new tag if needed
JoinMovieToTag(this.id, mytag.id);
}
private void JoinMovieToTag(movie_id, tag_id)
{
/* database code to insert record into MovieTags goes here */
/* db connection happens here */
SqlCommand sqlCmd = new SqlCommand("INSERT INTO theTable... /* etc */");
/* if you have a check constraint on Movie/Tag, this will
throw an exception if the row already exists */
cmd.ExecuteNonQuery();
}
}
There's no practical way to check for duplicates earlier in the process, because another user might Tag the Movie at any moment, so there's no way around this.
Note: If trying to insert a dupe record means there's a bug, then throwing an error is appropriate, but if not, you don't want extra complexity in your error handler.

Try using a database trigger for an efficient way to add rows to a different table based on updates to a table.

I'm assuming you mean you want to insert a row if it does not contain those values.
I don't have MySQL in front of me, but in general SQL this should work:
INSERT INTO a_table (image_id, tag_id) SELECT ? image_id, ? tag_id WHERE image_id!=39 AND tag_id!=8
If you mean to insert the row only if no such row exists at all, then you can do this:
INSERT INTO a_table (image_id, tag_id) SELECT ? image_id, ? tag_id WHERE not exists (SELECT 1 from a_table WHERE image_id!=39 AND tag_id!=8)

If you're using InnoDB with transcations, then you can simply query for the data first, and in your code subsequently execute the insert if now rows were already found with the values. Alternatively, add a unique constaint over both columns, and try the insert. If will fail, if it already exists, which you can ignore. (This is less preferred than my first approach.)

Is fare to give credit to Henrik Opel as he spotted what we all overlooked including me, a simple unique constraint on the two columns. Is ultimately the best solution.

To avoid duplicate row insertion use the below mentioned query:
INSERT INTO `tableName` ( `image_id`, `tag_id`)
SELECT `image_id`, `tag_id` FROM `tableName`
WHERE NOT EXISTS (SELECT 1
FROM `tableName`
WHERE `image_id` = '39'
AND `tag_id` = '8'
);

Related

how to not insert common data in a column in sqlite?

I have a table named category and the table looks like
cat_id cat_name
1 Science
2 Arts
and another table named item which looks like
item_id item_name cat_id
1 physics 1
2 literature 2
3 chemistry 1
please mark that cat_id is the foreign key here of item table.
now I want that If I put math as item_name under Arts category then it will insert successfully but I want this to happen in such a way that if I want to put same data again then it wont insert. please mark also that I have cat_name and item_name only then my query fetches the category_id using cat_name from category table and inserts into the item table like this way
insert into item (item_name,cat_id) select 'math',category.cat_id from category where category.cat_name = 'Arts'
but if I run this query again it inserts the same item math again, but I want to stop this to happen, what should I do?
Reyjohn, can't you define the column as unique? That will prevent duplicate values.
I don't use sqlite but my impression is it uses similar syntax to MySQL, but without some of the stricter checks.
As per this answer:
sqlite - How to get INSERT OR IGNORE to work
You could use INSERT OR IGNORE statement.
SQLite doesn't have "INSERT OR UPDATE" which would be perfect for your case.
But you can emulate it with "INSERT OR REPLACE" - which is supported by SQLite - together with a "UNION". I developed this technique because I wanted to issue one single query and let the SQLite engine do everything.
INSERT OR REPLACE INTO item (item_id, item_name, cat_id)
SELECT item_id, item_name, cat_id FROM (
SELECT item_id, item_name, 'new_category' FROM item
WHERE item_name='math'
UNION
SELECT NULL, 'math', 'new_category'
) ORDER BY item_id DESC LIMIT 1;
So you are basically making one "SELECT WHERE item_name='math'" to get the existing record (if it exists) and then you concatenate this result (UNION) with a SELECT that generates a new record with a NULL item_id.
The trick then is the "ORDER BY item_id DESC LIMIT 1" at the end, where the actual result for the INSERT statement is exactly 1 record: the one that existed, with same item_id and a 'new_category' or a new one with a NULL item_id, which forces SQLite to create one for you.
You can also check if record exists before (with a SELECT) and decide if you need an INSERT or an UPDATE. But this leads to 2 queries and additional code on the client side. Thats why I developed the technique above.

Is it possible to use a PG sequence on a per record label?

Does PostgreSQL 9.2+ provide any functionality to make it possible to generate a sequence that is namespaced to a particular value? For example:
.. | user_id | seq_id | body | ...
----------------------------------
- | 4 | 1 | "abc...."
- | 4 | 2 | "def...."
- | 5 | 1 | "ghi...."
- | 5 | 2 | "xyz...."
- | 5 | 3 | "123...."
This would be useful to generate custom urls for the user:
domain.me/username_4/posts/1
domain.me/username_4/posts/2
domain.me/username_5/posts/1
domain.me/username_5/posts/2
domain.me/username_5/posts/3
I did not find anything in the PG docs (regarding sequence and sequence functions) to do this. Are sub-queries in the INSERT statement or with custom PG functions the only other options?
You can use a subquery in the INSERT statement like #Clodoaldo demonstrates. However, this defeats the nature of a sequence as being safe to use in concurrent transactions, it will result in race conditions and eventually duplicate key violations.
You should rather rethink your approach. Just one plain sequence for your table and combine it with user_id to get the sort order you want.
You can always generate the custom URLs with the desired numbers using row_number() with a simple query like:
SELECT format('domain.me/username_%s/posts/%s'
, user_id
, row_number() OVER (PARTITION BY user_id ORDER BY seq_id)
)
FROM tbl;
db<>fiddle here
Old sqlfiddle
Maybe this answer is a little off-piste, but I would consider partitioning the data and giving each user their own partitioned table for posts.
There's a bit of overhead to the setup as you will need triggers for managing the DDL statements for the partitions, but would effectively result in each user having their own table of posts, along with their own sequence with the benefit of being able to treat all posts as one big table also.
General gist of the concept...
psql# CREATE TABLE posts (user_id integer, seq_id integer);
CREATE TABLE
psql# CREATE TABLE posts_001 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# CREATE TABLE posts_002 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# select * from posts;
user_id | seq_id
---------+--------
1 | 1
1 | 2
2 | 1
2 | 2
(4 rows)
I left out some rather important CHECK constraints in the above setup, make sure you read the docs for how these kinds of setups are used
insert into t values (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4))
Check for a duplicate primary key error in the front end and retry if needed.
Update
Although #Erwin advice is sensible, that is, a single sequence with the ordering in the select query, it can be expensive.
If you don't use a sequence there is no defeat of the nature of the sequence. Also it will not result in a duplicate key violation. To demonstrate it I created a table and made a python script to insert into it. I launched 3 parallel instances of the script inserting as fast as possible. And it just works.
The table must have a primary key on those columns:
create table t (
user_id int,
seq_id int,
primary key (user_id, seq_id)
);
The python script:
#!/usr/bin/env python
import psycopg2, psycopg2.extensions
query = """
begin;
insert into t (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4));
commit;
"""
conn = psycopg2.connect('dbname=cpn user=cpn')
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE)
cursor = conn.cursor()
for i in range(0, 1000):
while True:
try:
cursor.execute(query)
break
except psycopg2.IntegrityError, e:
print e.pgerror
cursor.execute("rollback;")
cursor.close()
conn.close()
After the parallel run:
select count(*), max(seq_id) from t;
count | max
-------+------
3000 | 3000
Just as expected. I developed at least two applications using that logic and one of then is more than 13 years old and never failed. I concede that if you are Facebook or some other giant then you could have a problem.
Yes:
CREATE TABLE your_table
(
column type DEFAULT NEXTVAL(sequence_name),
...
);
More details here:
http://www.postgresql.org/docs/9.2/static/ddl-default.html

Factor (string) to Numeric in PostgreSQL

Similar to this, is it possible to convert a String field to Numeric in PostgreSQL. For instance,
create table test (name text);
insert into test (name) values ('amy');
insert into test (name) values ('bob');
insert into test (name) values ('bob');
insert into test (name) values ('celia');
and add a field that is
name | num
-------+-----
amy | 1
bob | 2
bob | 2
celia | 3
The most effective "hash"-function of all is a serial primary key - giving you a unique number like you wished for in the question.
I also deal with duplicates in this demo:
CREATE TEMP TABLE string (
string_id serial PRIMARY KEY
,string text NOT NULL UNIQUE -- no dupes
,ct int NOT NULL DEFAULT 1 -- count instead of dupe rows
);
Then you would enter new strings like this:
(Data-modifying CTE requires PostgreSQL 9.1 or later.)
WITH x AS (SELECT 'abc'::text AS nu)
, y AS (
UPDATE string s
SET ct = ct + 1
FROM x
WHERE s.string = x.nu
RETURNING TRUE
)
INSERT INTO string (string)
SELECT nu
FROM x
WHERE NOT EXISTS (SELECT 1 FROM y);
If the string nu already exists, the count (ct) is increased by 1. If not, a new row is inserted, starting with a count of 1.
The UNIQUE also adds an index on the column string.string automatically, which leads to optimal performance for this query.
Add additional logic (triggers ?) for UPDATE / DELETE to make this bullet-proof - if needed.
Note, there is a super-tiny race condition here, if two concurrent transactions try to add the same string at the same moment in time. To be absolutely sure, you could use SERIALIZABLE transactions. More info and links under this this related question.
Live demo at sqlfiddle.
How 'bout a hash, such as md5, of name?
create table test (name text, hash text);
-- later
update test set hash = md5(name);
If you need to convert that md5 text to a number: Hashing a String to a Numeric Value in PostgresSQL
If they are all single characters, you could do this:
ALTER TABLE test ADD COLUMN num int;
UPDATE test SET num = ascii(name);
Though that would only return the character for the first letter if the string was more than a single character.
The exact case shown in your request can be produced with the dense_rank window function:
regress=# SELECT name, dense_rank() OVER (ORDER BY name) FROM test;
name | dense_rank
-------+------------
amy | 1
bob | 2
bob | 2
celia | 3
(4 rows)
so if you were adding a number for each row, you'd be able to do something like:
ALTER TABLE test ADD COLUMN some_num integer;
WITH gen(gen_name, gen_num) AS
(SELECT name, dense_rank() OVER (ORDER BY name) FROM test GROUP BY name)
UPDATE test SET some_num = gen_num FROM gen WHERE name = gen_name;
ALTER TABLE test ALTER COLUMN some_num SET NOT NULL;
however I think it's much more sensible to use a hash or to assign generated keys. I'm just showing that your example can be achieved.
The biggest problem with this approach is that inserting new data is a pain. It's a ranking (like your example shows) so if you INSERT INTO test (name) VALUES ('billy'); then the ranking changes.

Maintaining logical consistency with a soft delete, whilst retaining the original information

I have a very simple table students, structure as below, where the primary key is id. This table is a stand-in for about 20 multi-million row tables that get joined together a lot.
+----+----------+------------+
| id | name | dob |
+----+----------+------------+
| 1 | Alice | 01/12/1989 |
| 2 | Bob | 04/06/1990 |
| 3 | Cuthbert | 23/01/1988 |
+----+----------+------------+
If Bob wants to change his date of birth, then I have a few options:
Update students with the new date of birth.
Positives: 1 DML operation; the table can always be accessed by a single primary key lookup.
Negatives: I lose the fact that Bob ever thought he was born on 04/06/1990
Add a column, created date default sysdate, to the table and change the primary key to id, created. Every update becomes:
insert into students(id, name, dob) values (:id, :name, :new_dob)
Then, whenever I want the most recent information do the following (Oracle but the question stands for every RDBMS):
select id, name, dob
from ( select a.*, rank() over ( partition by id
order by created desc ) as "rank"
from students a )
where "rank" = 1
Positives: I never lose any information.
Negatives: All queries over the entire database take that little bit longer. If the table was the size indicated this doesn't matter but once you're on your 5th left outer join using range scans rather than unique scans begins to have an effect.
Add a different column, deleted date default to_date('2100/01/01','yyyy/mm/dd'), or whatever overly early, or futuristic, date takes my fancy. Change the primary key to id, deleted then every update becomes:
update students x
set deleted = sysdate
where id = :id
and deleted = ( select max(deleted) from students where id = x.id );
insert into students(id, name, dob) values ( :id, :name, :new_dob );
and the query to get out the current information becomes:
select id, name, dob
from ( select a.*, rank() over ( partition by id
order by deleted desc ) as "rank"
from students a )
where "rank" = 1
Positives: I never lose any information.
Negatives: Two DML operations; I still have to use ranked queries with the additional cost or a range scan rather than a unique index scan in every query.
Create a second table, say student_archive and change every update into:
insert into student_archive select * from students where id = :id;
update students set dob = :newdob where id = :id;
Positives: Never lose any information.
Negatives: 2 DML operations; if you ever want to get all the information ever you have to use union or an extra left outer join.
For completeness, have a horribly de-normalised data-structure: id, name1, dob, name2, dob2... etc.
If number 1 is not an option if I never want to lose any information and always do a soft delete. Number 5 can be safely discarded as causing more trouble than it's worth.
I'm left with options 2, 3 and 4 with their attendant negative aspects. I usually end up using option 2 and the horrific 150 line (nicely-spaced) multiple sub-select joins that go along with it.
tl;dr I realise I'm skating close to the line on a "not constructive" vote here but:
What is the optimal (singular!) method of maintaining logical consistency while never deleting any data?
Is there a more efficient way than those I have documented? In this context I'll define efficient as "less DML operations" and / or "being able to remove the sub-queries". If you can think of a better definition when (if) answering please feel free.
I'd stick to #4 with some modifications.No need to delete data from original table ; it's enough to copy old values to archive table before updating(or before deleting) original record. That's can be easily done with row level trigger. Retrieving all information in my opinion is not a frequent operation, and I don't see anything wrong with extra join /union. Also, you can define a view , so all queries will be straightforward from end user perspective.

Postgres: WHERE... AND syntax problem

Can anyone help with a Postgres syntax problem? I'm trying to insert a record, but before doing so, check it doesn't exist, using WHERE... AND.
=# \d domes_manor_place;
id | integer | not null default nextval('domes_manor_place_id_seq'::regclass)
manor_id | integer | not null
place_id | integer | not null
=# select * from domes_manor_place where place_id='13621';
24017 | 22276 | 13621
OK, so we know that there is no record with manor_id=22398 and place_id=13621. Let's try to insert it with our `WHERE... AND' syntax:
=# INSERT INTO domes_manor_place (manor_id, place_id) SELECT 22398, 13621
WHERE (22398 NOT IN (SELECT manor_id FROM domes_manor_place)) AND
(13621 NOT IN (SELECT place_id FROM domes_manor_place));
INSERT 0 0
It won't insert the record - so what's wrong with my syntax?
Try this:
WHERE (22398, 13621) NOT IN (SELECT manor_id, place_id FROM domes_manor_place)
By the way, a much better approach is to use a unique constraint on the pair of columns. This will cause the insert to fail if a row already exists.
You need a UNIQUE-constraint, the SELECT can't help because it can't see data that is not committed yet. Different transactions can insert new records at the same moment and these are all "unique"... NOT.