Is it possible to create a Set as a table? - sql

On the coding side of things, a Set data structure has three distinctive characteristics:
Every item in the Set is unique
The elements have no ordering
Adding an element that already exists in the Set is essentially a no-op
2 is easy enough in a SQL table, and 1 can be achieved by putting a unique constraint on the column(s) in question, but I wonder about #3. If you try to insert a value which is already there into a table constrained by a unique index, it will error out. Is there any way to design a table in SQL Server to ignore that error and just silently do nothing? Or does it have to be handled client-side, catching that error and ignoring it?

You understand how to handle (1) and (2).
For (3), you just need to implement an instead of trigger. If the value is already in the table, then the trigger would do nothing (not attempt an insert).
You can read about instead of triggers in the documentation.

Related

Informix select trigger to update column

Is it possible to increase the value of a number in a column with a trigger every time it gets selected? We have special tables where we store the new id and when we update it in the app, it tends to get conflicts before the update happens, even when it all takes less than a second. So I was wondering if it is not possible to set database to increase value after every select on that column? Do not ask me why we do not use autoincrement for ids because I do not know.
Informix provides the SERIAL and BIGSERIAL types (and also SERIAL8, but don't use that) which provide autoincrement support. It also provides SEQUENCES with more sophisticated autoincrements. You should aim to use one of those.
Trying to use a SELECT trigger to update the table being selected from is, at best, fraught with problems about transactions and the like (problems which both the types and sequences carefully avoid).
If your design team needs help making effective use of these, ask a new question outlining what you want to achieve.
Normally, the correct way to proceed is to make the ID column in each table that defines 'something' (the Orders table, the Customer table, …) into a SERIAL column and either not insert a value into the ID column or insert 0 into it. The generated value can be retrieved and used when creating auxilliary information — order items, etc.
Note that you could think about using:
CREATE TABLE xyz_sequence
(
xyz SERIAL NOT NULL PRIMARY KEY
);
and using:
INSERT INTO xyz_sequence VALUES(0);
and then retrieving the inserted value — in Informix ESQL/C, you'd use sqlca.sqlerrd[1], in other languages, other techniques. You can also delete the newly inserted record, or even all the records in the table. You can afford to ignore errors from the DELETE statement; sooner or later, the rows will be deleted. The next value inserted will continue where the prior ones left off.
In a stored procedure, you'd use DBINFO('sqlca.sqlerrd1') to get the inserted value. You'd use DBINFO('bigserial') to get the value if you use a BIGSERIAL type.
I found out possible answer in this question update with return value instead of doing it with select it seems better to return value directly from update as update use locks it should be more safer even when you use multithreading application. But these are just my assumptions. Hopefully it will help someone.

Ordered DELETE of records in self-referencing table

I need to delete a subset of records from a self referencing table. The subset will always be self contained (that is, records will only have references to other records in the subset being deleted, not to any records that will still exist when the statement is complete).
My understanding is that this might cause an error if one of the records is deleted before the record referencing it is deleted.
First question: does postgres do this operation one-record-at-a-time, or as a whole transaction? Maybe I don't have to worry about this problem?
Second question: is the order of deletion of records consistent or predictable?
I am obviously able to write specific SQL to delete these records without any errors, but my ultimate goal is to write a regression test to show the next person after me why I wrote it that way. I want to set up the test data in such a way that a simplistic delete statement will consistently fail because of the records referencing the same table. That way if someone else messes with the SQL later, they'll get notified by the test suite that I wrote it that way for a reason.
Anyone have any insight?
EDIT: just to clarify, I'm not trying to work out how to delete the records safely (that's simple enough). I'm trying to figure out what set of circumstances will cause such a DELETE statement to consistently fail.
EDIT 2: Abbreviated answer for future readers: this is not a problem. By default, postgres checks the constraints at the end of each statement (not per-record, not per-transaction). Confirmed in the docs here: http://www.postgresql.org/docs/current/static/sql-set-constraints.html And by the SQLFiddle here: http://sqlfiddle.com/#!15/11b8d/1
In standard SQL, and I believe PostgreSQL follows this, each statement should be processed "as if" all changes occur at the same time, in parallel.
So the following code works:
CREATE TABLE T (ID1 int not null primary key,ID2 int not null references T(ID1));
INSERT INTO T(ID1,ID2) VALUES (1,2),(2,1),(3,3);
DELETE FROM T WHERE ID2 in (1,2);
Where we've got circular references involved in both the INSERT and the DELETE, and yet it works just fine.
fiddle
A single DELETE with a WHERE clause matching a set of records will delete those records in an implementation-defined order. This order may change based on query planner decisions, statistics, etc. No ordering guarantees are made. Just like SELECT without ORDER BY. The DELETE executes in its own transaction if not wrapped in an explicit transaction, so it'll succeed or fail as a unit.
To force order of deletion in PostgreSQL you must do one DELETE per record. You can wrap them in an explicit transaction to reduce the overhead of doing this and to make sure they all happen or none happen.
PostgreSQL can check foreign keys at three different points:
The default, NOT DEFERRABLE: checks for each row as the row is inserted/updated/deleted
DEFERRABLE INITIALLY IMMEDIATE: Same, but affected by SET CONSTRAINTS DEFERRED to instead check at end of transaction / SET CONSTRAINTS IMMEDIATE
DEFERRABLE INITIALLY DEFERRED: checks all rows at the end of the transaction
In your case, I'd define your FOREIGN KEY constraint as DEFERRABLE INITIALLY IMMEDIATE, and do a SET CONSTRAINTS DEFERRED before deleting.
(Actually if I vaguely recall correctly, despite the name IMMEDIATE, DEFERRABLE INITIALLY IMMEDIATE actually runs the check at the end of the statement instead of the default of after each row change. So if you delete the whole set in a single DELETE the checks will then succeed. I'll need to double check).
(The mildly insane meaning of DEFERRABLE is IIRC defined by the SQL standard, along with gems like a TIMESTAMP WITH TIME ZONE that doesn't have a time zone).
If you issue a single DELETE that affects multiple records (like delete from x where id>100), that will be handled as a single transaction and either all will succeed or fail. If multiple DELETEs, you have to put them in a transaction yourself.
There will be problems. If you have a constraint with DELETE CASCADE, you might delete more than you want with a single DELETE. If you don't, the integrity check might stop you from deleting. Constraints other than NO ACTION are not deferrable, so you'd have to disable the constraint before delete and enable it afterwards (basically drop/create, which might be slow).
If you have multiple DELETEs, then the order is as the DELETE statements are sent. If a single DELETE, the database will delete in the order it happens to find them (index, oids, something else...).
So I would also suggest thinking about the logic and maybe handling the deletes differently. Can you elaborate more on the actual logic? A tree in database?
1) It will do as transaction if enclosed within "BEGIN/COMMIT". Otherwise in general no.
For more see http://www.postgresql.org/docs/current/static/tutorial-transactions.html
The answer in general to your question depends on how is self-referencing implemented.
If it is within application logic, it is solely your responsibility to check the things yourself.
Otherwise, it is in general possible to restrict or cascade deletes for rows with foreign keys and DELETE CASCADE . However, as far as PG docs go, I understand we are talking about referencing columns in other tables, not sure if same-table foreign keys are supported:
http://www.postgresql.org/docs/current/static/ddl-constraints.html#DDL-CONSTRAINTS-FK
2) In general, the order of deletion will be the order in which you issue delete statements. If you want them all to be "uninterruptible" with no other statements modifying table in between, you enclose them in a transaction.
As a warning, I may be wrong, but what you seem to be trying to do, must not be done. You should not have to rely on some esoteric "order of deletion" or some other undocumented and/or implicit features of database. The underlying logic does not seem sound, there should be another way.

SQL Server : Attempting to Insert a Duplicate Record Costs an Id

I have the following table set up:
Id int pk, unique not null
Name varchar(50) not null
Other columns not relevant to this issue
With an index set up on Name to be unique and non-clustered.
The setup does EXACTLY what I want - that is, only insert new rows whose Name doesn't already exist in the table, and throw an error if the new row is a duplicate Name.
I might be nit-picky about it, but every attempt to add a duplicate will cause SQL Server to skip the next Id that would have been assigned, had the new row been a non-duplicate Name.
Is there a way to prevent this with some setting, without the need to query for existence first before deciding to insert or deny?
No, there is no setting to prevent the identity value from incrementing on a failed insert.
Like you suggest, you can mitigate this by checking for a duplicate before performing the insert - I would do this not just to keep the identity from incrementing, but also to keep your Sql Server from raising errors as a standard procedure.
However, there may be other exceptional circumstances that would cause an insert to fail... so if gaps in the Ids pose more than an aesthetic problem, an identity column might not be the best solution for what you're trying to solve.

Limit column value to 0 or to be unique on insert

I have a table where a int column should either be set to zero or to a value which does not already exist in the table. Can I prevent inserting non zerod duplicated values in such column with a CHECK CONSTRAINT or should I use a BEFORE INSERT trigger? in case I could do this with both, what design is better?
From the .NET windows forms application we are using a global transaction scope to wrap the save and in both cases I would like the insert to fail and the transaction to roll back completely so I don't know if I should put the rollback inside the trigger, that's why I would rather try with a check if possible.
Database: SQL 2008
Thanks.
See the link in Andriy M's comment, it mention a 2008 new concept : filtered index...
CREATE UNIQUE INDEX indexName ON tableName(columns) INCLUDE includeColumns WHERE columnName != 0
This will create an index of unique items that are not 0.
Any attempt to insert a duplicate non-zero value will breach the uniqueness of the index and cause an error.
why are you using zero instead of null.? If you had it as null then the db would handle it for you easily via a nullable unique constraint..
Check constraint, when used properly, prevent bad data. They do not change the bad data to good. For that reason, I would aim for a trigger instead. If you can get around the need for a 0 as NULL, you could use a unique constraint, but supplying the answer would be the job of a trigger regardless.

Is there a disadvantage to blindly using INSERT in MySQL?

Often I want to add a value to a table or update the value if its key already exists. This can be accomplished in several ways, assuming a primary or unique key is set on the 'user_id' and 'pref_key' columns in the example:
1. Blind insert, update if receiving a duplicate key error:
// Try to insert as a new value
INSERT INTO my_prefs
(user_id, pref_key, pref_value)
VALUES (1234, 'show_help', 'true');
// If a duplicate-key error occurs run an update query
UPDATE my_prefs
SET pref_value = 'true'
WHERE user_id=1234 AND pref_key='show_help';
2. Check for existence, then select or update:
// Check for existence
SELECT COUNT(*)
FROM my_prefs
WHERE user_id=1234 AND pref_key='show_help';
// If count is zero, insert
INSERT INTO my_prefs
(user_id, pref_key, pref_value)
VALUES (1234, 'show_help', 'true');
// If count is one, update
UPDATE my_prefs
SET pref_value = 'true'
WHERE user_id=1234 AND pref_key='show_help';
The first way seems to be preferable as it will require only one query for new inserts and two for an update, where as the second way will always require two queries. Is there anything I'm missing though that would make it a bad idea to blindly insert?
have a look at the ON DUPLICATE KEY syntax in http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
SELECT ...
[ ON DUPLICATE KEY UPDATE col_name=expr, ... ]
There is the third MySQL way, which would be the preferred one in that RDBMS
INSERT INTO my_prefs
(user_id, pref_key, pref_value)
VALUES (1234, 'show_help', 'true')
ON DUPLICATE KEY
UPDATE pref_value = 'true'
Personally I am never a fan of exception based programming (expecting an exception in the normal operation of an application) and to me the second example is much more readable/maintainable.
There are situations where this would make a difference (very tight loops for example) but I think there should be a good reason to write code like this rather than it being the default.
If you want to avoid "the exception" by perhaps inserting a doublette and you want to use standard SQL (and your programming language / database returns the count of the updated rows) then use the following "SQL" - commands (pseudo-code):
int i = SQL("UPDATE my_prefs ...");
if(i==0) {
SQL("INSERT INTO my_prefs ...");
}
This also takes in account that - for the most use cases - updates do occur more often than inserts.
Will there be concurrent INSERTs to these rows? DELETEs?
"ON DUPLICATE" sounds great (the behavior is just what you want) provided that you're not concerned about portability to non-MySQL databases.
The "blind insert" seems reasonable and robust provided that rows are never deleted. (If the INSERT case fails because the row exists, the UPDATE afterward should succeed because the row still exists. But this assumption is false if rows are deleted - you'd need retry logic then.) On other databases without "ON DUPLICATE", you might consider an optimization if you find latency to be bad: you could avoid a database round trip in the already-exists case by putting this logic in a stored procedure.
The "check for existence" is tricky to get right if there are concurrent INSERTs. Rows could be added between your SELECT and your UPDATE. Transactions won't even really help - I think even at isolation level "serializable", you'll see "could not serialize access due to concurrent update" errors occasionally (or whatever the MySQL equivalent error message is). You'll need retry logic, so I'd say the person above who suggests using this method to avoid "exception-based programming" is wrong, as is the person who suggests doing the UPDATE first for the same reason.
You may be able to use REPLACE instead, or if using a more current MySQL you get the option of using "INSERT ... ON DUPLICATE KEY UPDATE"
The fact that several people brought this up in quick succession says "always check the MySQL docs" when you have an issue, as they're decent and in many cases, leads directly to the solution.
The first way is the preferred way as far as I know.
In your DAO model you could have an id field.
If set to null / -1 / whatever, the data hasn't been persisted.
When you persist it (or retrieve from database), set it to the id value in the database.
Your persist method can check the ID and pass it onto the update() or add() implementation.
Flaws: Getting out of sync with the database, etc. I'm sure there are more, but I really should get some work done...
So long as you're using MySQL, you can use the ON DUPLICATE keyword. For example:
INSERT INTO my_prefs (user_id, pref_key, pref_value) VALUES (1234, 'show_help', 'true')
ON DUPLICATE KEY UPDATE (pref_key, pref_value) VALUES ('show_help', 'true');