Does SQLite support SCOPE_IDENTITY? - sql

I'm trying to perform a simple INSERT and return the identity (auto-incrementing primary key). I've tried
cmd.CommandText = "INSERT INTO Prototype ( ParentID ) VALUES ( NULL ); SELECT SCOPE_IDENTITY();";
and I receive the following error
EnvironmentError: SQLite error
no such function: SCOPE_IDENTITY
Does SQLite support SCOPE_IDENTITY?
If so, how do I use it?
If not, what are my (preferably "thread-safe") alternatives?

If you're not using the C interface for programming and want to do the process from an SQL Command try: SELECT last_insert_rowid()
http://www.sqlite.org/lang_corefunc.html

Check out the FAQ. The sqlite3_last_insert_rowid() function will do it. Careful of triggers though.

The last_insert_rowid() results in the row id from the very LAST insert into ANY table. Definitely not thread-safe as mentioned in other answers.
If you absolutely need to make sure that you are getting the correct row id returned, regardless of threads, async etc (for example, if you intend to use the rowid as a foreign key in another table), here's a way:
insert a text column into the desired table (this will hold a GUID)
manually generate a GUID (using whatever libraries are available in your language) and hold it in memory
insert your data into the table along with the GUID you just generated
retrieve the (supposed) rowid via last_insert_rowid()
retrieve the row (or just the GUID) from your table using this rowid
compare the GUID from the retrieved row with the GUID you still have in memory
if they are the same, happy days, you have the correct rowid
if they are different, you need to query the table for a match to the GUID you have in memory for the correct rowid
Obviously this solution has considerable performance drawbacks and you need to make a judgement call on whether the risks of mismatched data outweigh the performance toll. In my case, I needed to be certain of the integrity of my data above all else so it was fine for performance to take the hit. (Just how big a hit, I can't say.)
You might get the urge to skip the 4th and 5th steps and get the rowid using your GUID; you might also get the urge to just use the GUID as the primary identifier and foreign key. After all these would make the code simpler and mean less columns in your table... RESIST THAT URGE. Integers as primary identifiers are easily indexable which makes them faster/more efficient in WHERE clauses and JOINS; a GUID or string just doesn't index as well as an integer.

Related

Informix select trigger to update column

Is it possible to increase the value of a number in a column with a trigger every time it gets selected? We have special tables where we store the new id and when we update it in the app, it tends to get conflicts before the update happens, even when it all takes less than a second. So I was wondering if it is not possible to set database to increase value after every select on that column? Do not ask me why we do not use autoincrement for ids because I do not know.
Informix provides the SERIAL and BIGSERIAL types (and also SERIAL8, but don't use that) which provide autoincrement support. It also provides SEQUENCES with more sophisticated autoincrements. You should aim to use one of those.
Trying to use a SELECT trigger to update the table being selected from is, at best, fraught with problems about transactions and the like (problems which both the types and sequences carefully avoid).
If your design team needs help making effective use of these, ask a new question outlining what you want to achieve.
Normally, the correct way to proceed is to make the ID column in each table that defines 'something' (the Orders table, the Customer table, …) into a SERIAL column and either not insert a value into the ID column or insert 0 into it. The generated value can be retrieved and used when creating auxilliary information — order items, etc.
Note that you could think about using:
CREATE TABLE xyz_sequence
(
xyz SERIAL NOT NULL PRIMARY KEY
);
and using:
INSERT INTO xyz_sequence VALUES(0);
and then retrieving the inserted value — in Informix ESQL/C, you'd use sqlca.sqlerrd[1], in other languages, other techniques. You can also delete the newly inserted record, or even all the records in the table. You can afford to ignore errors from the DELETE statement; sooner or later, the rows will be deleted. The next value inserted will continue where the prior ones left off.
In a stored procedure, you'd use DBINFO('sqlca.sqlerrd1') to get the inserted value. You'd use DBINFO('bigserial') to get the value if you use a BIGSERIAL type.
I found out possible answer in this question update with return value instead of doing it with select it seems better to return value directly from update as update use locks it should be more safer even when you use multithreading application. But these are just my assumptions. Hopefully it will help someone.

SQLite: Autoincrement and insert or ignore will produce unused Autoincrement Keys

I'm using a table Mail with auto-increment Id and Mail Address. The table is used in 4 other tables and it is mainly used to save storage (String is only saved once and not 4 times). I'm using INSERT OR IGNORE to just blindly add the mail addresses to the table and if it exists ignore the update. This approach is MUCH faster than checking the existence with SELECT ... and do an INSERT if needed.
For every INSERT OR IGNORE the auto-increment, no matter if ignored or done the auto-increment Id is incremented. I one run I have approx. 500k data sets to proceed. So after every run the the last auto-increment key is incremented by 500k. I know there are 2^63-1 possible keys, so a long time to use them all up.
I also tried INSERT OR REPLACE, but this will increment the Id of the dataset on every run of the command, so this is not a solution at all.
Is there a way to prevent this increase of auto-increment key on every INSERT OR IGNORE?
Table Mail Example (replaced with pseudo Addresses)
mIdMail mMail
"1" ""
"7" "mail1#example.com"
"15" "mail2#example.com"
"17" "mail3#example.com"
"19" "mail4#example.com"
"23" "mail5#example.com"
...
Insert Query (Using Java Lib: org.apache.commons.dbutils)
INSERT OR IGNORE
INTO MAIL
( mMail )
VALUES ( ? );
Table Definition
CREATE TABLE IF NOT EXISTS MAIL (
mIdMail INTEGER PRIMARY KEY AUTOINCREMENT,
mMail CHAR(90) UNIQUE
);
To get autoincrementing values without gaps, drop the AUTOINCREMENT keyword. (Yes, you get autoincrementing values even without it.)
Auto-increment keys behave the way they do specifically because the database guarantees their behavior -- regardless of concurrent transactions and transaction failures.
Auto-increment keys have two guarantees:
They are increasing, so later inserts have larger values than earlier ones.
They are guaranteed to be unique.
The mechanism for allocating the keys does not guarantee no gaps. Why not? Because no-gaps would incur a lot more overhead on the database. Basically, each transaction on the table would need to be completely serialized (that is completed and committed) before the next one can take place. Generally, that is a really bad idea from a performance perspective.
Unfortunately, SQLite doesn't have the simplest solution, which is simply to call row_number() on the auto-incremented keys. You could try to implement a gapless auto-increment using triggers, significantly slowing down your application.
My real suggestion is simply to live with the gaps. Accept them. Surrender. That is how the built-in method works, and for good reason. Now design the rest of the database/application keeping this in mind.
I had the same issue, and changing "INSERT OR IGNORE" into "INSERT OR FAIL" solved the problem, so now when it fails the id value doesn't increment.

How to prevent adding identical records to SQL database

I am writing a program that recovers structured data as individual records from a (damaged) file and collects the results into a sqlite database.
The program is invoked several times with slightly different recovery parameters. That leads to recovering often the same, but sometimes different data from the file.
Now, every time I run my program with different parameters, it's supposed to add just the newly (different) found items to the same database.
That means that I need a fast way to tell if each recovered record is already present in the DB or not, in order to add them only if they're not existing in the DB yet.
I understand that for each record I want to add, I could first do a SELECT for all columns to see if there is already a matching record in the DB, and only add the new one if no same is found.
But since I'm adding 10000s of records, doing a SELECT for each of these records feels pretty inefficient (slow) to me.
I wonder if there's a smarter way to handle this? I.e, is there a way I can tell sqlite that I do not want duplicate entries, and so it automatically detects and rejects them? I know about the UNIQUE modifier, but that's not it because it applies to single columns only, doesn't it? I'd need to be able to say that the combination of COL1+COL2+COL3 must be unique. Is there a way to do that?
Note: I never want to update any existing records. I only want to collect a set of different records.
Bonus part - performance
In a classic programming language, I'd use a key-value dictionary where the key is the sum of all a record's values. Similarly, I could calculate a Hash code for each added record and look that hash code up first. If there's no match, then the record is surely not in the DB yet; If there is a match I'd still have to search the DB for any duplicates. That'd surely be faster already, but I still wonder if sqlite can make this more efficient.
Try:
sqlite> create table foo (
...> a int,
...> b int,
...> unique(a, b)
...> );
sqlite>
sqlite> insert into foo values(1, 2);
sqlite> insert into foo values(2, 1);
sqlite> insert into foo values(1, 2);
Error: columns a, b are not unique
sqlite>
You could use UNIQUE column constraint or to declare a multiple columns unique constraint you can use UNIQUE () ON CONFLICT :
CREATE TABLE name ( id int , UNIQUE (col_name1 type , col_name2 type) ON CONFLICT IGNORE )
SQLite has two ways of expressing uniqueness constraints: PRIMARY KEY and UNIQUE. Both of them create an index and so the lookup happens through the created index.
If you do not want to use an SQL approach (as mentioned in other answers) you can do a select for all your data when the program starts, store the data in a dictionary and work with the dictionary do decide which records to insert to your DB.
The benefit of this approach is the single select is much faster than many small selects.
The disadvantage is that it won't work well if you don't have enough memory to store your data in.

Arrays in database tables and normalization

Is it smart to keep arrays in table columns? More precisely I am thinking of the following schema which to my understanding violates normalization:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
Permissions varchar(max) not null default(''),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
and this:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
PermissionID int not null default(-1),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
UPD3: I envision Permissions as a comma-delimited string since MSSQL is our primary deployment target.
UPD: Forgot to mention, in the scope of this concrete question we will consider that the "fetch rows that have permission X" won't be performed, instead all the lookups will be made by GroupID and CategoryID only
UPD2: I envision the typical usage scenario as following:
int category_id=42;
int[] array_of_groups=new int[]{40,2,42};
if(!Permissions.Check(category_id, array_of_groups, Permission.EatAndDrink)) {
throw new StarveToDeathException();
}
Thoughts?
Thanks in advance!
I'd suggest to take the normalized road for the following reasons:
By having a table containing all possible permissions, you have self-documenting data. You may add a description to each permission. This definitely beats concatenated id values without any meaning.
You get all the advantages of referential integrity and can be sure that there are no bogus permission ids in your data.
Inserting and deleting permissions will be easier - you add or delete records. With the concatenated string you will be updating a column, and delete the record only when you remove the last permission.
Your design is future-proof - you say you only want to query by CategoryID and GroupID, you can do this already with normalized tables. On top of that, you will also for example be able to add other properties to your permissions, query by permission, etc.
Performance-wise, I think it will actually be faster to get a resultset of id's than having to parse a string to integers. To be measured with actual data and implementation...
Your second example should probably be:
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID,PermissionID)
Your first example would violate normal form (and string parsing might not be a good use of your processing time), but that doesn't mean it's necessarily wrong for your application. It really depends how you use the data.
Is it smart
Occasionally, it depends. I'd say it depends how narrowly you define the things being normalised.
If you can see no way in which a table with one row for each item would ever be useful then I'd suggest that the encapsulate-in-a-string might be considered.
In the example given, I'd want to be sure that executing a query to find all group/category combinations for a specified permission would not cause me a problem if I had to write a WHERE clause that used string pattern matching. Of course, if I never have to perform such a query then it's a moot point.
In general I'm happiest with this approach when the data being assembled thus has no significance in isolation: the data only makes sense when considered as a complete set. If there's a little more structure, say a list of data/value pairs, then formatting with XML or JSON can be useful.
If you're only querying by GroupID and/or CategoryID then there's nothing wrong with it. Normalizing would mean more tables, rows, and joins. So for large databases this can have a negative performance impact.
If you're absolutely certain you'll never need a query which processes Permissions, and it's only parsed by your application, there's nothing improper about this solution. It could also be preferable if you always want the complete set of permissions (i.e. you're not querying just to get part of the string, but always want all of its values).
The problem with the first implementation is that it doesn't actually use an array but a concatenated string.
This means that you won't easily be able to use the value stored in that string to perform set based queries such as finding all people with a specific permission or specific set of permissions.
If you were using a database that natively supported arrays as an atomic value such PostgreSQL then the argument would be different.
Based upon the second requirement of the proposed query I'd have to suggest the second one is best as you can simply query SELECT count(*) FROM Permissions WHERE CategoryID = 42 AND GroupID IN (40, 2, 42) AND PermissionID = 2 (assuming EatAndDrink has an ID of 2). The first version however would require retrieving all the permissions for each group and parsing the string before you can test if it includes the requested permission.

How can I get the Primary Key id of a file I just INSERTED?

Earlier today I asked this question which arose from A- My poor planning and B- My complete disregard for the practice of normalizing databases. I spent the last 8 hours reading about normalizing databases and the finer points of JOIN and worked my way through the SQLZoo.com tutorials.
I am enlightened. I understand the purpose of database normalization and how it can suit me. Except that I'm not entirely sure how to execute that vision from a procedural standpoint.
Here's my old vision: 1 table called "files" that held, let's say, a file id and a file url and appropos grade levels for that file.
New vision!: 1 table for "files", 1 table for "grades", and a junction table to mediate.
But that's not my problem. This is a really basic Q that I'm sure has an obvious answer- When I create a record in "files", it gets assigned the incremented primary key automatically (file_id). However, from now on I'm going to need to write that file_id to the other tables as well. Because I don't assign that id manually, how do I know what it is?
If I upload text.doc and it gets file_id 123, how do I know it got 123 in order to write it to "grades" and the junction table? I can't do a max(file_id) because if you have concurrent users, you might nab a different id. I just don't know how to get the file_id value without having manually assigned it.
You may want to use LAST_INSERT_ID() as in the following example:
START TRANSACTION;
INSERT INTO files (file_id, url) VALUES (NULL, 'text.doc');
INSERT INTO grades (file_id, grade) VALUES (LAST_INSERT_ID(), 'some-grade');
COMMIT;
The transaction ensures that the operation remains atomic: This guarantees that either both inserts complete successfully or none at all. This is optional, but it is recommended in order to maintain the integrity of the data.
For LAST_INSERT_ID(), the most
recently generated ID is maintained in
the server on a per-connection basis.
It is not changed by another client.
It is not even changed if you update
another AUTO_INCREMENT column with a
nonmagic value (that is, a value that
is not NULL and not 0).
Using
LAST_INSERT_ID() and AUTO_INCREMENT
columns simultaneously from multiple
clients is perfectly valid. Each
client will receive the last inserted
ID for the last statement that client
executed.
Source and further reading:
MySQL Reference: How to Get the Unique ID for the Last Inserted Row
MySQL Reference: START TRANSACTION, COMMIT, and ROLLBACK Syntax
In PHP to get the automatically generated ID of a MySQL record, use mysqli->insert_id property of your mysqli object.
How are you going to find the entry tomorrow, after your program has forgotten the value of last_insert_id()?
Using a surrogate key is fine, but your table still represents an entity, and you should be able to answer the question: what measurable properties define this particular entity? The set of these properties are the natural key of your table, and even if you use surrogate keys, such a natural key should always exist and you should use it to retrieve information from the table. Use the surrogate key to enforce referential integrity, for indexing purpuses and to make joins easier on the eye. But don't let them escape from the database