How to construct an sqlite table that assign and returns IDs to any name? - sql

I would like to have an sqlite table that maps names into unique IDs. I can create this table in the following way:
CREATE TABLE name_to_id (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)
With a select statement I can get the row containing a needed name and get from this row the corresponding ID.
The problem appears if I try to get ID for a name that is not yet in the table. The expected behavior in this case is that the new name will be added and its newly generated ID will be returned. I have two possible solutions/implementations of that.
The first solution is trivial:
We check if name is in the table.
If not we insert a row with the name.
We select the row with the name and read the needed ID from that row.
I do not like this solution because it can happen that the first process checks if the name in the table, it sees that the name is not there, meanwhile another process adds the name to the table and then the first process tries to add the same name.
The second solution seems to be better:
For any name we use insert if not exist.
We select from the table the row containing the name and get its ID.
Is the second solution optimal or there are better solutions?

The normal way to avoid duplicate entries in a table is to create an unique constraint. The database will then check for you if the record is already there and fail if so. That should be the best in terms of reliability and performance.
Next, the SQLite FAQ suggests to use the function last_insert_rowid() to fetch the ID instead of running a second query. This is actually the first question of the FAQ at all ;)

In pseudocode, the first solution looks like this:
cursor = db.execute("SELECT id FROM name_to_id WHERE name = ?", name)
if cursor.has_some_row:
id = cursor["id"]
else:
db.execute("INSERT INTO name_to_id(name) VALUES(?)", name)
id = db.last_insert_rowid
and the second like this:
db.execute("INSERT OR IGNORE INTO name_to_id(name) VALUES(?)", name)
cursor = db.execute("SELECT id FROM name_to_id WHERE name = ?", name)
id = cursor["id"]
The first solution requires a transaction around both commands, but this would be a good idea for the second solution, too, to avoid the overhead of multiple implicit transactions.
The second solution requires a unique constaint on name, but this would be a good idea for the first solution, too, for correctness and to speed up the name lookups.
Both solution use two SQL statements, and have similar speed.
(The second searches the row two times, but that data is cached.)
So there isn't anything obvious that makes one better that the other.

Related

SQL or statement vs multiple select queries

I'm having a table with an id and a name.
I'm getting a list of id's and i need their names.
In my knowledge i have two options.
Create a forloop in my code which executes:
SELECT name from table where id=x
where x is always a number.
or I'm write a single query like this:
SELECT name from table where id=1 OR id=2 OR id=3
The list of id's and names is enormous so i think you wouldn't want that.
The problem of id's is the id is not always a number but a random generated id containting numbers and characters. So talking about ranges is not a solution.
I'm asking this in a performance point of view.
What's a nice solution for this problem?
SQLite has limits on the size of a query, so if there is no known upper limit on the number of IDs, you cannot use a single query.
When you are reading multiple rows (note: IN (1, 2, 3) is easier than many ORs), you don't know to which ID a name belongs unless you also SELECT that, or sort the results by the ID.
There should be no noticeable difference in performance; SQLite is an embedded database without client/server communication overhead, and the query does not need to be parsed again if you use a prepared statement.
A "nice" solution is using the INoperator:
SELECT name from table where id in (1,2,3)
Also, the IN operator is syntactic sugar built for exactly this purpose..
SELECT name from table where id IN (1,2,3,4,5,6.....)
Hoping that you are getting the list of ID's on which you have to perform a query for names as input temp table #InputIDTable,
SELECT name from table WHERE ID IN (SELECT id from #InputIDTable)

which is faster select+ update or delete+insert in sql?

I just got stuck in a problem, where there are two ways of solving this.
Let me first explain the case,
I have a DB table consisting of some columns say id, name, address, priority. Here name and address is not unique but name + address + priority is unique.
Input provided to me is name and list of addresses. Now, what I have to do is to arrange name and address in the same order as given in input in my DB table.
There are two ways of solving:
selecting on the basis of name and address and make update queries for those data which are changed and execute them.
delete the data corresponding to name and address from table and insert the data with new priority.
I know that one update is faster than delete + insert but here in this case there is one select query too.
My intuition is that 1st method will be more fast but I don't have any technical details about it.
Am I missing something?

can I insert a copy of a row from table T into table T without listing its columns and without primary key error?

I want to do something like this:
INSERT INTO T SELECT * FROM T WHERE Column1 = 'MagicValue' -- (multiple rows may be affected)
The problem is that T has a primary key column and so this causes an error as if trying to set the primary key. And frankly, I don't want to set the primary key either. I want to create entirely new rows with new primary keys but the rest of the fields being copied over from the original rows.
This is supposed to be generic code applicable to various tables. Well, so if there is no nice way of doing this, I will just write code to dynamically extract column names, construct the list etc. But maybe there is? Am I the first guy trying to create duplicate rows in a database or something?
I'm assuming by "Primary Key" you mean identity or guid data types that auto-assign or auto-increment.
Without some very fancy dynamic SQL, you can't do what you are after. If you want to insert everything but the identity field, you need to specify fields.
If you want to specify a value for that field, you need to specify all the fields in the SELECT and in the INSERT AND turn on IDENTITY_INSERT.
You don't gain anything from duplicating a row in a database (considering you didn't try to set the Primary Key). It would be wiser and will avoid problem to have another column called "amount" or something.
something like
UPDATE T SET Amount = Amount + 1 WHERE Column1 = 'MagicValue'
or if it can increase by more than 1 like amount of returned fields
Update T SET Amount = Amount * 2 WHERE Column1 = 'MagicValue'
I'm not sure what you're trying to do exactly but if the above doesn't work for what you're doing I think your design requires a new table and insert it there.
EDIT: Also as mentioned under your comments, a generic insert doesn't really make sense. Imagine, for this to work, you need the same number of fields, and they will hold the same values suggesting that they should also have the same names(even if it wouldn't require it to). It would basically be the same table structure twice.

How are these tasks done in SQL?

I have a table, and there is no column which stores a field of when the record/row was added. How can I get the latest entry into this table? There would be two cases in this:
Loop through entire table and get the largest ID, if a numeric ID is being used as the identifier. But this would be very inefficient for a large table.
If a random string is being used as the identifier (which is probably very, very bad practise), then this would require more thinking (I personally have no idea other than my first point above).
If I have one field in each row of my table which is numeric, and I want to add it up to get a total (so row 1 has a field which is 3, row 2 has a field which is 7, I want to add all these up and return the total), how would this be done?
Thanks
1) If the id is incremental, "select max(id) as latest from mytable". If a random string was used, there should still be an incremental numeric primary key in addition. Add it. There is no reason not to have one, and databases are optimized to use such a primary key for relations.
2) "select sum(mynumfield) as total from mytable"
for the last thing use a SUM()
SELECT SUM(OrderPrice) AS OrderTotal FROM Orders
assuming they are all in the same column.
Your first question is a bit unclear, but if you want to know when a row was inserted (or updated), then the only way is to record the time when the insert/update occurs. Typically, you use a DEFAULT constraint for inserts and a trigger for updates.
If you want to know the maximum value (which may not necessarily be the last inserted row) then use MAX, as others have said:
SELECT MAX(SomeColumn) FROM dbo.SomeTable
If the column is indexed, MSSQL does not need to read the whole table to answer this query.
For the second question, just do this:
SELECT SUM(SomeColumn) FROM dbo.SomeTable
You might want to look into some SQL books and tutorials to pick up the basic syntax.

Universal SQL construct to retrieve the last row inserted

What would be the correct universal SQL construct to get the last row inserted (or it's primary key). The ID might be autogenerated by a sequence but I do not want to deal with the sequence at all! I need to get the ID by querying the table. Alternatively, INSERT might be somehow extended to return the ID. Assume I am always inserting a single row. The solution should work with most RDBMS!
the best way is to depend on the sequence like:
select Max(ID) from tableName
but If you don't want to deal with it, you can add new timestamp column to your table and then select max from that column.
like this way
select Max(TimestampField) from tableName