Suppose I create a table of tasks.
CREATE TABLE todos (
id UUID DEFAULT gen_random_uuid PRIMARY KEY,
text STRING(1000)
)
An end-user creates some new todos (which don't yet have an id/UUID) and updates others, and we'd like to be able to update or insert those tasks (update if they have an id, insert otherwise). We could create or update one by one, using the right query.
-- $1 is "some task"
INSERT INTO todos (text)
VALUES ($1)
RETURNING id
-- $1 is a UUID, $2 is "some task"
UPDATE todos
SET text = $2
WHERE id = $1
Can "update or insert" be performed in one query? Or better, in a single query that works with a batch of todos?
UPSERT and INSERT INTO ... ON CONFLICT don't seem to be the right approach, because they try to create first, whereas I'd like to try to update first otherwise insert (to generate id).
I'm using CockroachDB, Go, and the pgx driver.
How would you typically handle this situation?
That can never work, because you have no way to identify a task. An automatically generated id is useful as an artificial primary key, that is, it prevents identical copies, but that artificial identifier has no meaning.
If you want to be able to update a task, you have to be able to identify it somehow. Since all you have is the task, all you can do is to have a unique constraint on task to identify when the same task comes a second time. Then you can use task in the ON CONFLICT clause.
However, it seems that task is not so much an identifier as a command line. In that case, you need to change your data model to have something like a task_name that uniquely identifies an individual task. If that task_name is never allowed to change, you might as well make it your primary key and do away with the id.
Related
Let me describe my scenario here.
I am having a table with multiple records, where the name is the same, as it's gonna be records for the same person updated on daily basis.
Right now, I am trying to find out the easiest way to update all the names accordingly.
Name is going to be updated (TestName -> RealName)
I want this change to be applied to all the records with the same name, in this case, "TestName"
I can do a single query, but I am trying to find if there's an automatic way to do this correctly.
Been trying using a triggers, but in most cases, I am ending with an infinite loop, as I am trying to update the table, where a trigger is actually bound to, so it's invoking another update and so on.
I don't need an exact solution, just give me some ropes about how it can be achieved, please.
The problem may be simply resolved by using the function pg_trigger_depth() in the trigger, e.g.:
create trigger before_update_on_my_table
before update on my_table
for each row
when (pg_trigger_depth() = 0) -- this prevents recursion
execute procedure before_update_on_my_table();
However, it seems that the table is poorly designed. It should not contain names. Create a table with names (say user_name) and in the old table store a reference to the new one, e.g.:
create table user_name(id serial primary key, name text);
create table my_table(id serial primary key, user_id int references user_name(id));
You can use event triggers in postgresql https://www.postgresql.org/docs/9.3/sql-createeventtrigger.html
I am creating related tables in SQLite and am wondering what the most efficient way to make them relate to each other is.
CREATE TABLE cards_name (id INTEGER PRIMARY KEY, name TEXT, rarity TEXT);
CREATE TABLE card_story (id INTEGER PRIMARY KEY, name_id INTEGER, story TEXT);
I have already entered some data for the first table and I was wondering how to add data to the second table without having to look up what the INTEGER PRIMARY KEY is every time (perhaps by using the cards name??)
26|Armorsmith|Rare
27|Auchenai Soulpriest|Rare
28|Avenging Wrath|Epic
29|Bane of Doom|Epic
For instance, I would like to enter the story of Armorsmith as "She accepts guild funds for repairs!" into story TEXT by using her name(Armorsmith) instead of ID(26).
Thanks
The task you are describing should be taken care of on the application level, not on database level.
You can create a GUI where you can select the name of a card, but the underlying value sent back to the database is the card's id and that gets stored in the story table establishing the relationship between the card and the story.
I would like to enter the story of Armorsmith as "She accepts guild funds for repairs!" into story TEXT by using her name(Armorsmith) instead of ID(26).
You can insert into one table from another table. Instead of hard coding the values, you can get them from a select. So long as the rows returned by the select match the rows needed by the insert it'll work.
insert into cards_story
(name_id, story)
select id, :story
from cards_name
where name = :name
The insert needs an ID and a story. The select returns ids and we've added our own text field for the story.
This statement would be executed with two parameters, one containing the text of the story, and one containing the name of the person. So you might write something like this (the exact details depend on your programming language and SQL interface library).
sql.execute(
name: "Armorsmith",
story: "She accepts guild funds for repairs!"
)
Is the equivalent of:
insert into cards_story
(name_id, story)
select id, 'She accepts guild funds for repairs!'
from cards_name
where name = 'Armorsmith'
Note that you'll want to make a few changes to your schema...
Declare name unique else you might get multiple cards for one name.
Like name TEXT UNIQUE.
Since you're looking up cards by name, you probably want to prevent there being multiple cards with the same name. That's just complexity you don't need to deal with.
Declare your foreign keys.
Like name_id INTEGER REFERENCES cards_name(id).
This has multiple benefits. One is keys are automatically indexed, so looking up stories by name_id will be faster.
The other is it enforces "referential integrity" which is a fancy way of saying it makes sure that every story has a name to go with it. If you try to delete a card_name it will balk unless the card_story is deleted first. You can also use things like on delete cascade to do the cleanup for you.
However, SQLite does not have foreign keys on by default. You have to turn them on. It's a very good idea to do so.
I am writing a program that recovers structured data as individual records from a (damaged) file and collects the results into a sqlite database.
The program is invoked several times with slightly different recovery parameters. That leads to recovering often the same, but sometimes different data from the file.
Now, every time I run my program with different parameters, it's supposed to add just the newly (different) found items to the same database.
That means that I need a fast way to tell if each recovered record is already present in the DB or not, in order to add them only if they're not existing in the DB yet.
I understand that for each record I want to add, I could first do a SELECT for all columns to see if there is already a matching record in the DB, and only add the new one if no same is found.
But since I'm adding 10000s of records, doing a SELECT for each of these records feels pretty inefficient (slow) to me.
I wonder if there's a smarter way to handle this? I.e, is there a way I can tell sqlite that I do not want duplicate entries, and so it automatically detects and rejects them? I know about the UNIQUE modifier, but that's not it because it applies to single columns only, doesn't it? I'd need to be able to say that the combination of COL1+COL2+COL3 must be unique. Is there a way to do that?
Note: I never want to update any existing records. I only want to collect a set of different records.
Bonus part - performance
In a classic programming language, I'd use a key-value dictionary where the key is the sum of all a record's values. Similarly, I could calculate a Hash code for each added record and look that hash code up first. If there's no match, then the record is surely not in the DB yet; If there is a match I'd still have to search the DB for any duplicates. That'd surely be faster already, but I still wonder if sqlite can make this more efficient.
Try:
sqlite> create table foo (
...> a int,
...> b int,
...> unique(a, b)
...> );
sqlite>
sqlite> insert into foo values(1, 2);
sqlite> insert into foo values(2, 1);
sqlite> insert into foo values(1, 2);
Error: columns a, b are not unique
sqlite>
You could use UNIQUE column constraint or to declare a multiple columns unique constraint you can use UNIQUE () ON CONFLICT :
CREATE TABLE name ( id int , UNIQUE (col_name1 type , col_name2 type) ON CONFLICT IGNORE )
SQLite has two ways of expressing uniqueness constraints: PRIMARY KEY and UNIQUE. Both of them create an index and so the lookup happens through the created index.
If you do not want to use an SQL approach (as mentioned in other answers) you can do a select for all your data when the program starts, store the data in a dictionary and work with the dictionary do decide which records to insert to your DB.
The benefit of this approach is the single select is much faster than many small selects.
The disadvantage is that it won't work well if you don't have enough memory to store your data in.
Let's say I have a table defined as follows:
CREATE TABLE SomeTable
(
P_Id int PRIMARY KEY IDENTITY,
CompoundKey varchar(255) NOT NULL,
)
CompoundKey is a string with the primary key P_Id concatenated to the end, like Foo00000001 which comes from "Foo" + 00000001. At the moment, entries insertions into this table happen in 2 steps.
Insert a dummy record with a place holder string for CompoundKey.
Update the CompoundKey with the column with the generated compound key.
I'm looking for a way to avoid the 2nd update entirely and do it all with one insert statement. Is this possible? I'm using MS SQL Server 2005.
p.s. I agree that this is not the most sensible schema in the world, and this schema will be refactored (and properly normalized) but I'm unable to make changes to the schema for now.
Your could use a computed column; change the schema to read:
CREATE TABLE SomeTable
(
P_Id int PRIMARY KEY IDENTITY,
CompoundKeyPrefix varchar(255) NOT NULL,
CompoundKey AS CompoundKeyPrefix + CAST(P_Id AS VARCHAR(10))
)
This way, SQL Server will automagically give you your compound key in a new column, and will automatically maintain it for you. You may also want to look into the PERSIST keyword for computed columns which will cause SQL Server to materialise the value in the data files rather than having to compute it on the fly. You can also add an index against the column should you so wish.
A trigger would easily accomplish this
This is simply not possible.
The "next ID" doesn't exist and thus cannot be read to fulfill the UPDATE until the row is inserted.
Now, if you were sourcing your autonumbers from somwhere else you could, but I don't think that's a good answer to your question.
Even if you want to use triggers, an UPDATE is still executed even if you don't manually execute it.
You can obscure the population of the CompoundKey, but at the end of the day it's still going to be an UPDATE
I think your safest bet is just to make sure the UPDATE is in the same transaction as the INSERT or use a trigger. But, for the academic argument of it, an UPDATE still occurs.
Two things:
1) if you end up using two inserts, you must use transaction! Otherwise other processes may see the database in inconsistent state (i.e. seeing record without CompoundKey).
2) I would refrain from trying to paste the Id to the end of CompoundKey in transaction, trigger etc. It is much cleaner to do it at the output if you need it, e.g. in queries (select concat(CompoundKey, Id) as CompoundKeyId ...). If you need it as a foreign key in other tables, just use the pair (CompoundKey, Id).
Earlier today I asked this question which arose from A- My poor planning and B- My complete disregard for the practice of normalizing databases. I spent the last 8 hours reading about normalizing databases and the finer points of JOIN and worked my way through the SQLZoo.com tutorials.
I am enlightened. I understand the purpose of database normalization and how it can suit me. Except that I'm not entirely sure how to execute that vision from a procedural standpoint.
Here's my old vision: 1 table called "files" that held, let's say, a file id and a file url and appropos grade levels for that file.
New vision!: 1 table for "files", 1 table for "grades", and a junction table to mediate.
But that's not my problem. This is a really basic Q that I'm sure has an obvious answer- When I create a record in "files", it gets assigned the incremented primary key automatically (file_id). However, from now on I'm going to need to write that file_id to the other tables as well. Because I don't assign that id manually, how do I know what it is?
If I upload text.doc and it gets file_id 123, how do I know it got 123 in order to write it to "grades" and the junction table? I can't do a max(file_id) because if you have concurrent users, you might nab a different id. I just don't know how to get the file_id value without having manually assigned it.
You may want to use LAST_INSERT_ID() as in the following example:
START TRANSACTION;
INSERT INTO files (file_id, url) VALUES (NULL, 'text.doc');
INSERT INTO grades (file_id, grade) VALUES (LAST_INSERT_ID(), 'some-grade');
COMMIT;
The transaction ensures that the operation remains atomic: This guarantees that either both inserts complete successfully or none at all. This is optional, but it is recommended in order to maintain the integrity of the data.
For LAST_INSERT_ID(), the most
recently generated ID is maintained in
the server on a per-connection basis.
It is not changed by another client.
It is not even changed if you update
another AUTO_INCREMENT column with a
nonmagic value (that is, a value that
is not NULL and not 0).
Using
LAST_INSERT_ID() and AUTO_INCREMENT
columns simultaneously from multiple
clients is perfectly valid. Each
client will receive the last inserted
ID for the last statement that client
executed.
Source and further reading:
MySQL Reference: How to Get the Unique ID for the Last Inserted Row
MySQL Reference: START TRANSACTION, COMMIT, and ROLLBACK Syntax
In PHP to get the automatically generated ID of a MySQL record, use mysqli->insert_id property of your mysqli object.
How are you going to find the entry tomorrow, after your program has forgotten the value of last_insert_id()?
Using a surrogate key is fine, but your table still represents an entity, and you should be able to answer the question: what measurable properties define this particular entity? The set of these properties are the natural key of your table, and even if you use surrogate keys, such a natural key should always exist and you should use it to retrieve information from the table. Use the surrogate key to enforce referential integrity, for indexing purpuses and to make joins easier on the eye. But don't let them escape from the database