SQL migration - move data to another table and get the primary key - sql

Problem
Suppose a table like this:
CREATE TABLE parent (
parent_id INTEGER PRIMARY KEY,
col1 REAL,
col2 REAL
);
Then the requirements of the system change and parent need to have the information col1 and col2 at two different points in time. One possible design would be the creation of a separate table for col1 and col2:
CREATE TABLE child (
child_id INTEGER PRIMARY KEY,
col1 REAL,
col2 REAL
);
CREATE TABLE parent (
parent_id INTEGER PRIMARY KEY,
current_child INTEGER,
previous_child INTEGER,
FOREIGN KEY (current_child) REFERENCES child (child_id),
FOREIGN KEY (previous_child) REFERENCES child (child_id)
);
Questions
Is there a way to create a migration SQL script to move the data from the original parent table to child and set the correct foreign key at current_child?
previous_child could stay empty initially, this is not a problem.
I'm not concerned yet about removing the columns col1 and col2 from parent.
I'm avoiding to use auxiliary scripts in other languages (like in Python) to make the migration process simpler.
Is there a better design alternative?
I could have simply added two new columns previous_col1 and previous_col2, but I have many more columns in the real scenario.
PS.: It is probably relevant to know that I'm using SQLite.
Update
#Felix.leg asked an excellent question and I noticed that I was stuck with the idea of always using automatically generated PK values for both tables. The truth is that child_id could be defined from parent_id:
current_child = 2 * parent_id - 1
previous_child = 2 * parent_id
With this approach, I believe it would be possible to write a single SQL migration script to create the new table, move the data, and set the right foreign key values.

Related

sqlite text as primary key vs autoincrement integers

I'm currently debating between two strategies to using a text column as a key.
The first one is to simply use the text column itself as a key, as such:
create table a(
key_a text primary key,
)
create table b(
key_b text primary key,
)
create table c(
key_a text,
key_b text,
foreign key("key_a") references a("key_a"),
foreign key("key_b") references b("key_b")
)
I'm concerned that this would result in every key being duplicated, once in a and b and another in c, since text isn't stored inline.
My second approach is to use an autoincrement id on the first two tables as a primary key, and use those ids on table c to refer to them, as such:
create table a(
id_a integer,
key_a text unique,
primary key("id_a" autoincrement)
)
create table b(
id_b integer,
key_b text unique,
primary key("id_a" autoincrement)
)
create table c(
id_a integer,
id_b integer,
foreign key("id_a") references a("id_a"),
foreign key("id_b") references b("id_b")
)
Am I right to be concerned about text duplication in the first case? Or does sqlite somehow intern these and just use an id for both, akin to what the second strategy does?
SQLite does not automatically compress text. So the answer to your question is "no".
Should you use text or an auto-incrementing id as the primary key? This can be a complex question. But happily, the answer is that it doesn't make much difference. That said, there are some considerations:
Integers are of fixed length. In general, fix length keys are slightly more efficient in B-tree indexes than variable length keys.
If the strings are short (like 1 or 2 or 3 characters), then they may be shorter -- or no longer -- than integers.
If you change the string (say, if it is originally misspelled), then using an "artificial" primary key makes this easy: just change the value in one table. Using the string itself as a key can result in lots of updates to lots of tables.
Am I right to be concerned about text duplication in the first case?
Or does sqlite somehow intern these and just use an id for both, akin
to what the second strategy does?
Yes, you are right to be concerned. The text will be duplicated.
Also, even if you did not define an integer primary key in your 1st approach, there is one.
From Rowid Tables:
The PRIMARY KEY of a rowid table (if there is one) is usually not the
true primary key for the table, in the sense that it is not the unique
key used by the underlying B-tree storage engine. The exception to
this rule is when the rowid table declares an INTEGER PRIMARY KEY. In
the exception, the INTEGER PRIMARY KEY becomes an alias for the rowid.
The true primary key for a rowid table (the value that is used as the
key to look up rows in the underlying B-tree storage engine) is the
rowid.
In your 2nd approach actually you are not creating a new column in each of the tables a and b by defining an integer primary key.
What you are doing is aliasing the existing rowid column:
id_a becomes the alias of rowid of the table a
id_b becomes the alias of rowid of the table b.
So, defining these integer primary keys is not more expensive in terms of space in the parent tables.
Although with your 1st approach you can avoid explicit updates in the child tables when you update a value in the parent tables by defining the foreign keys with ON UPDATE CASCADE, your 2nd approach is what I would suggest.
An integer primary key with a value assigned to it by the system and you don't even have to know or worry about it is common practice.
All you have to do is use that primary key and its corresponding foreign keys in the queries that you create to access the parent tables when you want to fetch from them the text values.
For performance (also it is a good db practice) you should stick to numeric/int value for the Primary Key.
As for the second approach, I'm not getting the concept you are after. Could you elaborate more on this?

Simple database table design structure

I have a situation while database designing, A simple issue but needed a working suggestions
My database tables:
TableAees.
TableBees.
Aees can mapped/contain one or more records of table Bees or also can be without any Bees
Aees can also mapped with one or more records of table Aees itself
Here normal primary key and foreign key relationship/hierarchy won't solve the purpose and also worried that parent/child hierarchy may end up in forming a loop between tables and can give a duplicates records on various joins.
Need a better table mapping for above mentioned tables(a,b) which will satisfy 1 and 2 points.
So to avoid such a situation, how the table relationship/hierarchy will be a better approach?
Database used: SQL Server
Thanks for sharing your knowledge.
You seem to describe a many-to-many relationship. If so, you would create a thrid table to store that relationship, like so:
create table a (
a_id int primary key,
...
);
create table b (
b_id int primary key,
...
);
create table ab (
a_id int references a(a_id),
b_id int references b(b_id),
primary key (a_id, b_id)
)
Each a/b tuple is stored on a separate row in bridge table ab.

Sql combine value of two columns as primary key

I have a SQL server table on which I insert account wise data. Same account number should not be repeated on the same day but can be repeated if the date changes.
The customer retrieves the data based on the date and account number.
In short the date + account number is unique and should not be duplicate.
As both are different fields should I concatenate both and create a third field as primary key or there is option of having a primary key on the merge value.
Please guide with the optimum way.
You can create a composite primary key. When you create the table, you can do this sort of thing in SQL Server;
CREATE TABLE TableName (
Field1 varchar(20),
Field2 INT,
PRIMARY KEY (Field1, Field2))
Take a look at this question which helps with each flavour of SQL
How can I define a composite primary key in SQL?
PLEASE HAVE A LOOK, IT WILL CLEAR MOST OF THE DOUBTS !
We can state 2 or more columns combined as a primary key.
In that case every column included in primary key will be called : Composite Key
And mind you Composite keys can never be null !!
Now, first let me show you how to make 2 or more columns as primary key.
create table table_name ( col1 type, col2 type, primary key(col1, col2));
The benefit is :
col1 has value (X) and col2 has value (Y) then no other row can have col1 as (X) and col2 as (Y).
col1, col2 must have some values, they can't be null !!
HOPE THIS HELPS !
Not at all. Just use a primary key constraint:
alter table t add constraint pk_accountnumber_date primary key (accountnumber, date)
You can also include this in the create table statement.
I might suggest, however, that you use an auto-incrementing/identity/serial primary key -- a unique number for each row. Then declare the account number/date combination as a unique key. I prefer such synthetic primary keys for several reasons:
They make it easy to refer to a row in foreign key relationships.
They show the insert order into the table, so you can readily see the last inserted rows.
They make it simple to identify a single row for updates and deletes.
They hide the "id" information of the row from referring tables and applications.
The alternative is to have a PK which is an autoincrementing number and then put a unique unique index on the natural key. In this way uniqueness is preserved but you have the fastest possible joining to any child tables. If the table will not ever have child tables, the composite PK is a good idea. If there will be many child tables, this is could be a better choice.

RDBMS primary key design for row versioning

I want to design primary key for my table with row versioning. My table contains 2 main fields : ID and Timestamp, and bunch of other fields. For a unique "ID" , I want to store previous versions of a record. Hence I am creating primary key for the table to be combination of ID and timestamp fields.
Hence to see all the versions of a particular ID, I can give,
Select * from table_name where ID=<ID_value>
To return the most recent version of a ID, I can use
Select * from table_name where ID=<ID_value> ORDER BY timestamp desc
and get the first element.
My question here is, will this query be efficient and run in O(1) instead of scanning the entire table to get all entries matching same ID considering ID field was a part of primary key fields? Ideally to get a result in O(1), I should have provided the entire primary key. If it does need to do entire table scan, then how else can I design my primary key so that I get this request done in O(1)?
Thanks,
Sriram
The canonical reference on this subject is Effective Timestamping in Databases:
https://www.cs.arizona.edu/~rts/pubs/VLDBJ99.pdf
I usually design with a subset of this paper's recommendations, using a table containing a primary key only, with another referencing table that has that key as well change_user, valid_from and valid_until colums with appropriate defaults. This makes referential integrity easy, as well as future value insertion and history retention. Index as appropriate, and consider check constraints or triggers to prevent overlaps and gaps if you expose these fields to the application for direct modification. These have an obvious performance overhead.
We then make a "current values view" which is exposed to developers, and is also insertable via an "instead of" trigger.
It's far easier and better to use the History Table pattern for this.
create table foo (
foo_id int primary key,
name text
);
create table foo_history (
foo_id int,
version int,
name text,
operation char(1) check ( operation in ('u','d') ),
modified_at timestamp,
modified_by text
primary key (foo_id, version)
);
Create a trigger to copy a foo row to foo_history on update or delete.
https://wiki.postgresql.org/wiki/Audit_trigger_91plus for a full example with postgres

Storing arbitrary attributes on tables

I have 3 tables, x, y, and z. I want to be able to attach arbitrary
attributes to each row in each table. x, y, and z have nothing in
common other than the fact that they all have an integer primary key called
id and should be able to have arbitrary attributes attached to them.
Is it better to make a single attributes table, like
create table attributes (
table enum('x', 'y', 'z'),
xyz_id integer,
name varchar(50),
value text,
primary key (table, xyz_id, name)
);
Or is it best to make separate tables, like
create table x_attributes (
x_id integer,
name varchar(50),
value text,
primary key (x_id, name),
foreign key (x_id) references x (id)
);
create table y_attributes (...);
create table z_attributes (...);
The second option (separate tables) seems to be cleaner, but requires a lot
more boilerplate on both the database side and the application side.
I'm also open to suggestions other than those two.
Note: I've considered the possibility of using a document store like MongoDB, but
the data I'm working with is fundamentally relational.
Go with one table with an enum column, it will make grabbing all of the attributes for each row easier in the long run.