Generate and Insert surrogate keys into already existing BigQuery table - sql

I have an existing table without any unique ID. I'm planning to generate surrogate keys using GENERATE_UUID() statement however I'm sure not how to insert this new column... What is the best option here?

One way is to use CREATE OR REPLACE TABLE ... AS SELECT
CREATE OR REPLACE TABLE table_a
AS SELECT GENERATE_UUID() uuid, * FROM table_a
The drawback is:
The metadata of the table is lost (table options, column descriptions etc.)
Nullability of the columns is lost, all columns becomes NULLABLE
If both are acceptable, then approach above is the simplest way
If not, then you need to add a column through UI or API, then do
UPDATE table_a
SET uuid = GENERATE_UUID()
WHERE uuid IS NULL

Related

Create a table with a foreign key referencing to a temporary table generated by a query

I need to create a table having a field, which is a foreign key referencing to another query rather than existing table. E.g. the following statement is correct:
CREATE TABLE T1 (ID1 varchar(255) references Types)
but this one throws a syntax error:
CREATE TABLE T2 (ID2 varchar(255) references SELECT ID FROM BaseTypes UNION SELECT ID FROM Types)
I cannot figure out how I can achieve my goal. In the case it’s needed to introduce a temporary table, how can I force this table being updated each time when tables BaseTypes and Types are changed?
I am using Firebird DB and IBExpert management tool.
A foreign key constraint (references) can only reference a table (or more specifically columns in the primary or unique key of a table). You can't use it to reference a select.
If you want to do that, you need to use a CHECK constraint, but that constraint would only be checked on insert and updates: it wouldn't prevent other changes (eg to the tables in your select) from making the constraint invalid while the data is at rest. This means that at insert time the value could meet the constraint, but the constraint could - unnoticed! - become invalid. You would only notice this when updating the row.
An example of the CHECK-constraint could be:
CREATE TABLE T2 (
ID2 varchar(255) check (exists(
SELECT ID FROM BaseTypes WHERE BaseTypes.ID = ID2
UNION
SELECT ID FROM Types WHERE Types.ID = ID2))
)
For a working example, see this fiddle.
Alternatively, if your goal is to 'unite' two tables, define a 'super'-table that contains the primary keys of both tables, and reference that table from the foreign key constraint. You could populate and update (eg insert and delete) this table using triggers. Or you could use a single table, and replace the existing views with an updatable view (if this is possible depends on the exact data, eg IDs shouldn't overlap).
This is more complex, but would give you the benefit that the foreign key is also enforced 'at rest'.

PostgreSQL Calculated Column with values of another table referenced by foreign key

I'm currently working on a simple dummy project to refresh my knowledge on SQL and to learn a few new things :)
I have a table Article with the columns:
aID, price
I have another table Storage:
sID, aID, count
The Storage table references the aID as a foreign key and the count column say how much of an article is stored.
Now I want to add a column value to my Storage table. This column should be calculated by Article.price * Storage.count.
I found after searching the web that you can have calculated columns like this
CREATE TABLE tbl
(
int1 INT,
int2 INT,
product BIGINT GENERATED ALWAYS AS (int1 * int2) STORED
);
But I haven't found an example how to this with columns from another table.
What do I have to do in order to use the price from the referenced aID in the calculation?
You cannot define a generated column based on values from other tables. Per the documentation:
The generation expression can refer to other columns in the table, but not other generated columns. Any functions and operators used must be immutable. References to other tables are not allowed.
You can achieve the expected behavior by creating two triggers on both tables but usually creating a view based on the tables is a simpler and more efficient solution.

RDBMS primary key design for row versioning

I want to design primary key for my table with row versioning. My table contains 2 main fields : ID and Timestamp, and bunch of other fields. For a unique "ID" , I want to store previous versions of a record. Hence I am creating primary key for the table to be combination of ID and timestamp fields.
Hence to see all the versions of a particular ID, I can give,
Select * from table_name where ID=<ID_value>
To return the most recent version of a ID, I can use
Select * from table_name where ID=<ID_value> ORDER BY timestamp desc
and get the first element.
My question here is, will this query be efficient and run in O(1) instead of scanning the entire table to get all entries matching same ID considering ID field was a part of primary key fields? Ideally to get a result in O(1), I should have provided the entire primary key. If it does need to do entire table scan, then how else can I design my primary key so that I get this request done in O(1)?
Thanks,
Sriram
The canonical reference on this subject is Effective Timestamping in Databases:
https://www.cs.arizona.edu/~rts/pubs/VLDBJ99.pdf
I usually design with a subset of this paper's recommendations, using a table containing a primary key only, with another referencing table that has that key as well change_user, valid_from and valid_until colums with appropriate defaults. This makes referential integrity easy, as well as future value insertion and history retention. Index as appropriate, and consider check constraints or triggers to prevent overlaps and gaps if you expose these fields to the application for direct modification. These have an obvious performance overhead.
We then make a "current values view" which is exposed to developers, and is also insertable via an "instead of" trigger.
It's far easier and better to use the History Table pattern for this.
create table foo (
foo_id int primary key,
name text
);
create table foo_history (
foo_id int,
version int,
name text,
operation char(1) check ( operation in ('u','d') ),
modified_at timestamp,
modified_by text
primary key (foo_id, version)
);
Create a trigger to copy a foo row to foo_history on update or delete.
https://wiki.postgresql.org/wiki/Audit_trigger_91plus for a full example with postgres

Is there a smart way to append a number to an PK identity column in a Relational database w/o total catastrophe?

It's far from the ideal situation, but I need to fix a database by appending the number "1" to the PK Identiy column which has FK relations to four other tables. I'm basically making a four digit number a five digit number. I need to maintain the relations. I could store the number in a var, do a Set query and append the 1, and do that for each table...
Is there a better way of doing this?
You say you are using an identity data type for your primary key so before you update the numbers you will have to SET IDENTITY_INSERT ON (documentation here) and then turn it off again after the update.
As long as you have cascading updates set for your relations the other tables should be updated automatically.
EDIT: As it's not possible to change an identity value I guess you have to export the data, set the new identity values (+10000) and then import your data again.
Anyone have a better suggestion...
Consider adding another field to the PK instead of extending the length of the PK field. Your new field will have to cascade to the related tables, like a field length increase would, but you get to retain your original PK values.
My suggestion is:
Stop writing to the tables.
Copy the tables to new tables with the new PK.
Rename the old tables to backup names.
Rename the new tables to the original table name.
Count the rows in all the tables and double check your work.
Continue using the tables.
Changing a PK after the fact is not fun.
If the column in question has an identity property on it, it gets complicated. This is more-or-less how I'd do it:
Back up your database.
Put it in single user mode. You don't need anybody mucking around whilst you do the surgery.
Execute the ALTER TABLE statements necessary to
disable the primary key constraint on the table in question
disable all triggers on the table in question
disable all foreign key constraints referencing the table in question.
Clone your table, giving it a new name and a column-for-column identical definitions. Don't bother with any triggers, indices, foreign keys or other constraints. Omit the identity property from the table's definition.
Create a new 'map' table that will map your old id values to the new value:
create table dbo.pk_map
(
old_id int not null primary key clustered ,
new_id int not null unique nonclustered ,
)
Populate the map table:
insert dbo.pk_map
select old_id = old.id ,
new_id = f( old.id ) // f(x) is the desired transform
from dbo.tableInQuestion old
Populate your new table, giving the primary key column the new value:
insert dbo.tableInQuestion_NEW
select id = map.id ,
...
from dbo.tableInQuestion old
join dbo.pk_map map on map.old_id = old.id
Truncate the original table: TRUNCATE dbo.tableInQuestion. This should work—safely—since you've disabled all the triggers and foreign key constraints.
Execute SET IDENTITY_INSERT dbo.tableInQuestion ON.
Reload the original table:
insert dbo.tableInQuestion
select *
from dbo.tableInQuestion_NEW
Execute SET IDENTITY_INSERT dbo.tableInQuestion OFF
Execute drop table dbo.tableInQuestion_NEW. We're all done with it.
Execute DBCC CHECKIDENT( dbo.tableInQuestion , reseed ) to get the identity counter back in sync with the data in the table.
Now, use the map table to propagate the changed primary key column down the line. Depending on your E-R model, this can get complicated as foreign keys referencing the updated column may themselves be part of a composite primary key.
When you're all done, start re-enabling the constraints and triggers you disabled. Make sure you do this using the WITH CHECK option. Fix any problems thus uncovered.
Finally, drop the map table, and clear the single user flag and bring your system(s) back online.
Piece of cake! (or something.)
Consider this approach:
Reset the identity seed to the 10000 + the current seed.
Set identity insert on
Insert into the table from the values in the table and add 10000 to the identity column on the way.
EX:
Set identity insert on
Insert Table(identity, column1, eolumn2)
select identity + 10000, column1, column2
From Table
Where identity < origional max identity value
After the insert you know the identity is exactly 10000 more than the origional.
Update the foreign keys by addding 10000.

How to insert duplicate rows in SQLite with a unique ID?

This seems simple enough: I want to duplicate a row in a SQLite table:
INSERT INTO table SELECT * FROM table WHERE rowId=5;
If there were no explicit unique column declarations, the statement would work, but the table's first column is declared rowID INTEGER NOT NULL PRIMARY KEY. Is there any way to create a simple statement like the one above that works without knowing the schema of the table (aside from the first column)?
This can be done using * syntax without having to know the schema of the table (other than the name of the primary key). The trick is to create a temporary table using the "CREATE TABLE AS" syntax.
In this example I assume that there is an existing, populated, table called "src" with an INTEGER PRIMARY KEY called "id", as well as several other columns. To duplicate the rows of "src", use the following SQL in SQLite3:
CREATE TEMPORARY TABLE tmp AS SELECT * FROM src;
UPDATE tmp SET id = NULL;
INSERT INTO src SELECT * FROM tmp;
DROP TABLE tmp;
The above example duplicates all rows of the table "src". To only duplicate a desired row, simply add a WHERE clause to the first line. This example works because the table "tmp" has no primary key constraint, but "src" does. Inserting NULL primary keys into src causes them to be given auto-generated values.
From the sqlite documentation: http://www.sqlite.org/lang_createtable.html
A "CREATE TABLE ... AS SELECT" statement creates and populates a database table based on the results of a SELECT statement. A table created using CREATE TABLE AS has no PRIMARY KEY and no constraints of any kind.
If you want to get really fancy, you can add a trigger that updates a third table which maps old primary keys to newly generated primary keys.
No. You need to know the schema of the table to write the insert statement properly.
You need to be able to write the statement in the form of:
insert into Table (column1, column2, column3)
select column1, column2, column3
from OtherTable
where rowId = 5
Well, since I was unable to do this the way I wanted, I resorted to using the implicit row id, which handily enough has the same name as the rowId column I defined explicitly, so now I can use the query I had in the question, and it will insert all the data with a new rowId. To keep the rest of the program working, I just changed SELECT * FROM table to SELECT rowId,* FROM table and everything's fine.
Absolutely no way to do this. Primary Key declaration implies this field is unique. You can't have a non unique PK. There is no way to create a row with existing PK in the same table.