RDBMS: Generating a locally Unique Key for a Relational Table with Foreign Key - sql

Below is the schema of the database.
There are multiple screenplays each identified by a globally unique key.
Multiple scenes can exist for a screenplay and linked by foreign key.
The obvious choice for me for scene_id of Scene table was to use an auto-increment Integer field type which will ensure each scene in Scene table has a globally unique key across all the multiple screenplays.
Now, the query is:
What is the best way for generating scene_id for Scene table?
Isn't keeping a globally unique key an overkill when the
scene_id needs to be unique only within a particular Screenplay?
A sample table
+----------+------------+-----------------+------+
| Scene_Id | Scene_Name | Scrn_ID | |
+----------+------------+-----------------+------+
| 1 | | Opening Scene | 1001 |
| 2 | | Climax Scene | 1001 |
| 3 | | End Credits | 1001 |
| 1 | | Opening Scene 1 | 1002 |
| 2 | | Character Intro | 1002 |
| 3 | | Conflict | 1002 |
| 4 | | Climax Scene | 1002 |
+----------+------------+-----------------+------+

Using an automatically generated primary key is actually the simplest solution:
scene_id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY
There is very little overhead in this.
It would be much more complicated and expensive to use numbers that are relative to scm_id – see the many questions for such a feature on this forum.
Keep it simple!

Related

Select unique combination of values (attributes) based on user_id

I have a table that has user a user_id and a new record for each return reason for that user. As show here:
| user_id | return_reason |
|--------- |-------------- |
| 1 | broken |
| 2 | changed mind |
| 2 | overpriced |
| 3 | changed mind |
| 4 | changed mind |
What I would like to do is generate a foreign key for each combination of values that are applicable in a new table and apply that key to the user_id in a new table. Effectively creating a many to many relationship. The result would look like so:
Dimension Table ->
| reason_id | return_reason |
|----------- |--------------- |
| 1 | broken |
| 2 | changed mind |
| 2 | overpriced |
| 3 | changed mind |
Fact Table ->
| user_id | reason_id |
|--------- |----------- |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 3 |
My thought process is to iterate through the table with a cursor, but this seems like a standard problem and therefore has a more efficient way of doing this. Is there a specific name for this type of problem? I also thought about pivoting and unpivoting. But that didn't seem too clean either. Any help or reference to articles in how to process this is appreciated.
The problem concerns data normalization and relational integrity. Your concept doesn't really make sense - Dimension table shows two different reasons with same ID and Fact table loses a record. Conventional schema for this many-to-many relationship would be three tables like:
Users table (info about users and UserID is unique)
Reasons table (info about reasons and ReasonID is unique)
UserReasons junction table (associates users with reasons - your
existing table). Assuming user could associate with same reason
multiple times, probably also need ReturnDate and OrderID_FK fields
in UserReasons.
So, need to replace reason description in first table (UserReasons) with a ReasonID. Add a number long integer field ReasonID_FK in that table to hold ReasonID key.
To build Reasons table based on current data, use DISTINCT:
SELECT DISTINCT return_reason INTO Reasons FROM UserReasons
In new table, rename return_reason field to ReasonDescription and add an autonumber field ReasonID.
Now run UPDATE action to populate ReasonID_FK field in UserReasons.
UPDATE UserReasons INNER JOIN UserReasons.return_reason ON Reasons.ReasonDescription SET UserReasons.ReasonID_FK = Reasons.ReasonID
When all looks good, delete return_reason field.

Relationship model of an inventory control

I'm trying to create a mer diagram, for an inventory control with the following logic, I have 10 components of the same model (10 ACER 24 monitors, with the same characteristics.), The only thing that differentiates each of these components would be the serial number, so I had the following logic:
But I don't know if these relationships are correct, especially with the inventory part, would it be correct for me to add the serial number in the component entity, and create 10 inventory records?
I'm having a hard time choosing the best path for this logic that I described above.
In relational notation every relation must have a primary key, while in your example:
stock, component_has_component_category, user_has_components do not.
To solve it, add both fields in a primary key for each relation.
components relation on the other hand has a useless field id, which could be replaced with primary key component_id + serial_number, and so if a user has a component (refers by foreign key) relation user_has_components will have both fields component_id + serial_number which is logical, since the user has a specific component instance.
A hypothetical relation components_inventory would be similar, it has two fields component_id and serial_number which are both primary key and foreign key to components, denoting that a specific component with a serial number is present. (in some sense components_inventory is subset of components)
EDIT: My view of how it would look like data-wise:
component
| component_id | .. other stuff |
| 1 | .. |
| 2 | .. |
components (all existent components and serial numbers)
| component_id | serial_number |
| 1 | 101 |
| 1 | 102 |
| 1 | 103 |
| 1 | 104 |
| 2 | 201 |
| 2 | 202 |
user_has_component (refers to components)
| user_id | component_id | serial_number |
| mary | 1 | 101 |
| john | 1 | 104 |
| john | 2 | 202 |
category_inventory (refers to components, some components we have, but not
users)
| component_id | serial_number | location |
| 1 | 102 | warehouse New York |
| 1 | 103 | warehouse New York |
| 2 | 201 | warehouse Paris |

Which normal form or other formal rule does this database design choice violate?

The project I'm working on is an application that lets you design data entry forms, and automagically generates a schema in an underlying PostgreSQL database
to persist them as well as the browsing and editing UI.
The use case I've encountered this with is a store back-office database, but the app itself intends to be somewhat universal. The administrator creates the following entry forms with the given fields:
Customers
name (text box)
Items
name (text box)
stock (number field)
Order
customer (combo box selecting a customer)
order lines (a grid showing order lines)
OrderLine
item (combo box selecting an item)
count (number field)
When all this is done, the resulting database schema will be equivalent to this:
create table Customers(id serial primary key,
name varchar);
create table Items(id serial primary key,
name varchar,
stock integer);
create table Orders(id serial primary key);
create table OrderLines(id serial primary key,
count integer);
create table Links(id serial primary key,
fk1 integer references Customers.id,
fk2 integer references Items.id,
fk3 integer references Orders.id,
fk4 integer references OrderLines.id);
Links being a special table that stores all the relationships between entities; every row has (usually) two of the foreign keys set to a value, and the rest set to NULL. Whenever a new entry form is added to the application instance, a new foreign key referencing the table for this form is added to Links.
So, suppose our shop stocks some widgets, gizmos, and thingeys. A customer named Adam orders two widgets and three gizmos, and Betty orders four gizmos and five thingeys. The database will contain the following data:
Customers
/----+-------\
| ID | NAME |
| 1 | Adam |
| 2 | Betty |
\----+-------/
Items
/----+---------+-------\
| ID | NAME | STOCK |
| 1 | widget | 123 |
| 2 | gizmo | 456 |
| 3 | thingey | 789 |
\----+---------+-------/
Orders
/----\
| ID |
| 1 |
| 2 |
\----/
OrderLines
/----+-------\
| ID | COUNT |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
\----+-------/
Links
/----+------+------+------+------\
| ID | FK1 | FK2 | FK3 | FK4 |
| 1 | 1 | NULL | 1 | NULL |
| 2 | 2 | NULL | 2 | NULL |
| 3 | NULL | NULL | 1 | 1 |
| 4 | NULL | NULL | 1 | 2 |
| 5 | NULL | NULL | 2 | 3 |
| 6 | NULL | NULL | 2 | 4 |
| 7 | NULL | 1 | NULL | 1 |
| 8 | NULL | 2 | NULL | 2 |
| 9 | NULL | 2 | NULL | 3 |
| 10 | NULL | 3 | NULL | 4 |
\----+------+------+------+------/
(The tables also contain a bunch of timestamps for auditing and soft deletion but I don't think they're relevant here, they just make writing the SQL by the administrator that much messier. The management app is also used to implement a bunch of different use cases, but they're generally primarily data entry, master-detail views, and either scalar fields or selection boxes.)
When I've had to write a join through this thing I'd grumbled about it to my coworker, who replied "well using separate tables for each relationship is one way to do it, this is another..." Leaving aside the obvious-to-me ugliness of the above and the practical issues, I also have a nagging feeling this has to be a violation of some normal form, but it's been a while since college and I'm struggling to figure out which of the criteria apply here.
Is there something stronger "well that's just your opinion" I can use when critiquing this design?

Are there problems with this 'Soft Delete' solution using EAV tables?

I've read some information about the ugly side of just setting a deleted_at field in your tables to signify a row has been deleted.
Namely
http://richarddingwall.name/2009/11/20/the-trouble-with-soft-delete/
Are there any potential problems with taking a row from a table you want to delete and pivoting it into some EAV tables?
For instance.
Lets Say I have two tables deleted and deleted_row respectively described as follows.
mysql> describe deleted;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| tablename | varchar(255) | YES | | NULL | |
| deleted_at | timestamp | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
mysql> describe deleted_rows;
+--------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| entity | int(11) | YES | MUL | NULL | |
| name | varchar(255) | YES | | NULL | |
| value | blob | YES | | NULL | |
+--------+--------------+------+-----+---------+----------------+
Now when you wanted to delete a row from any table you would delete it from the table then insert it into these tables as such.
deleted
+----+-----------+---------------------+
| id | tablename | deleted_at |
+----+-----------+---------------------+
| 1 | products | 2011-03-23 00:00:00 |
+----+-----------+---------------------+
deleted_row
+----+--------+-------------+-------------------------------+
| id | entity | name | value |
+----+--------+-------------+-------------------------------+
| 1 | 1 | Title | A Great Product |
| 2 | 1 | Price | 55.00 |
| 3 | 1 | Description | You guessed it... it's great. |
+----+--------+-------------+-------------------------------+
A few things I see off the bat.
You'll need to use application logic
to do the pivot (Ruby, PHP, Python,
etc)
The table could grow pretty big
because I'm using blob to handle
the unknown size of the row value
Do you see any other glaring problems with this type of soft delete?
Why not mirror your tables with archive tables?
create table mytable(
col_1 int
,col_2 varchar(100)
,col_3 date
,primary key(col_1)
)
create table mytable_deleted(
delete_id int not null auto_increment
,delete_dtm datetime not null
-- All of the original columns
,col_1 int
,col_2 varchar(100)
,col_3 date
,index(col_1)
,primary key(delete_id)
)
And then simply add on-delete-triggers on your tables that inserts the current row in the mirrored table before the deletion? That would provide you with dead-simple and very performant solution.
You could actually generate the tables and trigger code using the data dictionary.
Note that I might not want to have a unique index on the original primary key (col_1) in the archive table, because you may actually end up deleting the same row twice over time if you are using natural keys. Unless you plan to hook up the archive tables in your application (for undo purposes) you can drop the index entirely. Also, I added the time of delete (deleted_dtm) and a surrogate key that can be used to delete the deleted (hehe) rows.
You may also consider range partitioning the archive table on deleted_dtm. This makes it pretty much effortless to purge data from the tables.

Flag column or foreign key?

I have ENTERPRISES and DOMAINS table. The property of each enterprise is that it should have a single primary domain, but it can have more than one domain. I have come up with this table structure
+---------------------------------------+
| ENTERPRISES |
+----+--------------+-------------------+
| ID | Name | Primary Domain ID |
+----+--------------+-------------------+
| 1 | Enterprise A | 2 |
| 2 | Enterprise B | 4 |
+----+--------------+-------------------+
+---------------------------------------+
| DOMAINS |
+----+------------------+---------------+
| ID | Domain Name | Enterprise ID |
+----+------------------+---------------+
| 1 | ent-a.com | 1 |
| 2 | enterprise-a.com | 1 |
| 3 | ent-b.com | 2 |
| 4 | enterprise-b.com | 2 |
+----+------------------+---------------+
My co-worker suggested this alternative structure:
+-------------------+
| ENTERPRISES |
+----+--------------+
| ID | Name |
+----+--------------+
| 1 | Enterprise A |
| 2 | Enterprise B |
+----+--------------+
+----------------------------------------------------+
| DOMAINS |
+----+------------------+---------------+------------+
| ID | Domain Name | Enterprise ID | Is Primary |
+----+------------------+---------------+------------+
| 1 | ent-a.com | 1 | False |
| 2 | enterprise-a.com | 1 | True |
| 3 | ent-b.com | 2 | False |
| 4 | enterprise-b.com | 2 | True |
+----+------------------+---------------+------------+
My question is, which one is more efficient/correct?
Also, in the first example should I use ID for primary domain column or a string value, so ENTERPRISES table does not have a circular dependency on DOMAINS table?
Both are correct. But go for the FK.
The one you suggest has less sparse data, while in the second example you may have 100 domains belonging to the same company, all with IsPrimary set to False and just one domain set to True.
Also, it's easier to enforce exactly one primary domain in the first scenario, while in the second you'll have to write a trigger or a check in your code to see that there is one, and only one, primary domain at all times.
Again, stick to the FK.
Circular references are OK. Circular dependencies are not. As long as Primary Domain ID is nullable, then you're fine. Otherwise you'll have a chicken-or-the-egg scenario, being unable to create a Domain without an Enterprise, but also unable to create an Enterprise without a Primary Domain ID.
I would choose the former (your proposed solution), because you're defining a one-to-one relationship. While the Enterprise->Domain relationship is one-to-many, the Enterprise->Primary Domain relationship is one-to-one.
In the first model you say an Enterprise should have a single primary domain. Expand that a moment and say it will have a single primary domain. At this point you'd be inclined to mark that column as not nullable.
The problem then is you won't be able to insert data since you've created a circular depedancy. You can't insert an enterprise without a domain and you can't insert a domain without an enterprise.
I prefer the first model as it is cleaner and more explicit. Your model enforces that there is a single primary domain where there is nothing in the second model so you'd be forced to enforce this rule using some other mechanism.