Find primary key in star schema table

Find primary key in star schema table - sql

I have encountered a situation while developing a star schema. I have a table like this
Name
Email
amy
amy#gmail.com
jess
amy#gmail.com
I want to find the key column as foreign key for Fact table as you can see there is a duplication of records if look individually but unique if consider both column as a key column
Your help will be highly regarded

I am assuming from your question that the table is a dimension? If that is the case then it should have a surrogate key as its PK (as every dimension table should) and you use this to join to fact tables.

Related

Which column for foreign key: id or any other column and why?

TL;DR
Should a foreign key always refer to the id column of another table? Why or why not? Is there a standard rule for this?
Is there a cost associated with using any other unique column other than id column for foreign key? Performance / storage? How significant? Is it frowned in the industry?
Example: this is the schema for my sample problem:
In my schema sometimes I use id column as the foreign key and sometimes use some other data column.
In vehicle_detail table I use a unique size column as a foreign key from vehicle_size table and unique color column as the foreign key from vehicle_color table.
But in vehicle_user I used the user_identifier_id as a foreign key which refers to id primary key column in user_identifier table.
Which is the correct way?
On a side note, I don't have id columns for the garage_level, garage_spaceid, vehicle_garage_status and vehicle_parking_status tables because they only have one column which is the primary key and the data they store is just at most 15 rows in each table and it is probably never going to change. Should I still have an id column in those ?

A foreign key has to target a primary key or unique constraint. It is normal to reference the primary key, because you typically want to reference an individual row in another table, and the primary key is the identifier of a table row.
From a technical point of view, it does not matter whether a foreign key references the primary key or another unique constraint, because in PostgreSQL both are implemented in the same way, using a unique index.
As to your concrete examples, there is nothing wrong with having the unique size column of vehicle_size be the target of a foreign key, although it begs the question why you didn't make size the primary key and omit the id column altogether. There is no need for each table to have an id column that is the automatically generated numeric primary key, except that there may be ORMs and other software that expect that.

A foreign key is basically a column of a different table(it is always of a different table, since that is the role it serves). It is used to join/ get data from a different table. Think of it like say school is a database and there are many different table for different aspects of student.
say by using Admission number 1234, from accounts table you can get the fees and sports table you can get the sports he play.
Now there is no rule that foreign key should be id column, you can keep it whatever you want. But,to use foreign key you should have a matching column in both tables therefore usually id column is only used. As I stated in the above example the only common thing in say sports table and accounts table would be admission number.
admn_no | sports |
+---------+------------+
| 1234 | basketball
+---------+---------+
| admn_no | fees |
+---------+---------+
| 1234 | 1000000 |
+---------+---------+
Now say using the query\
select * from accounts join sports using (admn_no);
you will get:
+---------+---------+------------+
| admn_no | fees | sports |
+---------+---------+------------+
| 1234 | 1000000 | basketball |
+---------+---------+------------+
PS: sorry for bad formatting

A foreign key is a field or a column that is used to establish a link between two tables. A FOREIGN KEY is a column (or collection of columns) in one table, that refers to the PRIMARY KEY in another table.
There is no rule that it should refer to a id column but the column it refers to should be the primary key. In real scenarios, it usually refers to Id column as in most cases it is the primary key in the tables.

OP question is about "correct way".
I will try to provide some kind of summary from existing comments and answers, general DO and general DONT for FKs.
What was already said
A. "A foreign key has to target a primary key or unique constraint"
Literally from Laurenz Albe answer and it was noted in comments
B. "stick with whatever you think will change the least"
It was noted by Adrian Klavier in comments.
Notes
There is no such general rule that PK or unique constraint must be defined on a single column.
So the question title itself must be corrected: "Which column(s) for foreign key: id or any other column(s) and why?"
Let's talk about "why".
Why: General DO, general DONT and an advice
Is there a cost associated with using any other unique column other than id column for foreign key? Performance / storage? How significant? Is it frowned in the industry?
General DO: Analyze requirements, use logic, use math (arithmetics is enough usually). There is no a single database design that's always good for all cases. Always ask yourself: "Can it be improved?". Never be content with design of existing FKs, if requirements changed or DBMS changed or storage options changed - revise design.
General DONT: Don't think that there is a single correct rule for all cases. Don't think: "if that worked in that database/table than it will work for this case too".
Let me illustrate this points with a common example.
Example: PK on id uuid field
We look into database and see a table has a unique constraint on two fields of types integer (4 bytes) + date (4 bytes)
Additionally: this table has a field id of uuid type (16 bytes)
PK is defined on id
All FKs from other tables are targeting id field
It this a correct design or not?
Case A. Common case - not OK
Let's use math:
unique constraint on int+date: it's 4+4=8 bytes
data is never changed
so it's a good candidate for primary key in this table
and nothing prevents to use it for foreign keys in related tables
So it looks like additional 16 bytes per each row + indexes costs is a mistake.
And that's a very common mistake especially in combination of MSSQL + CLUSTERED indexes on random uuids
Is it always a mistake?
No.
Consider latter cases.
Case B. Distributed system - OK
Suppose that you have a distributed system:
ServerA, ServerB, ServerC are sources of data
HeadServer - is data aggregator
data on serverA-ServerC could be duplicated: the same record could exists on several instances
aggregated data must not have duplicates
data for related tables can come from different instances: data for table with PK from serverA and data for tables with FKs from serverB-serverC
you need to log from where each record is originated
In such case existence of PK on id uuid is justified:
unique constraint allows to deduplicate records
surrogate key allows related data come from different sources
Case C. 'id' is used to expose data through API - OK
Suppose that you have an API to access data for external consumers.
There is a good unique constraint on:
client_id: incrementing integer in range 1..100000
invoice_date: dates '20100101'..'20210901'
And a surrogate key on id with random uuids.
You can create external API in forms:
/server/invoice/{client_id}/{invoice_date}
/server/invoice/{id}
From security POV /{id} is superior by reasons:
it's impossible to deduce from one uuid value existence of other
it's easier to implement authorization system for entities of different types. E.g. entityA has natural key on int, entityB on bigint' and entityC on int+ byte+date`
In such case surrogate key not only justified but becames essential.
Afterword
I hope that I was clear in explanation of main correct principle: "There is no such thing as a universal correct principle".
An additional advice: avoid CASCADE UPDATE/DELETEs:
Although it depends on DBMS you use.
But in general :
"explicit is better than implicit"
CASCADEs rarely works as intended
when CASCADES works - usually they have performance problems
Thank you for your attention.
I hope this helps somebody.

Postgresql: Primary key for table with one column

Sometimes, there are certain tables in an application with only one column in each of them. Data of records within the respective columns are unique. Examples are: a table for country names, a table for product names (up to 60 characters long, say), a table for company codes (3 characters long and determined by the user), a table for address types (say, billing, delivery), etc.
For tables like these, as the records are unique and not null, the only column can be used as the primary key, technically speaking.
So my question is, is it good enough to use that column as the primary key for the table? Or, is it still desirable to add another column (country_id, product_id, company_id, addresstype_id) as the primary key for the table? Why?
Thanks in advance for any advice.

there is always a debate between using surrogate keys and composite keys as primary key. using composite primary keys always introduces some complexity to your database design so to your application.
think that you have another table which is needed to have direct relationship between your resulting table (billing table). For the composite key scenario you need to have 4 columns in your related table in order to connect with the billing table. On the other hand, if you use surrogate keys, you will have one identity column (simplicity) and you can create unique constraint on (country_id, product_id, company_id, addresstype_id)
but it is hard to say this approach is better then the other one because they both have Pros and Cons.
You can check This for more information

One Primary Key Value in many tables

This may seem like a simple question, but I am stumped:
I have created a database about cars (in Oracle SQL developer). I have amongst other tables a table called: Manufacturer and a table called Parentcompany.
Since some manufacturers are owned by bigger corporations, I will also show them in my database.
The parentcompany table is the "parent table" and the Manufacturer table the "child table".
for both I have created columns, each having their own Primary Key.
For some reason, when I inserted the values for my columns, I was able to use the same value for the primary key of Manufacturer and Parentcompany
The column: ManufacturerID is primary Key of Manufacturer. The value for this is: 'MBE'
The column: ParentcompanyID is primary key of Parentcompany. The value for this is 'MBE'
Both have the same value. Do I have a problem with the thinking logic?
Or do I just not understand how primary keys work?
Does a primary key only need to be unique in a table, and not the database?
I would appreciate it if someone shed light on the situation.

A primary key is unique for each table.
Have a look at this tutorial: SQL - Primary key
A primary key is a field in a table which uniquely identifies each
row/record in a database table. Primary keys must contain unique
values. A primary key column cannot have NULL values.
A table can have only one primary key, which may consist of single or
multiple fields. When multiple fields are used as a primary key, they
are called a composite key.
If a table has a primary key defined on any field(s), then you cannot
have two records having the same value of that field(s).

Primary key is table-unique. You can use same value of PI for every separate table in DB. Actually that often happens as PI often incremental number representing ID of a row: 1,2,3,4...
For your case more common implementation would be to have hierarchical table called Company, which would have fields: company_name and parent_company_name. In case company has a parent, in field parent_company_name it would have some value from field company_name.

There are several reasons why the same value in two different PKs might work out with no problems. In your case, it seems to flow naturally from the semantics of the data.
A row in the Manufacturers table and a row in the ParentCompany table both appear to refer to the same thing, namely a company. In that case, giving a company the same id in both tables is not only possible, but actually useful. It represents a 1 to 1 correspondence between manufacturers and parent companies without adding extra columns to serve as FKs.

Thanks for the quick answers!
I think I know what to do now. I will create a general company table, in which all companies will be stored. Then I will create, as I go along specific company tables like Manufacturer and parent company that reference a certain company in the company table.
To clarify, the only column I would put into the sub-company tables is a column with a foreign key referencing a column of the company table, yes?
For the primary key, I was just confused, because I hear so much about the key needing to be unique, and can't have the same value as another. So then this condition only goes for tables, not the whole database. Thanks for the clarification!

How does FOREIGN KEY references from one table to two table?

when I design a database for production, I got a problem.
I have 5 table's Product, SemiProduct, BOOM,BOOM_Details,Material.
Product(Code,Name,BOOM_Code,etc....)
SemiProduct(Code,Name,BOOM_Code,etc...)
BOOM(BOOM_Code,etc...)
BOOM_Details(ID,BOOM_Code,Component, Percent)
Material(Code,Name, etc..)
My data for each table is :
Product
{Pro1,Product1, BOOMPRO1}
{Pro2,Product2, BOOMPRO2}
-----------
SemiProduct
{SemiPro1,Semi Product1, BOOMSemi1}
{SemiPro2,Semi Product2, BOOMSemi2}
------------
BOOM
{BOOMPRO1}
{BOOMPRO2}
{BOOMSemi1}
{BOOMSemi2}
----------
Material
{M1,Material1}
{M2,Material12}
---------------
BOOM_Details
{1,BOOMPRO1,M1, 20}
{2,BOOMPRO1,M2, 25}
{3,BOOMPRO1,SemiPro1, 25}
{4,BOOMPRO1,SemiPro2, 30}
{5,BOOMPRO2,M1, 20}
{6,BOOMPRO2,M2, 25}
{7,BOOMPRO2,SemiPro1, 55}
The problem is col [Component] in [BOOM_Details].
It can be code of Material or Code of SemiProduct.
Please help me create FOREIGN KEY from Component REFERENCES to [Material].[Code] and [SemiProduct].[Code]
Or show me a different way to design the table with data like that.
Thank you very much !

This is a common SQL Anti-Pattern. You should change your table structures so that your Material and Code of SemiProduct tables have a column that is a foreign key to the BOOM_Details table. That way you can always join between the tables as needed.
So instead of a field in BOOM_Details that could point to either of the two tables, have a field in both of the tables that contains the primary key of the BOOM_Details table and is constrained by a foreign key constraint.

Foriegn key:- A FOREIGN KEY is a column which is used to join two tables. It points to a PRIMARY KEY in another table which we need to join. Here is the example for better understanding.
Read this for better understanding
Coming to the problem:- Here you need the foreign key between Boom_details and Material.So first you have to make sure that there is a primary key(ex:-id column with sequential no. id[1,2,3,4,5] or sequential alphabets[a,b,c,d] etc..) in the first table to which we have to point! Then add the foreign key column in your table from which we are pointing with same set of values as primary key in the first table.
EDIT:-
Here are the links for understanding how to use foreign key across multiple tables
LINK 1
LINK 2
Hope this helps!

Can I have 2 unique columns in the same table?

I have 2 tables:
roomtypes[id(PK),name,maxAdults...]
features(example: Internet in room, satelite tv)
Can both id and name field be unique in the same table in mysql MYISAM?
If the above is posible, I am thinking of changing the table to:
features[id(PK),name,roomtypeID] ==> features[id(PK),name,roomtypeNAME]
...because it is helping me not to do extra querying in presentation for features because the front end users can't handle with IDs.

Of course, you can make one of them PRIMARY and one UNIQUE. Or both UNIQUE. Or one PRIMARY and four UNIQUEs, if you like

Yes, you can define UNIQUE constraints to columns other than the primary key in order to ensure the data is unique between rows. This means that the value can only exist in that column once - any attempts to add duplicates will result in a unique constraint violation error.
I am thinking of changing the FEATURES table to features[id(PK), name, roomtypeNAME] because it is helping me not to do extra querying in presentation for features because the front end users can't handle with IDs.
There's two problems:
A unique constraint on the ROOM_TYPE_NAME wouldn't work - you'll have multiple instances of a given room type, and a unique constraint is designed to stop that.
Because of not using a foreign key to the ROOM_TYPES table, you risk getting values like "Double", "double", "dOUBle"
I recommend sticking with your original design for sake of your data; your application is what translates a room type into its respective ROOM_TYPE record while the UI makes it presentable.

I would hope so otherwise MySQL is not compliant with the SQL standard. You can only have one primary key but you can mark other columns as unique.
In SQL, this is achieved with:
create table tbl (
colpk char(10) primary key,
coluniq char(10) unique,
colother char(10)
);
There are other ways to do it (particularly with multi-part keys) but this is a simple solution.

Yes you can.
Also keep in mind that MySQL allow NULL values in unique columns, whereas a column that is a primary key cannot have a NULL value.

1 RoomType may have many Features
1 Feature may be assigned to many RoomTypes
So what type of relationship do i have? M:N ?
You have there a many-to-many relationship, which has to be represented by an extra table.
That relationship table will have 2 fields: the PK of RoomTypes and the PK of Features.
The PK of the relationship table will be made of those 2 fields.
If that's usefull, you can add extra fields like the Quantity.
I would like to encourage you to read about database Normalization, which is he process of creating a correct design for a relational database. You can Google for that, or look eventually here (there are plenty of books/web pages on this)

Thanks again for very helpful answers.
1 roomType may have many features
1 feature may be assigned to many roomTypes
So what type of relationship do i have? M:N ?
If yes the solution I see is changing table structure to
roomTypes[id,...,featuresIDs]
features[id(PK),name,roomtypeIDs] multiple roomTypesIDs separated with comma?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas