I have 5 tables that have the same structure and same columns: id, name, description. So I wonder what is the best way to design or to avoid having 5 tables that have the same columns:
Create a category table that will include my three common
columns and another column "enum" that will differentiate my categories
ex (city, country, continent, etc.)
Create a category table that will include my three common
columns and create the other five tables that will just include an
id.
Note that I would have an assocation table that should include the id of cities, id countries, id continents, etc. so i can display them into a report
Thank you for your advice.
The decision on how many tables to have under these circumstances simply depends.
The most important factor is whether the five things are independent entities or whether they are related. A simple way to understand this is by understanding foreign key relationships: Will other tables have a column that could refer to any of the five (say "geoid")? Or will other tables have a column that generally refers to one of the five ("cityid", "countryid")? The ability to define helpful foreign key constraints often drives the table structure.
There are other considerations. If your data is at the geographic level, then it might represent hierarchies . . . cities are in countries, countries are on continents. Some databases (such as MySQL) do not support hierarchical queries at all. Under these circumstances, you might consider denormalizing the data for querying purposes.
Other considerations can also come into play. If your application is going to be internationalized, then having all the reference tables in a single place is handy -- for providing cultural-specific information (language, currency symbol, and so on). One way of handling this situation is to put all such references in a single table (and perhaps using more sophisticated foreign key relationships).
The column names are not important, just the data in the columns. If City description, country description and continent description are different information then you are already doing this the right way. The only time you would aim to reduce this data would be if you were repeating information but for the titles of the data it's fine.
In fact. You are doing this correctly. Country will have different values from city for every field mentioned. Id is just an id, every table should have one. Name and description wont be the same across country and city.
Also, this way if you want a countrys name you dont have to go through every country, continent and city. You only have 192 or so entries to go through. If you had all of that in one massive table you would have to use it for everything and go through every result every time you want data. You would also have to distinguish between cities, countries and continents in some other way than the separate tables.
Eg:
method 1, with 5 tables:
SELECT * FROM country
does the same as
method 2, 1 table:
SELECT * FROM table WHERE enumvalue = 'country';
If you have tables representing city, country and continent, and they all have exactly the same fields, you have a fundamental problem. In real life, every city is in a country and every country is in at least one continent (more or less) but your data structure does not reflect that. Your city table should look something like this:
id (primary key)
countryId (foreign key to country)
name
other fields
You'll need a similar relationship between countries and continents. However, before you do so, you have to decide what to do about countries like Russia which is in two continents and Palau which isn't really in any.
You may also want to have a provinceStateTerritory table to help you sort out the 38 places in the United States named Springfield. Or, you may want to handle that situation differently.
Related
I am learning sql now and practicing the scenarios to design the tables. I have a one scenario where I could not find proper suitable table structure.
The scenarios is as follows, I want to store depedencies user journey in sql. For example, in customer creation journey, we need to create valid sector, language and country codes in the system. Another example, to create a new account (bank account), we need to create the sector, language and country followed by customer.
So far, I could think of following table design, but I am sure this is not good as there is no primary key and not following the normalization standards.
journey
dependent
order
CUSTOMER
SECTOR
0
CUSTOMER
LANGUAGE
1
CUSTOMER
COUNTRY
2
ACCOUNT
CUSTOMER
0
I understand that this is many to many relationship as one journey can have many dependent and one dependent can be associated with multiple journeys. I need help to efficiently design the tables in sql, please can anyone help on this.
You need from the intermediate/join table that should look like this -
Table name - journey_dependent
Coll(Jurney_FK) Coll(Dependent_FK)
journey_id dependent_id
You can check more here - https://www.baeldung.com/jpa-many-to-many#1-modeling-a-many-to-many-relationship
If journey and dependent values are PK in origin tables, you have 2 FK. You can create a composite PK on that table with that 2 columns.
Maybe order need to be in dependent table. If not, there is information on that table : order. So this is not a pur relationship table. So you could optionally had a technical PK column (auto increment) on it.
I have country, region, county, town data and I'm currently deciding between 2 schema designs (if there's a better one, do tell).
I first thought
Country
Id
Name
Region
Id
CountryId
Name
County
Id
RegionId
Name
Town
Id
CountyId
Name
Does the job however to get all towns in a country you have to 3 inner joins to do the filtering. I guess this could be ok but potentially expensive?
The other design was:
Country
Id
Name
Region
Id
Name
County
Id
Name
Town
Id
CountryId
RegionId
CountyId
Name
This way all hierarchical data so to speak is at the bottom and you can go back up however if you want all regions in a country you're a bit screwed which makes we wonder whether the first design is best.
What do you think is the best schema design?
The best database design depends on how the data is being used.
If this is pretty static data that will all be updated at one time and external references are all to towns, then I would probably go for a denormalized dimension. That is, store the information all in one row:
Town Id
Town name
County name
Region name
Country name
Under the above scenario, the ids for county, region, and country are not necessary (by assumption).
If the data is being provided as separate tables with separate ids, and these tables can be updated independently or row-by-row, then a separate table for each makes sense. Putting all the ids into the towns table may or may not be a good idea. You will have to verify and maintain the hierarchies when data is inserted and updated.
If ids for each level are necessary for your, then you should have appropriate table structure for declaring foreign key constraints. But, this can get complicated. Will an external entity have a "geography" attribute that can be at any level? Will an external always know what level it is going to refer to as?
In other words, you need to know how the data is going to be used in order to define an appropriate data model.
Suppose I have the following:
Table region_city
id name parent_id
==============================
1 North null
2 South null
3 Manchester 1
4 London 2
In my user table I store the ID of the City that the user is in.
Now in my search form I need to be able to perform a top-level search, i.e. find all Users that belong to a given region (North or South).
Will it make life easier if I included a region_id field in my user table? Or is that going against the normalisation concept?
It does denormalize the table structures and it could introduce data update anomalies. Consider: the user moves from Manchester to London and the city_id changes. The region_id could still point to the North.
The region_id only depends on the city therefore it does not belong in the user table. Since it can be derived from the city.
If the design absolutely calls for only two levels (region and city) and you are willing to forgo the possible addition of other levels in the future (not a decision I would be inclined to make, but you know your data better than I do) then do not include the regionID in your user table; that would denormalize your database. Instead, you have several choices for representing the data (including two related tables, region and city) and you would perform your search by JOINing the city table to the user table or using an IN clause in your search.
i am tracking exercises. i have a workout table with
id
exercise_id (foreign key into exercise table)
now, some exercises like weight training would have the fields:
weight, reps (i just lifted 10 times # 100 lbs.)
and other exercises like running would have the fields: time, distance (i just ran 5 miles and it took 1 hours)
should i store these all in the same table and just have some records have 2 fields filled in and the other fields blank or should this be broken down into multiple tables.
at the end of the day, i want to query for all exercises in a day (which will include both types of exercises) so i will have to have some "switch" somewhere to differentiate the different types of exercises
what is the best database design for this situation
There are a few different patterns for modelling object oriented inheritance in database tables. The most simple being Single table inheritance, which will probably work great in this case.
Implementing it is mostly according to your own suggestion to have some fields filled in and the others blank.
One way to do it is to have an "exercise" table with a "type" field that names another table where the exercise-specific details are, and a foreign key into that table.
if you plan on keeping it only 2 types, just have exercise_id, value1, value2, type
you can filter the type of exercise in the where clause and alias the column names in the same statment so that the results don't say value1 and value2, but weight and reps or time and distance
I have a few tables representing geographical entities (CITIES, COUNTIES, ZIPCODES, STATES, COUNTRIES, etc).
I need to way represent sets of geographical entities. A set can contain records from more than one table. For example, one set may contain 3 records from CITIES, 1 record from COUNTIES and 4 from COUNTRIES.
Here are two possible solutions:
A table which contains three columns - one record for each entity. The table will contain multiple records for each set, all sharing the the set number.
set_id INT, foreign_table VARTEXT(255), foreign_id INT
Sample entries for set #5:
(5,'CITIES',4)
(5,'CITIES',12)
(5,'ZIPCODES',91)
(5,'ZIPCODES',92)
(5,'COUNTRIES',15)
A table which contains a TEXT column for each entity type, which will include a string set with the appropriate entries:
set_id INT,cities TEXT,counties TEXT,zipcodes TEXT,states TEXT,countries TEXT
So the above set will be represented with a single record
(5,'4,12','','91,92','','15')
Any other ideas? Would love to hear your input.
Thanks!
Both solutions you propose don't have real foreign keys. In the first solution, one foreign_id can point to many tables, which is hard (or at least inefficient) for a database to enforce. The second solution stores multiple values in one column, which is the one thing everyone agrees you shouldn't do (it breaks first normal form.)
What I would do is this: cities, zip codes, and states all "have a" geographical location. The normal way to implement that is a one to many relation. Create a geolocation table, and add a geolocation_id column to the cities, zip code, and state tables.
EDIT: Per your comment, to get from a geolocation to its cities:
select *
from geolocation g
left join cities c
on g.id = c.geolocation_id
left join zipcodes z
on g.id = z.geolocation_id
....
The database will resolve the joins using the foreign key index, which is very fast.
One Location Set can have many Geography items
One Geography item can belong to many Location Sets
Regarding the Geography item table, two approaches are possible. In the first case the super-type/subtype relationship is overlapping -- more than one sub-type can be linked to the super-type.
For example, there can be GeographyID = 5 in Geography and Zipcodes, Cities, States, Countries tables.
In the second case, we can consider the exclusive (disjoint) relationship, in which only one subtype can be connected to the super-type. The parent-child relationship is used to create paths, like ZIP/City/State/Country -- that is if actual administrative areas allow for this type of relationship.
In this example, there can be GeographyID = 5 in the Geography and only one more sub-type table.