RDBMS schema for unknown columns - schema

I have a project with a MySQL database, and I would like to be able to upload various datasets. Say I am building a restaurant reviews aggregator. So we would like to keep adding all sources of restaurant reviews we could get our hands on, and keeping all the information.
I have a table review_sources
=========================
| id | name |
=========================
| 1 | Zagat |
| 2 | GoodEats Magazine|
| ... |
| 50 | Allergy News |
=========================
Now say I have a table reviews
=====================================================================
| id | Restaurant Name | source_id | Star Rating | Description |
=====================================================================
| 0 | Joey's Burgers | 1 | 3.5 | Wow! |
| 1 | Jamal's Steaks | 1 | 3.5 | Yummy! |
| 2 | Jenny's Crepes | 1 | 4.5 | Sweet! |
| .... |
| 253| Jeeva's Curries | 3 | 4 | Spicy! |
=====================================================================
Now suppose someone wants to add reviews from "Allergy News", they have a field "nut-free". Or a source of reviews could describe the degree of kashrut compliance, or halal compliance or vegan-friendliness. I as a designer don't know the possible optional fields future data sources may have. I want to be able to answer queries:
What are all the fields in the Zagat reviews?
For review id=x, what is value of the optional field "vegan-friendly"?
So how do I design a schema that can handle these disparate data sources and answer these queries? My reasons for not going for NoSQL are that I do want certain types of normalization, and that this is part of an existing MySQL based project.

I'd use a many-to-many relationship with a table containing a review_id, a field (e.g. "vegan-friendly") and the value of the field. Then of course a reviews_fields table to map one to the other.
Cheers

Related

How to define a relationship between two tables from different sources with different identifiers

Background:
I'm working on a project that does not allow me to share the data, but I'll do my best to give you some visualisation below. So before going further, I know (some) SQL, and I have done basic work relationship before, but the data was clean and simple and for some reason I just can't' figure out a solution.
Problem (?)
I'm trying to define a relationship between two tables from two different sources that each work with different identifiers. I do have however a mapping table from one of those but again the identifiers do not align. Let me try explain visually:
| TABLE 1 (cies) | | TABLE 2 (forms) |
| ------------ | | ------------- |
| id(PK) | | id(PK) |
| 4_digit_code | | 16_digit_code |
| ...more fields | | ...more fields |
The second source provided me a mapping table they use internally:
| MAPPING TABLE |
| ------------- |
| id(PK) |
| 4_digit_code | (= to the one in TABLE 1)
| 16_digit_code | (= to the one in TABLE 2)
My first thought was to create a script and just merge the info in the mapping table in TABLE 1 like so:
| TABLE 1 | | TABLE 2 |
| ------------ | | ------------- |
| id(PK) | | id(PK) |
| 16_digit_code | ==== | 16_digit_code |
| 4_digit_code |
The issue here is the 16_digit_code is not unique so I believe this does not work. Now comes something I have no experience with so I am just thinking out loud here:
Can I keep (?) the mapping table and each time reference that one to get my data from the other table via another? On other hand should not all values in a mapping table be unique as well for it to work? The reason there are non-unique values is that (some) very old numbers end up getting recycled.
For example get me all forms from company with id 1:
| TABLE 1 | | MAPPING TABLE | | TABLE 2 |
| ------------ | | ------------- | | ------------- |
| id(PK) | | id(PK) | | id(PK) |
| 16_digit_code | | 16_digit_code | ==== | 16_digit_code |
| 4_digit_code | ==== | 4_digit_code | | ...more fields |
And in the above, I would not know how to efficiently approach this problem. I really don't know if it makes any sense though what I am saying or I am missing something or making this way too complex.
Solution?
I'd love it if someone could point me in the right direction. And if you have the solution I'd love to know the reasoning, not just the solution as I'd love to learn from this for the future obviously.
Edit/Clarification:
Just for completion sake, the mapping combination (4 digit + 16 digit code) is unique. Although, as I said earlier one 16 digit code can be linked to multiple 4 digit codes.

Calculate Equation From Seperate Tables Data

I'm working on my senior High School Project and am reaching out to the community for help! (As my teacher doesn't know the answer to my question).
I have a simple "Products" table as shown below:
I also have a "Orders" table shown below:
Is there a way I can create a field in the "Orders" table named "Total Cost", and make that automaticly calculate the total cost from all the products selected?
Firstly, I would advise against storing calculated values, and would also strongly advise against using calculated fields in tables. In general, calculations should be performed by queries.
I would also strongly advise against the use of multivalued fields, as your images appear to show.
In general, when following the rules of database normalisation, most sales databases are structured in a very similar manner, containing with the following main tables (amongst others):
Products (aka Stock Items)
Customers
Order Header
Order Line (aka Order Detail)
A good example for you to learn from would be the classic Northwind sample database provided free of charge as a template for MS Access.
With the above structure, observe that each table serves a purpose with each record storing information pertaining to a single entity (whether it be a single product, single customer, single order, or single order line).
For example, you might have something like:
Products
Primary Key: Prd_ID
+--------+-----------+-----------+
| Prd_ID | Prd_Desc | Prd_Price |
+--------+-----------+-----------+
| 1 | Americano | $8.00 |
| 2 | Mocha | $6.00 |
| 3 | Latte | $5.00 |
+--------+-----------+-----------+
Customers
Primary Key: Cus_ID
+--------+--------------+
| Cus_ID | Cus_Name |
+--------+--------------+
| 1 | Joe Bloggs |
| 2 | Robert Smith |
| 3 | Lee Mac |
+--------+--------------+
Order Header
Primary Key: Ord_ID
Foreign Keys: Ord_Cust
+--------+----------+------------+
| Ord_ID | Ord_Cust | Ord_Date |
+--------+----------+------------+
| 1 | 1 | 2020-02-16 |
| 2 | 1 | 2020-01-15 |
| 3 | 2 | 2020-02-15 |
+--------+----------+------------+
Order Line
Primary Key: Orl_Order + Orl_Line
Foreign Keys: Orl_Order, Orl_Prod
+-----------+----------+----------+---------+
| Orl_Order | Orl_Line | Orl_Prod | Orl_Qty |
+-----------+----------+----------+---------+
| 1 | 1 | 1 | 2 |
| 1 | 2 | 3 | 1 |
| 2 | 1 | 2 | 1 |
| 3 | 1 | 1 | 4 |
| 3 | 2 | 3 | 2 |
+-----------+----------+----------+---------+
You might also opt to store the product description & price on the order line records, so that these are retained at the point of sale, as the information in the Products table is likely to change over time.

Correct Database Design / Relationship

Below I have shown a basic example of my proposed database tables.
I have two questions:
Categories "Engineering", "Client" and "Vendor" will have exactly the same "Disciplines", "DocType1" and "DocType2", does this mean I have to enter these 3 times over in the "Classification" table, or is there a better way? Bear in mind there is the "Vendor" category that is also covered in the classification table.
In the "Documents" table I have shown "category_id" and "classification_id", I'm not sure if the will depend on the answer to the first question, but is "category_id" necessary, or should I just be using a JOIN to allow me to filter the category based on the classification_id?
Thank you in advance.
Table: Category
id | name
---|-------------
1 | Engineering
2 | Client
3 | Vendor
4 | Commercial
Table: Discipline
id | name
---|-------------
1 | Electrical
2 | Instrumentation
3 | Proposals
Table: DocType1
id | name
---|-------------
1 | Specifications
2 | Drawings
3 | Lists
4 | Tendering
Table: Classification
id | category_id | discipline_id | doctype1_id | doctype2
---|-------------|---------------|-------------|----------
1 | 1 | 1 | 2 | 00
2 | 1 | 1 | 2 | 01
3 | 2 | 1 | 2 | 00
4 | 4 | 3 | 4 | 00
Table: Documents
id | title | doc_number | category_id | classification_id
---|-----------------|------------|-------------|-------------------
1 | Electrical Spec | 0001 | 1 | 1
2 | Electrical Spec | 0002 | 2 | 3
3 | Quotation | 0003 | 3 | 4
From what you've provided, it looks like we have three simple lookup tables: category, discipline, and doctype1. The part that's not intuitively obvious to me and may also be causing confusion on your end, is that the last two tables are both serving as cross-references of the lookup tables. The classification table in particular seems like it might be out of place. If there are only certain combinations of category, discipline, and doctype that would ever be valid, then the classification table makes sense and the right thing to do would be to look up that valid combination by way of the classification ID from the document table. If this is not the case, then you would probably just want to reference the category, discipline, and document type directly from the document table.
In your example, the need to make this distinction is illuminated by the fact that the document table has a referenc to the classification table and a references to the category table. However the row that is looked up in the classification table also references a category ID. This is not only redundant but also opens the door to the possibility of having conflicting category IDs.
I hope this helps.

How to store Goals (think RPG Quest) in SQL

Someone asked me today how they should store quest goals in a SQL database. In this context, think of an RPG. Goals could include some of the following:
Discover [Location]
Kill n [MOB Type]
Acquire n of [Object]
Achieve a [Skill] in [Skillset]
All the other things you get in RPGs
The best I could come up with is:
Quest 1-* QuestStep
QuestStep 1-* MobsToKill
QuestStep 1-* PlacesToFind
QuestStep 1-* ThingsToAcquire
QuestStep 1-* etc.
This seems a little clunky - Should they be storing a query of some description instead (or a formula or ???)
Any suggestions appreciated
User can embark on many quests.
One quest belongs to one user only (in this model).
One quest has many goals, one goal belongs to one quest only.
Each goal is one of possible goals.
A possible goal is an allowed combination of an action and an object of the action.
PossibleGoals table lists all allowed combinations of actions and objects.
Goals are ordered by StepNo within a quest.
Quantity defines how many objects should an action act upon, (kill 5 MOBs).
Object is a super-type for all possible objects.
Location, MOBType, and Skill are object sub-types, each with different properties (columns).
I would create something like this.
For the Quest table:
| ID | Title | FirstStep (Foreign key to GuestStep table) | etc.
The QuestStep table
| ID | Title | Goal (Foreign key to Goal table) | NextStep (ID of next QuestStep) | etc.
Ofcourse this is where the hard part start, how do we describe the goals? I'd say create one record for the goal in the Goal table and save each of the fields of the goal (I.E. how many mobs of what type to kill, what location to visit, etc.) in a GoalFields table, thus:
Goal table:
| ID | Type (type is one from an Enum of goal types) |
The GoalFields Table
| ID | Goal (Foreign key to goal) | Field | Value |
I understand that this can be a bit vague, so here is an example of what dat in the database could look like.
Quest table
| 0 | "Opening quest" | 0 | ...
| 1 | "Time for a Sword" | 2 | ...
QuestStep table
| 0 | "Go to the castle" | 0 | 1 | ...
| 1 | "Kill two fireflies" | 1 | NULL | ...
| 2 | "Get a sword" | 2 | NULL | ...
Goal table
| 0 | PlacesToFind |
| 1 | MobsToKill |
| 2 | ThingsToAcquire |
GoalFields table
| 0 | 0 | Place | "Castle" |
| 1 | 1 | Type | "firefly" |
| 2 | 1 | Amount | 2 |
| 3 | 2 | Type | "sword" |
| 4 | 2 | Amount | 1 |

Retrieve comma delimited data from a field

I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .