Stuck on SQL database normalisation

Stuck on SQL database normalisation - sql

I'm creating a database of motorcycle and was wondering what the best way to go about setting it out is.
i would like to normalise the data as best as possible so save any headaches further down the line.
I anticipate having the following tables so far:
Manufacturer
ID,name,country,image
Model
ID,name,manufactureID,engine_size,power,torque,description,weight + various other specifications
I'll also want to separate models by type,so should I have a another table with the details below or should I just include this in the model table?
Type
ID,Sports,Supersports,Touring,Cruiser,Off-road
Similar to the type, I want to categorise motorcycle by licence type. Again should I create a separate licence table or just have it as a string in the model table.
I'll need front end users to be able to search the database based on type, licence, manufacture and model.
I'll need them to be able to sort by things like weight, power etc..
Is there a best practice approach to this?

Yes, you should create another table for TYPE and add a reference in Model table.
Yes, you should add licence type also.
To sort them by weight,power; you can do it just from your model table.

Related

Trying to make my database more dynamic

I am trying to figure out what the best way to design this database would be. Currently what I have works, but it requires me to hard-code values where I would like it to be dynamic in the future.
Here is my current database design:
As you can see, for both the Qualities and the PressSettingsSet tables, there are many columns that are hard-coded such as BlownInsert, Blowout, Temperature1, Temperature2, etc.
What I am trying to accomplish is to have these be dynamic. Each job will have these same settings, but I would like to allow the users to define these settings. Would it be best to create a table with just a name field and have a one-to-one relationship to another table with a value for the field and a relation to the Job Number?
I hope this makes sense, any help is appreciated. I can make a database diagram of how I think it should work if that is more helpful to what I am trying to convey. I think that what I have in mind will work, but it just seems like it will be creating a lot of extra rows in the database, so I wanted to see if there is possibly a better way.

Would it be best to create a table with just a name field and have a one-to-one relationship to another table with a value for the field and a relation to the Job Number?
That would be the simplest - you could expand that by adding data-effective fields or de-normalize it by putting it all in one table (with just a name and value field).
Are the values for the settings different per job? If so then yes a "key" table" with the name ans a one-to-many relationship to the value per job would be best.

Database design advice needed on custom fields

I have a table that stores general information about a customer (name, address, etc) that is common to all customers. I have a field called CustomerType (list of types) that drives what other fields I need to capture. So if they are a government customer then they will see a different set of custom fields than a non-profit customer would see. I need to create forms that each different CustomerType will be fill out. On the SQL side, I need to figure out the best way to store the data so that when I do reporting it is simple. I don't know the best way to attack this problem.

On the SQL side, I need to figure out the best way to store the data so that
when I do reporting it is simple. I don't know the best way to attack this
problem.
There are many possible approaches each with different strengths and weaknesses, here's some to think about:
Create separate customer detail tables for each of the customer types, each containing the fields specific to that customer type. Each detail table keyed on the customer Id. The customer type does not have to be an attribute of the detail table, only of the parent Customer table.
(+) The correctly normalized solution (although you may find awkward situations
where attributes are common to a subset of customer types). The tables will be fairly easy to maintain.
(-) Reports harder to write - you may find yourself using a LOT of unions or outer joins. Development against this schema is more complex, the extra logic to insert/update attributes in the correct tables for particular customer types must be encoded somewhere. This might become unmanageable if you have many customer types, or if you're adding/changing them frequently.
Expand the customer table to contain the super-set of columns required by all customer types, keyed on the customer Id.
(+) Simple, very easy to report on, simple programming logic.
(-) The customer-type specific fields are only partially dependent on the key of the customer table (customer Id) - they are really dependent on the combination of customerId/customerType.
If there are many extra fields, and if there are few fields common between customer types then this denormalization may result in a very wide table with an unmanageable number of columns. It could be a maintenance nightmare - the table must be modified every time a new customer type is added/change.
You might find this a good solution if the number of unique fields required by each customer type is small and they don't change often and if ease of programming and reporting is an overriding concern.
Store the customer specific values as name/value pairs in a generic customer Details table, keyed on customerId/customerType/key.
(+) Very simple to maintain - No data model changes are required to add a new customer type.
(-) Non-relational, makes pure SQL reporting near impossible and makes integrity constraints very difficult to add. You might see this in specialized use cases e.g. where the data will only ever consumed as JSON and direct reporting will never be a requirement, or in some corporate environments where it may be appealing if database changes are very hard to push through.

First of all, have a look at some good tutorials on database design and object relational modelling (ORM) A beginner's guide to SQL database design
My personal suggestion for your design would be to create one table to store all costumers, together with some kind of unique customer id and the CustomerType. Next create a separate table for each of the CustomerTypes and for each user that belongs to that type, store that users unique id in a column together with its customertype specific fields.

How to make dynamic database structure for Commodities in SQL?

I want to create a database for online shops, my database has "Commodity" and type of "Commodities". Before that I want to create its dynamic I create 2 tables for Commodity and CommodityType.
For example, "Mobile" category has many details like:
cpu
ram
internal storage
battery
etc.
and external "Hard Category" has details like:
capacity
waterfall
armor
hasAdaper
etc.
I want the admin of my website to be able to add new a category that may have new properties; then add related "Commodity" to it.
My problem is how to design my database and tables?
I think that when the admin adds a new category, the system must make a new table with properties that the admin defines for this category.
Is that how to do it, or can you suggest a better way?

You're probably looking for an Entity-Attribute-Value Model

I see two approaches :
Define a specialized table with the custom fields when the new product type is created.
Have a table CustomProperties, where you list the properties, a table TypeProperties, where you save which type has which property, and finally a table PropertyValues, where you store the values, each value a line.

SQL: Best practice to store various fields in one table

I want to design a table for items.
There are many types of items, all share several fields.
Each type of item has it's own fields.
I want to store the uncommon fields in a separate table.
I thought of something like :
----Items
+Item_id
+Item_Type_Id
+Item_Serial
...
----Item_types
+Item_Type_Id
+Item_Name
...
----Item_Fields
+Item_Field_Id
+Item_Type_Id
+Field_Name
...
----Field_Values
+Field_Value_Id
+Item_Field_Id
+Item_Id
+Value
...
The pro is having the ability to add fields and values without changing the tables.
The con is that i have to transpose the field names and values in order to see all info for an item.
Any better suggestions? Or perhaps a simple (not stored procedure) way to join the tables to get a flat info?
I tried to use PIVOT (I'm using SQL 2005) but with no luck.
Thanks.

I wrote a stored proc to make PIVOT more useful. Here is the source:
http://dot-dash-dot.com/files/pivot_query.sql
and some examples how to use it:
http://dot-dash-dot.com/files/pivot_query_examples.sql
For your data, the query would just be the raw data joining those tables above to produce a raw listing of:
set #myQuery = '
Select Item_Id, Item_Name, Field_Name, Value From ...
';
Then your call to pivot_query would be:
exec pivot_query #myQuery, 'Item_Id, Item_Name', 'Field_Name', 'max(Value)'
like that.

One other option is to store items in XML format in one single field. Depending on your usage scenario, it may work well. Or it may not.

I believe there has to be some grouping of values.
For example lets say your items are objects in a room. Then different types of objects have different attributes. For example books have publication date and number of pages, chairs have color pattern and height, etc.
In this example, you make an item table, a book table and a chair table.
You could make an "additional values" table that holds generic information as above, but what you really want to do is figure out the "types" of the different groups of attributes and then make every one of those types it's own table.
Is there a set of values that all items have? There has to be at least one which is a type field (this describes where the other information is stored. I expect every item will also have a name and a description. This is the information to go in the item table.
Then you make additional tables for the different types itembook, itemchair etc. There may even be some overlap. For example itembook, itemhardback, itempaperback would be 3 tables used to describe books.
I believe this is the best solution to your problem. It will still allow you to extend, but it does put a framework around your data.
Of course there are systems that do it the way you describe, but unless you are building a tool that others are going to reuse for many different projects, it makes sense to design the system for the task at hand. You end up falling into the over designing trap otherwise. (IMHO)
On the other hand, if you are going to go the totally generic direction I suggest you use one of the systems that already exist that work in this way (entity framework, app framework, etc) Use someone else's don't start from scratch.

I'm not too sure how you want to retrieve the info, but something like the below may work. (It's probably close to what Hogan mentioned.)
If you want to retrieve data for a type, you can just JOIN two tables.
If you want to retrieve data for all types (with all fields), you can LEFT JOIN all tables.
----Items
+Item_id
+Item_Type_Id
+Item_Common_Field1
+Item_Common_Field1
...
----Item_Type_A
+Item_id
+Item_Type_A_Specific_Field1
+Item_Type_A_Specific_Field2
...
----Item_Type_B
+Item_id
+Item_Type_B_Specific_Field1
...

If you add these columns to the table, you can make them sparse columns to avoid the space taken by unspecified uncommon fields.
But I would not call this a best practice. (see comments under your question)

I don't want to be accused of being the always-uses-the-latest-useless-technology guy, but depending on your use case, this might be a good case for a nosql database - Tokyo, Mongo, SimpleDB, etc. Or as Developer Art suggested, you could just serialize the different fields into a single column. It's not the worst thing in the world.

What is the preferred way to store custom fields in a SQL database?

My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.

I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.

It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.

Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.

I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.

Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.

I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.

I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.

Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.

lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.

I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas