Basic question: how to properly redesign this schema - sql

I am hopping on a project that sits on top of a Sql Server 2008 DB with what seems like an inefficient schema to me. However, I'm not an expert at anything SQL, so I am seeking for guidance.
In general, the schema has tables like this:
ID | A | B
ID is a unique identifier
A contains text, such as animal names. There's very little variety; maybe 3-4 different values in thousands of rows. This could vary with time, but still a small set.
B is one of two options, but stored as text. The set is finite.
My questions are as follows:
Should I create another table for names contained in A, with an ID and a value, and set the ID as the primary key? Or should I just put an index on that column in my table? Right now, to get a list of A's, it does "select distinct(a) from table" which seems inefficient to me.
The table has a multitude of columns for properties of A. It could be like: Color, Age, Weight, etc. I would think that this is better suited in a separate table with: ID, AnimalID, Property, Value. Each property is unique to the animal, so I'm not sure how this schema could enforce this (the current schema implies this as it's a column, so you can only have one value for each property).
Right now the DB is easily readable by a human, but its size is growing fast and I feel like the design is inefficient. There currently is not index at all anywhere. As I said I'm not a pro, but will read more on the subject. The goal is to have a fast system. Thanks for your advice!

This sounds like a database that might represent a veterinary clinic.
If the table you describe represents the various patients (animals) that come to the clinic, then having properties specific to them are probably best on the primary table. But, as you say column "A" contains a species name, it might be worthwhile to link that to a secondary table to save on the redundancy of storing those names:
For example:
Patients
--------
ID Name SpeciesID Color DOB Weight
1 Spot 1 Black/White 2008-01-01 20
Species
-------
ID Species
1 Cocker Spaniel
If your main table should be instead grouped by customer or owner, then you may want to add an Animals table and link it:
Customers
---------
ID Name
1 John Q. Sample
Animals
-------
ID CustomerID SpeciesID Name Color DOB Weight
1 1 1 Spot Black/White 2008-01-01 20
...
As for your original column B, consider converting it to a boolean (BIT) if you only need to store two states. Barring that, consider CHAR to store a fixed number of characters.

Like most things, it depends.
By having the animal names directly in the table, it makes your reporting queries more efficient by removing the need for many joins.
Going with something like 3rd normal form (having an ID/Name table for the animals) makes you database smaller, but requires more joins for reporting.
Either way, make sure to add some indexes.

Related

SQL query: have results into a table named the results name

I have a very large database I would like to split up into tables. I would like to make it so when I run a distinct, it will make a table for every distinct name. The name of the table will be the data in one of the fields.
EX:
A --------- Data 1
A --------- Data 2
B --------- Data 3
B --------- Data 4
would result in 2 tables, 1 named A and another named B. Then the entire row of data would be copied into that field.
select distinct [name] from [maintable]
-make table for each name
-select [name] from [maintable]
-copy into table name
-drop row from [maintable]
Any help would be great!
I would advise you against this.
One solution is to create indexes, so you can access the data quickly. If you have only a handful of names, though, this might not be particularly effective because the index values would have select almost all records.
Another solution is something called partitioning. The exact mechanism differs from database to database, but the underlying idea is the same. Different portions of the table (as defined by name in your case) would be stored in different places. When a query is looking only for values for a particular name, only that data gets read.
Generally, it is bad design to have multiple tables with exactly the same data columns. Here are some reasons:
Adding a column, changing a type, or adding an index has to be done times instead of one time.
It is very hard to enforce a primary key constraint on a column across the tables -- you lose the primary key.
Queries that touch more than one name become much more complicated.
Insertions and updates are more complex, because you have to first identify the right table. This often results in overuse of dynamic SQL for otherwise basic operations.
Although there may be some simplifications (security comes to mind), most databases have other mechanisms that are superior to splitting the data into separate tables.
what you want is
CREATE TABLE new_table
AS (SELECT .... //the data that you want in this table);

Architecture of SQL tables

I am wondering is it more useful and practical (size of DB) to create multiple tables in sql with two columns (one column containing foreign key and one column containing random data) or merge it and create one table containing multiple columns. I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
example a. one table
productID productname weight no_of_pages
1 book 130 500
2 watch 50 null
3 ring null null
example b. three tables
productID productname
1 book
2 watch
3 ring
productID weight
1 130
2 50
productID no_of_pages
1 500
The multi-table approach is more "normal" (in database terms) because it avoids columns that commonly store NULLs. It's also something of a pain in programming terms because you have to JOIN a bunch of tables to get your original entity back.
I suggest adopting a middle way. Weight seems to be a property of most products, if not all (indeed, a ring has a weight even if small and you'll probably want to know it for shipping purposes), so I'd leave that in the Products table. But number of pages applies only to a book, as do a slew of other unmentioned properties (author, ISBN, etc). In this example, I'd use a Products table and a Books table. The books table would extend the Products table in a fashion similar to class inheritance in object oriented program.
All book-specific properties go into the Books table, and you join only Products and Books to get a complete description of a book.
I think this all depends on how the tables will be used. Maybe your examples are oversimplifying things too much but it seems to me that the first option should be good enough.
You'd really use the second example if you're going to be doing extremely CPU intensive stuff with the first table and will only need the second and third tables when more information about a product is needed.
If you're going to need the information in the second and third tables most times you query the table, then there's no reason to join over every time and you should just keep it in one table.
I would suggest example a, in case there is a defined set of attributes for product, and an example c if you need variable number of attributes (new attributes keep coming every now and then) -
example c
productID productName
1 book
2 watch
3 ring
attrID productID attrType attrValue
1 1 weight 130
2 1 no_of_pages 500
3 2 weight 50
The table structure you have shown in example b is not normalized - there will be separate id columns required in second and third tables, since productId will be an fk and not a pk.
It depends on how many rows you are expecting on your PRODUCTS table. I would say that it would not make sense to normalize your tables to 3N in this case because product name, weight, and no_of_pages each describe the products. If you had repeating data such as manufacturers, it would make more sense to normalize your tables at that point.
Without knowing the background (data model), there is no way to tell which variant is more "correct". both are fine in certain scenarios.
You want three tables, full stop. That's best because there's no chance of watches winding up with pages (no pun intended) and some books without. If you normalize, the server works for you. If you don't, you do the work instead, just not as well. Up to you.
I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
That's always true of nullable columns. Here's the rule: a nullable column has an optional relationship to the key. A nullable column can always be, and usually should be, in a separate table where it can be non-null.

storing multiple formats in a table

So here's the basic problem: I'd like to be able to store various fields in a database. These can be short textfields (maybe 150 characters max, probably more like 50 in general) and long textfields (something that can store a whole page full of text). Ideally more types can be added later on.
These fields are group by common field_group ids, and their type shouldn't really have anything to do with categorization.
So what's the best way to represent this in MySQL? One table with a short_text and long_text columns of differing types, one of which is to be NULL? Or is there a more elegant solution?
(I'd like this to be primarily driven by ease to select all fields with a given field_group_id.)
Clarification
I'm essentially attempting to allow users to create their own tables, but without actually creating tables.
So you'd have a 'Book' field group, which would have the fields 'Name' (short text), 'Summary' (long text). Then you would be able to create entries into that book. I realize that this is essentially the whole point of MySQL, but I need to have a LOT of these and don't want users creating whole tables in my database.
What you are looking for is called an EAV. With an EAV model you can build any freaking database in the world with only inserts. But it's really horrible for a lot of reasons but yours sounds so looney-tunes, it could work.
Build an Entity table
In here you'd list
Car
Person
Plant
Build an Attribute Table.
Here you'd list the PK from Entity and the list of attributes.
I'll use the word instead of a number PK.
Car | Engine Cylinders
Car | Doors
Car | Make
Person | First Name
Person | Last Name
then in a third table you'd list the actual values for each one, again using the words but you'd have numbers.
Car | Engine Cylinders | 4
Car | Doors | 4
Car | Make | Honda
Person | First Name | Stephanie
Person | Last Name | Page
If you want to get tricky instead on one column for value you could have 4 columns
a number
a varchar
a date
a clob
then in the Attribute table you could add a column that says which column to put the data.
If you plan on this database being "Multitenent" you'll need to add an OWNER table as the parent of the entity table, so you and I could both have a Car entity.
But this SUCKS to query, SUCKS to index, SUCKS to use for anything else but a toy app.
I don't know exactly what you mean by "field group", but if the information (short text, long text) all belongs to a certain entry, you can create a single table and include all those columns.
Say you have a bunch of books with a title and a summary:
table: `books`
- id, int(11) // unique for each book
- title, varchar(255)
- writer, varchar(50)
- summary, text
- etc
Fields that don't necessarily need to be set can be set to NULL by default.
To retrieve the information, simply select all the fields:
SELECT * FROM books WHERE id = 1
Or some of the fields:
SELECT title, writer FROM books ORDER BY title ASC

what is the best database design for this table when you have two types of records

i am tracking exercises. i have a workout table with
id
exercise_id (foreign key into exercise table)
now, some exercises like weight training would have the fields:
weight, reps (i just lifted 10 times # 100 lbs.)
and other exercises like running would have the fields: time, distance (i just ran 5 miles and it took 1 hours)
should i store these all in the same table and just have some records have 2 fields filled in and the other fields blank or should this be broken down into multiple tables.
at the end of the day, i want to query for all exercises in a day (which will include both types of exercises) so i will have to have some "switch" somewhere to differentiate the different types of exercises
what is the best database design for this situation
There are a few different patterns for modelling object oriented inheritance in database tables. The most simple being Single table inheritance, which will probably work great in this case.
Implementing it is mostly according to your own suggestion to have some fields filled in and the others blank.
One way to do it is to have an "exercise" table with a "type" field that names another table where the exercise-specific details are, and a foreign key into that table.
if you plan on keeping it only 2 types, just have exercise_id, value1, value2, type
you can filter the type of exercise in the where clause and alias the column names in the same statment so that the results don't say value1 and value2, but weight and reps or time and distance

How to display multiple values in a MySQL database?

I was wondering how can you display multiple values in a database for example, lets say you have a user who will fill out a form that asks them to type in what types of foods they like for example cookies, candy, apples, bread and so on.
How can I store it in the MySQL database under the same field called food?
How will the field food structure look like?
You may want to read the excellent Wikipedia article on database normalization.
You don't want to store multiple values in a single field. You want to do something like this:
form_responses
id
[whatever other fields your form has]
foods_liked
form_response_id
food_name
Where form_responses is the table containing things that are singular (like a person's name or address, or something where there aren't multiple values). foods_liked.form_response_id is a reference to the form_responses table, so the foods liked by the person who has response number six will have a value of six for the form_response_id field in foods_liked. You'll have one row in that table for each food liked by the person.
Edit: Others have suggested a three-table structure, which is certainly better if you are limiting your users to selecting foods from a predefined list. The three-table structure may be better in the case that you are allowing them the ability to enter their own foods, though if you go that route you'll want to be careful to normalize your input (trim whitespace, fix capitalization, etc.) so you don't end up with duplicate entries in that table.
normally, we do NOT work out like this. try to use a relation table.
Table 1: tbl_food
ID primary key, auto increment
FNAME varchar
Table 2: tbl_user
ID primary key, auto increment
USER varchar
Table 3: tbl_userfood
RID auto increment
USERID int
FOODID int
Use similar format to store your data, instead a chunk of data fitted into a field.
Querying in these tables are easier than parsing the chunk of data too.
Use normalization.
More specifically, create a table called users. Create another called foods. Then link the two tables together with a many-to-many table called users_to_foods referencing each others foreign keys.
One way to do it would be to serialize the food data in your programming language, and then store it in the food field. This would then allow you to query the database, get the serialized food data, and convert it back into a native data structure (probably an array in this case) in your programming language.
The problem with this approach is that you will be storing a lot of the same data over and over, e.g. if a lot of people like cookies, the string "cookies" will be stored over and over. Another problem is searching for everyone who likes one particular food. To do that, you would have to select the food data for each record, unserialize it, and see if the selected food is contained within. This is a very inefficient.
Instead you'll want to create 3 tables: a users table, a foods table, and a join table. The users and foods tables will contain one record for each user and food respectively. The join table will have two fields: user_id and food_id. For every food a user chooses as a favorite, it adds a record to the join table of the user's ID and the food ID.
As an example, to pull all the users who like a particular food with id FOOD_ID, your query would be:
SELECT users.id, users.name
FROM users, join_table
WHERE join_table.food_id = FOOD_ID
AND join_table.user_id = users.id;