I keep messing up 1NF - sql

For me the most understandable description of going about 1NF so far I found is ‘A primary key is a column (or group of columns) that uniquely identifies each row. ‘ on www.phlonx.com
I understand that redundancy means per key there shouldn’t be more than 1 value to each row. More than 1 value would then be ‘redundant’. Right?
Still I manage to screw up 1 NF a lot of times.
I posted a question for my online pizzashop http://foo.com
pizzashop
here
where I was confused about something in the second normal form only to notice I started off wrong in 1 NF.
Right now I’m thinking that I need 3 keys in 1NF in order to uniquely identify each row.
In this case, I’m finding that order_id, pizza_id, and topping_id will do that for me. So that’s 3 columns. Because if you want to know which particular pizza is which you need to know what order_id it has what type of pizza (pizza_id) and what topping is on there. If you know that, you can look up all the rest.
Yet, from an answer to previous question this seems to be wrong, because topping_id goes to a different table which I don’t understand.
Here’s the list of columns:
Order_id
Order_date
Customer_id
Customer_name
Phone
Promotion
Blacklist Y or N
Customer_address
ZIP_code
City
E_mail
Pizza_id
Pizza_name
Size
Pizza_price
Amount
Topping_id
Topping_name
Topping_prijs
Availabitly
Delivery_id
Delivery_zone
Deliveryguy_id
Deliveryguy_name
Delivery Y or N
Edit: I marked the id's for the first concatenated key in bold. They are only a list of columns, unnormalized. They're not 1 table or 3 tables or anything

use Object Role Modelling (say with NORMA) to capture your information about the design, press the button and it spits out SQL.
This will be easier than having you going back and forth between 1NF, 2NF etc. An ORM design is guaranteed to be in 5NF.
Some notes:
you can have composite keys
surrogate keys may be added after both conceptual and logical design: you have added them up front which is bad. They are added because of the RDBMS performance, not at design time
have you read several sources on 1NF?
start with plain english and some facts. Which is what ORM does with verbalisation.
So:
A Customer has many pizzas (zero to n)
A pizza has many toppings (zero to n)
A customer has an address
A pizza has a base
...

I'd use some more tables for this, to remove duplication for customers, orders, toppings and pizze:
Table: Customer
Customer_id
Customer_name
Customer_name
Phone
Promotion
Blacklist Y or N
Customer_address
ZIP_code
City
E_mail
Table: Order
Order_id
Order_date
Customer_id
Delivery_zone
Deliveryguy_id
Deliveryguy_name
Delivery Y or N
Table: Order_Details
Order_ID (FK on Order)
Pizza_ID (FK on Pizza)
Amount
Table: Pizza
Pizza_id
Pizza_name
Size
Pizza_price
Table: Topping
Topping_id
Topping_name
Topping_prijs
Availabitly
Table: Pizza_Topping
Pizza_ID
Topping_ID
Pizza_topping and Order_details are so-called interselection tables ("helper" tables for modelling a m:n relationship between two tables).
Now suppose we have just one pizza, some toppings and our customer Billy Smith orders 2 quattro stagione pizze - our tables will contain this content:
Pizza(Pizza_ID, Pizza_name, Pizza_price)
1 Quattro stagioni 12€
Topping(Topping_id, topping_name, topping_price)
1 Mozzarrella 0,50€
2 Prosciutto 0,70€
3 Salami 0,50€
Pizza_Topping(Pizza_ID, Topping_ID)
1 1
1 3
(here, a quattro stagioni pizza contains only Mozzarrella and Salami).
Order(order_ID, Customer_name - rest omitted)
1 Billy Smith
Order_Details(order_id, Pizza_id, amount)
1 1 2
I've removed delivery ID, since for me, there is no distinction between an Order and a delivery - or do you support partial deliveries?

On 1NF, from wikipedia, quoting Date:
According to Date's definition of 1NF,
a table is in 1NF if and only if it is
"isomorphic to some relation", which
means, specifically, that it satisfies
the following five conditions:
There's no top-to-bottom ordering to the rows.
There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one
value from the applicable domain (and
nothing else).
All columns are regular [i.e. rows have no hidden components such as
row IDs, object IDs, or hidden
timestamps].
—Chris Date, "What First Normal Form Really Means", pp. 127–8[4]
First two are guaranteed in any modern RDBMS.
Duplicate rows are possible in modern RDBMS - however, only if you don't have primary keys (or other unique constraints).
The fourth one is the hardest one (and depends on the semantics of your model) - for example your field Customer_address might be breaking 1NF. Might be, because if you make a contract with yourself (and any potential user of the system) that you will always look at the address as a whole and will not want to separate street name, street number and or floor, you could still claim that 1NF is not broken.
It would be more proper to break the customer address, but there are complexities there with which you would then need to address and which might bring no benefit (provided that you will never have to look a the sub-atomic part of the address line).
The fifth one is broken by some modern RDBMs, however the real importance is that your model nor system should depend on hidden elements, which is normally true - even if your RDBMS uses OIDs internally for certain operations, unless you start to use them for non-administrative, non-maintenance tasks, you can consider it not breaking the 1NF.

The strengths of relational databases come from separating information into different tables. One useful way of looking at tables is first to identify as entity tables those concepts which are relatively permanent (in your case, probably Pizza, Customer, Topping, Deliveryguy). Then you think about the relations between them (in your case, Order, Delivery ). The relational tables link together the entity tables by having foreign keys pointing to the relevant entities: an Order has foreign keys to Customer, Pizza, Topping); a Delivery has foreign keys to Deliveryguy and Order. And, yes, relations can link relations, not just entities.
Only in such a context can you achieve anything like normalization. Tossing a bunch of attributes into one singular table does not make your database relational in any meaningful sense.

Related

Design Sql Tables with common columns

I have 5 tables that have the same structure and same columns: id, name, description. So I wonder what is the best way to design or to avoid having 5 tables that have the same columns:
Create a category table that will include my three common
columns and another column "enum" that will differentiate my categories
ex (city, country, continent, etc.)
Create a category table that will include my three common
columns and create the other five tables that will just include an
id.
Note that I would have an assocation table that should include the id of cities, id countries, id continents, etc. so i can display them into a report
Thank you for your advice.
The decision on how many tables to have under these circumstances simply depends.
The most important factor is whether the five things are independent entities or whether they are related. A simple way to understand this is by understanding foreign key relationships: Will other tables have a column that could refer to any of the five (say "geoid")? Or will other tables have a column that generally refers to one of the five ("cityid", "countryid")? The ability to define helpful foreign key constraints often drives the table structure.
There are other considerations. If your data is at the geographic level, then it might represent hierarchies . . . cities are in countries, countries are on continents. Some databases (such as MySQL) do not support hierarchical queries at all. Under these circumstances, you might consider denormalizing the data for querying purposes.
Other considerations can also come into play. If your application is going to be internationalized, then having all the reference tables in a single place is handy -- for providing cultural-specific information (language, currency symbol, and so on). One way of handling this situation is to put all such references in a single table (and perhaps using more sophisticated foreign key relationships).
The column names are not important, just the data in the columns. If City description, country description and continent description are different information then you are already doing this the right way. The only time you would aim to reduce this data would be if you were repeating information but for the titles of the data it's fine.
In fact. You are doing this correctly. Country will have different values from city for every field mentioned. Id is just an id, every table should have one. Name and description wont be the same across country and city.
Also, this way if you want a countrys name you dont have to go through every country, continent and city. You only have 192 or so entries to go through. If you had all of that in one massive table you would have to use it for everything and go through every result every time you want data. You would also have to distinguish between cities, countries and continents in some other way than the separate tables.
Eg:
method 1, with 5 tables:
SELECT * FROM country
does the same as
method 2, 1 table:
SELECT * FROM table WHERE enumvalue = 'country';
If you have tables representing city, country and continent, and they all have exactly the same fields, you have a fundamental problem. In real life, every city is in a country and every country is in at least one continent (more or less) but your data structure does not reflect that. Your city table should look something like this:
id (primary key)
countryId (foreign key to country)
name
other fields
You'll need a similar relationship between countries and continents. However, before you do so, you have to decide what to do about countries like Russia which is in two continents and Palau which isn't really in any.
You may also want to have a provinceStateTerritory table to help you sort out the 38 places in the United States named Springfield. Or, you may want to handle that situation differently.

Turn two database tables into one?

I am having a bit of trouble when modelling a relational database to an inventory managament system. For now, it only has 3 simple tables:
Product
ID | Name | Price
Receivings
ID | Date | Quantity | Product_ID (FK)
Sales
ID | Date | Quantity | Product_ID (FK)
As Receivings and Sales are identical, I was considering a different approach:
Product
ID | Name | Price
Receivings_Sales (the name doesn't matter)
ID | Date | Quantity | Type | Product_ID (FK)
The column type would identify if it was receiving or sale.
Can anyone help me choose the best option, pointing out the advantages and disadvantages of either approach?
The first one seems reasonable because I am thinking in a ORM way.
Thanks!
Personally I prefer the first option, that is, separate tables for Sales and Receiving.
The two biggest disadvantage in option number 2 or merging two tables into one are:
1) Inflexibility
2) Unnecessary filtering when use
First on inflexibility. If your requirements expanded (or you just simply overlooked it) then you will have to break up your schema or you will end up with unnormalized tables. For example let's say your sales would now include the Sales Clerk/Person that did the sales transaction so obviously it has nothing to do with 'Receiving'. And what if you do Retail or Wholesale sales how would you accommodate that in your merged tables? How about discounts or promos? Now, I am identifying the obvious here. Now, let's go to Receiving. What if we want to tie up our receiving to our Purchase Order? Obviously, purchase order details like P.O. Number, P.O. Date, Supplier Name etc would not be under Sales but obviously related more to Receiving.
Secondly, on unnecessary filtering when use. If you have merged tables and you want only to use the Sales (or Receving) portion of the table then you have to filter out the Receiving portion either by your back-end or your front-end program. Whereas if it a separate table you have just to deal with one table at a time.
Additionally, you mentioned ORM, the first option would best fit to that endeavour because obviously an object or entity for that matter should be distinct from other entity/object.
If the tables really are and always will be identical (and I have my doubts), then name the unified table something more generic, like "InventoryTransaction", and then use negative numbers for one of the transaction types: probably sales, since that would correctly mark your inventory in terms of keeping track of stock on hand.
The fact that headings are the same is irrelevant. Seeking to use a single table because headings are the same is misconceived.
-- person [source] loves person [target]
LOVES(source,target)
-- person [source] hates person [target]
HATES(source,target)
Every base table has a corresponding predicate aka fill-in-the-[named-]blanks statement describing the application situation. A base table holds the rows that make a true statement.
Every query expression combines base table names via JOIN, UNION, SELECT, EXCEPT, WHERE condition, etc and has a corresponding predicate that combines base table predicates via (respectively) AND, OR, EXISTS, AND NOT, AND condition, etc. A query result holds the rows that make a true statement.
Such a set of predicate-satisfying rows is a relation. There is no other reason to put rows in a table.
(The other answers here address, as they must, proposals for and consequences of the predicate that your one table could have. But if you didn't propose the table because of its predicate, why did you propose it at all? The answer is, since not for the predicate, for no good reason.)

Database structure, one big entity for multiple entities

Suppose that I have a store-website where user can leave comments about any product.
Suppose that I have tables(entities) in my website database: let it be 'Shoes', 'Hats' and 'Skates'.
I don't want to create separate "comments" table for every entity (like 'shoes_comments', 'hats_comments', 'skates_comments').
My idea is to somehow store all the comments in one big table.
One way to do this, that I thought of, is to create a table:
table (comments):
ID (int, Primary Key),
comment (text),
Product_id (int),
isSkates (boolean),
isShoes (boolean),
isHats (boolean)
and like flag for every entity that could have comments.
Then when I want to get comments for some product the SELECT query would look like:
SELECT comment
FROM comments, ___SOMETABLE___
WHERE ____SOMEFLAG____ = TRUE
AND ___SOMETABLE___.ID = comments.Product_id
Is this an efficient way to implement database for needed functionality?
What other ways i can do this?>
Sorry, this feels odd.
Do you indeed have one separate table for each product type? Don't they have common fields (e.g. name, description, price, product image, etc.)?
My recommendation as for tables: product for common fields, comments with foreign key to product but no hasX columns, hat with only the fields that are specific to the hat product line. The primary key in hat is either the product PK or an individual unique value (then you'd need an extra field for the foreign key to product).
I would recommend you to make one table for the comments and use a foreign key of other tables in the comments table.
The "normalized" way to do this is to add one more entity (say, "Product") that groups all characteristics common to shoes, hats and skates (including comments)
+-- 0..1 [Shoe]
|
[Product] 1 --+-- 0..1 [Hat]
1 |
| +-- 0..1 [Skate]
*
[Comment]
Besides performance considerations, the drawback here is that there is nothing in the data model preventing a row in Product to be referenced both by a row in Shoe and one in Hat.
There are other alternatives too (each with perks & flaws) - you might want to read something about "jpa inheritance strategies" - you'll find java-specific articles that discuss your same issue (just ignore the java babbling and read the rest)
Personally, I often end up using a single table for all entities in a hierarchy (shoes, hats and skates in our case) and sacrificing constraints on the altar of performance and simplicity (eg: not null in a field that is mandatory for shoes but not for hats and skates).

How to create history fact table?

I have some entities in my Data Warehouse:
Person - with attributes personId, dateFrom, dateTo, and others those can be changed, e.g. last name, birth date and so on - slowly changing dimension
Document - documentId, number, type
Address - addressId, city, street, house, flat
The relations between (Person and Document) is One-To-Many and (Person and Address) is Many-To-Many.
My target is to create history fact table that can answer us following questions:
What persons with what documents lived at defined address on defined date?
2, What history of residents does defined address have on defined interval of time?
This is not only for what DW is designed, but I think it is the hardest thing in DW's design.
For example, Miss Brown with personId=1, documents with documentId=1 and documentId=2 had been lived at address with addressId=1 since 01/01/2005 to 02/02/2010 and then moved to addressId=2 where has been lived since 02/03/2010 to current date (NULL?). But she had changed last name to Mrs Green since 04/05/2006 and her first document with documentId=1 to documentId=3 since 06/07/2007. Mr Black with personId=2, documentId=4 has been lived at addressId=1 since 02/03/2010 to current date.
The expected result on our query for question 2 where addressId=1, and time interval is since 01/01/2000 to now, must be like:
Rows:
last_name="Brown", documentId=1, dateFrom=01/01/2005, dateTo=04/04/2006
last_name="Brown", documentId=2, dateFrom=01/01/2005, dateTo=04/04/2006
last_name="Green", documentId=1, dateFrom=04/05/2006, dateTo=06/06/2007
last_name="Green", documentId=2, dateFrom=04/05/2006, dateTo=06/06/2007
last_name="Green", documentId=2, dateFrom=06/07/2007, dateTo=02/01/2010
last_name="Green", documentId=3, dateFrom=06/07/2007, dateTo=02/01/2010
last_name="Black", documentId=4, dateFrom=02/03/2010, dateTo=NULL
I had an idea to create fact table with composite key (personId, documentId, addressId, dateFrom) but I have no idea how to load this table and then get that expected result with this structure.
I will be pleased for any help!
Interesting question #Argnist!
So to create some common language for my example, you want a
DimPerson (PK=kcPerson, suggorate key for unique Persons=kPerson, type 2 dim)
DimDocument (PK=kcDocument, suggorate key for unique Documents=kDocument, type 2 dim)
DimAddress (PK=kcAddress, suggorate key for unique Addresses=kAddress, type 2 dim)
A colleague has written a short blog on the usage of two surrogate keys to explain the above dims 'Using Two Surrogate Keys on Dimensions'.
I would always add
DimDate with PK in the form yyyymmdd
to any data warehouse with extra attribute columns.
Then you would have your fact table as
FactHistory (FKs=kcPerson, kPerson, kcDocument, kDocument, kcPerson, kPerson, kDate)
plus any aditional measures.
Then joining on the "kc"s you can show the current Person/Document/Address dimension information.
If you joined on the "k"s you can show the historic Person/Document/Address dimension information.
The downside of this is that this fact table needs one row for each person/document/address/date combination. But it really is a very narrow table, since the table just has a number of foreign keys.
The advantage of this is it is very easy to query for the sorts of questions you were asking.
Alternatively, you could have your fact table as
FactHistory (FKs=kcPerson, kPerson, kcDocument, kDocument, kcPerson, kPerson, kDateFrom, kDateTo)
plus any aditional measures.
This is obviously much more compact, but the querying becomes more complex. You could also put a view over the Fact table to make it easier to query!
The choice of solution depends on the frequency of change of the data. I suspect that it will not be changing that quickly, so teh alternate design of the fact table may be better.
Hope that helps.

Architecture of SQL tables

I am wondering is it more useful and practical (size of DB) to create multiple tables in sql with two columns (one column containing foreign key and one column containing random data) or merge it and create one table containing multiple columns. I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
example a. one table
productID productname weight no_of_pages
1 book 130 500
2 watch 50 null
3 ring null null
example b. three tables
productID productname
1 book
2 watch
3 ring
productID weight
1 130
2 50
productID no_of_pages
1 500
The multi-table approach is more "normal" (in database terms) because it avoids columns that commonly store NULLs. It's also something of a pain in programming terms because you have to JOIN a bunch of tables to get your original entity back.
I suggest adopting a middle way. Weight seems to be a property of most products, if not all (indeed, a ring has a weight even if small and you'll probably want to know it for shipping purposes), so I'd leave that in the Products table. But number of pages applies only to a book, as do a slew of other unmentioned properties (author, ISBN, etc). In this example, I'd use a Products table and a Books table. The books table would extend the Products table in a fashion similar to class inheritance in object oriented program.
All book-specific properties go into the Books table, and you join only Products and Books to get a complete description of a book.
I think this all depends on how the tables will be used. Maybe your examples are oversimplifying things too much but it seems to me that the first option should be good enough.
You'd really use the second example if you're going to be doing extremely CPU intensive stuff with the first table and will only need the second and third tables when more information about a product is needed.
If you're going to need the information in the second and third tables most times you query the table, then there's no reason to join over every time and you should just keep it in one table.
I would suggest example a, in case there is a defined set of attributes for product, and an example c if you need variable number of attributes (new attributes keep coming every now and then) -
example c
productID productName
1 book
2 watch
3 ring
attrID productID attrType attrValue
1 1 weight 130
2 1 no_of_pages 500
3 2 weight 50
The table structure you have shown in example b is not normalized - there will be separate id columns required in second and third tables, since productId will be an fk and not a pk.
It depends on how many rows you are expecting on your PRODUCTS table. I would say that it would not make sense to normalize your tables to 3N in this case because product name, weight, and no_of_pages each describe the products. If you had repeating data such as manufacturers, it would make more sense to normalize your tables at that point.
Without knowing the background (data model), there is no way to tell which variant is more "correct". both are fine in certain scenarios.
You want three tables, full stop. That's best because there's no chance of watches winding up with pages (no pun intended) and some books without. If you normalize, the server works for you. If you don't, you do the work instead, just not as well. Up to you.
I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
That's always true of nullable columns. Here's the rule: a nullable column has an optional relationship to the key. A nullable column can always be, and usually should be, in a separate table where it can be non-null.