Im currently working on a small project in which I need to model the following scenario:
Scenario
Customer calls, he want an quote on a new car.
Sales rep. register customer information.
Sales rep. create a quote in the system, and add a item to the quote (the car).
Sales rep. send the quote to the customer on email.
Customer accept the quote, and the quote is now not longer a quote but an order.
Sales rep. check the order, everything is OK and he invoice the order. The order is now not longer an order, but an invoice.
Thoughts
I need a bit of help finding out the ideal way to model this, but I have some thoughts.
I'm thinking that both draft/quote/invoice is basically an order.
Draft/quote/invoice need seperate unique numbers(id's) so there for i'm thinking separate tables for all of them.
Model
This is my data model v.1.0, please let me know what you think.
Concerns
I however have som concerns regarding this model:
Draft/quote/invoice might have different items and prices on the order lines. In this model all draft/quote/invoice is connected to the same order and also order lines, making it impossible to have separate quote lines/draft lines/invoice lines. Maybe I shall make new tables for this, but then basically the same information would be stored in multiple tables, and that is not good either.
Sometimes two or more quotes become an invoice, how would this model take care of this?
If you have any tips on how to model this better, please let me know!
EDIT: Data model v.1.4
It looks like you've modeled every one of these things--quote, order, draft, invoice--as structurally identical to all the others. If that's the case, then you can "push" all the similar attributes up into a single table.
create table statement (
stmt_id integer primary key,
stmt_type char(1) not null check (stmt_type in ('d', 'q', 'o', 'i')),
stmt_date date not null default current_date,
customer_id integer not null -- references customer (customer_id)
);
create table statement_line_items (
stmt_id integer not null references statement (stmt_id),
line_item_number integer not null,
-- other columns for line items
primary key (stmt_id, line_item_number)
);
I think that will work for the model you've described, but I think you'll be better served in the long run by modeling these as a supertype/subtype. Columns common to all subtypes get pushed "up" into the supertype; each subtype has a separate table for the attributes unique to that subtype.
This SO question and its accepted answer (and comments) illustrate a supertype/subtype design for blog comments. Another question relates to individuals and organizations. Yet another relating to staffing and phone numbers.
Later . . .
This isn't complete, but I'm out of time. I know it doesn't include line items. Might have missed something else.
-- "Supertype". Comments appear above the column they apply to.
create table statement (
-- Autoincrement or serial is ok here.
stmt_id integer primary key,
stmt_type char(1) unique check (stmt_type in ('d','q','o','i')),
-- Guarantees that only the order_st table can reference rows having
-- stmt_type = 'o', only the invoice_st table can reference rows having
-- stmt_type = 'i', etc.
unique (stmt_id, stmt_type),
stmt_date date not null default current_date,
cust_id integer not null -- references customers (cust_id)
);
-- order "subtype"
create table order_st (
stmt_id integer primary key,
stmt_type char(1) not null default 'o' check (stmt_type = 'o'),
-- Guarantees that this row references a row having stmt_type = 'o'
-- in the table "statement".
unique (stmt_id, stmt_type),
-- Don't cascade deletes. Don't even allow deletes. Every order given
-- an order number must be maintained for accountability, if not for
-- accounting.
foreign key (stmt_id, stmt_type) references statement (stmt_id, stmt_type)
on delete restrict,
-- Autoincrement or serial is *not* ok here, because they can have gaps.
-- Database must account for each order number.
order_num integer not null,
is_canceled boolean not null
default FALSE
);
-- Write triggers, rules, whatever to make this view updatable.
-- You build one view per subtype, joining the supertype and the subtype.
-- Application code uses the updatable views, not the base tables.
create view orders as
select t1.stmt_id, t1.stmt_type, t1.stmt_date, t1.cust_id,
t2.order_num, t2.is_canceled
from statement t1
inner join order_st t2 on (t1.stmt_id = t2.stmt_id);
There should be a table "quotelines", which would be similar to "orderlines". Similarly, you should have an 'invoicelines' table. All these tables should have a 'price' field (which nominally will be the part's default price) along with a 'discount' field. You could also add a 'discount' field to the 'quotes', 'orders' and 'invoices' tables, to handle things like cash discounts or special offers. Despite what you write, it is good to have separate tables, as the amount and price in the quote may not match what the customer actually orders, and again it may not be the same amount that you actually supply.
I'm not sure what the 'draft' table is - you could probably combine the 'draft' and 'invoices' tables as they hold the same information, with one field containing the status of the invoice - draft or final. It is important to separate your invoice data from order data, as presumably you will be paying taxes according to your income (invoices).
'Quotes', 'Orders' and 'Invoices' should all have a field (foreign key) which holds the value of the sales rep; this field would point to the non-existent 'SalesRep' table. You could also add a 'salesrep' field in the 'customers' table, which points to the default rep for the customer. This value would be copied into the 'quotes' table, although it could be changed if a different rep to the default gave the quote. Similarly, this field should be copied when an order is made from a quote, and an invoice from an order.
I could probably add much more, but it all depends on how complex and detailed a system you want to make. You might need to add some form of 'bill of materials' if the cars are configured according to their options and priced accordingly.
Add a new column to line_items ( ex:Status as smallint)
When a quote_line becomes an order_line then set bit you choose from 0 to 3 to 1.
But when qty changes then add a new line with new qte and keep last line unchanged.
Kad.
Related
I am new to DB Design and I've recently inherited the responsibility of adding some new attributes to an existing design.
Below is a sample of the current table in question:
Submission Table:
ID (int)
Subject (text)
Processed (bit)
SubmissionDate (datetime)
Submitted (bit)
...
The new requirements are:
A Submission can be marked as valid or invalid
A Reason must be provided when a Submission is marked as invalid. (So a submission may have an InvalidReason)
Submissions can be associated with one another such that: Multiple valid Submissions can be set as "replacements" for an invalid Submission.
So I've currently taken the easy solution and simply added new attributes directly to the Submission Table so that it looks like this:
NEW Submission Table:
ID (int)
Subject (text)
Processed (bit)
SubmissionDate (datetime)
Submitted (bit)
...
IsValid (bit)
InvalidReason (text)
ReplacedSubmissionID (int)
Everything works fine this way, but it just seems a little strange:
Having InvalidReason as a column that will be NULL for majority of submissions.
Having ReplacedSubmissionID as a column that will be NULL for majority of submissions.
If I understand normalization right, InvalidReason might be transitively dependent on the IsValid bit.
It just seems like somehow some of these attributes should be extracted to a separate table, but I don't see how to create that design with these requirements.
Is this single table design okay? Anyone have better alternative ideas?
Whether or not you should have a single table design really depends on
1) How you will be querying the data
2) How much data would end up being potentially NULL in the resulting table.
In your case its probably ok, but again it depends on #1. If you will be querying separately to get information on invalid submissions, you may want to create a separate table that references the Id of invalid submissions and the reason:
New table: InvalidSubmissionInfo
Id (int) (of invalid submissions; will have FK contraint on Submission table)
InvalidReason (string)
Additionally if you will be querying for replaced submissions separately you may want to have a table just for those:
New table: ReplacementSubmissions
Id (int) (of the replacement submissions; will have FK contraint on Submission table)
ReplacedSubmissionId (int) (of what got replaced; will have FK constraint on submission table)
To get the rest of the info you will still have to join with the Submissions table.
All this to say you do not need separate this out into multiple tables. Having a NULL value only takes up 1 bit of memory which isn't bad. And if you need to query and return an entire Submission record each time, it makes more sense to condense this info into one table.
Single table design looks good to me and it should work in your case.
If you do not like NULLS, you can give default value of an empty string and ReplacedSubmissionID to 0. Default values are always preferable in database design.
Having an empty string or default value will make your data look more cleaner.
Please remember if you add default values, you might need to change queries to get proper results.
For example:-
Getting submissions which have not been replaced>
Select * from tblSubmission where ReplacedSubmissionID = 0
Don't fear joins. Looking for ways to place everything in a single table is at best a complete waste of time, at worst results in a convoluted, unmaintainable mess.
You are correct about InvalidReason and IsValid. However, you missed SubmittedDate and Submitted.
Whenever modeling an entity that will be processed in some way and going through consecutive state changes, these states really should be placed in a separate table. Any information concerning the state change -- date, reason for the change, authorization, etc. -- will have a functional dependency on the state rather than the entity as a whole, therefore an attempt to make the state information part of the entity tuple will fail at 2nf.
The problem this causes is shown in your very question. You already incorporated Submitted and SubmittedDate into the tuple. Now you have another state you want to add. If you had normalized the submission data, you could have simply added another state and gone on.
create table StateDefs(
ID int auto_generated primary key,
Name varchar( 16 ) not null, -- 'Submitted', 'Processed', 'Rejected', etc.
... -- any other data concerning states
);
create table Submissions(
ID int auto_generated primary key,
Subject varchar( 128 ) not null,
... -- other data
);
create table SubmissionStates(
SubID int not null references Submissions( ID ),
State int not null references StateDefs( ID ),
When date not null,
Description varchar( 128 )
);
This shows that a state consists of a date and an open text field to place any other information. That may suit your needs. If different states require different data, you may have to (gasp) create other state tables. Whatever your needs require.
You could insert the first state of a submission into the table and update that record at state changes. But you lose the history of state changes and that is useful information. So each state change would call for a new record each time. Reading the history of a submission would then be easy. Reading the current state would be more difficult.
But not too difficult:
select ss.*
from SubmissionStates ss
where ss.SubID = :SubID
and ss.When =(
select Max( When )
from SubmissionStates
where SubID = ss.SubID
and When <= Today() );
This finds the current row, that is, the row with the most recent date. To find the state that was in effect on a particular date, change Today() to something like :AsOf and place the date of interest in that variable. Storing the current date in that variable returns the current state so you can use the same query to find current or past data.
Say you had a Model called Forest. Each object represents a forest on your continent. There is a set of data that is common to all these forests, like forest type, area etc., and these can be easily represented by columns on the SQL table, forest.
However, imagine that these forests had additional data about them that might not always be repeatable. For example the 20 coniferous forests have a pine-fir split ratio number, whereas the deciduous forests have a autumn-duration number. One way would be to store all these columns on the main table itself, but there will be too many columns on each row, with many columns remaining un-filled by definition.
The most obvious way around this is to make sub-classes of the Forest model and have separate table for each subclass. I feel that's a heavy handed approach that I would rather not follow. If I need some data about the generic forest I'll have to consult another table.
Is there a pattern to solve this problem? What solution do you usually prefer?
NOTE: I have seen the other questions about this. The solutions proposed were:
Subtyping, same as I proposed above.
Have all the columns on the same table.
Have separate tables for each kind of forest, with duplicated data like area and rainfall... duplicated.
Is there an inventive solution that I don't know of?
UPDATE: I have run into the EAV model, and also a modified version where the unpredictable fields are stored out in a NoSQL/JSON store, and the id for that is held in the RDB. I like both, but welcome suggestions in this direction.
On the database side, the best approach is often to store attributes common to all forests in one table, and to store unique attributes in other tables. Build updatable views for clients to use.
create table forests (
forest_id integer primary key,
-- Assumes forest names are not unique on a continent.
forest_name varchar(45) not null,
forest_type char(1) not null
check (forest_type in ('c', 'd')),
area_sq_km integer not null
check (area_sq_km > 0),
-- Other columns common to all forests go here.
--
-- This constraint lets foreign keys target the pair
-- of columns, guaranteeing that a row in each subtype
-- table references a row here having the same subtype.
unique (forest_id, forest_type)
);
create table coniferous_forests_subtype (
forest_id integer primary key,
forest_type char(1) not null
default 'c'
check (forest_type = 'c'),
pine_fir_ratio float not null
check (pine_fir_ratio >= 0),
foreign key (forest_id, forest_type)
references forests (forest_id, forest_type)
);
create table deciduous_forests_subtype (
forest_id integer primary key,
forest_type char(1) not null
default 'd'
check (forest_type = 'd'),
autumn_duration_days integer not null
check (autumn_duration_days between 20 and 100),
foreign key (forest_id, forest_type)
references forests (forest_id, forest_type)
);
Clients usually use updatable views, one for each subtype, instead of using the base tables. (You can revoke privileges on the base subtype tables to guarantee this.) You might want to omit the "forest_type" column.
create view coniferous_forests as
select t1.forest_id, t1.forest_type, t1.area_sq_km,
t2.pine_fir_ratio
from forests t1
inner join coniferous_forests_subtype t2
on t1.forest_id = t2.forest_id;
create view deciduous_forests as
select t1.forest_id, t1.forest_type, t1.area_sq_km,
t2.autumn_duration_days
from forests t1
inner join deciduous_forests_subtype t2
on t1.forest_id = t2.forest_id;
What you have to do to make these views updatable varies a little with the dbms, but expect to write some triggers (not shown). You'll need triggers to handle all the DML actions--insert, update, and delete.
If you need to report only on columns that appear in "forests", then just query the table "forests".
Well, the easiest way is putting all the columns into one table and then having a "type" field to decide which columns to use. This works for smaller tables, but for more complicated cases it can lead to a big messy table and issues with database constraints (such as NULLs).
My preferred method would be something like this:
A generic "Forests" table with: id, type, [generic_columns, ...]
"Coniferous_Forests" table with: id, forest_id (FK to Forests), ...
So, in order to get all the data for a Coniferous Forest with id of 1, you'd have a query like so:
SELECT * FROM Coniferous_Forests INNER JOIN Forests
ON Coniferous_Forests.forest_id = Forests.id
AND Coniferous_Forests.id = 1
As for inventive solutions, there is such a thing as an OODBMS (Object Oriented Database Management Sytem).
The most popular alternative to Relational SQL databases are Document-Oriented NoSQL databases like MongoDB. This is comparable to using JSON objects to store your data, and allows you to be more flexible with your database fields.
I have a huge existing Order Management Application.
Now, in the main ORDER Table, i am adding a new column: IS_HISTORICAL. If its value is: TRUE, means the Order is Historical now, and should not show up in application.
Now, i have to modify many SQL Queries in my existing application so that they select only those orders whose IS_HISTORICAL is 'FALSE' - i.e add following in WHERE clause:
AND IS_HISTORICAL='FALSE'
Question: *Is there a easier way - so that i do not have to modify so many application queries (to hide away historical orders)?
Essentially all ORDERS marked as IS_HISTORICAL='TRUE' should become invisible/un-available for read/updates!!*
Note: Right now the table sizes are not very huge, but ultimately i intend to partition the table by IS_HISTORICAL true/false.
If you're only going to use the historical data for analysis then I prefer Florin's solution as the amount of data you need to look at for each query remains smaller. It makes the analysis queries more difficult as you need to UNION ALL but everything else will run "quicker" (it may not be noticable).
If some applications/users require access to the historical data the better solution would be to rename your table and create a view on top of it with the query that you need.
The problem with re-writing all your queries is that you're going to forget one or get one incorrect, either now or in the future. A view removes that problem for you as the query is static, every time you query the view the additional conditions you require are automatically added.
Something like:
rename orders to order_history;
create or replace view orders as
select *
from order_history
where is_historical = 'FALSE';
Two further points.
I wouldn't bother with TRUE / FALSE, if the table gets large it's a lot of additional data to scan. Create your column as a VARCHAR2(1) and use T / F or Y / N, they are as immediately obvious but are smaller. Alternatively use a NUMBER(1,0) and 1 / 0.
Don't forget to put a constraint on your table so that the IS_HISTORICAL column can only have the values you've chosen.
If you're only ever going to have the two values then you may want to consider a CHECK CONSTRAINT:
alter table order_history
add constraint chk_order_history_historical
check ( is_historical in ('T','F') );
Otherwise, maybe you should do this anyway, use a FOREIGN KEY CONSTRAINT. Define an extra table, ORDER_HISTORY_TYPES
create table order_history_types (
id varchar2(1)
, description varchar2(4000)
, constraint pk_order_history_types primary key (id)
);
Fill it with your values and then add the foreign key:
alter table order_history
add constraint fk_order_history_historical
foreign key (is_historical)
references order_history_types (id)
You could look into using Virtual Private Database/row-level security. This can be used to automatically add the is_historical = 'FALSE' predicate when certain conditions are met (e.g. you're connected as the application user).
If the user only need nonhistorical records, an option is to create an ORDER_HIST table and move there the historical records. (delete and insert)
If some users/applications need both type of records then the partition aproach is the best.
I need a two retrieve data from the same table but divided in different columns.
First table "PRODUCTS" has the following columns:
PROD_ID
PRO_TYPE_ID
PRO_COLOR_ID
PRO_WEIGHT_ID
PRO_PRICE_RANGE_ID
Second table "COUNTRY_TRANSLATIONS" has the following columns:
ATTRIBUTE_ID
ATT_LANGUAGE_ID
ATT_TEXT_ID
Third and last table "TEXT_TRANSLATIONS" has the following columns:
TRANS_TEXT_ID
TRA_TEXT
PRO_TYPE_ID, PRO_COLOR_ID, PRO_WEIGHT_ID and PRO_PRICE_RANGE_ID are all integers and are found back in the column ATTRIBUTE_ID multiple times (depending on howmany translations are available). Then ATT_TEXT_ID is joined with TRANS_TEXT_ID from the TEXT_TRANSLATIONS table.
Basically I need to run a query so I can retreive information from TEXT_TRANSLATIONS multiple times. Right now I get an error saying that the correlation is not unique.
The data is available in more then 20 languages, therefore the need to work with intergers for each of the attributes.
Any suggestion on how I should build up the query? Thank you.
Hopefully, you're on an RDBMS that supports CTEs (pretty much everything except mySQL), or you'll have to modify this to refer to the joined tables each time...
WITH Translations (attribute_id, text)
as (SELECT c.attribute_id, t.tra_text
FROM Country_Translations c
JOIN Text_Translations t
ON t.trans_text_id = c.att_text_id
WHERE c.att_language_id = #languageId)
SELECT Products.prod_id,
Type.text,
Color.text,
Weight.text,
Price_Range.text
FROM Products
JOIN Translations as Type
ON Type.attribute_id = Products.pro_type_id
JOIN Translations as Color
ON Color.attribute_id = Products.pro_color_id
JOIN Translations as Weight
ON Weight.attribute_id = Products.pro_weight_id
JOIN Translations as Price_Range
ON Price_Range.attribute_id = Products.pro_price_range_id
Of course, personally I think the design of the localization table was botched in two ways -
Everything is in the same table (especially without an 'attribute type' column).
The language attribute is in the wrong table.
For 1), this is mostly going to be a problem because you now have to maintain system-wide uniqueness of all attribute values. I can pretty much guarantee that, at some point, you're going to run into 'duplicates'. Also, unless you've designed your ranges with a lot of free space, the data values are non-consecutive for type; if you're not careful there is the potential for update statements being run over the wrong values, simply because the start and end of the given range belong to the same attribute, but not every value in the range.
For 2), this is because a text can't be completely divorced from it's language (and country 'locale'). From what I understand, there are parts of some text that are valid as written in multiple languages, but mean completely different things when read.
You'd likely be better off storing your localizations in something similar to this (only one table shown here, the rest are an exercise for the reader):
Color
=========
color_id -- autoincrement
cyan -- smallint
yellow -- smallint
magenta -- smallint
key -- smallint
-- assuming CYMK palette, add other required attributes
Color_Localization
===================
color_localization_id -- autoincrement, but optional:
-- the tuple (color_id, locale_id) should be unique
color_id -- fk reference to Color.color_id
locale_id -- fk reference to locale table.
-- Technically this is also country dependent,
-- but you can start off with just language
color_name -- localized text
This should make it so that all attributes have their own set of ids, and tie the localized text to what it was localized to directly.
I have an issue where we have a customer table includes name, email, address and a skills table which is qts, first aid which is associated by an id. For example
Customer
id = 1
Name = James
Address = some address
Skills
1, qts
2, first aid
I am now trying to pair up the relationship. I first came to a quick solution just by creating a skills table which just has customerId and each skill has a true / false value. Then created a go between customer_skills with an customerId to SkillId. But I would not know how to update the records when values change as there is no unique id.
can anyone help on what would be the best way to do this?
thanks....
The solution you want really depends on your data, and is a question that has been asked thousands of times before. If you google
Entity Attribute Value vs strict relational model you will see countless articles
comparing and contrasting the methods available.
Strict Relational Model
You would add additional BIT or DATETIME fields (Where a NULL datetime represents the customer not having the skill)
to your customer table for each skill. This works well if you have few skills that are unlikely to change much over time.
This allows simple queries to locate customers with skills, especially with various combinations of skills ee.g (Using datetime fields)
SELECT *
FROM Customer
WHERE Skill1 >= '20120101' -- SKILL 1 AQUIRED AFTER 1ST JAN 2012
AND Skill2 IS NOT NULL -- HAS SKILL 2
AND Skill2 IS NULL -- DOES NOT POSSESS SKILL 3
Entity-Attribute-Value Model
This is a slight adaptation of a classic entity-attribute-value model, because the value is boolean represented by the existence of a record.
You would create a table like this:
CREATE TABLE CustomerSkills
( CustomerID INT NOT NULL,
SkillID INT NOT NULL
PRIMARY KEY (CustomerID, SkillID),
FOREIGN KEY (CustomerID) REFERENCES Customer (ID),
FOREIGN KEY (SkillID) REFERENCES Skills (ID)
)
You may want additional columns such as DateAdded, AddedBy etc to track when skills were added and who by etc, but the core principles can be gathered from the above.
With this method it is much easier to add skills, as it doesn't require adding columns, but can make simple queries much more complicated. The above query would have to be written as:
SELECT Customer.*
FROM Customer
INNER JOIN
( SELECT CustomerID
FROM CustomerSkills
WHERE SkillID IN (2, 3) -- SKILL2,SKILL3
OR (SkillID = 1 AND DateAdded >= '20120101')
GROUP BY CustomerID
HAVING COUNT(*) = 2
AND COUNT(CASE WHEN SkillID = 3 THEN 1 END) = 0
) skills
ON Skills.CustomerID = Customer.ID
This is much more complext and resource intensive than with the relational model, but the overall structure is much more flexible.
So to summarise, it really depends on your own particular situation, there are a few factors to consider, but there are plenty of resources out there to help you decide.
If you have a table linking the primary keys from two other tables together in order to form a many-to-many relationship (like in you example) you don't have to update that table. Instead you can just delete and reinsert values into it.
If you are editing a custiomer (customerId 46 for instance) and changing the skills for that customer, you can just delete all skills for the customer and then reinsert the new set of skills when storing the changes.
If your "link table" contains some additional information besides just the two primary key columns, then the situation might be different. But from your description it seems like you just want to link the table together using the primary keys from each table. In that case a delete + reinsert should be fine.
Also in this kind of table, you should make the combination of the two foreign key fields be the primary key of the binding table.