I am having following data :
Fabric Cost
time | No fabric|BangloreSilk|Chanderi|.... <- fabric types
--------------------------------------------
01/15 | 40 | 25 |...
02/15 | 45 | 30 |...
..... | ... | ... |...
Dyeing Cost
time | No fabric|BangloreSilk|Chanderi|.... <- fabric types
--------------------------------------------
01/15 | 40 | 25 |...
02/15 | 45 | 30 |...
..... | ... | ... |...
And here list of fabric types will be same for both the data.
Now to add this data I created following tables :
fabric_type
id int
fabric_type_name varchar
And then I have two approaches .
Approach 1 :
fabric_cost
id int
fabric_type_id int (foreign key to fabric_type)
cost int
deying_cost
id int
fabric_type_id int (foreign key to fabric_type)
cost int
Approach 2 :
fabric_overall_cost
id int
fabric_type_id int (foreign key to fabric_type)
cost int
fabric_or_dyeing bit (to represent 0 for fabric cost and 1 for dyeing cost)
Now the question is which approach will be better??
Maybe you can create another table - cost_subjects
cost_subjects
id byte
subject varchar
costs
id int
fabric_type_id int (foreign key to fabric_type)
cost int
cost_subject byte (foreign key to cost_subjects table)
And then you can extend the table with more subjects to include in costs of fabric
It really depends on your requirements. Are there other columns that are unique only for the fabric_cost table? Are there other columns that are unique only for the dyeing_cost table? Meaning will your 2 tables grow independently?
If yes, approach 1 is better. Otherwise, approach 2 is better because you won't need to do CRUD on 2 separate tables (for easier maintenance).
Another approach would be:
id int
fabric_type_id int (foreign key to fabric_type)
fabric_cost float/double/decimal
dyeing_cost float/double/decimal
This third approach is if you always have both costs. You might not want to use int for cost. Again, it depends on your requirements.
Related
Suppose I have a student table with id as primary key, which holds information about all students.
id | name
---+------
1 | aaron
2 | bob
In addition there is a table where id and tid form a composite key, which holds the scores for each test.
id | tid | score
---| --- | -----
Note: Different students have different tests with different numbers and no correlation. tid does not represent a specific test at all, but for a student the test serial number. id=1 and id=2, if tid=1, does not mean it is the same test.
There are two ways to generate tid, one is globally unique and increases by 1 for each record inserted, e.g.
id | tid | score
-- | --- | -----
1 | 1 | 99
1 | 2 | 98
2 | 3 | 97
2 | 4 | 96
The other is unique within a specific id, and different ids can have the same tid take value, for example
id | tid | score
-- | --- | -----
1 | 1 | 99
1 | 2 | 98
2 | 1 | 97
2 | 2 | 96
In the previous way, a student with id=2 could probably guess how many tests roughly the whole school went through in between based on his tid change. Since the tid of each student changes globally, this is something I don't want. Of course I could consider using a non-repeating use of random numbers or scheme. But I would prefer a slightly more compact incremental integer to describe it.
For the latter, is there a more efficient and simple way to implement it?
Option 1.
create table student(id integer PRIMARY KEY, name varchar);
create table test(tid integer PRIMARY KEY, name varchar, test_date date);
create table test_score(sid integer, tid integer references test, score integer, PRIMARY KEY(sid, tid);
UPDATE
Option 2.
Create an BEFORE INSERT trigger that uses a function that does roughly:
CREATE OR REPLACE FUNCTION tid_incr()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
DECLARE
max_ct integer;
BEGIN
SELECT INTO max_ct max(tid) FROM score WHERE id = NEW.id;
NEW.tid = max_ct + 1;
RETURN NEW;
END;
$function$
The correct design is option 2, because tid should refer (ie be a foreign key) to the test (ie a collection of questions) being taken (whether the test is custom for each student or not).
If students can take the same test more than once, add a date column (or timestamp if multiple attempts may be made of the same day) to distinguish the results of repeat attempts at the same test.
I have a postgresql database, which heavily relies on events from the outside, e.g. administrator changing / adding some fields or records might trigger a change in overall fields structure in other tables.
There lies the problem, however, as sometimes the fields changed by the trigger function are primary key fields. There is a table, which uses two foreign keys ids as the primary key, as in example below:
# | PK id1 | PK id2 | data |
0 | 1 | 1 | ab |
1 | 1 | 2 | cd |
2 | 1 | 3 | ef |
However, within one transaction (if I may call it such, since, in fact, it is a plpgsql function), the structure might be changed to:
# | PK id1 | PK id2 | data |
0 | 1 | 3 | ab |
1 | 1 | 2 | cd |
2 | 1 | 1 | ef |
Which, as you might have noticed, changed the 0th record's second primary key to 3, and the 2nd's to 1, which is the opposite of what they were before.
It is 100% certain that after the function has taken its effect there will be no collisions whatsoever, but I'm wondering, how can this be implemented?
I could, in fact, use a synthetic primary key as a BIGSERIAL, yet there is still a need for those two ids to be UNIQUE constained, so it wouldn't do the trick, unfortunately.
You can declare a constraint as deferrable, for example a primary key:
CREATE TABLE elbat (id int,
nmuloc int,
PRIMARY KEY (id)
DEFERRABLE);
You can then use SET CONSTRAINTS in a transaction to set deferrable constraints as deferred. That means that they can be violated temporarily during the transaction but must be fulfilled at the transaction's COMMIT.
Let's assume we have some data in our example table:
INSERT INTO elbat (id,
nmuloc)
VALUES (1,
1),
(2,
2);
We can now switch the IDs like this:
BEGIN TRANSACTION;
SET CONSTRAINTS ALL DEFERRED;
UPDATE elbat
SET id = 2
WHERE nmuloc = 1;
SELECT *
FROM elbat;
UPDATE elbat
SET id = 1
WHERE nmuloc = 2;
COMMIT;
There's no error even though the IDs are both 2 after the first UPDATE.
db<>fiddle
More on that can be found in the documentation, e.g. in CREATE TABLE (or ALTER TABLE) and SET CONSTRAINTS.
When keying a FACT table in a data warehouse, is it better best to use the primary key from the foreign table or the unique key or identifier used by the business?
For example (see below illustration), assume you have two dimension tables "DimStores" and "DimCustomers" and one FACT table named "FactSales". Both of the dimension tables have an indexed primary key field that is an integer data type and is named "ID". They also have an indexed unique business key field that is a alpha-numeric text data type named "Number".
Typically you'd use the primary key of dimension tables as the foreign keys in the FACT table. However, I'm wondering if that is the best approach.
By using the primary key, in order to look up or do calculations on the facts in the FACT table, you'd likely have to always do a join query on the primary key and use the business key as your look up. The reason is because most users won't know the primary key value to do a lookup in the FACT table. They will, however, likely know the business key. Therefore to use that business key you'd have to do a join query to make the relationship.
Since the business key is indexed anyway, would it be better to just use that as the foreign key in the FACT table? That way you wouldn't have to do a join and just do your lookup or calculations directly?
I guess it boils down to whether join queries are that expensive? Imagine you're dealing with a billion record FACT table and dimensions with tens of millions of records.
Example tables:
DimStores:
+------------+-------------+-------------+
| StoreId | StoreNumber | StoreName |
+------------+-------------+-------------+
| 1 | S001 | Los Angeles |
| 2 | S002 | New York |
+------------+-------------+-------------+
DimCustomers:
+------------+----------------+--------------+
| CustomerId | CustomerNumber | CustomerName |
+------------+----------------+--------------+
| 1 | S001 | Michael |
| 2 | S002 | Kareem |
| 3 | S003 | Larry |
| 4 | S004 | Erving |
+------------+----------------+--------------+
FactSales:
+---------+------------+------------+
| StoreId | CustomerId | SaleAmount |
+---------+------------+------------+
| 1 | 1 | $400 |
| 1 | 2 | $300 |
| 2 | 3 | $200 |
| 2 | 4 | $100 |
+---------+------------+------------+
In the above to get the total sales for the Los Angles store I'd have to do this:
Select Sum(SaleAmount)
From FactSales FT
Inner Join DimStores D1 ON FT.StoreId = D1.StoreId
Where D1.StoreNumber = 'S001'
Had I used the "StoreNumber" and "CustomerNumber" fields as the foreign keys instead in the "FactSales" table. I wouldn't have had to do a join query and could have directly done this instead:
Select Sum(SaleAmount)
From FactSales
Where StoreNumber = 'S001'
The reason you use artificial primary keys is to isolate the data warehouse from business decisions.
Your business grows. Now you have more than 1000 stores. The keys for the stores change. How do you handle this?
If the store key is spread throughout your data warehouse, this is a painful operation. If the store key is just an attribute on a dimension table, then this is easy.
I should also note that in many cases, the dimensions might be type 2 dimensions -- meaning that they change over time. For instance, customers can change their names, but you might want to know what their name was at a particular point in time.
And a third reason. Artificial primary keys are usually integers. These are better for indexing than strings (particularly strings with variable lengths). The difference in performance is minor, but it is a reason to use the primary keys. In fact, if the keys are strings and are longer than integers, it might be more efficient to use the artificial keys in terms of space.
I am building a database for a company that sells warranty packages. There are a number of attributes associated to each package that factor into the final price. For example, the age of the vehicle. Prices for the age of the vehicle need to be set at the package level, but also can be set at the state/province or dealership level. This is in order to allow for overriding the base package pricing
I would like to know if the table example below is an acceptable practice or is building separate tables for state/province and dealership pricing a better decision.
Id | int | NOT NULL
PackageId | int | NOT NULL
StateProvinceId | int | NULL
DealershipId | int | NULL
MinAge | int | NOT NULL
MaxAge | int | NOT NULL
Premium | money | NOT NULL
When a customer is purchasing a warranty, I need to prioritize the qualification by 1) dealership, 2) state/province, and 3) package.
I have the following:
1. A table "patients" where I store patients data.
2. A table "tests" where I store data of tests done to each patient.
Now the problem comes as I have 2 types of tests "tests_1" and "tests_2"
So for each test done to particular patient I store the type and id of the type of test:
CREATE TABLE IF NOT EXISTS patients
(
id_patient INTEGER PRIMARY KEY,
name_patient VARCHAR(30) NOT NULL,
sex_patient VARCHAR(6) NOT NULL,
date_patient DATE
);
INSERT INTO patients values
(1,'Joe', 'Male' ,'2000-01-23');
INSERT INTO patients values
(2,'Marge','Female','1950-11-25');
INSERT INTO patients values
(3,'Diana','Female','1985-08-13');
INSERT INTO patients values
(4,'Laura','Female','1984-12-29');
CREATE TABLE IF NOT EXISTS tests
(
id_test INTEGER PRIMARY KEY,
id_patient INTEGER,
type_test VARCHAR(15) NOT NULL,
id_type_test INTEGER,
date_test DATE,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests values
(1,4,'test_1',10,'2004-05-29');
INSERT INTO tests values
(2,4,'test_2',45,'2005-01-29');
INSERT INTO tests values
(3,4,'test_2',55,'2006-04-12');
CREATE TABLE IF NOT EXISTS tests_1
(
id_test_1 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
data4 REAL,
data5 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_1 values
(10,4,100.7,1.8,10.89,20.04,5.29);
CREATE TABLE IF NOT EXISTS tests_2
(
id_test_2 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_2 values
(45,4,10.07,18.9,1.8);
INSERT INTO tests_2 values
(55,4,17.6,1.8,18.89);
Now I think this approach is redundant or not to good...
So I would like to improve queries like
select * from tests WHERE id_patient=4;
select * from tests_1 WHERE id_patient=4;
select * from tests_2 WHERE id_patient=4;
Is there a better approach?
In this example I have 1 test of type tests_1 and 2 tests of type tests_2 for patient with id=4.
Here is a fiddle
Add a table testtype (id_test,name_test) and use it an FK to the id_type_test field in the tests table. Do not create seperate tables for test_1 and test_2
It depends on the requirement
For OLTP I would do something like the following
STAFF:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | JOB_TITLE | ...
-------------------------------------------------------------
1 | harry | potter | 2001-01-01 | consultant | ...
2 | ron | weasley | 2001-02-01 | pathologist | ...
PATIENT:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | ...
-----------------------------------------------
1 | hermiony | granger | 2013-01-01 | ...
TEST_TYPE:
ID | CATEGORY | NAME | DESCRIPTION | ...
--------------------------------------------------------
1 | haematology | abg | arterial blood gasses | ...
REQUEST:
ID | TEST_TYPE_ID | PATIENT_ID | DATE_REQUESTED | REQUESTED_BY | ...
----------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 1 | ...
RESULT_TYPE:
ID | TEST_TYPE_ID | NAME | UNIT | ...
---------------------------------------
1 | 1 | co2 | kPa | ...
2 | 1 | o2 | kPa | ...
RESULT:
ID | REQUEST_ID | RESULT_TYPE_ID | DATE_RESULTED | RESULTED_BY | RESULT | ...
-------------------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 2 | 5 | ...
2 | 1 | 2 | 2013-01-02 | 2 | 5 | ...
A concern I have with the above is with the unit of the test result, these can sometimes (not often) change. It may be better to place the unit un the result table.
Also consider breaking these into the major test categories as my understanding is they can be quite different e.g. histopathology and xrays are not resulted in the similar ways as haematology and microbiology are.
For OLAP I would combine request and result into one table adding derived columns such as REQUEST_TO_RESULT_MINS and make a single dimension from RESULT_TYPE and TEST_TYPE etc.
You can do this in a few ways. without knowing all the different type of cases you need to deal with.
The simplest would be 5 tables
Patients (like you described it)
Tests (like you described it)
TestType (like Declan_K suggested)
TestResultCode
TestResults
TestRsultCode describe each value that is stored for each test. TestResults is a pivoted table that can store any number of test-results per test,:
Create table TestResultCode
(
idTestResultCode int
, Code varchar(10)
, Description varchar(200)
, DataType int -- 1= Real, 2 = Varchar, 3 = int, etc.
);
Create Table TestResults
(
idPatent int -- FK
, idTest int -- FK
, idTestType int -- FK
, idTestResultCode int -- FK
, ResultsI real
, ResultsV varchar(100)
, Resultsb int
, Created datetime
)
so, basically you can fit the results you wanted to add into the tables "tests_1" and "tests_2" and any other tests you can think of.
The application reading this table, can load each test and all its values. Of course the application needs to know how to deal with each case, but you can store any type of test in this structure.