Database table design: How to use multiple tables to separate related data so that querying resultset is simpler or require less code? - sql

I have a three tables as illustrated in the screenshots below:
The primary design objective is to separate the values in the LICENSE_REQ table: TAX, INSURANCE and BENEFIT columns from their description in the LICENSE_REQ_DESC table.
The below query returns the TAX, INSURANCE and BENEFIT columns next to their description from the LICENSE_REQ_DESC table.
Query:
SELECT A.LICENSE_ID,
B.TAX, T.REQ_CODE_DESC,
B.INSURANCE,
I.REQ_CODE_DESC,
B.BENEFIT,
J.REQ_CODE_DESC
FROM BUSINESS A INNER JOIN LICENSE_REQ B ON A.LICENSE_ID = B.LICENSE_ID
LEFT OUTER JOIN LICENSE_REQ_DESC T ON B.TAX = T.REQ_CODE
LEFT OUTER JOIN LICENSE_REQ_DESC I ON B.INSURANCE = I.REQ_CODE
LEFT OUTER JOIN LICENSE_REQ_DESC J ON B.BENEFIT = J.REQ_CODE
Tables:
BUSINESS - Primary Key LICENSE_ID
LICENSE_REQ - Foreign Key LICENSE_ID
LICENSE_REQ_DESC - Primary Key SEQ_NBR
And here is the resultset screenshot:
Is there a more effective and efficient method for separating data from meta data (description) for efficiently querying the desire resultset?

I like your way.
Some developers might create a stored function like get_desc(req_code) and write the SQL as:
SELECT A.LICENSE_ID,
B.TAX,
get_desc(B.TAX) TAX_DESC,
B.INSURANCE,
get_desc(B.INSURANCE) INSURANCE_DESC,
B.BENEFIT,
get_desc(B.BENEFIT) BENEFIT_DESC
FROM BUSINESS A INNER JOIN LICENSE_REQ B ON A.LICENSE_ID = B.LICENSE_ID
But I much prefer the way you've got it. Keep it all in pure-SQL when you can!

There are pro's and con's to the single lookup table method.
The primary pro is that all the translations are in one place, you don't need to go searching hither and yon to find each translation.
The cons are that you end up with larger lookup tables and usually more complex join requirements since there is usually a discriminator column included in the join, and you are limited to having the exact same attributes available for every lookup.
Right now your scheme doesn't include a discriminator column to determine from the contents of LICENSE_REQ_DESC which code value source the description goes with. This can be a problem if for some reason their are REQ_CODE values conflicts between the TAX, INSURANCE, and BENEFIT code columns.
An improved version of your LICENSE_REQ_DESC table would be to:
ALTER TABLE LICENSE_REQ_DESC ADD (REQ_CODE_SOURCE VARCHAR2(15) );
Populate the new column with appropriate values e.g. 'TAX', 'INSURANCE', and 'BENEFIT' then:
ALTER TABLE CODE_LOOKUP MODIFY (REQ_CODE_SOURCE NOT NULL);
ALTER TABLE CODE_LOOKUP ADD CONSTRAINT CODE_LOOKUP_UK UNIQUE
(
REQ_CODE_SOURCE
, REQ_CODE
)
ENABLE;
And finally change your query to use the following joins:
LEFT OUTER JOIN LICENSE_REQ_DESC T ON T.REQ_CODE_SOURCE = 'TAX' AND B.TAX = T.REQ_CODE
LEFT OUTER JOIN LICENSE_REQ_DESC I ON T.REQ_CODE_SOURCE = 'INSURANCE' AND B.INSURANCE = I.REQ_CODE
LEFT OUTER JOIN LICENSE_REQ_DESC J ON T.REQ_CODE_SOURCE = 'BENEFIT' AND B.BENEFIT = J.REQ_CODE

There is some interesting reading here (in the negative):
https://www.simple-talk.com/sql/t-sql-programming/look-up-tables-in-sql-/ (although not everyone agrees with Joe Celko he always goes into depth)
Something to consider is your foreign keys. You can enforce a foreign key to this table but you cannot enforce that a tax transaction record may only use a tax description record.
Foreign keys enforce logic and can assist in generating an efficient query plan. When you are creating foreign keys that don't quite tell the whole story, it appears to indicate a design issue.
Looking ahead: Could this lookup table end up being extended to store many other types of descriptions? Could the distribution of description types be heavily skewed in some way? When you have a table whose purpose (and therefore content and cardinality) drifts over time , it can cause 'random performance issues' when statistics get stale.

Related

Can we join two parts of two composite primary keys together?

I have to two tables, both have a composite primary key:
OrderNr + CustNr
OrderNr + ItemNr
Can I join both tables with the OrderNr and OrderNr which is each a part of a composite primary key?
Yes, but you may find you get rows from each table that repeat as they combine to make a unique combination. This is called a Cartesian product
Table A
OrderNr, CustNr
1,C1
1,C2
2,C1
2,C2
TableB
OrderNr,ItemNr
1,i1
1,i2
SELECT * FROM a JOIN b ON a.OrderNr = b.OrderNr
1,C1,1,i1
1,C1,1,i2
1,C2,1,i1
1,C2,1,i2
This happens because composite primary keys can contain repeated elements so long as the combination of elements is unique. Joining on only one part of the PK, and that part being an element that is repeated (my custnr 1 repeats twice in each table, even though the itemnr and CustNr mean the rows are unique) results in a multiplied resultset - 2 rows from A that are custnr 1, multiplied by 2 rows from B that are custnr 1, gives 4 rows in total
Does it work with the normal/naturla join too?
Normal joins (INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER) will join the rows from two tables or subqueries when the ON condition is valid. The clause in the ON is like a WHERE clause, yes - in that it represents a statement that is true or false (a predicate). If the statement is true, the rows are joined. You don't even have to make it about data from the tables - you can even say a JOIN b ON 1=1 and every rows from A will get joined to every row from B. As commented, primary keys aren't involved in JOINS at all, though primary keys often rely on indexes and those indexes may be used to speed up a join, but they aren't vital to it.
Other joins (CROSS, NATURAL..) exist; a CROSS join is like the 1=1 example above, you don't specify an ON, every row from A is joined to every row from B, by design. NATURAL JOIN is one to avoid using, IMHO - the database will look for column names that are the same in both tables and join on them. The problem is that things can stop working in future if someone adds a column with the same name but different content/meaning to the two tables. No serious production system I've ever come across has used NATURAL join. You can get away with some typing if your columns to join on are named the same, with USING - SELECT * FROM a JOIN b USING (col) - here both A and B have a column called col. USING has some advantages, especially over NATURAL join, in that it doesn't fall apart if another column of the same name as an existing one but it has some detractors too - you can't say USING(col) AND .... Most people just stick to writing ON, and forget USING
NATURAL join also does NOT use primary keys. There is no join style (that I know of) that will look at a foreign key relationship between two tables and use that as the join condition
And then is it true that if I try to join Primary key and foreign key of two tables, that it works like a "where" command?
Hard to understand what you mean by this, but if you mean that A JOIN B ON A.primarykey = B.primary_key_in_a then it'll work out, sure. If you mean A CROSS JOIN B WHERE A.primarykey = B.primary_key_in_a then that will also work, but it's something I'd definitely avoid - no one writes SQLs this way, and the general favoring is to drop use of WHERE to create joining conditions (you do still see people writing the old school way of FROM a,b WHERE a.col=b.col but it's also heavily discouraged), and put them in the ON instead
So, in summary:
SELECT * FROM a JOIN b ON a.col1 = b.col2
Joins all rows from a with all rows from b, where the values in col1 equal the values in col2. No primary keys are needed for any of this to work out
You can join any table if there is/are logical relationship between them
select *
from t1
JOIN t2
on t1.ORderNr = t2.OrderNr
Although if OrderNr cannot provide unicity between tables by itself, your data will be multiplied.
Lets say that you have 2 OrderNr with value 1 on t1 and 5 OrderNr with value 1 on t2, when you join them, you will get 2 x 5 = 10 records.
Your data model is similar to a problem commonly referred to as a "fan trap". (If you had an "order" table keyed solely by OrderNr if would exactly be a fan trap).
Either way, it's the same problem -- the relationship between Order/Customers and Order/Items is ambiguous. You cannot tell which customers ordered which items.
It is technically possible to join these tables -- you can join on any columns regardless of whether they are key columns or not. The problem is that your results will probably not make sense, unless you have more conditions and other tables that you are not telling us about.
For example, a simple join just on t1.OrderNr = t2.OrderNr will return rows indicating every customer related to the order has ordered every item related to the order. If that is what you want, you have no problem here.

SQL JOIN from other JOIN

So, this is probably gonna sound like a weird or dumb question.
I have this application that I'm writing, which consists of four tables. Two are 'connected' to one main table, and one last one is connected to one of those two tables. I've included a database diagram to give you a better idea of what I mean.
Now, my goal is to have 'Bedrijfsnaam' from the 'Bedrijven' table into the 'Samenwerkingen' table. Problem is: I can't add more than two foreign keys, so I was assuming that I would have to create a FK in'Contactpersonen' table and pick it from the 'Bedrijven' table. It would basically mean I'd have a JOIN in 'Contactpersonen' table to my 'Bedrijven' table. And then the 'Samenwerkingen' table has a JOIN to the 'Contactpersonen' table and accesses the column from 'Bedrijven'.
Does that make any sense? Hope it does, because I could really use some help making this possible. xD
Since Samenwerkingen has a foreign key to Contactpersonen which is itself related to Bedrijven, you don't need an additional constraint: your data integrity is guaranteed and you can join the two tables.
Your query should look like :
select s.*, b.* from Samenwerkingen s
inner join Contactpersonen c on s.Contactpersonen = c.Contactpersonen
inner join Bedrijven b on c.Bedrijfnaam = b.Bedrijfnaam

MS SQL Server 2014 | "Recursive" delete through a star schema

Good day all,
I'll try and be as succinct as possible.
I have a star schema with a fact table in the centre, surrounded by a set of dimension tables. Each dimension table snowflakes into further dimension tables.
Pseudo-visual example:
Super_FAC >--FK-- First_DIM --NK-- Snowflake_DIM
(>- indicates an n-to-1 relationship, -- indicates a 1-to-1 relationship, FK indicates a foreign key and NK indicates a natural key)
What I am trying to achieve: I'm trying to find an elegant solution the the problem by recursively deleting records from the fact table, up the chain into the first dimension table and then into the snowflake dimension table, with the minimal amount of scripting. (No triggers allowed)
What I have done:
DELETE Super_FAC
FROM Snowflake_DIM AS SDIM INNER JOIN
First_DIM AS FDIM ON SDIM.NK = FDIM.NK INNER JOIN -- For brevity, key is more complex
Super_FAC AS SFAC ON FDIM.FK = SFAC.FK
WHERE SDIM.ModifiedDate >= DeltaDate
DELETE First_DIM
FROM Snowflake_DIM AS SDIM INNER JOIN
First_DIM AS FDIM ON SDIM.NK = FDIM.NK -- For brevity, key is more complex
WHERE SDIM.ModifiedDate >= DeltaDate
DELETE Snowflake_DIM
FROM Snowflake_DIM AS SDIM INNER JOIN
WHERE SDIM.ModifiedDate >= DeltaDate
Is there a more elegant solution? Using something like a CTE perhaps?
Thanks in advance.
M

Access: Updatable join query with 2 primary key fields that are both also foreign keys

In MS Access, I am trying to implement a many-to-many table that will store 2-way relationships, similar to Association between two entries in SQL table. This table stores info such as "Person A and Person B are coworkers" "C and D are friends", etc. The table is like this:
ConstitRelationships
LeftId (number, primary key, foreign key to Constituents.ConstitId)
RightId (number, primary key, foreign key to Constituents.ConstitId)
Description (text)
Note that the primary key is a composite of the two Id fields.
Also the table has constraints:
[LeftId]<>[RightId] AND [LeftId]<[RightId]
The table is working ok in my Access project, except that I cannot figure out how to make an updateable query that I want to use as a datasheet subform so users can easily add/delete records and change the descriptions. I currently have a non-updatable query:
SELECT Constituents.ConstituentId, Constituents.FirstName,
Constituents.MiddleName, Constituents.LastName,
ConstitRelationships.Description, ConstitRelationships.LeftId,
ConstitRelationships.RightId
FROM ConstitRelationships INNER JOIN Constituents ON
(Constituents.ConstituentId =
ConstitRelationships.RightId) OR (Constituents.ConstituentId =
ConstitRelationships.LeftId);
If I ignore the possibility that the constituentId I want is in the leftId column, I can do this, which is updatable. So the OR condition in the inner join above is what's messing it up.
SELECT Constituents.ConstituentId, Constituents.FirstName,
Constituents.MiddleName, Constituents.LastName,
ConstitRelationships.Description, ConstitRelationships.LeftId,
ConstitRelationships.RightId
FROM ConstitRelationships INNER JOIN Constituents ON
(Constituents.ConstituentId =
ConstitRelationships.RightId) ;
I also tried this wacky iif thing to collapse the two LeftId and RightId fields into FriendId, but it was not updateable either.
SELECT Constituents.ConstituentId, Constituents.FirstName,
Constituents.MiddleName,
Constituents.LastName, subQ.Description
FROM Constituents
INNER JOIN (
SELECT Description, Iif([Forms]![Constituents Form]![ConstituentId] <>
ConstitRelationships.LeftId, ConstitRelationships.LeftId,
ConstitRelationships.RightId) AS FriendId
FROM ConstitRelationships
WHERE ([Forms]![Constituents Form]![ConstituentId] =
ConstitRelationships.RightId)
OR ([Forms]![Constituents Form]![ConstituentId] =
ConstitRelationships.LeftId)
) subQ
ON (subQ.FriendId = Constituents.ConstituentId)
;
How can I make an updatable query on ConstitRelationships, including a JOIN with the Constituent.FirstName MiddleName LastName fields?
I am afraid that is not possible. Because you use joins in your query over three tables it is not updatable. There is no way around this.
Here some detailed information about the topic: http://www.fmsinc.com/Microsoftaccess/query/non-updateable/index.html
As mentioned in the linked article one possible solution and in my opinion best solution for you would be the temporary table. It is a load of work compared to the easy "bind-form-to-a-query"-approach but it works best.
The alternative would be to alter your datascheme in that way that you do not need joins. But then denormalized data and duplicates would go rampage which makes the temporary table a favorable choice.

Design : multiple visits per patient

Above is my schema. What you can't see in tblPatientVisits is the foreign key from tblPatient, which is patientid.
tblPatient contains a distinct copies of each patient in the dataset as well as their gender. tblPatientVists contains their demographic information, where they lived at time of admission and which hospital they went to. I chose to put that information into a separate table because it changes throughout the data (a person can move from one visit to the next and go to a different hospital).
I don't get any strange numbers with my queries until I add tblPatientVisits. There are just under one millions claims in tblClaims, but when I add tblPatientVisits so I can check out where that person was from, it returns over million. I thinkthis is due to the fact that in tblPatientVisits the same patientID shows up more than once (due to the fact that they had different admission/dischargedates).
For the life of me I can't see where this is incorrect design, nor do I know how to rectify it beyond doing one query with count(tblPatientVisits.PatientID=1 and then union with count(tblPatientVisits.patientid)>1.
Any insight into this type of design, or how I might more elegantly find a way to get the claimType from tblClaims to give me the correct number of rows with I associate a claim ID with a patientID?
EDIT: The biggest problem I'm having is the fact that if I include the admissionDate,dischargeDate or the patientStatein the tblPatient table I can't use the patientID as a primary key.
It should be noted that tblClaims are NOT necessarily related to tblPatientVisits.admissionDate, tblPatientVisits.dischargeDate.
EDIT: sample queries to show that when tblPatientVisits is added, more rows are returned than claims
SELECT tblclaims.id, tblClaims.claimType
FROM tblClaims INNER JOIN
tblPatientClaims ON tblClaims.id = tblPatientClaims.id INNER JOIN
tblPatient ON tblPatientClaims.patientid = tblPatient.patientID INNER JOIN
tblPatientVisits ON tblPatient.patientID = tblPatientVisits.patientID
more than one million query rows returned
SELECT tblClaims.id, tblPatient.patientID
FROM tblClaims INNER JOIN
tblPatientClaims ON tblClaims.id = tblPatientClaims.id INNER JOIN
tblPatient ON tblPatientClaims.patientid = tblPatient.patientID
less than one million query rows returned
I think this is crying for a better design. I really think that a visit should be associated with a claim, and that a claim can only be associated with a single patient, so I think the design should be (and eliminating the needless tbl prefix, which is just clutter):
CREATE TABLE dbo.Patients
(
PatientID INT PRIMARY KEY
-- , ... other columns ...
);
CREATE TABLE dbo.Claims
(
ClaimID INT PRIMARY KEY,
PatientID INT NOT NULL FOREIGN KEY
REFERENCES dbo.Patients(PatientID)
-- , ... other columns ...
);
CREATE TABLE dbo.PatientVisits
(
PatientID INT NOT NULL FOREIGN KEY
REFERENCES dbo.Patients(PatientID),
ClaimID INT NULL FOREIGN KEY
REFERENCES dbo.Claims(ClaimID),
VisitDate DATE
, -- ... other columns ...
, PRIMARY KEY (PatientID, ClaimID, VisitDate) -- not convinced on this one
);
There is some redundant information here, but it's not clear from your model whether a patient can have a visit that is not associated with a specific claim, or even whether you know that a visit belongs to a specific claim (this seems like crucial information given the type of query you're after).
In any case, given your current model, one query you might try is:
SELECT c.id, c.claimType
FROM dbo.tblClaims AS c
INNER JOIN dbo.tblPatientClaims AS pc
ON c.id = pc.id
INNER JOIN dbo.tblPatient AS p
ON pc.patientid = p.patientID
-- where exists tells SQL server you don't care how many
-- visits took place, as long as there was at least one:
WHERE EXISTS (SELECT 1 FROM dbo.tblPatientVisits AS pv
WHERE pv.patientID = p.patientID);
This will still return one row for every patient / claim combination, but it should only return one row per patient / visit combination. Again, it really feels like the design isn't right here. You should also get in the habit of using table aliases - they make your query much easier to read, especially if you insist on the messy tbl prefix. You should also always use the dbo (or whatever schema you use) prefix when creating and referencing objects.
I'm not sure I understand the concept of a claim but I suspect you want to remove the link table between claims and patient and instead make the association between patient visit and a claim.
Would that work out better for you?