Multiple Columns but only one join - sql

I have searched through quite a few answers on this but I can't seem anything specific to this situation. Apologies if I have over looked this.
We have a calendar system we are re-coding to log events, holidays, absent days, MOT's, Inspections etc
Initially our calendar focused around people, so we had a join from the person table to the calendarDay, but now other tables from the database are required to have days assigned to them.
My plan was to have a junction table for each of the three tables needing joined - Person, Job and Vehicle, or, have an ID column in the Assignment junction table that has a column for Person_ID, Job_ID and Vehicle_ID fk's:
CalendarDayAssignment
ID CalendarDayID PersonID JobID VehicleID
so this table would contain a CalendarDay_ID and either a Person_ID, Job_ID or Vehicle_ID depending on which table has a day assigned to it, leaving two columns having a NULL. (These could be moved directly to the CalendarDay table actually, as days will not be shared)
My preference would be the later shown above as I would only need one table, rather than 3 (potentially more if more objects need days assigned to them), and the referential data integrity is intact also.
I know this may be subjective, but is this a reasonable way of accomplishing this? It seems easier to include more objects should the time come.
Just as a note to the above, we will be pulling different data from the Person, Job and Vehicle tables, so I can't see how to implement a one size fits all solutions that doesn't end up with redundant data from a coding POV. The Junction table will likely have only 10,000 rows per year added.

Related

Sql query to delete duplicate rows

I have three tables.
Diagnose, Patient and PatientDiagnose
The tables look like this
Diagnose:
uuid,text,date
Patient:
uuid,name
PatientDiagnose:
patientuuid,diagnoseuuid
One patient can of course have multiple diagnoses and two patients can of course have the same diagnose but the two diagnoses are represented uniqly in Diagnose with different uuids. Therefore the two patients are represented in PatientDiagnose with their patient uuids and each one with those unique diagnose uuid.
Now I have found out that I would like to fix something in my DB. I would like to delete the diagnoses that are considered duplicates for a patient. Duplicates are: if they belong to the same patient and has the same text, within the same year (use of year function on date?) and leaving just one of those diagnoses intact.
I would like to remove those duplicates since I only want one diagnose pr patient of the same text, a year.
How can I do that in SQL?
Tommy
You say that a diagnose shall refer to exactly one patient. Your database, however, doesn't guarantee this, so you should fix that issue first. That would leave you with only two tables:
Patient: patientuuid, name
Diagnose: diagnoseuuid, text, date, patientuuid
Once you've converted your tables thus, you can easily do the cleanup:
delete from diagnose
where exists
(
select *
from diagnose other
where other.diagnoseuuid < diagnose.diagnoseuuid
and other.text = diagnose.text
and year(other.date) = year(diagnose.date)
and other.patientuuid = diagnose.patientuuid
);
You haven't mentioned which DBMS you are using. It may not feature the YEAR function. In that case try EXTRACT(YEAR FROM date) or look up date functions in your manual.

I need help counting char occurencies in a row with sql (using firebird server)

I have a table where I have these fields:
id(primary key, auto increment)
car registration number
car model
garage id
and 31 fields for each day of the mont for each row.
In these fields I have char of 1 or 2 characters representing car status on that date. I need to make a query to get number of each possibility for that day, field of any day could have values: D, I, R, TA, RZ, BV and LR.
I need to count in each row, amount of each value in that row.
Like how many I , how many D and so on. And this for every row in table.
What best approach would be here? Also maybe there is better way then having field in database table for each day because it makes over 30 fields obviously.
There is a better way. You should structure the data so you have another table, with rows such as:
CarId
Date
Status
Then your query would simply be:
select status, count(*)
from CarStatuses
where date >= #month_start and date < month_end
group by status;
For your data model, this is much harder to deal with. You can do something like this:
select status, count(*)
from ((select status_01 as status
from t
) union all
(select status_02
from t
) union all
. . .
(select status_31
from t
)
) s
group by status;
You seem to have to start with most basic tutorials about relational databases and SQL design. Some classic works like "Martin Gruber - Understanding SQL" may help. Or others. ATM you miss the basics.
Few hints.
Documents that you print for user or receive from user do not represent your internal data structures. They are created/parsed for that very purpose machine-to-human interface. Inside your program should structure the data for easy of storing/processing.
You have to add a "dictionary table" for the statuses.
ID / abbreviation / human-readable description
You may have a "business rule" that from "R" status you can transition to either "D" status or to "BV" status, but not to any other. In other words you better draft the possible status transitions "directed graph". You would keep it in extra columns of that dictionary table or in one more specialized helper table. Dictionary of transitions for the dictionary of possible statuses.
Your paper blank combines in the same row both totals and per-day detailisation. That is easy for human to look upon, but for computer that in a sense violates single responsibility principle. Row should either be responsible for primary record or for derived total calculation. You better have two tables - one for primary day by day records and another for per-month total summing up.
Bonus point would be that when you would change values in the primary data table you may ask server to automatically recalculate the corresponding month totals. Read about SQL triggers.
Also your triggers may check if the new state properly transits from the previous day state, as described in the "business rules". They would also maybe have to check there is not gaps between day. If there is a record for "march 03" and there is inserted a new the record for "march 05" then a record for "march 04" should exists, or the server would prohibit adding such a row. Well, maybe not, that is dependent upon you business processes. The general idea is that server should reject storing any data that is not valid and server can know it.
you per-date and per-month tables should have proper UNIQUE CONSTRAINTs prohibiting entering duplicate rows. It also means the former should have DATE-type column and the latter should either have month and year INTEGER-type columns or have a DATE-type column with the day part in it always being "1" - you would want a CHECK CONSTRAINT for it.
If your company has some registry of cars (and probably it does, it is not looking like those car were driven in by random one-time customers driving by) you have to introduce a dictionary table of cars. Integer ID (PK), registration plate, engine factory number, vagon factory number, colour and whatever else.
The per-month totals table would not have many columns per every status. It would instead have a special row for every status! The structure would probably be like that: Month / Year / ID of car in the registry / ID of status in the dictionary / count. All columns would be integer type (some may be SmallInt or BigInt, but that is minor nuancing). All the columns together (without count column) should constitute a UNIQUE CONSTRAINT or even better a "compound" Primary Key. Adding a special dedicated PK column here in the totaling table seems redundant to me.
Consequently, your per-day and per-month tables would not have literal (textual and immediate) data for status and car id. Instead they would have integer IDs referencing proper records in the corresponding cars dictionary and status dictionary tables. That you would code as FOREIGN KEY.
Remember the rule of thumb: it is easy to add/delete a row to any table but quite hard to add/delete a column.
With design like yours, column-oriented, what would happen if next year the boss would introduce some more statuses? you would have to redesign the table, the program in many points and so on.
With the rows-oriented design you would just have to add one row in the statuses dictionary and maybe few rows to transition rules dictionary, and the rest works without any change.
That way you would not

How to deal with one single cell containg multiple values?

I'm having an exercise requiring to create two table for a travel business:
Activity
Booking
it turns out that the column activities in the Booking table references from the Activities table. However it contains multiple value. How do I sort it out? If I insert multiple rows there will possibly duplication in the Booking's primary key.
As Gordon mentioned you should refactor your tables for better normalization. If I interpret your intent correctly this is more like what your schema should look like. Booking should only contain an ID for adventure and an ID for Customer. You will add a row to [AdventureActivity] for each activity booked on a [Booking]. With this design you can JOIN tables and get all the data you require without having to try to parse out multiple values in a column.

How to create history fact table?

I have some entities in my Data Warehouse:
Person - with attributes personId, dateFrom, dateTo, and others those can be changed, e.g. last name, birth date and so on - slowly changing dimension
Document - documentId, number, type
Address - addressId, city, street, house, flat
The relations between (Person and Document) is One-To-Many and (Person and Address) is Many-To-Many.
My target is to create history fact table that can answer us following questions:
What persons with what documents lived at defined address on defined date?
2, What history of residents does defined address have on defined interval of time?
This is not only for what DW is designed, but I think it is the hardest thing in DW's design.
For example, Miss Brown with personId=1, documents with documentId=1 and documentId=2 had been lived at address with addressId=1 since 01/01/2005 to 02/02/2010 and then moved to addressId=2 where has been lived since 02/03/2010 to current date (NULL?). But she had changed last name to Mrs Green since 04/05/2006 and her first document with documentId=1 to documentId=3 since 06/07/2007. Mr Black with personId=2, documentId=4 has been lived at addressId=1 since 02/03/2010 to current date.
The expected result on our query for question 2 where addressId=1, and time interval is since 01/01/2000 to now, must be like:
Rows:
last_name="Brown", documentId=1, dateFrom=01/01/2005, dateTo=04/04/2006
last_name="Brown", documentId=2, dateFrom=01/01/2005, dateTo=04/04/2006
last_name="Green", documentId=1, dateFrom=04/05/2006, dateTo=06/06/2007
last_name="Green", documentId=2, dateFrom=04/05/2006, dateTo=06/06/2007
last_name="Green", documentId=2, dateFrom=06/07/2007, dateTo=02/01/2010
last_name="Green", documentId=3, dateFrom=06/07/2007, dateTo=02/01/2010
last_name="Black", documentId=4, dateFrom=02/03/2010, dateTo=NULL
I had an idea to create fact table with composite key (personId, documentId, addressId, dateFrom) but I have no idea how to load this table and then get that expected result with this structure.
I will be pleased for any help!
Interesting question #Argnist!
So to create some common language for my example, you want a
DimPerson (PK=kcPerson, suggorate key for unique Persons=kPerson, type 2 dim)
DimDocument (PK=kcDocument, suggorate key for unique Documents=kDocument, type 2 dim)
DimAddress (PK=kcAddress, suggorate key for unique Addresses=kAddress, type 2 dim)
A colleague has written a short blog on the usage of two surrogate keys to explain the above dims 'Using Two Surrogate Keys on Dimensions'.
I would always add
DimDate with PK in the form yyyymmdd
to any data warehouse with extra attribute columns.
Then you would have your fact table as
FactHistory (FKs=kcPerson, kPerson, kcDocument, kDocument, kcPerson, kPerson, kDate)
plus any aditional measures.
Then joining on the "kc"s you can show the current Person/Document/Address dimension information.
If you joined on the "k"s you can show the historic Person/Document/Address dimension information.
The downside of this is that this fact table needs one row for each person/document/address/date combination. But it really is a very narrow table, since the table just has a number of foreign keys.
The advantage of this is it is very easy to query for the sorts of questions you were asking.
Alternatively, you could have your fact table as
FactHistory (FKs=kcPerson, kPerson, kcDocument, kDocument, kcPerson, kPerson, kDateFrom, kDateTo)
plus any aditional measures.
This is obviously much more compact, but the querying becomes more complex. You could also put a view over the Fact table to make it easier to query!
The choice of solution depends on the frequency of change of the data. I suspect that it will not be changing that quickly, so teh alternate design of the fact table may be better.
Hope that helps.

Can a many-to-many join table have more than two columns?

I have some tables that benefit from many-to-many tables. For example the team table.
Team member can hold more than one 'position' in the team, all the positions are listed in the position db table. The previous positions held are also stored for this I have a separate table, so I have
member table (containing team details)
positions table (containing positions)
member_to_positions table (id of member and id of position)
member_to_previous_positions (id of member and id of position)
Simple, however the crux comes now that a team member can belong to many teams aghhh.
I already have a team_to_member look-up table.
Now the problem comes how do I tie a position to a team? A member may have been team leader on one team, and is currently team radio man and press officer on a different team. How do I just pull the info per member to show his current position, but also his past history including past teams.
Do I need to add a position_to team table and somehow cross reference that, or can I add the team to the member to positions table?
It's all very confusing, this normalization.
Yes, a many-to-many junction table can have additional attributes (columns).
For example, if there's a table called PassengerFlight table that's keyed by PassengerID and FlightID, there could be a third column showing the status of the given passenger on the given flight. Two different statuses might be "confirmed" and "wait listed", each of them coded somehow.
In addition, there can be ternary relationships, relationships that involve three entities and not just two. These tables are going to have three foreign keys that taken together are the primary key for the relationship table.
It's perfectly legitimate to have a TeamPositionMember table, with the columns
Team_Id
Position_Code
Member_Id
Start_Date
End_Date NULLABLE
And and a surrogate ID column for Primary Key if you want; otherwise it's a 3-field composite Primary Key. (You'll want a uniqueness constraint on this anyway.)
With this arrangement, you can have a team with any set of positions. A team can have zero or more persons per position. A person can fill zero or more positions for zero or more teams.
EDIT:
If you want dates, just revise as shown above, and add Start_Date to the PK to allow the same person to hold the same position at different times.
My first thought:
Give your many-to-many teams/members table an ID column. Every team-to-member relationship now has an ID.
Then create a many-to-many linking positions to team-member relationships.
This way, teams can have multiple members, members can have multiple teams, and members can have multiple positions on a per-team basis.
Now everything is nice and DRY, and all the linking up seems to work. Does that sound right to anyone else?
Sounds like you need a many-to-many positions to teams table now.
Your team_to_member table can indeed have an extra column position_id to describe (or in this case point to) the position the member has within that team.
Get rid of member_to_previous_position table. Just use member_to_positions and have these columns:
MemberToPositionID (autoincrement OK only)
MemberID
PositionID
StartDate
EndDate
Then to find current positions, you do:
select *
from member_to_positions
where EndDate is null