How to manage "groups" in the database? - sql

I've asked this question here, but I don't think I got my point across.
Let's say I have the following tables (all PK are IDENTITY fields):
People (PersonId (PK), Name, SSN, etc.)
Loans (LoanId (PK), Amount, etc.)
Borrowers (BorrowerId(PK), PersonId, LoanId)
Let's say Mr. Smith got 2 loans on his name, 3 joint loans with his wife, and 1 join loan with his mistress. For the purposes of application I want to GROUP people, so that I can easily single-out the loans that Mr. Smith took out jointly with his wife.
To accomplish that I added BorrowerGroup table, now I have the following (all PK are IDENTITY fields):
People (PersonId (PK), Name, SSN, etc.)
Loans (LoanId (PK), Amount, BorrowerGroupId, etc.)
BorrowerGroup(GroupId (PK))
Borrowers (BorrowerId(PK), GroupId, PersonId)
Now Mr. Smith is in 3 groups (himself, him and his wife, him and his mistress) and I can easily lookup his activity in any of those groups.
The problems with new design:
The only way to generate new BorrowerGroup is by inserting MAX(GourpId)+1 with IDENTITY_INSERT ON, this just doesn't feel right. Also, the notion of a table with 1 column is kind of weird.
I'm a firm believer in surrogate keys, and would like to stick to that design if possible.
This application does not care about individuals, the GROUP is treated as an individual
Is there a better way to group people for the purpose of this application?

You could just remove the table BorrowerGroups - it carries no information. This information is allready present via the Loans People share - I just assume you have a PeopleLoans table.
People Loans PeopleLoans
----------- ------------ -----------
1 Smith 6 S1 60 1 6
2 Wife 7 S2 60 1 7
3 Mistress 8 S+W1 74 1 8
9 S+W2 74 1 9
10 S+W3 74 1 10
11 S+M1 89 1 11
2 8
2 9
2 10
3 11
So your BorrowerGroups are actually almost the Loans - 6 and 7 with Smith only, 8 to 10 with Smith and Wife, and 11 with Smith and Mistress. So there is no need for BorrowerGroups in the first place, because they are identical to Loans grouped by the involved People.
But it might be quite hard to efficently retrieve this information, so you could think about adding a GroupId directly to Loans. Ignoring the second column of Loans (just for readability) the third column schould represent your groups. They are redundant, so you have to be carefull if you change them.
If you find a good way to derive a unique GroupId from the ids of involved people, you could make it a computed column. If a string would be okay as an group id, you could just order the ids of the people an concat them with a separator.
Group 60 with Smith only would get id '1', group 74 would become 1.2, and group 89 would become 1.3. Not that smart, but unique and easy to compute.

use the original schema:
People (PersonId (PK), Name, SSN, etc.)
Loans (LoanId (PK), Amount, etc.)
Borrowers (BorrowerId(PK), PersonId, LoanId)
just query for the data you need (your example to find husband and wife on same loans):
SELECT
l.*
FROM Borrowers b1
INNER JOIN Borrowers b2 ON b1.LoanId=b2.LoanId
INNER JOIN Loans l ON b1.LoanId=l.LoanId
WHERE b1.PersonId=#HusbandID
AND b2.PersonId=#WifeID

The design of the database seems OK. Why do you have to use MAX(GourpId)+1 when you create a new group? Can't you just create the row and then use SCOPE_IDENTITY() to return the new ID?
e.g.
INSERT INTO BorrowerGroup() DEFAULT VALUES
SELECT SCOPE_IDENTITY()
(See this other question)
(edit to SQL courtesy of this question)

I would do something more like this:
People (PersonId (PK), Name, SSN, etc.)
Loans (LoanId (PK), Amount, BorrowerGroupId, etc.)
BorrowerGroup(BorrowerGroupId (PK))
PersonBelongsToBorrowerGroup(BorrowerGroupId
(PK), PersonId(PK))
I got rid of the Borrowers table. Just store the info in the BorrowerGroup table. That's my preference.

The consensus seems to be to omit the BorrowerGroup table and I have to agree. Suggesting that you would use MAX(groupId+1) has all sorts of ACID/transaction issues and the main reason why IDENTITY fields exist.
That said; the SQL that KM provided looks good. There are any number of ways to get the same results. Joins, sub-selects and so on. The real issue there... is knowing the dataset. Given the explanation you provided the datasets are going to be very small. That also supports removing the BorrowerGroup table.

I would have a group table and then a groupmembers(borrowers) table to accomplish the many-to-many relationship between loans and people. This allows the tracking of data on the group other than just a list of members (I believe someone else made this suggestion?).
CREATE TABLE LoanGroup
(
ID int NOT NULL
, Group_Name char(50) NULL
, Date_Started datetime NULL
, Primary_ContactID int NULL
, Group_Type varchar(25)
)

Related

SQL column compare

I have 3 columns in the same table in SQL one for the number, a name, and another unrelated data. The numbers repeat for a certain amount of times and have a name next to them, there can't be a name twice on the same number, but the names can be present in multiple different numbers. I need to make an SQL query to find what names have been under the same number the most amount of times. Any help will be very appreciated.
Example: SQL query will find what names have been grouped together the most.
1 Bill
1 Bob
1 Dave
2 Bob
2 John
2 Bill
To confirm - you would like to find
The pairs of names that occur together within a 'number'
Of those, find the pair that occurs most often
The trick here is to get all the pairs, then count how many 'numbers' that pair appears in.
To get the pairs, join the table to itself (on the number) - and then to only have one pairing in each, also join on name with the first in the pair < second in the pair.
The answer to this question depends on your database (SQL Server, MySQL, etc). However, here is an example written in T-SQL but it is fairly generic that does most of the work: it shows the counts and orders them by the the relevant count.
Feel free to get the TOP or LIMIT 1 just to get a pair with the most matches (noting that if there is a tie, only one would be chosen this way)
Alternatively modify the query to work out what the maximum number is, then get the pairs with that number.
CREATE TABLE NameGrps (NameNum int, Name varchar(30));
INSERT INTO NameGrps (NameNum, Name)
VALUES
(1, 'Bill'),
(1, 'Bob'),
(1, 'Dave'),
(2, 'Bob'),
(2, 'John'),
(2, 'Bill');
SELECT NamePairs.FirstInPair, NamePairs.SecondInPair, COUNT(NameNum) AS Num_Paired
FROM
(SELECT A.Name AS FirstInPair, B.Name AS SecondInPair, A.NameNum
FROM NameGrps A
INNER JOIN NameGrps B ON A.NameNum = B.NameNum AND A.Name < B.Name
) AS NamePairs
GROUP BY NamePairs.FirstInPair, NamePairs.SecondInPair
ORDER BY COUNT(NameNum) DESC, NamePairs.FirstInPair, NamePairs.SecondInPair;
And here are the results of the above
FirstInPair SecondInPair Num_Paired
Bill Bob 2
Bill Dave 1
Bill John 1
Bob Dave 1
Bob John 1
If you take a TOP or LIMIT 1 of that, it will find the pair of Bill and Bob is the most frequent.
Here is a db<>fiddle with the query, as well as additional information (e.g., what the sub-query does, and adding a TOP 1 version).

How do I make a query for if value exists in row add a value to another field?

I have a database on access and I want to add a value to a column at the end of each row based on which hospital they are in. This is a separate value. For example - the hospital called "St. James Hospital" has the id of "3" in a separate field. How do I do this using a query rather than manually going through a whole database?
example here
Not the best solution, but you can do something like this:
create table new_table as
select id, case when hospital="St. James Hospital" then 3 else null
from old_table
Or, the better option would be to create a table with the columns hospital_name and hospital_id. You can then create a foreign key relationship that will create the mapping for you, and enforce data integrity. A join across the two tables will produce what you want.
Read about this here:
http://net.tutsplus.com/tutorials/databases/sql-for-beginners-part-3-database-relationships/
The answer to your question is a JOIN+UPDATE. I am fairly sure if you looked up you would find the below link.
Access DB update one table with value from another
You could do this:
update yourTable
set yourFinalColumnWhateverItsNameIs = {your desired value}
where someColumn = 3
Every row in the table that has a 3 in the someColumn column will then have that final column set to your desired value.
If this isn't what you want, please make your question clearer. Are you trying to put the name of the hospital into this table? If so, that is not a good idea and there are better ways to accomplish that.
Furthermore, if every row with a certain value (3) gets this value, you could simply add it to the other (i.e. Hospitals) table. No need to repeat it everywhere in the table that points back to the Hospitals table.
P.S. Here's an example of what I meant:
Let's say you have two tables
HOSPITALS
id
name
city
state
BIRTHS
id
hospitalid
babysname
gender
mothersname
fathername
You could get a baby's city of birth without having to include the City column in the Births table, simply by joining the tables on hospitals.id = births.hospitalid.
After examining your ACCDB file, I suggest you consider setting up the tables differently.
Table Health_Professionals:
ID First Name Second Name Position hospital_id
1 John Doe PI 2
2 Joe Smith Co-PI 1
3 Sarah Johnson Nurse 3
Table Hospitals:
hospital_id Hospital
1 Beaumont
2 St James
3 Letterkenny Hosptial
A key point is to avoid storing both the hospital ID and name in the Health_Professionals table. Store only the ID. When you need to see the name, use the hospital ID to join with the Hospitals table and get the name from there.
A useful side effect of this design is that if anyone ever misspells a hospital name, eg "Hosptial", you need correct that error in only one place. Same holds true whenever a hospital is intentionally renamed.
Based on those tables, the query below returns this result set.
ID Second Name First Name Position hospital_id Hospital
1 Doe John PI 2 St James
3 Johnson Sarah Nurse 3 Letterkenny Hosptial
2 Smith Joe Co-PI 1 Beaumont
SELECT
hp.ID,
hp.[Second Name],
hp.[First Name],
hp.Position,
hp.hospital_id,
h.Hospital
FROM
Health_Professionals AS hp
INNER JOIN Hospitals AS h
ON hp.hospital_id = h.hospital_id
ORDER BY
hp.[Second Name],
hp.[First Name];

How should I handle updating data in a one-to-many relationship database schema?

Lets say I have a database similar to the following:
Table: People
id | name | age
---+------+-----
1 | dave | 78
Table: Likes
id | userid | like
---+--------+-------
1 | 1 | apples
---+--------+-------
2 | 1 | oranges
---+--------+-------
3 | 1 | women
What would be the best way to handle updating daves data? Currently, I do the following:
Update his age
UPDATE People SET age = 79 WHERE id = 1;
Update his likes
DELETE FROM Likes WHERE userid = 1;
INSERT INTO LIKES (userid, like) VALUES (1, 'new like');
INSERT INTO LIKES (userid, like) VALUES (1, 'another like');
I delete all the users data from the table and then readd their new stuff. This seems inefficient. Is there a better way?
It's not clear to me why you are suggesting a link between updating a record in the parent table and its dependents in the child table. The point of having separate tables is precisely that we can modify the non-key columns in People without touching Likes.
When it comes to updating Likes there are two different business transactions. The first is when Dave says, "I didn't mean 'oranges' I meant to say I like flower arranging". Correcting a mistake would use an update:
update likes
set like = 'flower arranging'
where userid = 1
and like = 'oranges'
/
The WHERE clause could use the LIKES.ID column instead.
The other case is where the preferences have actually changed. That is, when Dave says "Now I'm 79 I don't like women any more. I have new tastes.". This might look like this:
delete from likes
where userid = 1
and like = 'women'
/
insert into likes (userid, like)
values (1, 'dominoes')
/
insert into likes (userid, like)
values (1, 'Werthers Orignals')
/
The difference between these two statements is primarily one of clarity. We could have implemented the second set of statements as an update and a single insert but that would be misleading. Keeping the distinction between meaningful changes to the data and correcting mistakes is a useful discipline. It is especially helpful when we are keeping historical records and/or auditing changes.
What is definitely a bad idea is deleting all Dave's Likes records and then re-inserting them. Your application should be able to track which records have changed.
Depending on the DBMS you could use "MERGE" "REPLACE" "INSERT OR REPLACE",
most DBMSes support this functionality but the syntax varies wildly so you will need to RTFM to find out how do do this with the databse you are using.
You could add a new column to Likes storing the actual age. By doing this you get a history functionality and you can determine his likes by requesting those with the highest age like:
SELECT * FROM Likes WHERE userid = 1 AND age = (SELECT MAX(age) FROM Likes WHERE userid = 79 GROUP BY userid);
When you want be able to override some Likes you also could add another column "overriden_by" containing the id of the new Like. This would result in a simpler query:
SELECT * FROM Likes WHERE userid = 1 and overriden_by is null;

Checking the integrity of the data for an entity

I have three tables STUDENT, DEPARTMENT and COURSE in a University database...
STUDENT has a UID as a Primary key -> which is the UNIQUE ID of the student
DEPARTMENT has Dept_id as a Primary Key -> which is the Dept. number
COURSE has C_id as Primary Key -> which is the Course/subject Id
I need to store marks in a table by relating the primary key of STUDENT, DEPARTMENT and COURSE for each student in each course.
UID Dept_id C_id marks
1 CS CS01 98
1 CS CS02 96
1 ME ME01 88
1 ME ME02 90
The problem is if i create a table like this for marks then i feel the data operator might insert wrong combination of primary key of a student for example
UID Dept_id C_id marks
1 CS CS01 98
1 CS CS02 96
1 ME CS01 88 //wrong C_id (course id) inputted by the DBA
1 ME ME02 90
In which case how can i prevent him doing this?
Also is there any other way to store marks for each student ? I mean like :
UID Dept_id CS01 CS02
1 CS 98 96
3 CS 95 92
You should avoid duplicating data in your database if possible:
UID Dept_id C_id marks
1 CS CS01 98
^^ ^^
You could:
Change the course ID to a two column key (department, course number), eg ('CS', '01').
or:
Keep the course name as it is, but put the department ID field in the course table and omit it from your marks table. If you need to calculate the total marks for a specific department you can still do this easily by adding a JOIN to your query.
Your last suggestion seems to be a bad idea. You would need a column in your table for every course and most values would be NULL.
I'm not sure why you need the department in this table if the course indicates the department. Thus, why wouldn't your table be:
UID C_id marks
1 CS01 98
1 CS02 96
1 ME01 88
1 ME02 90
What is missing from this table is some indication of time. For example, a student could take the same course twice if they failed it the first time. Thus, you would need additional columns to indicate the semester and year.
Your suggestion would be a nightmare to maintain. You would have to add new columns every time a new course was added to the achedule. It also would be harder to query much of the time.
If you want to make sure that each course is appropriate for the department, you can do that in a trigger (make sure to handle multiple record inserts or updates) or in the application. This still won't prevent all data entry errors (it is possible to pick CS89 when you meant CS98), but it will reduce the amount of error. In this case it is unlikely the data would come from anywhere other than the application, so I'd probably choose to enforce the rules in the application. A pull down list where they chose the department and only the courses for that department showed would do the trick.
You could add foreign key constraints to your tables to ensure that a valid value is entered for student IDs, course IDs and department IDs. You could also add unique constraints to the table to ensure inadvertent duplicates were not created. But in the end you can't prevent incorrect data from being inserted; if you knew it was incorrect, you wouldn't need to ask for it.
Example: 29th February 1957 couldn't be my birthday; 15th July 2025 couldn't be my birthday; 27th September 1974 wasn't my birthday.

UPDATE query that fixes orphaned records

I have an Access database that has two tables that are related by PK/FK. Unfortunately, the database tables have allowed for duplicate/redundant records and has made the database a bit screwy. I am trying to figure out a SQL statement that will fix the problem.
To better explain the problem and goal, I have created example tables to use as reference:
alt text http://img38.imageshack.us/img38/9243/514201074110am.png
You'll notice there are two tables, a Student table and a TestScore table where StudentID is the PK/FK.
The Student table contains duplicate records for students John, Sally, Tommy, and Suzy. In other words the John's with StudentID's 1 and 5 are the same person, Sally 2 and 6 are the same person, and so on.
The TestScore table relates test scores with a student.
Ignoring how/why the Student table allowed duplicates, etc - The goal I'm trying to accomplish is to update the TestScore table so that it replaces the StudentID's that have been disabled with the corresponding enabled StudentID. So, all StudentID's = 1 (John) will be updated to 5; all StudentID's = 2 (Sally) will be updated to 6, and so on. Here's the resultant TestScore table that I'm shooting for (Notice there is no longer any reference to the disabled StudentID's 1-4):
alt text http://img163.imageshack.us/img163/1954/514201091121am.png
Can you think of a query (compatible with MS Access's JET Engine) that can accomplish this goal? Or, maybe, you can offer some tips/perspectives that will point me in the right direction.
Thanks.
The only way to do this is through a series of queries and temporary tables.
First, I would create the following Make Table query that you would use to create a mapping of the bad StudentID to correct StudentID.
Select S1.StudentId As NewStudentId, S2.StudentId As OldStudentId
Into zzStudentMap
From Student As S1
Inner Join Student As S2
On S2.Name = S1.Name
Where S1.Disabled = False
And S2.StudentId <> S1.StudentId
And S2.Disabled = True
Next, you would use that temporary table to update the TestScore table with the correct StudentID.
Update TestScore
Inner Join zzStudentMap
On zzStudentMap.OldStudentId = TestScore.StudentId
Set StudentId = zzStudentMap.NewStudentId
The most common technique to identify duplicates in a table is to group by the fields that represent duplicate records:
ID FIRST_NAME LAST_NAME
1 Brian Smith
3 George Smith
25 Brian Smith
In this case we want to remove one of the Brian Smith Records, or in your case, update the ID field so they both have the value of 25 or 1 (completely arbitrary which one to use).
SELECT min(id)
FROM example
GROUP BY first_name, last_name
Using min on ID will return:
ID FIRST_NAME LAST_NAME
1 Brian Smith
3 George Smith
If you use max you would get
ID FIRST_NAME LAST_NAME
25 Brian Smith
3 George Smith
I usually use this technique to delete the duplicates, not update them:
DELETE FROM example
WHERE ID NOT IN (SELECT MAX (ID)
FROM example
GROUP BY first_name, last_name)