UPDATE query that fixes orphaned records - sql

I have an Access database that has two tables that are related by PK/FK. Unfortunately, the database tables have allowed for duplicate/redundant records and has made the database a bit screwy. I am trying to figure out a SQL statement that will fix the problem.
To better explain the problem and goal, I have created example tables to use as reference:
alt text http://img38.imageshack.us/img38/9243/514201074110am.png
You'll notice there are two tables, a Student table and a TestScore table where StudentID is the PK/FK.
The Student table contains duplicate records for students John, Sally, Tommy, and Suzy. In other words the John's with StudentID's 1 and 5 are the same person, Sally 2 and 6 are the same person, and so on.
The TestScore table relates test scores with a student.
Ignoring how/why the Student table allowed duplicates, etc - The goal I'm trying to accomplish is to update the TestScore table so that it replaces the StudentID's that have been disabled with the corresponding enabled StudentID. So, all StudentID's = 1 (John) will be updated to 5; all StudentID's = 2 (Sally) will be updated to 6, and so on. Here's the resultant TestScore table that I'm shooting for (Notice there is no longer any reference to the disabled StudentID's 1-4):
alt text http://img163.imageshack.us/img163/1954/514201091121am.png
Can you think of a query (compatible with MS Access's JET Engine) that can accomplish this goal? Or, maybe, you can offer some tips/perspectives that will point me in the right direction.
Thanks.

The only way to do this is through a series of queries and temporary tables.
First, I would create the following Make Table query that you would use to create a mapping of the bad StudentID to correct StudentID.
Select S1.StudentId As NewStudentId, S2.StudentId As OldStudentId
Into zzStudentMap
From Student As S1
Inner Join Student As S2
On S2.Name = S1.Name
Where S1.Disabled = False
And S2.StudentId <> S1.StudentId
And S2.Disabled = True
Next, you would use that temporary table to update the TestScore table with the correct StudentID.
Update TestScore
Inner Join zzStudentMap
On zzStudentMap.OldStudentId = TestScore.StudentId
Set StudentId = zzStudentMap.NewStudentId

The most common technique to identify duplicates in a table is to group by the fields that represent duplicate records:
ID FIRST_NAME LAST_NAME
1 Brian Smith
3 George Smith
25 Brian Smith
In this case we want to remove one of the Brian Smith Records, or in your case, update the ID field so they both have the value of 25 or 1 (completely arbitrary which one to use).
SELECT min(id)
FROM example
GROUP BY first_name, last_name
Using min on ID will return:
ID FIRST_NAME LAST_NAME
1 Brian Smith
3 George Smith
If you use max you would get
ID FIRST_NAME LAST_NAME
25 Brian Smith
3 George Smith
I usually use this technique to delete the duplicates, not update them:
DELETE FROM example
WHERE ID NOT IN (SELECT MAX (ID)
FROM example
GROUP BY first_name, last_name)

Related

What happens if one query contains duplicate joins?

I have an application filter which can generate duplicate SQL queries to the result SQL like:
select * from articles
inner join users on articles.users_id = users.id
inner join users on articles.users_id = users.id
where users.name like %xxx%
The question is if a database is able to handle these duplicates or not. What happened in the database if this query comes inside? If I should remove it from the result SQL or if I can leave it as is.
This is a self join.
A self join is a regular join, but the table is joined with itself.
Example
SELECT
A.Id,
A.FullName,
A.ManagerId,
B.FullName as ManagerName
FROM Employees A
JOIN Employees B
ON A.ManagerId = B.Id
A and B are different table aliases for the same table.
The self join, as its name implies, joins a table to itself. To use a self join, the table must contain a column (call it X) that acts as the primary key and a different column (call it Y) that stores values that can be matched up with the values in Column X. The values of Columns X and Y do not have to be the same for any given row, and the value in Column Y may even be null.
Let’s take a look at an example. Consider the table Employees:
Id
FullName
Salary
ManagerId
1
John Smith
10000
3
2
Jane Anderson
12000
3
3
Tom Lanon
15000
4
4
Anne Connor
20000
5
Jeremy York
9000
1
Each employee has his/her own Id, which is our “Column X.” For a given employee (i.e., row), the column ManagerId contains the Id of his or her manager; this is our “Column Y.” If we trace the employee-manager pairs in this table using these columns:
The manager of the employee John Smith is the employee with Id 3,
i.e., Tom Lanon.
The manager of the employee Jane Anderson is the employee with Id 3,
i.e., Tom Lanon.
The manager of the employee Tom Lanon is the employee with Id 4,
i.e., Anne Connor.
The employee Anne Connor does not have a manager; her ManagerId is
null.
The manager of the employee Jeremy York is the employee with Id 1,
i.e., John Smith.
This type of table structure is very common in hierarchies. Now, to show the name of the manager for each employee in the same row, we can run the following query:
SELECT
employee.Id,
employee.FullName,
employee.ManagerId,
manager.FullName as ManagerName
FROM Employees employee
JOIN Employees manager
ON employee.ManagerId = manager.Id
which returns the following result:
Id
FullName
ManagerId
ManagerName
1
John Smith
3
Tom Lanon
2
Jane Anderson
3
Tom Lanon
3
Tom Lanon
4
Anne Connor
5
Jeremy York
1
John Smith
The query selects the columns Id, FullName, and ManagerId from the table aliased employee. It also selects the FullName column of the table aliased manager and designates this column as ManagerName. As a result, every employee who has a manager is output along with his/her manager’s ID and name.
In this query, the Employees table is joined with itself and has two different roles:
Role 1: It stores the employee data (alias employee).
Role 2: It stores the manager data (alias manager).
By doing so, we are essentially considering the two copies of the Employees table as if they are two distinct tables, one for the employees and another for the managers.
You can find more about the concept of the self join in our article "An illustrated guide to the SQL self join".

SQL Insert with value from different table

I have 2 tables storing information. For example:
Table 1 contains persons:
ID NAME CITY
1 BOB 1
2 JANE 1
3 FRED 2
The CITY is a id to a different table:
ID NAME
1 Amsterdam
2 London
The problem is that i want to insert data that i receive in the format:
ID NAME CITY
1 PETER Amsterdam
2 KEES London
3 FRED London
Given that the list of Cities is complete (i never receive a city that is not in my list) how can i insert the (new/received from outside)persons into the table with the right ID for the city?
Should i replace them before I try to insert them, or is there a performance friendly (i might have to insert thousands of lines at one) way to make the SQL do this for me?
The SQL server i'm using is Microsoft SQL Server 2012
First, load the data to be inserted into a table.
Then, you can just use a join:
insert into persons(id, name, city)
select st.id, st.name, c.d
from #StagingTable st left join
cities c
on st.city = c.name;
Note: The persons.id should probably be an identity column so it wouldn't be necessary to insert it.
insert into persons (ID,NAME,CITY) //you dont need to include ID if it is auto increment
values
(1,'BOB',(select Name from city where ID=1)) //another select query is getting Name from city table
if you want to add 1000 rows at a time that'd be great if you use stored procedure like this link

Excessive Case Statement Help - SQL Server

I'm supposed to answer this for class, and it's tricky (for me)
Write a SELECT query to output the name of all employees with the name of their supervisor. If the employee has no supervisor, the supervisor name column should contain the text 'No Supervisor'.
The primary key field in my db is the employeeid and they are provided with names, and each student also has a supervisorid
The table for this is shown below (sorry for the layout):
employeeid lastname firstname salary supervisorid
1 Stolz Ted 25000 NULL
2 Boswell Nancy 23000 1
3 Hargett Vincent 22000 1
4 Weekley Kevin 22000 3
5 Metts Geraldine 22000 2
6 McBride Jeffrey 21000 2
7 Xiong Jay 20000 3
I was wondering how I could go about this statement without using the case statement to apply each of the 7 students with:
when concat(firstname,' ',lastname)='Nancy Boswell' then 'Ted Stolz'
In larger tables this would simply be a HUGE statement, is there a better way to do it?
Thanks!
EDIT:
I've now tried this:
SELECT
EMP1.employeeid as 'employee',
EMP2.supervisorid as 'manager'
FROM
employee EMP1
LEFT OUTER JOIN
employee EMP2
ON
emp1.employeeid = emp2.supervisorid;
However, I am seeing duplicate fields, for some reason employee 2 and 3 are appearing twice, meaning there are 9 fields showing instead of 7.
Also, I need to display their names, not their id's does that mean I need to join the join that i've already done to the employee name ? How would I do this?
Thanks for the feedback guys!
You need to link the table with itself based on the supervisorId. This might be strange if you are new to SQL but it is very common to do. You tell with SQL to add the row of the supervisor to the row of the employee via its primary key.
SELECT
*
FROM
EMPLOYEES EMP1
LEFT OUTER JOIN
EMPLOYEES EMP2
ON
-- make link between tables here
Note that the above query is not 100% correct / complete, its an indication. The LEFT OUTER JOIN statement makes the employees without supervisor have null values for the supervisor, otherwise the whole record would be left out.

How do I make a query for if value exists in row add a value to another field?

I have a database on access and I want to add a value to a column at the end of each row based on which hospital they are in. This is a separate value. For example - the hospital called "St. James Hospital" has the id of "3" in a separate field. How do I do this using a query rather than manually going through a whole database?
example here
Not the best solution, but you can do something like this:
create table new_table as
select id, case when hospital="St. James Hospital" then 3 else null
from old_table
Or, the better option would be to create a table with the columns hospital_name and hospital_id. You can then create a foreign key relationship that will create the mapping for you, and enforce data integrity. A join across the two tables will produce what you want.
Read about this here:
http://net.tutsplus.com/tutorials/databases/sql-for-beginners-part-3-database-relationships/
The answer to your question is a JOIN+UPDATE. I am fairly sure if you looked up you would find the below link.
Access DB update one table with value from another
You could do this:
update yourTable
set yourFinalColumnWhateverItsNameIs = {your desired value}
where someColumn = 3
Every row in the table that has a 3 in the someColumn column will then have that final column set to your desired value.
If this isn't what you want, please make your question clearer. Are you trying to put the name of the hospital into this table? If so, that is not a good idea and there are better ways to accomplish that.
Furthermore, if every row with a certain value (3) gets this value, you could simply add it to the other (i.e. Hospitals) table. No need to repeat it everywhere in the table that points back to the Hospitals table.
P.S. Here's an example of what I meant:
Let's say you have two tables
HOSPITALS
id
name
city
state
BIRTHS
id
hospitalid
babysname
gender
mothersname
fathername
You could get a baby's city of birth without having to include the City column in the Births table, simply by joining the tables on hospitals.id = births.hospitalid.
After examining your ACCDB file, I suggest you consider setting up the tables differently.
Table Health_Professionals:
ID First Name Second Name Position hospital_id
1 John Doe PI 2
2 Joe Smith Co-PI 1
3 Sarah Johnson Nurse 3
Table Hospitals:
hospital_id Hospital
1 Beaumont
2 St James
3 Letterkenny Hosptial
A key point is to avoid storing both the hospital ID and name in the Health_Professionals table. Store only the ID. When you need to see the name, use the hospital ID to join with the Hospitals table and get the name from there.
A useful side effect of this design is that if anyone ever misspells a hospital name, eg "Hosptial", you need correct that error in only one place. Same holds true whenever a hospital is intentionally renamed.
Based on those tables, the query below returns this result set.
ID Second Name First Name Position hospital_id Hospital
1 Doe John PI 2 St James
3 Johnson Sarah Nurse 3 Letterkenny Hosptial
2 Smith Joe Co-PI 1 Beaumont
SELECT
hp.ID,
hp.[Second Name],
hp.[First Name],
hp.Position,
hp.hospital_id,
h.Hospital
FROM
Health_Professionals AS hp
INNER JOIN Hospitals AS h
ON hp.hospital_id = h.hospital_id
ORDER BY
hp.[Second Name],
hp.[First Name];

How to manage "groups" in the database?

I've asked this question here, but I don't think I got my point across.
Let's say I have the following tables (all PK are IDENTITY fields):
People (PersonId (PK), Name, SSN, etc.)
Loans (LoanId (PK), Amount, etc.)
Borrowers (BorrowerId(PK), PersonId, LoanId)
Let's say Mr. Smith got 2 loans on his name, 3 joint loans with his wife, and 1 join loan with his mistress. For the purposes of application I want to GROUP people, so that I can easily single-out the loans that Mr. Smith took out jointly with his wife.
To accomplish that I added BorrowerGroup table, now I have the following (all PK are IDENTITY fields):
People (PersonId (PK), Name, SSN, etc.)
Loans (LoanId (PK), Amount, BorrowerGroupId, etc.)
BorrowerGroup(GroupId (PK))
Borrowers (BorrowerId(PK), GroupId, PersonId)
Now Mr. Smith is in 3 groups (himself, him and his wife, him and his mistress) and I can easily lookup his activity in any of those groups.
The problems with new design:
The only way to generate new BorrowerGroup is by inserting MAX(GourpId)+1 with IDENTITY_INSERT ON, this just doesn't feel right. Also, the notion of a table with 1 column is kind of weird.
I'm a firm believer in surrogate keys, and would like to stick to that design if possible.
This application does not care about individuals, the GROUP is treated as an individual
Is there a better way to group people for the purpose of this application?
You could just remove the table BorrowerGroups - it carries no information. This information is allready present via the Loans People share - I just assume you have a PeopleLoans table.
People Loans PeopleLoans
----------- ------------ -----------
1 Smith 6 S1 60 1 6
2 Wife 7 S2 60 1 7
3 Mistress 8 S+W1 74 1 8
9 S+W2 74 1 9
10 S+W3 74 1 10
11 S+M1 89 1 11
2 8
2 9
2 10
3 11
So your BorrowerGroups are actually almost the Loans - 6 and 7 with Smith only, 8 to 10 with Smith and Wife, and 11 with Smith and Mistress. So there is no need for BorrowerGroups in the first place, because they are identical to Loans grouped by the involved People.
But it might be quite hard to efficently retrieve this information, so you could think about adding a GroupId directly to Loans. Ignoring the second column of Loans (just for readability) the third column schould represent your groups. They are redundant, so you have to be carefull if you change them.
If you find a good way to derive a unique GroupId from the ids of involved people, you could make it a computed column. If a string would be okay as an group id, you could just order the ids of the people an concat them with a separator.
Group 60 with Smith only would get id '1', group 74 would become 1.2, and group 89 would become 1.3. Not that smart, but unique and easy to compute.
use the original schema:
People (PersonId (PK), Name, SSN, etc.)
Loans (LoanId (PK), Amount, etc.)
Borrowers (BorrowerId(PK), PersonId, LoanId)
just query for the data you need (your example to find husband and wife on same loans):
SELECT
l.*
FROM Borrowers b1
INNER JOIN Borrowers b2 ON b1.LoanId=b2.LoanId
INNER JOIN Loans l ON b1.LoanId=l.LoanId
WHERE b1.PersonId=#HusbandID
AND b2.PersonId=#WifeID
The design of the database seems OK. Why do you have to use MAX(GourpId)+1 when you create a new group? Can't you just create the row and then use SCOPE_IDENTITY() to return the new ID?
e.g.
INSERT INTO BorrowerGroup() DEFAULT VALUES
SELECT SCOPE_IDENTITY()
(See this other question)
(edit to SQL courtesy of this question)
I would do something more like this:
People (PersonId (PK), Name, SSN, etc.)
Loans (LoanId (PK), Amount, BorrowerGroupId, etc.)
BorrowerGroup(BorrowerGroupId (PK))
PersonBelongsToBorrowerGroup(BorrowerGroupId
(PK), PersonId(PK))
I got rid of the Borrowers table. Just store the info in the BorrowerGroup table. That's my preference.
The consensus seems to be to omit the BorrowerGroup table and I have to agree. Suggesting that you would use MAX(groupId+1) has all sorts of ACID/transaction issues and the main reason why IDENTITY fields exist.
That said; the SQL that KM provided looks good. There are any number of ways to get the same results. Joins, sub-selects and so on. The real issue there... is knowing the dataset. Given the explanation you provided the datasets are going to be very small. That also supports removing the BorrowerGroup table.
I would have a group table and then a groupmembers(borrowers) table to accomplish the many-to-many relationship between loans and people. This allows the tracking of data on the group other than just a list of members (I believe someone else made this suggestion?).
CREATE TABLE LoanGroup
(
ID int NOT NULL
, Group_Name char(50) NULL
, Date_Started datetime NULL
, Primary_ContactID int NULL
, Group_Type varchar(25)
)