Are Append Queries in a self joining table possible - sql

I have tblEmployee that contains 3 fields:
ID: AutoNumber
Name: Text
Supervisor: Number [as a lookup in tblEmployee]
I wish to append new data to this table from tblNewEmployees that has the exact same structure as the previous table.
Can this be done if I have the ID field as an autonumber?
I have tried various queries (for example first appending only the Name field as step 1, and then trying with a second update query to get the supervisor) but all produced garbage, hence my question whether this is possible using AutoNumbers in the first place.

I suppose it can be done in your structure (adjacency list model) in an iterative way i.e. add the employee(s) at the top of the tree, query the database to get their auto-generated id(s), then add the employees in the next level down using the previously queried id(s), then repeat for each level down.
While possible, is it desirable? Presumably every employee already has a unique id e.g. payroll number, social security number, etc. If unsure, ask the payroll person.
Removing the dependency of the database in generating employee ids will probably free you from the aforementioned iterative process. It is preferable for inserts to be deterministic, predictable, scritpable as a one-off, etc.
Another thing to consider is that your may be modelling a tree structure when you may want a hierarchy. The examples Celko used to give: the army is a hierarchy because if you shoot your sergeant you still have to take orders from your captain; on the other hand, a river system is tree because if your dam one tributary then all downstream tributaries run dry.
It seems to me with your design, when a supervisor leaves (is deleted from the table) then you are left with an unsupervised employee (missing data, therefore data integrity is corrupted), whereas you'd want the next senior employee to take their place (hierarchy). An update in your structure could be a lot if work i.e. iterative again.
While the adjacency list model may be intuitive, it is not always the easiest to work with in SQL DML. Consider other models e.g. nested sets. That said, with Access, SQL DML is almost always painful because it doesn't support procedural SQL code in stored procs, triggers, etc; even a simple update can fail due to 'non-updatable query' (view) restrictions. So as usual, I must advise you to consider a more capable DBMS if at all possible.

Yes, it is possible to merge the two tables when the destination table has an AutoNumber ID. There are two possible scenarios:
Scenario 1: No overlap of ID values between the two tables
[tblEmployee]
ID Name Supervisor
-- ---------- ----------
1 Director A
2 Manager A 1
3 Worker A 2
[tblNewEmployees]
ID Name Supervisor
--- ---------- ----------
101 Director B
102 Manager B 101
103 Worker B 102
Since the Access Database Engine allows us to insert arbitrary values into an AutoNumber column, this case is trivial. Just ...
INSERT INTO tblEmployee (ID, [Name], Supervisor)
SELECT ID, [Name], Supervisor FROM tblNewEmployees
... and we're done:
[tblEmployee]
ID Name Supervisor
--- ---------- ----------
1 Director A
2 Manager A 1
3 Worker A 2
101 Director B
102 Manager B 101
103 Worker B 102
Scenario 2: Common ID values between the two tables
[tblEmployee]
ID Name Supervisor
-- ---------- ----------
1 Director A
2 Manager A 1
3 Worker A 2
[tblNewEmployees]
ID Name Supervisor
-- ---------- ----------
2 Director B
5 Manager B 2
7 Worker B 5
In this case we need to map the old ID values to the new ID values when the new rows are inserted. To do that, add a new column to [tblEmployee]
ALTER TABLE tblEmployee ADD oldID LONG
then insert the new rows, putting tblNewEmployees.ID into tblEmployee.oldID
INSERT INTO tblEmployee (oldID, [Name])
SELECT ID, [Name] FROM tblNewEmployees
giving us
[tblEmployee]
ID Name Supervisor oldID
-- ---------- ---------- -----
1 Director A
2 Manager A 1
3 Worker A 2
4 Director B 2
5 Manager B 5
6 Worker B 7
Then we can update the Supervisor column with the new ID values
UPDATE
(
tblEmployee emp
INNER JOIN
tblNewEmployees new
ON emp.oldID = new.ID
)
INNER JOIN
tblEmployee emp2
ON new.Supervisor = emp2.oldID
SET emp.Supervisor = emp2.ID
producing
[tblEmployee]
ID Name Supervisor oldID
-- ---------- ---------- -----
1 Director A
2 Manager A 1
3 Worker A 2
4 Director B 2
5 Manager B 4 5
6 Worker B 5 7
We can then drop the [oldID] column if desired.

Related

Insert into a table using multiple tables [duplicate]

This question already has answers here:
SQL JOIN and different types of JOINs
(6 answers)
Closed 4 years ago.
I am new to using SQL Server. I have an assignment, and the lecturer is not showing us how to use the tools he wants the assignment to be completed with.
I am trying to come up with a query that will insert the primary key of 3 dimension tables into as well as trying to insert data from the source data in another table.
The source data is a data set of 10000+ Apps on the Google Play Store.
See below for my table and what I need
DimContentRating - there are 6 content ratings
ContentRatingID(PK) Content Rating
--------------- --------------
1 Everyone
2 Teen
DimCategory - there are 34 categories
CategoryID(PK) Category
---------- --------
1 Education
2 Finance
DimInstalls - there are many ranges of installs
InstallID(PK) Installs
---------- --------
1 10000+
2 100000+
googleplaystore - the table with the 10000+ records and original data
App Category Rating Reviews Installs Price Content_Rating
--- -------- ------ ------ -------- ----- --------------
GMAT
Question Education 4.2 240 10000+ Free Everyone
Bank
Ace Elite Finance 4.1 2898 100000+ Free Everyone
How I need it to look
AppFact - The table that needs the tables above to be broken down from the above tables and inserted using links from Foreign Keys
AppFactID Category Rating Reviews Installs Price Content_Rating
--------- -------- ------ ------ -------- ----- --------------
1 1 4.2 240 1 Free 1
2 2 4.1 2898 2 Free 1
I do apologize for not having a query that I tried writing to get it to work but I have not been shown much at all about SQL Server and so the best I know is general queries. What I do know is I need to use the below as well as possible inner joins?
INSERT INTO AppFact(...)
SELECT ...
FROM ...
Am I on the right track?
Assuming you already create the table AppFact with PK AppFactID, the query you're looking for is:
INSERT INTO AppFact (Category, Rating, Reviews, Installs, Price, Content_Rating)
SELECT c.CategoryID, a.Rating, a.Reviews, i.Installs, a.Price, r.ContentRatingID
FROM googleplaystore a
INNER JOIN DimContentRating r ON a.Content_Rating = r.Content_Rating
INNER JOIN DimCategory c ON a.Category = c.Category
INNER JOIN DimInstalls i ON a.Installs = i.Installs
You should take a look to JOINs in SQL Server

Merge two versions of database tables with conflicting keys

I have been asked to merge 2 Access databases. They are conflicting versions of the same file.
A database was emailed to somebody. (I know.) Somebody added records to the 'main' copy while somebody else added records to their copy. I want to add the new records from the 'unauthorised' copy into the main version, before utterly destroying all other copies.
Unfortunately, the database has several related tables. As would naturally happen when records are added, records in different versions have conflicting primary keys. These conflicting keys are also used as foreign keys in the new records. A foreign key reference to ID x means different things in the 2 versions.
Is there any hope? I thought of maybe importing it all into excel and using formulas to update the primary and foreign keys.
Is there any way to fix this programatically?
EDIT: Here is a picture showing the full relationships. Tables teachers, tests, and test_results have been changed; the others are the same in both.
In the main database, add a Long field named [oldID] to each table into which you need to append data. Then create Linked Tables pointing to the relevant tables in the "other" database. Since the table names are the same, the linked tables will have a '1' appended to them.
For this example, we have
[teachers]
ID teacher oldID
-- -------- -----
1 TeacherA
2 TeacherB
3 TeacherX
[teachers1]
ID teacher
-- --------
1 TeacherA
2 TeacherB
3 TeacherY
[tests]
ID test_name teacher oldID
-- -------------- ------- -----
1 TeacherA_Test1 1
2 TeacherA_Test2 1
3 TeacherB_Test1 2
4 TeacherX_Test1 3
[tests1]
ID test_name teacher
-- -------------- -------
1 TeacherA_Test1 1
2 TeacherA_Test2 1
3 TeacherB_Test1 2
4 TeacherY_Test1 3
5 TeacherY_Test2 3
Make a note of where the tables diverge. In this case the [teachers] tables diverge after ID=2. So, insert the new rows from [teachers1] into [teachers], putting [teachers1].[ID] into [teachers].[oldID] so we can map old IDs to new ones:
INSERT INTO [teachers] ([teacher], [oldID])
SELECT [teacher], [ID] FROM [teachers1] WHERE [ID]>2
So now we have
[teachers]
ID teacher oldID
-- -------- -----
1 TeacherA
2 TeacherB
3 TeacherX
4 TeacherY 3
Now when we append the new rows from [tests1] into [tests] we can use an INNER JOIN on [teachers].[oldID] to adjust the foreign key values that get inserted:
INSERT INTO [tests] ([test_name], [teacher], [oldID])
SELECT [tests1].[test_name], [teachers].[ID], [tests1].[ID]
FROM [tests1] INNER JOIN [teachers] ON [tests1].[teacher]=[teachers].[oldID]
giving us
[tests]
ID test_name teacher oldID
-- -------------- ------- -----
1 TeacherA_Test1 1
2 TeacherA_Test2 1
3 TeacherB_Test1 2
4 TeacherX_Test1 3
5 TeacherY_Test1 4 4
6 TeacherY_Test2 4 5
Notice how the [teacher] foreign key has been mapped from the value 3 in [tests1] to 4 in [tests], reflecting the new [teachers].[ID] value for 'TeacherY'.
You can then repeat the process for child tables of [tests].
(Once the cleanup is complete you can remove the table links and drop the [oldID] columns.)
Is there any way to fix this programatically?
No. This must be done by a human capable of reading and understanding the data and taking decisions.
Create a query with an inner join between table one and table two, another query with an outer join between table one and table two, and another query with an outer join between table two and table one.
Now you can study the differences and decide which version of similar records to be kept and which records are completely new and should be kept - some with a new Primary Key.

MySQL duplicates -- how to specify when two records actually AREN'T duplicates?

I have an interesting problem, and my logic isn't up to the task.
We have a table with that sometimes develops duplicate records (for process reasons, and this is unavoidable). Take the following example:
id FirstName LastName PhoneNumber email
-- --------- -------- ------------ --------------
1 John Doe 123-555-1234 jdoe#gmail.com
2 Jane Smith 123-555-1111 jsmith#foo.com
3 John Doe 123-555-4321 jdoe#yahoo.com
4 Bob Jones 123-555-5555 bob#bar.com
5 John Doe 123-555-0000 jdoe#hotmail.com
6 Mike Roberts 123-555-9999 roberts#baz.com
7 John Doe 123-555-1717 wally#domain.com
We find the duplicates this way:
SELECT c1.*
FROM `clients` c1
INNER JOIN (
SELECT `FirstName`, `LastName`, COUNT(*)
FROM `clients`
GROUP BY `FirstName`, `LastName`
HAVING COUNT(*) > 1
) AS c2
ON c1.`FirstName` = c2.`FirstName`
AND c1.`LastName` = c2.`LastName`
This generates the following list of duplicates:
id FirstName LastName PhoneNumber email
-- --------- -------- ------------ --------------
1 John Doe 123-555-1234 jdoe#gmail.com
3 John Doe 123-555-4321 jdoe#yahoo.com
5 John Doe 123-555-0000 jdoe#hotmail.com
7 John Doe 123-555-1717 wally#domain.com
As you can see, based on FirstName and LastName, all of the records are duplicates.
At this point, we actually make a phone call to the client to clear up potential duplicates.
After doing so, we learn (for example) that records 1 and 3 are real duplicates, but records 5 and 7 are actually two different people altogether.
So we merge any extraneously linked data from records 1 and 3 into record 1, remove record 3, and leave records 5 and 7 alone.
Now here's were the problem comes in:
The next time we re-run the "duplicates" query, it will contain the following rows:
id FirstName LastName PhoneNumber email
-- --------- -------- ------------ --------------
1 John Doe 123-555-4321 jdoe#gmail.com
5 John Doe 123-555-0000 jdoe#hotmail.com
7 John Doe 123-555-1717 wally#domain.com
They all appear to be duplicates, even though we've previously recognized that they aren't.
How would you go about identifying that these records aren't duplicates?
My first though it to build a lookup table identifying which records aren't duplicates of each other (for example, {1,5},{1,7},{5,7}), but I have no idea how to build a query that would be able to use this data.
Further, if another duplicate record shows up, it may be a duplicate of 1, 5, or 7, so we would need them all to show back up in the duplicates list so the customer service person can call the person in the new record to find out which record he may be a duplicate of.
I'm stretched to the limit trying to understand this. Any brilliant geniuses out there that would care to take a crack at this?
Interesting problem. Here's my crack at it.
How about if we approach the problem from a slightly different perspective.
Consider that the system is clean for a start i.e all records currently in the system are either with Unique First + Last name combinations OR the same first + last name ones have already been manually confirmed to be different people.
At the point of entering a NEW user in the system, we have an additional check. Can be implemented as an INSERT Trigger or just another procedure called after the insert is successfully done.
This Trigger / Procedure matches the
FIRST + LAST name combination of
"Inserted"record with all existing
records in the table.
For all the matching First + Last names, it will create an entry in a matching table (new table) with NewUserID, ExistingMatchingRecordsUserID
From an SQL perspective,
TABLE MatchingTable
COLUMNS 1. NewUserID 2. ExistingUserID
Constraint : Logical PK = NewUserID + ExistingMatchingRecordsUserID
INSERT INTO MATCHINGTABLE VALUES ('NewUserId', userId)
SELECT userId FROM User u where u.firstName = 'John' and u.LastName = 'Doe'
All entries in MatchingTable need resolution.
When say an Admin logs into the system, the admin sees the list of all entries in MatchingTable
eg: New User John Doe - (ID 345) - 3 Potential matches John Doe - ID 123 ID 231 / ID 256
The admin will check up data for 345 against data in 123 / 231 and 256 and manually confirm if duplicate of ANY / None
If Duplicate, 345 is deleted from User Table (soft / hard delete - whatever suits you)
If NOT, the entries for ID 354 are just removed from MatchingTable (i would go with hard deletes here as this is like a transactional temp table but again anything is fine).
Additionally, when entries for ID 354 are removed from MatchingTable, all other entries in MatchingTable where ExistingMatchingRecordsUserID = 354 are automatically removed to ensure that unnecessary manual verification for already verified data is not needed.
Again, this could be a potential DELETE trigger / Just logic executed additionally on DELETE of MatchingTable. The implementation is subject to preference.
At the expense of adding a single byte per row to your table, you could add a manually_verified BOOL column, with a default of FALSE. Set it to TRUE if you have manually verified the data. Then you can simply query where manually_verified = FALSE.
It's simple, effective, and matches what is actually happening in the business processes: you manually verify the data.
If you want to go a step further, you might want to store when the row was verified and who verified it. Since this might be annoying to store in the main table, you could certainly store it in a separate table, and LEFT JOIN in the verification data. You could even create a view to recreate the appearance of a single master table.
To solve the problem of a new duplicate being added: you would check non-verified data against the entire data set. So that means your main table, c1, would have the condition manually_verified = FALSE, but your INNER JOINed table, c2, does not. This way, the unverified data will still find all potential duplicate matches:
SELECT * FROM table t1
INNER JOIN table t2 ON t1.name = t2.name AND t1.id <> t2.id
WHERE t1.manually_verified = FALSE
The possible matches for the duplicates will be in the joined table.

SQL query to return data from two separate rows in a table joined to a master table

I have a TWO tables of data with following fields
table1=(ITTAG,ITCODE,ITDESC,SUPcode)
table2=(ACCODE,ACNAME,ROUTE,SALMAN)
This is my customer master table that contains my customer data such as customer code, customer name and so on...
Every Route has a supervisor (table1=supcode) and I need to know the supervisor name in my table which both supervisor name and code exist in one table.
table1 has contain all names separated by ITTAG. For example, supervisor's name has ITTAG='K'; also salesman's name has ITTAG='S'.
ITTAG ITCODE ITDESC SUPCODE
------ ------ ------ -------
S JT JOHN TOMAS TF
K WK VIKI KOO NULL
Now this is the result which I want
ACCODE ACNAME ROUTE SALEMANNAME SUPERVISORNAME
------- ------ ------ ------------ ---------------
IMC1010 ABC HOTEL 01 JOHN TOMAS VIKI KOO
I hope this this information is sufficient to get the query..
Your data structure is either not clear or incomplete. It would help if you showed the actual example data for Table1 too, but there would be trouble.
SELECT t2.ACCODE, t2.ACNAME, t2.ROUTE, a1.ITDESC AS Salesman, a2.ITDESC AS Supervisor
FROM table1 AS t1
JOIN table2 AS a1 ON t1.SALMAN = a1.ITCODE
JOIN table2 AS a2 ON t1.?????? = a2.SUPCODE
It is not clear whether I've managed the join between Table1 and Table2 for the salesman information correctly; it is plausible, but the join for the supervisor should be similar, and yet there isn't a way to make that work. Hence the '??????' in the query.
The basic technique for joining twice to a single table is to cite it twice with different aliases, as shown. I usually use one letter or a letter and a digit for the aliases, as shown.

Oracle SQL: update table conditionally based on values in another table

[Previous essay-title for question]
Oracle SQL: update parent table column if all child table rows have specific value in a column. Update RANK of only those students who have 100 marks in all the subjects. If student has less than 100 marks in any subject, his RANK should not be updated.
I have a scenario where I have a parent table and a child table. The child table has a foreign key to parent table. I need to update parent table's status column when a column in child table rows have specific values. There are more than one child records for each parent, in some cases none. Is it possible to achieve this with Oracle SQL, without using PL/SQL. Is that possible, can some one explain how? In some case I have to update parent table row's column based on two columns of child table records.
My exact problem is like : I have two tables STUDENTS, MARKS. MARKS has a FK to STUDENTS named STUDENT_ID.MARKS has number of rows for a STUDENT record, depending on different subjects (MARKS has a FK to SUBJECTS), and has a column named MARKS_OBTAINED. I have to check that if MARKS_OBTAINED for one student for every subject (i.e. all his records in MARKS) have value 100, then update STUDENT table's column RANK to a value 'Merit'. This query:
update STUDENT
set RANK = 'Merit'
where exists ( select *
from MARKS
where MARKS.STUDENT_ID = STUDENT.ID
and MARKS.MARKS_OBTAINED = 100)
and not exists ( select *
from MARKS
where MARKS.STUDENT_ID = STUDENT.ID
and MARKS.MARKS_OBTAINED != 100)
updates all those student who have 100 marks in any subject. It does not exclude records which have non 100 marks. Because it passes rows for a STUDENT in MARKS where one record in MARKS has 100 MARKS_OBTAINED but other records have less than 100 marks, but since STUDENT obtained 100 marks in one subject, its RANK will also get updated. The requirement is that if any STUDENT records has a MARKS record with non 100 value in MARKS_OBTAINED column this STUDENT record should get excluded from the query.
Total rewrite
This is a complete rewrite to fit my example to the OQ's revised question. Unfortunately Manish has not actually run my original solution otherwise they would realise the following assertion is wrong:
Your solution returns all those
student who have 100 marks in any
subject. It does not exclude records
which have non 100 marks.
Here are six students and their marks.
SQL> select * from student
2 /
ID RANK
---------- ----------
1 normal
2 normal
3 normal
4 normal
5 normal
6 normal
6 rows selected.
SQL> select * from marks
2 /
COURSE_ID STUDENT_ID MARK
---------- ---------- ----------
1 1 100
2 1 100
1 2 100
2 2 99
1 4 100
2 5 99
1 6 56
2 6 99
8 rows selected.
SQL>
Student #1 has two courses with marks of 100. Student #4 has just the one course but with with a mark of 100. Student #2 has a mark of 100 in one course but only 99 in the other course they have taken. None of the other students scored 100 in any course. Which students will be awarded a 'merit?
SQL> update student s
2 set s.rank = 'merit'
3 where exists ( select null
4 from marks m
5 where m.student_id = s.id
6 and m.mark = 100 )
7 and not exists ( select null
8 from marks m
9 where m.student_id = s.id
10 and m.mark != 100)
11 /
2 rows updated.
SQL>
SQL> select * from student
2 /
ID RANK
---------- ----------
1 merit
2 normal
3 normal
4 merit
5 normal
6 normal
6 rows selected.
SQL>
And lo! Only those students with 100 marks in all their courses have been updated. Never underestimate the power of an AND.
So the teaching is: an ounce of testing is worth sixteen tons of supposition.
Your question is a little too vague at the moment to really answer fully. What happens to a parent row if it has no children? What happens if some of the child rows have specific values but not all of them? In the two-column case, what combinations of number of children/values are needed (is is the same set of values for each column or unique ones? Is it an AND relationship or an OR relationship)? Etc...
Anyway, making the assumption that there needs to be at least one child row with a value in a given domain, this should be fairly straightforward:
update PARENT set STATUS = 'whatever'
where ID in (
select parent_id from CHILD
where value_col in ('your', 'specific', 'values', 'here')
);
This general pattern expands to the multi-column case easily (just add an extra AND or ORed condition to the inner where clause), and to the negative case too (change where ID in to where ID not in).
If performance of this update is an issue you may want to look at triggers - at the price of slightly slower inserts on the child tables, you can keep your parent table up-to-date on an ongoing basis without having to run this update statement periodically. This works quite nicely because the logic of inspecting each child row is essentially distributed across each individual insert or update on the child table. Of course, if those child modifications are performance-critical, or if the child changes many times in between the points where you need to update the parent, then this wouldn't work very well.
What about:
UPDATE ParentTable
SET StatusColumn = 78
WHERE PK_Column IN
(SELECT DISTINCT FK_Column
FROM ChildTable AS C1
WHERE (SELECT COUNT(*) FROM ChildTable C2
WHERE C1.FK_Column = C2.FK_Column) =
(SELECT COUNT(*) FROM ChildTable C3
WHERE C1.FK_Column = C3.FK_Column
AND C3.OtherColumn = 23)
)
I strongly suspect there are neater ways to do it, but...the correlated sub-queries count the number of rows in the child table for a particular parent and the number of rows in the child table for the same parent where some filter condition matches a particular value. Those FK_Column values are returned to the main UPDATE statement, giving a list of primary key values for which the status should be updated.
This code enforces the stringent condition 'all matching rows in the child table satisfy the specific condition'. If your condition is simpler, your sub-query can be correspondingly simpler.