Informix SQL merging/joining and updating a table - sql

I have an Informix database.
EMPLOYEE
LAST_NAME FIRST_NAME SSN
----------------------------------------------
SMITH JOHN 123456789
DOE JANE 987654321
SMITH JOHN 5555555
SCHEDULE
SSN START END
---------------------------
123456789 8:00AM 4:00PM
987654321 8:00AM 4:00PM
I need to take the profile John Smith with ssn 5555555 and replace it with the other John Smith with ssn 123456789. Also with this query I need to update the Schedule table to update to with the new ssn 5555555.
The most important thing is that the profile with ssn 123456789 is now attached to the schedule table with the ssn 5555555. Then I need to be able to delete the old employee with ssn 123456789.

An original version of the question seemed to discuss truncating a 9-digit SSN into a 7-digit not-quite-SSN. The revised mapping of 123456789 ⟶ 5555555 is not one that can be done algorithmically. The original question also discussed dealing with multiple mappings, though the example data only shows and describes one.
I'm assuming that the Employee table does not yet have the 7-digit SSN for John Smith, even though the sample data shows that.
There are multiple ways to address the requirements. This is one way to do it.
To address the generalized mapping, we can define an appropriate temporary table and populate it with the relevant data:
CREATE TEMP TABLE SSN_map
(
SSN_9 INTEGER NOT NULL UNIQUE,
SSN_7 INTEGER NOT NULL UNIQUE
);
INSERT INTO SSN_map(SSN_9, SSN_7) VALUES(123456789, 5555555);
-- …
This table might need to be a 'regular' table if it will take time to get its content correct. This would allow multiple sessions to get the mapping done correctly.
If there is an algorithmic mapping between the 9-digit and 7-digit SSNs, you could still create the SSN_map table applying the algorithm to the SSN_9 (SSN) value to create the mapping. You might also be able to apply the algorithm 'on the fly' and avoid the mapping table, but it makes the UPDATE statement harder to get right.
Assuming the database supports transactions (Informix databases can be logged, meaning 'with transactions', or unlogged, meaning 'without transactions'), then:
BEGIN WORK;
-- Create 7-digit SSN entry corresponding to 9-digit SSN
INSERT INTO Employee(SSN, Last_Name, First_Name)
SELECT m.SSN_7, e.Last_Name, e.First_Name -- , other employee columns
FROM Employee AS e
JOIN SSN_map AS m ON e.SSN = m.SSN_9;
Some (older) versions of Informix won't let you SELECT from the table you are modifying as shown. If that's a problem, then use:
SELECT m.SSN_7, e.Last_Name, e.First_Name -- , other employee columns
FROM Employee AS e
JOIN SSN_map AS m ON e.SSN = m.SSN_9
INTO TEMP mapped_emp WITH NO LOG;
INSERT INTO Employee SELECT * FROM mapped_emp;
DROP TABLE mapped_emp;
Continuing: the Employee table now contains two entries for each mapped employee, one with the old 9-digit SSN, one with the new 7-digit not-quite-SSN.
UPDATE Schedule
SET SSN = (SELECT SSN_7 FROM SSN_map WHERE SSN_9 = SSN)
WHERE EXISTS(SELECT * FROM SSN_map WHERE SSN_9 = SSN);
This updates the Schedule with the new not-quite-SSN value. The WHERE EXISTS clause is there to ensure that only rows with an entry in the SSN mapping table are changed. If it was not present, any unmatched row would have the SSN set to NULL, which won't be good.
DELETE FROM Employee
WHERE SSN IN (SELECT SSN_9 FROM SSN_map);
COMMIT WORK;
This deletes the old data with 9-digit SSNs from the Employee table. You could drop the SSN_map table at this point too.
Complete test script
-- Outside the test, the Employee and Schedule tables would exist
-- and be fully loaded with data before running this script
BEGIN WORK;
CREATE TABLE EMPLOYEE
(
LAST_NAME CHAR(15) NOT NULL,
FIRST_NAME CHAR(15) NOT NULL,
SSN INTEGER NOT NULL PRIMARY KEY
);
INSERT INTO Employee(Last_Name, First_Name, SSN) VALUES('SMITH', 'JOHN', 123456789);
INSERT INTO Employee(Last_name, First_Name, SSN) VALUES('DOE', 'JANE', 987654321);
CREATE TABLE SCHEDULE
(
SSN INTEGER NOT NULL REFERENCES Employee,
START DATETIME HOUR TO MINUTE NOT NULL,
END DATETIME HOUR TO MINUTE NOT NULL,
PRIMARY KEY(SSN, START)
);
INSERT INTO Schedule(SSN, Start, End) VALUES(123456789, '08:00', '16:00');
INSERT INTO Schedule(SSN, Start, End) VALUES(987654321, '08:00', '16:00');
SELECT * FROM Employee;
SELECT * FROM Schedule;
-- Start the work for mapping SSN to not-quite-SSN
CREATE TEMP TABLE SSN_map
(
SSN_9 INTEGER NOT NULL UNIQUE,
SSN_7 INTEGER NOT NULL UNIQUE
);
INSERT INTO SSN_map(SSN_9, SSN_7) VALUES(123456789, 5555555);
-- In the production environment, this is where you'd start the transaction
--BEGIN WORK;
-- Create 7-digit SSN entry corresponding to 9-digit SSN
INSERT INTO Employee(SSN, Last_Name, First_Name)
SELECT m.SSN_7, e.Last_Name, e.First_Name -- , other employee columns
FROM Employee AS e
JOIN SSN_map AS m ON e.SSN = m.SSN_9;
UPDATE Schedule
SET SSN = (SELECT SSN_7 FROM SSN_map WHERE SSN_9 = SSN)
WHERE EXISTS(SELECT * FROM SSN_map WHERE SSN_9 = SSN);
DELETE FROM Employee
WHERE SSN IN (SELECT SSN_9 FROM SSN_map);
SELECT * FROM Employee;
SELECT * FROM Schedule;
-- When satisfied, you'd use COMMIT WORK instead of ROLLBACK WORK
ROLLBACK WORK;
--COMMIT WORK;
Sample output
The first four lines are the 'before' data; the last four are the 'after' data.
SMITH JOHN 123456789
DOE JANE 987654321
123456789 08:00 16:00
987654321 08:00 16:00
DOE JANE 987654321
SMITH JOHN 5555555
5555555 08:00 16:00
987654321 08:00 16:00
As you can see, the material for John Smith was updated correctly, but the material for Jane Doe was unchanged. This is correct given that there was no mapping for Jane.
For people not used to Informix: yes, you really can include DDL statements like CREATE TABLE inside a transaction, and yes, the created table really is rolled back if you roll back the transaction. Not all DBMS are as generous as that.

Related

What can I use instead of "Set Identity Insert On"?

I have stored procedure in SQL Server. In this procedure, I delete duplicate records and insert one unique records to table. When I insert new unique record I am using below script.
SET IDENTITY_INSERT tbl_personnel_info ON
INSERT INTO tbl_personnel_info (pk_id, first_name, last_name, department, age, phone_number)
SELECT pk_id, first_name, last_name, department, age, phone_number
FROM #Unique
SET IDENTITY_INSERT tbl_personnel_info Off
Everthing is okey with this script but in the production SET IDENTITY_INSERT command needs to ALTER permission. Giving this permission should be dangerous so I can't give this permission. Also I must insert old pk_id instead of new. How can I do this without SET IDENTITY_INSERT command.
For example I have those records.
first_name
last_name
department
age
phone_number
John
Doe
IT
21
XXX
John
Doe
Finance
22
YYY
John
Doe
HR
23
ZZZ
And the record i want is
first_name
last_name
department
age
phone_number
John
Doe
IT
23
YYY
I also have my wanted record in the #Unique table. I want to delete 3 records and add record which is in the unique table.
I still believe that you have a bit of an xy problem here, and you would be better off preventing the duplicates at source rather than having a clean up procedure that needs to be regularly run by people other than the sa, but to actually answer your question one option would be not not delete the records you want to retain.
If you generate your #Unique table before you do the delete, then you can simply use something like:
SET XACT_ABORT ON;
BEGIN TRANSACTION;
UPDATE p WITH (UPDLOCK, SERIALIZABLE)
SET first_name = u.first_name,
last_name = u.last_name,
department = u.department,
age = u.age,
phone_number = u.phone_number
FROM tbl_personnel_info AS p
INNER JOIN #Unique AS u
ON u.pk_id = p.pk_id
WHERE NOT EXISTS
( SELECT u.first_name, u.last_name, u.department, u.age. u.phone_number
INTERSECT
SELECT p.first_name, p.last_name, p.department, p.age. p.phone_number
);
DELETE p
FROM tbl_personnel_info AS p
WHERE NOT EXISTS (SELECT 1 FROM #Unique AS u WHERE u.pk_id = p.pk_id);
COMMIT TRANSACTION;
This will update the records you want to retain and were originally planning to re-insert(but only if there is a value that needs to be updated), then only delete any records that don't exist in your temp table.
One big issue you may face here is foreign keys, you would presumably also need to tidy up any records related to the records you are deleting? This is another reason why you would be much better off preventing the duplicates at source and doing one single clear up (therefore stored procedure not required).
For a bit of an analogy, you have a hole in your boat and your current approach is to grab a bucket and keep scooping water over board, which you'll be doing forever and the hole will only get bigger. The hole is as small as it will ever be right now - so now is the best time to plug it.

Optimize SELECT query for working with large database

This is a part of my database:
ID EmployeeID Status EffectiveDate
1 110545 Active 2011-08-01
2 110700 Active 2012-01-05
3 110060 Active 2012-01-05
4 110222 Active 2012-06-30
5 110545 Resigned 2012-07-01
6 110545 Active 2013-02-12
I want to generate records which select Active employees:
ID EmployeeID Status EffectiveDate
2 110700 Active 2012-01-05
3 110060 Active 2012-01-05
4 110222 Active 2012-06-30
So, I tried this query:
SELECT *
FROM Employee AS E
WHERE E.Status='Active' AND
E.EffectiveDate between'2011-08-01' and '2012-07-02'AND NOT
EXISTS(SELECT * FROM Employee AS E2
WHERE E2.EmployeeID = E.EmployeeID AND E2.Status = 'Resigned'
AND E2.EffectiveDate between '2011-08-01' and '2012-07-02'
);
It only works with small amount of data, but got timeout error with large database.
Can you help me optimize this?
This is how I read your request: You want to show active employees. For this to happen, you look at their latest entry, which is either 'Active' or 'Resigned'.
You want to restrict this to a certain time range. That probably means you want to find all employees that became active without becoming immediately inactive again within that time frame.
So, get the latest date per employee first, then stay with those rows in case they are active.
select *
from employee
where (employeeid, effectivedate) in
(
select employeeid, max(effectivedate)
from employee
where effectivedate between date '2011-08-01' and date '2012-07-02'
group by employeeid
)
and status = 'active'
order by employeeid;
The subquery tries to find a time range and then look at each employee to find their latest date within. I'd offer the DBMS this index:
create index idx on employee (effectivedate, employeeid);
The main query wants to find that row again by using employeeid and effectivedate and would then look up the status. The above index could be used again. We could even add the status in order to ease the lookup:
create index idx on employee (effectivedate, employeeid, status);
The DBMS may use this index or not. That's up to the DBMS to decide. I find it likely that it will, for it can be used for all steps in the execution of the query and even contains all columns the query works with, so the table itself wouldn't even have to be read.
I have tried to achieve the above result set using Case Statements.
Hope this helps.
CREATE TABLE employee_test
(rec NUMBER,
employee_id NUMBER,
status VARCHAR2(100),
effectivedate DATE);
INSERT INTO employee_test VALUES(1,110545,'Active',TO_DATE('01-08-2011','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(2,110545,'Active',TO_DATE('05-01-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(3,110545,'Active',TO_DATE('05-01-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(4,110545,'Active',TO_DATE('30-06-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(5,110545,'Resigned',TO_DATE('01-07-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(6,110545,'Active',TO_DATE('12-02-2013','DD-MM-YYYY'));
COMMIT;
SELECT * FROM(
SELECT e.* ,
CASE WHEN (effectivedate BETWEEN TO_DATE('2011-08-01','YYYY-MM-DD') AND TO_DATE('2012-07-02','YYYY-MM-DD') AND status='Active')
THEN 'Y' ELSE 'N' END AS FLAG
FROM Employee_Test e)
WHERE Flag='Y'
;
I'm adding another answer with another interpretation of the request. Just in case :-)
The table shows statuses per employee. An employee can become active, then retired, then active again. But they can not become active and then active again, without becoming retired in between, of course.
We are looking at a time range and want to find all employees that became active but never retired within - no matter whether they became active again after retirement in that period.
This makes this easy. We are looking for employees, that have exactly one row in that time range and that row is active. One way to do this:
select employeeid, any_value(effectivedate), max(status)
from employee
where effectivedate between date '2011-08-01' and date '2012-07-02'
group by employeeid
having max(status) = 'Active'
order by employeeid;
As in my other answer, an appropriate index would be
create index idx on employee (effectivedate, employeeid, status);
as we want to look into the date range and look up the statuses per employee.

SQLite subqueries trying find if id does not exist in another column

I have been battling with this for a bit. This is a test question from a testing site but I have no one to email and try find the answer from.
CREATE TABLE employees (
id INTEGER NOT NULL PRIMARY KEY,
managerId INTEGER REFERENCES employees(id),
name VARCHAR(30) NOT NULL
);
INSERT INTO employees(id, managerId, name) VALUES(1, NULL, 'John');
INSERT INTO employees(id, managerId, name) VALUES(2, 1, 'Mike');
-- Expected output (in any order):
-- name
-- ----
-- Mike
-- Explanation:
-- In this example.
-- John is Mike's manager. Mike does not manage anyone.
-- Mike is the only employee who does not manage anyone.
Write a query that selects the names of employees who are not managers.
This is what I have come up with but it does not work.
SELECT name
FROM employees
WHERE id NOT IN(SELECT managerId FROM employees)
I'm just trying to understand how I can iterate through the managerId column and check whether the Id matches it or not?
This is because one of the selected manager IDs is null. Null ist the "unknown value". So NOT IN does not succeed, as it cannot guarantee that your value is not in the data set (as your value could be the unknown value). Well, so far for the argument.
So either:
SELECT name
FROM employees
WHERE id NOT IN (SELECT managerId FROM employees WHERE managerId IS NOT NULL);
or
SELECT name
FROM employees e
WHERE NOT EXISTS (SELECT * FROM employees m WHERE m.managerId = e.id);
This is really a nasty trap one must be aware of. Most often we look up values that cannot be null. Bad luck yours is a rare case where nulls exist in the lookup :-)

Field Aliasing in queries, nzsql

I'm working in Netezza -- or, you know, pure data for Analytics -- nzsql, but I think this is an ANSI SQL question. The question is so basic, I don't even know how to search for it.
CREATE TEMPORARY TABLE DEMO1 AS SELECT 'SMORK' AS SMORK, 'PLONK' AS PLONK, 'SPROING' AS SPROING;
SELECT SMORK AS PLONK, PLONK, SPROING AS CLUNK, CLUNK
FROM DEMO1;
This returns 'SMORK, PLONK, SPROING, SPROING', which is to say, the query is fine reusing the CLUNK alias, but the PLONK alias is overwritten by the column from the source table. Now, if I really wanted the column from the source table, I could write SELECT SMORK AS PLONK, DEMO1.PLONK et c, but I don't know how to specify that I would prefer the alias I've defined earlier in same the SELECT clause.
Does anybody know a way?
In Netezza, when selecting a column, Netezza will search for table column first, and then alias.
Example:
Suppose we have the following statements:
CREATE TEMPORARY TABLE EMPLOYEES AS
SELECT 1001 AS EMPLOYEE_ID
,'Alex' AS FIRST_NAME
,'Smith' AS LAST_NAME
,'Alex J. Smith' AS FULL_NAME;
SELECT
EMPLOYEE_ID
,FIRST_NAME
,LAST_NAME
,LAST_NAME||', '||FIRST_NAME AS FULL_NAME
,'My full name is :'||FULL_NAME AS DESCRIPTION
FROM EMPLOYEES;
It will return
EMPLOYEE_ID FIRST_NAME LAST_NAME FULL_NAME DESCRIPTION
1001 Alex Smith Smith, Alex My full name is :Alex J. Smith
Notice in DESCRIPTION, the FULL_NAME value is picked from table column, not from alias.
If you want DESCRIPTION column use value from alias FULL_NAME, you can do it in two steps:
Step 1. Create a sub-query includes all columns you want. For all alias names you want to reuse, you need to name them as names not exist in any table columns on your FROM clause;
Step 2. SELECT only column you want from the subquery.
CREATE TEMPORARY TABLE EMPLOYEES AS SELECT 1001 AS EMPLOYEE_ID, 'Alex' AS FIRST_NAME, 'Smith' AS LAST_NAME, 'Alex J. Smith' AS FULL_NAME;
WITH EMPLOYESS_TMP AS (
SELECT
EMPLOYEE_ID
,FIRST_NAME
,LAST_NAME
,LAST_NAME||', '||FIRST_NAME AS FULL_NAME2
,FULL_NAME2 AS FULL_NAME
,'My full name is :'||FULL_NAME2 AS DESCRIPTION
FROM EMPLOYEES)
SELECT
EMPLOYEE_ID
,FIRST_NAME
,LAST_NAME
,FULL_NAME
,DESCRIPTION
FROM EMPLOYESS_TMP;
This will return what you want:
EMPLOYEE_ID FIRST_NAME LAST_NAME FULL_NAME DESCRIPTION
1001 Alex Smith Smith, Alex My full name is :Smith, Alex
Just change the order of your columns. Netezza tries to use your alias so you can either rename the column or change the order.
SELECT SMORK AS PLONK, PLONK, CLUNK, SPROING AS CLUNK
FROM DEMO1;

Displaying the same fields AS different names from the same table -Access 2010

I have EmployeeName; it is from the Employee table. The Employee table holds ALL Employees in the organization and the Employee table references the primary key of the Position table, which holds the different position names. So this is how I'm differentiating between employees in the Employee table; each record in the Employee table has a PosNo which references the Position table(worker = Pos1, manager = Pos2, etc...) So for simplicity's sake, a record in the employee table would be similar to: EmployeeName, EmployeeAddress, DeptNo, PosNo
Here's the problem: Certain positions are under other positions. There are workers in the Employee table and there are managers in the Employee table. I'm wanting to make a table that lists all workers and their managers. For example, the table would have two fields: EmployeeName, ManagerName.
The Employee table is broken down into a generalization hierarchy. The Salary and Hourly tables branch out from Employee table. Then, from the Salary table, another table branches out called Manager(which I call ProgramSupervisor; it has a unique field). Workers are part of the Hourly table though. Managers(ProgramSupervisor) and Workers(Hourly) are related to each other through the ISL table. The Manager is the head of the ISL and therefore ISL has a ManagerNo as one of its fields. Workers(Hourly), however, work in the ISL and therefore have ISLNo as a field in their table, the Hourly table.
So, I'm trying to find a way to relate all of these table as make a table with two fields, workers and managers, in which the workers belong to managers through the ISL table. Would I use a nested query of some sort? I'll post my code so far, which is absolutely not correct (probably no even on the right track) and I'll post my ERD so you can get a better picture of how the tables relate.
SELECT EmpLastName + ', ' + EmpFirstName as ProgSupName,
EmpLastName + ', ' + EmpFirstName as EmpName
FROM Employee, Salary, ProgramSupervisor, ISL, Hourly
WHERE Employee.EmpNo = Salary.EmpNo
AND Salary.EmpNo = ProgramSupervisor.EmpNo
AND ProgramSupervisor.EmpNo = ISL.ProgramSupervisor_EmpNo
AND ISL.ISLNo = Hourly.ISLNo
AND Hourly.EmpNo = Employee.EmpNo
ERD
In its simplest form you can distill the Employee-Supervisor relationship down to three tables:
[Employee]
EmpNo EmpFirstName EmpLastName
----- ------------ -----------
1 Montgomery Burns
2 Homer Simpson
[Hourly]
EmpNo ISLNo
----- -----
2 1
[ISL]
ISLNo ProgramSupervisor_EmpNo ISLName
----- ----------------------- -------------------------
1 1 Springfield Nuclear Plant
If you put them together in a query that looks like this
it produces results like this:
Employee_LastName Employee_FirstName ISLName Supervisor_LastName Supervisor_FirstName
----------------- ------------------ ------------------------- ------------------- --------------------
Simpson Homer Springfield Nuclear Plant Burns Montgomery
"But wait a minute!" I hear you say, "There are four tables in that query. Where did the [Supervisor] table come from?"
That is just another instance of the [Employee] table that uses [Supervisor] as its alias. A table can appear in a query more than once provided that we use aliases to specify the instance to which we are referring when we talk about [EmpLastName], [EmpFirstName], etc..
The SQL for the above query shows the second instance Employee AS Supervisor on the second-last line:
SELECT
Employee.EmpLastName AS Employee_LastName,
Employee.EmpFirstName AS Employee_FirstName,
ISL.ISLName,
Supervisor.EmpLastName AS Supervisor_LastName,
Supervisor.EmpFirstName AS Supervisor_FirstName
FROM
(
Employee
INNER JOIN
(
Hourly
INNER JOIN
ISL
ON Hourly.ISLNo = ISL.ISLNo
)
ON Employee.EmpNo = Hourly.EmpNo
)
INNER JOIN
Employee AS Supervisor
ON ISL.ProgramSupervisor_EmpNo = Supervisor.EmpNo