SQL select value if no corresponding value exists in another table - sql

I have a database which tries to acheive point-in-time information by having a master table and a history table which records when fields in the other table will/did change. e.g.
Table: Employee
Id | Name | Department
-----------------------------
0 | Alice | 1
1 | Bob | 1
Table: History
ChangeDate | Field | RowId | NewValue
---------------------------------------------
05/05/2009 | Department | 0 | 2
That records that employee 0 (Alice) will move to department 2 on 05/05/2009.
I want to write a query to determine the employee's department on a particular date. So it needs to:
Find the first history record for that field and employee before given date
If none exists then default to the value currently in the master employee table.
How can I do this? My intuition is to select the first row of a result set which has all suitable history records reverse ordered by date and with the value in the master table last (so it's only the first result if there are no suitable history records), but I don't have the required SQL-fu to achieve this.
Note: I am conscious that this may not be the best way to implement this system - I am not able to change this in the short term - though if you can suggest a better way to implement this I'd be glad to hear it.

SELECT COALESCE (
(
SELECT newValue
FROM history
WHERE field = 'Department'
AND rowID = ID
AND changeDate =
(
SELECT MAX(changedate)
FROM history
WHERE field = 'Department'
AND rowID = ID
AND changeDate <= '01/01/2009'
)
), department)
FROM employee
WHERE id = #id
In both Oracle and MS SQL, you can also use this:
SELECT COALESCE(newValue, department)
FROM (
SELECT e.*, h.*,
ROW_NUMBER() OVER (PARTITION BY e.id ORDER BY changeDate) AS rn
FROM employee e
LEFT OUTER JOIN
history h
ON field = 'Department'
AND rowID = ID
AND changeDate <= '01/01/2009'
WHERE e.id = #id
)
WHERE rn = 1
Note, though, that ROWID is reserved word in Oracle, so you'll need to rename this column when porting.

This should work:
select iif(history.newvalue is null, employee.department, history.newvalue)
as Department
from employee left outer join history on history.RowId = employee.Id
and history.changedate < '2008-05-20' // (i.e. given date)
and history.changedate = (select max(changedate) from history h1
where h1.RowId = history.RowId and h1.changedate <= history.changedate)

Related

Select distinct value and bring only the latest one

I have a table that stores different statuses of each transaction. Each transaction can have multiple statuses (pending, rejected, aproved, etc).
I need to build a query that brings only the last status of each transaction.
The definition for the table that stores the statuses is:
[dbo].[Cuotas_Estado]
ID int (PK)
IdCuota int (references table dbo.Cuotas - FK)
IdEstado int (references table dbo.Estados - FK)
Here's the architecture for the 3 tables:
When running a simple SELECT statement on table dbo.Cuotas_Estado you'll get:
SELECT
*
FROM [dbo].[Cuotas_Estado] [E]
But the result I need is:
IdCuota | IdEstado
2 | 1
3 | 2
9 | 3
10 | 3
11 | 4
I'm running the following select statement:
SELECT
DISTINCT([E].[IdEstado]),
[E].[IdCuota]
FROM [dbo].[Cuotas_Estado] [E]
ORDER BY
[E].[IdCuota] ASC;
This will bring this result:
So, as you can see, it's bringing a double value to entry 9 and entry 11, I need the query to bring only the latest IdEstado column (3 in the entry 9 and 4 in the entry 11).
can you try this?
with cte as (
select IdEstado,IdCuota,
row_number() over(partition by IdCuota order by fecha desc) as RowNum
from [dbo].[Cuotas_Estado]
)
select IdEstado,IdCuota
from cte
where RowNum = 1
You can use a correlated subquery:
SELECT e.*
FROM [dbo].[Cuotas_Estado] e
WHERE e.IdEstado = (SELECT MAX(e2.IdEstado)
FROM [dbo].[Cuotas_Estado] e2
WHERE e2.IdCuota = e.IdCuota
);
With an index on Cuotas_Estado(IdCuota, IdEstado) this is probably the most efficient method.

Update Child table Whose Parent Table has duplicates

I have duplicate data in Contact Table.
Using Rank() function i am identifying duplicates.
I also have to update the id of contact in child table activity.
I want to update contact id in activity table by contact id of rank 1 where rank 2 or above are there.
Using this query to find duplicate contact
(SELECT
ExternalContactID,
RANK() OVER (PARTITION BY ExternalAccountId, Name, Email, MailingCity,
MailingCountry, MailingState, MailingStreet, Phone ORDER BY
ExternalContactID) AS rank
FROM
contact)
Table: Contact
ExternalContactID | Rank
101 | 1
102 | 2
Child table: Activity
ActivityID | ContactID
1 | 101
2 | 102
Before deleting contact(s) which have rank > 1, I need to update the child table "Activity" with rank 1 contact id.
Result:
Contact with id = 102 is deleted and Activity Record with id = 2 now has contact id = 101
You've used RANK to identify groups of duplicates and the one in each group you wish to keep. This is your mechanism for doing a self-join to relate every row to the "keeper" (the member of the duplicate group you wish to keep).
WITH cte AS (
{Query with Rank column}
)
SELECT ...
FROM cte a
INNER JOIN cte b
ON {all the partitioning columns being equal}
AND a.Rank=1
AND b.Rank<>1
With the psuedocode above, you've got all the rows in the table joined to the row that is going to be the "keeper" after you delete the duplicates. JOIN to this structure in your UPDATE to set the FK to the PK of the "Keeper" that is related to it.

Delete rows where date was least updated

How can I delete rows where dateupdated was least updated ?
My table is
Name Dateupdated ID status
john 1/02/17 JHN1 A
john 1/03/17 JHN2 A
sally 1/02/17 SLLY1 A
sally 1/03/17 SLLY2 A
Mike 1/03/17 MK1 A
Mike 1/04/17 MK2 A
I want to be left with the following after the data removal:
Name Date ID status
john 1/03/17 JHN2 A
sally 1/03/17 SLLY2 A
Mike 1/04/17 MK2 A
If you really want to "delete rows where dateupdated was least updated" then a simple single-row subquery should do the trick.
DELETE MyTable
WHERE Date = (SELECT MIN(Date) From MyTable)
If on the other hand you just want to delete the row with the earliest Date per person (as identified by their ID) you could use:
DELETE MyTable
FROM MyTable a
JOIN (SELECT ID, MIN(Date) MinDate FROM MyTable GROUP BY ID) b
ON a.ID = b.ID AND a.Date = b.MinDate
The idea here is you create an aggregate query that returns rows containing the columns that would match the rows you want deleted, then join to it. Because it's an inner join, rows that do not match the criteria will be excluded.
If people are uniquely identified by something else (e.g. Name then you can just substitute that for the ID in my example above.
I am thinking though that you don't want either of these. I think you want to delete everything except for each person's latest row. If that is the case, try this:
DELETE MyTable
WHERE EXISTS (SELECT 0 FROM MyTable b WHERE b.ID = MyTable.ID AND b.Date > MyTable.Date)
The idea here is you check for existence of another data row with the same ID and a later date. If there is a later record, delete this one.
The nice thing about the last example is you can run it over and over and every person will still be left with exactly one row. The other two queries, if run over and over, will nibble away at the table until it is empty.
P.S. As these are significantly different solutions, I suggest you spend some effort learning how to articulate unambiguous requirements. This is an extremely important skill for any developer.
This deletes rows where the name is a duplicate, and deletes all but the latest row for each name. This is different from your stated question.
Using a common table expression (cte) and row_number():
;with cte as (
select *
, rn = row_number() over (
partition by Name
order by Dateupdated desc
)
from t
)
/* ------------------------------------------------
-- Remove duplicates by deleting rows
-- where the row number (rn) is greater than 1
-- leaving the first row for each partition
------------------------------------------------ */
delete
from cte
where cte.rn > 1
select * from t
rextester: http://rextester.com/HZBQ50469
returns:
+-------+-------------+-------+--------+
| Name | Dateupdated | ID | status |
+-------+-------------+-------+--------+
| john | 2017-01-03 | JHN2 | A |
| sally | 2017-01-03 | SLLY2 | A |
| Mike | 2017-01-04 | MK2 | A |
+-------+-------------+-------+--------+
Without using the cte it can be written as:
delete d
from (
select *
, rn = row_number() over (
partition by Name
order by Dateupdated desc
)
from t
) as d
where d.rn > 1
This should do the trick:
delete
from MyTable a
where not exists (
select top 1 1
from MyTable b
where b.name = a.name
and b.DateUpdated < a.DateUpdated
)
i.e. remove any entries from the table for which there is no record on the same name with a date earlier than the record to be deleted's.
Your Name column has Mike and Mik2 which is different for each other.
So, if you did not make a mistake, standard column to group by must be ID column without last digit.
I think following is more accurate if you did not mistaken.
delete a
from MyTable a
inner join
(select substring(ID, 1, len(ID) - 1) as ID, min(Dateupdated) as MinDate
from MyTable
group by substring(ID, 1, len(ID) - 1)
) b
on substring(a.ID, 1, len(a.ID) - 1) = b.ID and a.Dateupdated = b.MinDate
You can test it at SQLFiddle: http://sqlfiddle.com/#!6/9c440/1

How to select the first row for every sub group using a custom order?

Having a Table Person
and a Table PersonRecord
I need to select only one record for each person, the record with the max status.
The status are ordered by C > B > A, a person can have multiple records with different or the same status, I need always select the greater status or the first (if the person have records with the same status).
I make the following query to get the rows ordered
select ep.personid, ep.persondesc, records.veryimportantcode, records.status
from extperson ep
left join
(
select rownum as rn, v.* from
(
select pr.personid, pr.veryimportantcode, pr.status
from personrecord pr
group by pr.personid, pr.veryimportantcode, pr.status
order by pr.personid,
decode(pr.status,
'C', 1,'B', 2,'A', 3,
4)
) v
) records
on ep.personid = records.personid
it give me:
I need
|PERSONID |PERSONDESC|VERYIMPORTANTCODE |STATUS |
|00325465 |Bjork |(null) |(null) |
|00527513 |Paul |ZP-2143540 |A |
|00542369 |Hazard |ZH-7531594 |C |
|0324567 |Jhon |ZJ-2346570 |B |
I try to achieve this using an aditional materialized subquery where I count the number of repetitions and make a left join with a where (subquerymat.nrorepeat > 1 and rownum = 1) or (subquerymat.nrorepeat = 1 or subquerymat.nrorepeat is null) but does not work.
There is one very important rule for this query, I would append this query in the right side of an union inside a view then I can't use stored procedures.
Try:
select personid, persondesc, veryimportantcode, status
from (select pe.personid,
pe.persondesc,
pr.veryimportantcode,
pr.status,
row_number() over(partition by pe.personid order by pr.status desc,
pr.autoid) as rn
from person pe
left join personrecord pr
on pe.personid = pr.personid)
where rn = 1
Fiddle test: http://sqlfiddle.com/#!4/25074/2/0

sql select tuples and group by id

I have the current database schema
EMPLOYEES
ID | NAME | JOB
JOBS
ID | JOBNAME | PRICE
I want to query so that it goes through each employee, and gets all their jobs, but I want each employee ID to be grouped so that it returns the employee ID followed by all the jobs they have. e.g if employee with ID 1 had jobs with ID, JOBNAME (1, Roofing), (1,Brick laying)
I want it to return something like
1 Roofing Bricklaying
I was trying
SELECT ID,JOBNAME FROM JOBS WHERE ID IN (SELECT ID FROM EMPLOYEES) GROUP BY ID;
but get the error
not a GROUP BY expression
Hope this is clear enough, if not please say and I'll try to explain better
EDIT:
WITH ALL_JOBS AS
(
SELECT ID,LISTAGG(JOBNAME || ' ') WITHIN GROUP (ORDER BY ID) JOBNAMES FROM JOBS GROUP BY ID
)
SELECT ID,JOBNAMES FROM ALL_JOBS A,EMPLOYEES B
WHERE A.ID = B.ID
GROUP BY ID,JOBNAMES;
In the with clause, I am grouping by on ID and concatenating the columns corresponding to an ID(also concatenating with ' ' to distinguish the columns).
For example, if we have
ID NAME
1 Roofing
1 Brick laying
2 Michael
2 Schumacher
we will get the result set as
ID NAME
1 Roofing Brick laying
2 Michael Schumacher
Then, I am join this result set with the EMPLOYEES table on ID.
You need to put JobName to group by expression too.
SELECT ID,JOBNAME FROM JOBS WHERE ID IN (SELECT ID FROM EMPLOYEES) GROUP BY ID,JOBNAME;