Query target conditions - sql

I need to query a table accounting for multiple change events. The table (seen below) is partitioned by Date where a snapshot of is taken every day of employees. I would like to create a table that shows milestone changes.
Namely I want the final export to show:
First Date they appear (hire date)
Any record when the Type changes
Last Date they appear (termination date)
This ultimately shows the changes in Type along with the hire/termination date.
I'm wondering what a good way to build this is? I can see a query that takes the UNION of the 3 criteria listed above and then sorts by date then employee but am not sure if this is efficient.
Table
+-----------+------+----------+--------+
| Employee | Type | Date | Active |
+-----------+------+----------+--------+
| urdearboy | 1 | 1/1/2019 | 1 | '<---- Want
+-----------+------+----------+--------+
| urdearboy | 1 | 1/2/2019 | 1 |
+-----------+------+----------+--------+
| urdearboy | 4 | 1/3/2019 | 1 | '<---- Want
+-----------+------+----------+--------+
| urdearboy | 4 | 1/4/2019 | 1 |
+-----------+------+----------+--------+
| urdearboy | 4 | 1/5/2019 | 1 |
+-----------+------+----------+--------+
| urdearboy | 4 | 1/6/2019 | 1 |
+-----------+------+----------+--------+
| urdearboy | 4 | 1/7/2019 | 0 | '<---- Want
+-----------+------+----------+--------+
In the above it can be deduced I was:
Hired 1/1/19
Changed Type 1/3/19
Terminated 1/7/19

One method is to use lag():
select t.*
from (select t.*,
lag(date) over (partition by employee, type, active order by prev_date) as prev_date_eta,
lag(date) over (partition by employee order by date) as prev_date
from t
) t
where prev_date_eta is null or
prev_date_eta <> prev_date;
This approach compares the previous date with the same attributes to the overall previous date for the employee. When these are the same, nothing has changed, so the row is filtered out.
The use of partition by is a big convenience when you want to compare multiple columns. The alternative is basically to compare each column individually.

Related

How to find two consecutive rows sorted by date, containing a specific value?

I have a table with the following structure and data in it:
| ID | Date | Result |
|---- |------------ |-------- |
| 1 | 30/04/2020 | + |
| 1 | 01/05/2020 | - |
| 1 | 05/05/2020 | - |
| 2 | 03/05/2020 | - |
| 2 | 04/05/2020 | + |
| 2 | 05/05/2020 | - |
| 2 | 06/05/2020 | - |
| 3 | 01/05/2020 | - |
| 3 | 02/05/2020 | - |
| 3 | 03/05/2020 | - |
| 3 | 04/05/2020 | - |
I'm trying to write an SQL query (I'm using SQL Server) which returns the date of the first two consecutive negative results for a given ID.
For example, for ID no. 1, the first two consecutive negative results are on 01/05 and 05/05.
The first two consecutive results for ID No. 2 are on 05/05 and 06/05.
The first two consecutive negative results for ID No. 3 are on on 01/05 and 02/05 .
So the query should produce the following result:
| ID | FirstNegativeDate |
|---- |------------------- |
| 1 | 01/05 |
| 2 | 05/05 |
| 3 | 01/05 |
Please note that the dates aren't necessarily one day apart. Sometimes, two consecutive negative tests may be several days apart. But they should still be considered as "consecutive negative tests". In other words, two negative tests are not 'consecutive' only if there is a positive test result in between them.
How can this be done in SQL? I've done some reading and it looks like maybe the PARTITION BY statement is required but I'm not sure how it works.
This is a gaps-and-island problem, where you want the start of the first island of '-'s that contains at least two rows.
I would recommend lead() and aggregation:
select id, min(date) first_negative_date
from (
select t.*, lead(result) over(partition by id order by date) lead_result
from mytable t
) t
where result = '-' and lead_result = '-'
group by id
Use LEAD or LAG functions over ID partition ordered by your Date column.
Then simple check where LEAD/LAG column is equal to Result.
You'll need also to filter the top ones.
The image attached just shows what LEAD/LAG would return

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

Comparing two tables that are the same and listing out the max date

I was wondering if it's possible to compare dates within the same table with same ID, but the catch is that there is an additional column that display the status. For instance, here's a table A:
The results I would like to see is this:
I know I could use a group by and max aggregate with ID to find the max date; however, I would like the status (Running/Stopped) column associated to be there. It would help me a lot.
In most databases, the fastest method (assuming the right indexes) is a correlated subquery:
select t.*
from t
where t.date = (select max(t2.date) from t t2 where t2.id = t.id);
Even if not the fastest, this should work in any database.
In case of Oracle, you can use the KEEP clause like this:
SELECT t.id,
MAX(t.status) KEEP (DENSE_RANK LAST ORDER BY t."DATE") AS corresponding_status,
MAX(t."DATE") AS last_date
FROM tab t
GROUP BY t.id
ORDER BY 1
For this sample data:
+----+---------+------------+
| ID | STATUS | DATE |
+----+---------+------------+
| 1 | Running | 2018-02-03 |
| 1 | Stopped | 2018-04-04 |
| 2 | Running | 2018-03-24 |
| 2 | Stopped | 2018-01-02 |
| 3 | Running | 2018-06-12 |
| 3 | Stopped | 2018-06-12 |
+----+---------+------------+
This would return this result:
+----+----------------------+------------+
| ID | CORRESPONDING_STATUS | LAST_DATE |
+----+----------------------+------------+
| 1 | Stopped | 2018-04-04 |
| 2 | Running | 2018-03-24 |
| 3 | Stopped | 2018-06-12 |
+----+----------------------+------------+
As can be seen in this SQL Fiddle.
For the cases, when you have multiple entries on the same ID and DATE combination, it'll choose one STATUS value - in this case the last one (based on alphanumerical sorting), as I've used MAX on the STATUS.
The part LAST ORDER BY t."DATE" corresponds to how we choose DATE value in the group, i.e. by choosing the last DATE in the group.
See this Oracle Docs entry on more details.

Impala SQL Stockpiling Algorithm

I have prescription drug data that has a prescription date and the number of days supplied for that prescription. I am trying estimate actually drug intake dates which can be different then prescription date if people (1) refill their prescription before their current prescription is done or (2) they lost their current prescription and so need a refill.
Below is sample data for 1 patient:
| patient_id | rx_start_date | days_supply |
|------------|---------------|-------------|
| 1 | 1/10/2013 | 3 |
| 1 | 1/11/2013 | 3 |
| 1 | 1/14/2013 | 3 |
Without adjusting for stockpiling the end dates are calculated as rx_start_date + days_supply - 1 see:
| patient_id | rx_start_date | days_supply | rx_end_date |
|------------|---------------|-------------|-------------|
| 1 | 1/10/2013 | 3 | 1/12/2013 |
| 1 | 1/11/2013 | 3 | 1/13/2013 |
| 1 | 1/14/2013 | 3 | 1/16/2013 |
As you can see the start date for the 2nd prescription is overlapped by the first prescription. If we assume that they filled their prescription early then the actual intake date for the 2nd prescription should start on 1/13/2013. But moving the end date of the 2nd prescription causes an overlap over the 3rd prescription and so that must be moved as well. See the expected resulting table below:
| patient_id | rx_start_date | days_supply | rx_end_date |
|------------|---------------|-------------|-------------|
| 1 | 1/10/2013 | 3 | 1/12/2013 |
| 1 | 1/13/2013 | 3 | 1/15/2013 |
| 1 | 1/16/2013 | 3 | 1/18/2013 |
The other case is we might say if the current prescription overlaps the next one by more than 50% than we assume they lost their prescription and the 2nd prescription start date is the actual intake date. This means though that we need to truncate the current prescription to end when the 2nd one starts.
The algorithm is relatively simple using a non-sql iterative solution but I'm having trouble with a generic sql solution since adjusting dates at time X could potentially cause a cascading effect that adjust many other dates. I'm using Impala SQL so recursive CTE's are not an option and I'd like this to work on other databases so database specific functions are not ideal either.
The following should give you what you are looking for, so long as there are no gaps in the treatment regime:
with aggs as (select d1.patient_id, d1.rx_start_dt, sum(ds.days_supply) days_supply, min(ds.rx_start_dt) + sum(ds.days_supply) - 1 end_dt
from drugs d1
inner join drugs ds
on ds.patient_id = d1.patient_id and ds.rx_start_dt <= d1.rx_start_dt
group by d1.patient_id, d1.rx_start_dt)
select patient_id, coalesce(lag(end_dt+1) over (partition by patient_id order by rx_start_dt),rx_start_dt) start_dt, end_dt
from aggs;
Using the given sample data, this gives as output:
ID Start End
1 2013-01-10 2013-01-12
1 2013-01-13 2013-01-15
1 2013-01-16 2013-01-18
This was tested on Oracle, but all functions used appear to also be available in impala so should work there too.

SQL deleting rows with duplicate dates conditional upon values in two columns

I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.
I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-06-02 | 211 | B |
| 1 | 1997-08-19 | 67 | A |
| 1 | 1997-08-19 | 23 | A |
So my desired output would look like this;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-08-19 | 23 | A |
I'm struggling to come up with a solution to this, so any help greatly appreciated!
Microsoft SQL Server 2012 (X64)
The following works with your test data
SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date
You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.
SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5
You can use min() function: (See the details here)
select ID, DATE, min(CODE), max(PROG)
from table
group by DATE
I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.