update last record of each cluster

update last record of each cluster - sql

I have a table and I already create the lead values for the next date in each product cluster. In addition to that I created a delta value which displays the difference between date and lead_date.
+---------+------------+------------+------------+
| Product | Date | LeadDate | delta_days |
+---------+------------+------------+------------+
| A | 2018-01-15 | 2018-01-23 | 8 |
| A | 2018-01-23 | 2018-02-19 | 27 |
| A | 2018-02-19 | 2017-05-25 | -270 |
| B | 2017-05-25 | 2017-05-30 | 5 |
| B | 2017-05-30 | 2016-01-01 | -515 |
| C | 2016-01-01 | 2016-01-02 | 1 |
| C | 2016-01-02 | 2016-01-03 | 1 |
| C | 2016-01-03 | NULL | NULL |
+---------+------------+------------+------------+
What I want to do is that I need to update the last record of each product cluster and set Lead_date and delta_days to null. How should I do this?
This is my goal:
+---------+------------+------------+------------+
| Product | Date | LeadDate | delta_days |
+---------+------------+------------+------------+
| A | 2018-01-15 | 2018-01-23 | 8 |
| A | 2018-01-23 | 2018-02-19 | 27 |
| A | 2018-02-19 | NULL | NULL |
| B | 2017-05-25 | 2017-05-30 | 5 |
| B | 2017-05-30 | NULL | NULL |
| C | 2016-01-01 | 2016-01-02 | 1 |
| C | 2016-01-02 | 2016-01-03 | 1 |
| C | 2016-01-03 | NULL | NULL |
+---------+------------+------------+------------+

Lag/Lead have a default value if it can't find the next/previous value:
LAG (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )
Just specify that you want the [default] to be NULL in your code to produce your lead column
In your code (guess since we don't have it):
SELECT date,
LEAD([date], 1, NULL) OVER(PARTITION BY Product ORDER BY [date]) as your_new_col
IMO, this is better than running an actual update since this will be dynamic in case you insert a new record that will change the existing order of your records.

You can use updatable cte with last_value() function :
with updatable as (
select *, last_value(date) over (partition by product order by date) as last_val
from table
)
update updatable
set LeadDate = null, delta_days = null
where Date = last_val;

Related

Query to reorganize dates

I need to do a transformation of a Postgres database table and I don't know where to start.
This is the table:
| Customer Code | Activity | Start Date |
|:---------------:|:--------:|:----------:|
| 100 | A | 01/05/2017 |
| 100 | A | 19/07/2017 |
| 100 | B | 18/09/2017 |
| 100 | C | 07/12/2017 |
| 101 | A | 11/02/2018 |
| 101 | B | 02/04/2018 |
| 101 | B | 14/06/2018 |
| 100 | A | 13/07/2018 |
| 100 | B | 14/08/2018 |
Customers can perform activities A, B and C, always in that order.
To carry out activity B he/she has to carry out activity A. To carry out C, he/she has to carry out activity A, then to B.
An activity or cycle can be performed more than once by the same customer.
I need to reorganize the table in this way, placing the beginning and end of each step:
| Customer Code | Activity | Start Date | End Date |
|:---------------:|:--------:|:----------:|:----------:|
| 100 | A | 01/05/2017 | 18/09/2017 |
| 100 | B | 18/09/2017 | 07/12/2017 |
| 100 | C | 07/12/2017 | 13/07/2018 |
| 101 | A | 11/02/2018 | 02/04/2018 |
| 101 | B | 02/04/2018 | |
| 100 | A | 13/07/2018 | 14/08/2018 |
| 100 | B | 14/08/2018 | |

Here is approach at this gaps-and-islands problem:
select
customer_code,
activity,
start_date,
case when (activity, lead(activity) over(partition by customer_code order by start_date))
in (('A', 'B'), ('B', 'C'), ('C', 'A'))
then lead(start_date) over(partition by customer_code order by start_date)
end end_date
from (
select
t.*,
lead(activity) over(partition by customer_code order by start_date) lead_activity
from mytable t
) t
where activity is distinct from lead_activity
The query starts by removing consecutive rows that have the same customer_code and activity. Then, we use conditional logic to bring in the start_date of the next row when the activty is in sequence.
Demo on DB Fiddle:
customer_code | activity | start_date | end_date
------------: | :------- | :--------- | :---------
100 | A | 2017-07-19 | 2017-09-18
100 | B | 2017-09-18 | 2017-12-07
100 | C | 2017-12-07 | 2018-07-13
100 | A | 2018-07-13 | 2018-08-14
100 | B | 2018-08-14 | null
101 | A | 2018-02-11 | 2018-06-14
101 | B | 2018-06-14 | null

Finding MAX date aggregated by order - Oracle SQL

I have a data orders that looks like this:
| Order | Step | Step Complete Date |
|:-----:|:----:|:------------------:|
| A | 1 | 11/1/2019 |
| | 2 | 11/1/2019 |
| | 3 | 11/1/2019 |
| | 4 | 11/3/2019 |
| | 5 | 11/3/2019 |
| | 6 | 11/5/2019 |
| | 7 | 11/5/2019 |
| B | 1 | 12/1/2019 |
| | 2 | 12/2/2019 |
| | 3 | |
| C | 1 | 10/21/2019 |
| | 2 | 10/23/2019 |
| | 3 | 10/25/2019 |
| | 4 | 10/25/2019 |
| | 5 | 10/25/2019 |
| | 6 | |
| | 7 | 10/27/2019 |
| | 8 | 10/28/2019 |
| | 9 | 10/29/2019 |
| | 10 | 10/30/2019 |
| D | 1 | 10/30/2019 |
| | 2 | 11/1/2019 |
| | 3 | 11/1/2019 |
| | 4 | 11/2/2019 |
| | 5 | 11/2/2019 |
What I need to accomplish is the following:
For each order, assign the 'Order_Completion_Date' field as the most recent 'Step_Complete_Date'. If ANY 'Step_Complete_Date' is NULL, then the value for 'Order_Completion_Date' should be NULL.
I set up a SQL FIDDLE with this data and my attempt, below:
SELECT
OrderNum,
MAX(Step_Complete_Date)
FROM
OrderNums
WHERE
Step_Complete_Date IS NOT NULL
GROUP BY
OrderNum
This is yielding:
ORDERNUM MAX(STEP_COMPLETE_DATE)
D 11/2/2019
A 11/5/2019
B 12/2/2019
C 10/30/2019
How can I achieve:
| OrderNum | Order_Completed_Date |
|:--------:|:--------------------:|
| A | 11/5/2019 |
| B | NULL |
| C | NULL |
| D | 11/2/2019 |

Aggregate function with KEEP can handle this
select ordernum,
max(step_complete_date)
keep (DENSE_RANK FIRST ORDER BY step_complete_date desc nulls first) res
FROM
OrderNums
GROUP BY
OrderNum

You can use a CASE expression to first count if there are any NULL values and if not then find the maximum value:
Query 1:
SELECT OrderNum,
CASE
WHEN COUNT( CASE WHEN Step_Complete_Date IS NULL THEN 1 END ) > 0
THEN NULL
ELSE MAX(Step_Complete_Date)
END AS Order_Completion_Date
FROM OrderNums
GROUP BY OrderNum
Results:
| ORDERNUM | ORDER_COMPLETION_DATE |
|----------|-----------------------|
| D | 11/2/2019 |
| A | 11/5/2019 |
| B | (null) |
| C | (null) |

First, you are representing dates as varchars in mm/dd/yyyy format (at least in fiddle). With max function it can produce incorrect result, try for example order with dates '11/10/2019' and '11/2/2019'.
Second, the most simple solution is IMHO to use fallback date for nulls and get null back when fallback date wins:
SELECT
OrderNum,
NULLIF(MAX(NVL(Step_Complete_Date,'~')),'~')
FROM
OrderNums
GROUP BY
OrderNum
(Example is still for varchars since tilde is greater than any digit. For dates, you could use 9999-12-31, for instance.)

Return Max Value Date for each group in Netezza SQL

+--------+---------+----------+------------+------------+
| CASEID | USER ID | TYPE | OPEN_DT | CLOSED_DT |
+--------+---------+----------+------------+------------+
| 1 | 1000 | MA | 2017-01-01 | 2017-01-07 |
| 2 | 1000 | MB | 2017-07-15 | 2017-07-22 |
| 3 | 1000 | MA | 2018-02-20 | NULL |
| 8 | 1001 | MB | 2017-05-18 | 2018-02-18 |
| 9 | 1001 | MA | 2018-03-05 | 2018-04-01 |
| 7 | 1002 | MA | 2018-06-01 | 2018-07-01 |
+--------+---------+----------+------------+------------+
This is a snippet of my data set. I need a query that returns just the max(OPEN_DT) row for each USER_ID in Netezza SQL.
so given the above the results would be:
| CASEID | USERID | TYPE | OPEN_DT | CLOSED_DT |
| 3 | 1000 | MA | 2018-02-20 | NULL |
| 9 | 1001 | MA | 2018-03-05 | 2018-04-01 |
| 7 | 1002 | MA | 2018-06-01 | 2018-07-01 |
Any help is very much appreciated!

You can use correlated subquery :
select t.*
from table t
where open_dt = (select max(t1.open_dt) from table t1 where t1.user_id = t.user_id);
You can also row_number() :
select t.*
from (select *, row_number() over (partition by user_id order by open_dt desc) as seq
from table t
) t
where seq = 1;
However if you have a ties with open_dt then you would need to use limit clause with correlated subquery but i am not sure about the ties so i just leave it.

How to calculate date difference for all IDs in Microsoft SQL Server

How can I do to check if the difference between date 1 and date 2 for every ID is more than 6 months? Let me illustrate with an example.
So I have a table like this one:
+----+---------+
| ID | Y-M |
+----+---------+
| 1 | 2017-01 |
| 1 | 2017-02 |
| 1 | 2017-10 |
| 2 | 2017-04 |
| 2 | 2017-06 |
| 3 | 2017-06 |
| 4 | 2017-07 |
+----+---------+
And I want to add a third column that says yes if the difference between the first one and the second one is more than 6months. I want to say yes only on the first one. In case there is no date to compare with, then it would be also a yes. Anyway, this would be the final result:
+----+---------+------------+
| ID | Y-M | Difference |
+----+---------+------------+
| 1 | 2017-01 | No |
| 1 | 2017-02 | Yes |
| 1 | 2017-10 | Yes |
| 2 | 2017-04 | Yes |
| 2 | 2017-11 | No |
| 2 | 2017-12 | Yes |
| 3 | 2017-06 | Yes |
| 4 | 2017-07 | Yes |
+----+---------+------------+
Thank you!

You can use lead() and some date arithmetic:
select t.*,
(case when lead(ym) over (partition by id order by ym) < dateadd(month, 6, ym)
then 'No' else 'Yes'
end) as difference
from t;

SQL - Count of instances between two dates for mutliple criteria

I have a table of account numbers with a date range. I must use another table of data to determine how many times we interacted with that account between the date range. I'm at a loss where to even start.
Table1
+------+------------+------------+
| Acct | EndDate | StartDate |
+------+------------+------------+
| 1 | 2017-02-14 | 2016-12-16 |
| 2 | 2017-02-14 | 2016-12-16 |
| 3 | 2017-02-13 | 2016-12-15 |
+------+------------+------------+
Table2
+------+--------------+
| acct | calllog_date |
+------+--------------+
| 1 | 2016-06-16 |
| 1 | 2016-08-15 |
| 1 | 2015-11-10 |
| 2 | 2015-11-10 |
| 2 | 2015-11-13 |
| 2 | 2015-11-16 |
| 2 | 2015-11-19 |
| 3 | 2015-11-19 |
| 3 | 2015-11-23 |
| 4 | 2015-11-30 |
+------+--------------+

Try using a JOIN based on a match between the date values:
SELECT t1.Acct, COUNT(*) AS cnt
FROM Table1 AS t1
JOIN Table2 AS t2
ON t1.Acct = t2.Acct AND t2.calllog_date BETWEEN t1.StartDate AND t1.EndDate
GROUP BY t1.Acct

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

update last record of each cluster - sql

You can use updatable cte with last_value() function : with updatable as ( select *, last_value(date) over (partition by product order by date) as last_val from table ) update updatable set LeadDate = null, delta_days = null where Date = last_val;

Related

Query to reorganize dates

Finding MAX date aggregated by order - Oracle SQL

Return Max Value Date for each group in Netezza SQL

How to calculate date difference for all IDs in Microsoft SQL Server

SQL - Count of instances between two dates for mutliple criteria

Categories

Resources