Last Changed Date - sql

ID DATE AMT
A 20180401 110
A 20180301 110
A 20180201 100
A 20171010 90
B 20181001 90
B 20180901 90
B 20180707 80
My Output should be
ID DATE AMT Result
A 20180401 110 20180201
A 20180301 110 20180201
A 20180201 100 20171010
A 20171010 90 null
B 20181001 90 20180707
B 20180901 90 20180707
B 20180707 80 null
So i need to get the result column date of Last value different from current value with in same ID
so if we take the first record in this case current AMT value is 110 and next record also has 110 and the next record is 100 which is different from current value so I need to get that date -
I have used
LAST_VALUE ( DATE) OVER ( PARTITION BY ID, AMT ORDER BY ID ) AS LASTVALUE -I'm getting the date for the records with same Amount
This is after the
LAST_VALUE ( DATE) OVER ( PARTITION BY ID, AMT ORDER BY ID ) AS LASTVALUE2
ID;DAT;AMT;LASTVALUE2 -After Last Value
A;Mar 1, 2018;130;Mar 1, 2018
A;Feb 1, 2018;110;Jan 1, 2018
A;Jan 1, 2018;110;Jan 1, 2018
A;Nov 1, 2017;140;Nov 1, 2017
B;Jun 1, 2018;110;Apr 1, 2018
B;May 1, 2018;110;Apr 1, 2018
B;Apr 1, 2018;110;Apr 1, 2018
B;Mar 1, 2018;130;Mar 1, 2018
ID;DAT;AMT;PREV_DIFF_VALUE -After Lag
A;Nov 1, 2017;140;?
A;Jan 1, 2018;110;Nov 1, 2017
A;Feb 1, 2018;110;Jan 1, 2018
A;Mar 1, 2018;130;Feb 1, 2018
B;Mar 1, 2018;130;?
B;Apr 1, 2018;110;Mar 1, 2018
B;May 1, 2018;110;Apr 1, 2018
B;Jun 1, 2018;110;May 1, 2018
The third record should be Nov 1 2017
Thanks in advance

This is tricky. I think this does what you want:
select t.*,
max(case when result <> next_result then date end) over (partition by id order by date rows between unbounded preceding and 1 preceding)
from (select t.*,
lead(result) over (partition by a order by b) as next_result
from t
) t;

Try:
SELECT s1.ID
, FORMAT(s1.theDate,'MM-dd-yyyy') AS theDate
, s1.Amt
--, s1.PrevAmt
, CASE
WHEN Amt <> prevAmt
THEN FORMAT(
LAG(theDate) OVER ( PARTITION BY ID ORDER BY theDate )
,'MM-dd-yyyy' )
END AS prevDate
FROM (
SELECT ID, theDate, Amt
, LAG(AMT) OVER ( PARTITION BY ID ORDER BY theDate) AS prevAmt
FROM t1
) s1
ORDER BY ID, theDate DESC
This should give:
ID | theDate | Amt | prevDate
:- | :--------- | --: | :---------
A | 10-10-2017 | 90 | null
A | 04-04-2018 | 110 | null
A | 03-03-2018 | 110 | 02-02-2018
A | 02-02-2018 | 100 | 10-10-2017
B | 10-10-2018 | 90 | null
B | 09-09-2018 | 90 | 07-07-2018
B | 07-07-2018 | 80 | null
db<>fiddle here
For rows that don't have a previous row to pull the date from, it will return a NULL in the prevDate field.

Related

How to return same row multiple times with multiple conditions

My knowledge is pretty basic so your help would be highly appreciated.
I'm trying to return the same row multiple times when it meets the condition (I only have access to select query).
I have a table of more than 500000 records with Customer ID, Start Date and End Date, where end date could be null.
I am trying to add a new column called Week_No and list all rows accordingly. For example if the date range is more than one week, then the row must be returned multiple times with corresponding week number. Also I would like to count overlapping days, which will never be more than 7 (week) per row and then count unavailable days using second table.
Sample data below
t1
ID | Start_Date | End_Date
000001 | 12/12/2017 | 03/01/2018
000002 | 13/01/2018 |
000003 | 02/01/2018 | 11/01/2018
...
t2
ID | Unavailable
000002 | 14/01/2018
000003 | 03/01/2018
000003 | 04/01/2018
000003 | 08/01/2018
...
I cannot pass the stage of adding week no. I have tried using CASE and UNION ALL but keep getting errors.
declare #week01start datetime = '2018-01-01 00:00:00'
declare #week01end datetime = '2018-01-07 00:00:00'
declare #week02start datetime = '2018-01-08 00:00:00'
declare #week02end datetime = '2018-01-14 00:00:00'
...
SELECT
ID,
'01' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week01end and End_Date >= #week01start)
or (Start_Date <= #week01end and End_Date is null)
UNION ALL
SELECT
ID,
'02' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week02end and End_Date >= #week02start)
or (Start_Date <= #week02end and End_Date is null)
...
The new table should look like this
ID | Week_No | Year | Start_Date | End_Date | Overlap | Unavail_Days
000001 | 01 | 2018 | 12/12/2017 | 03/01/2018 | 3 |
000002 | 02 | 2018 | 13/01/2018 | | 2 | 1
000003 | 01 | 2018 | 02/01/2018 | 11/01/2018 | 6 | 2
000003 | 02 | 2018 | 02/01/2018 | 11/01/2018 | 4 | 1
...
business wise i cannot understand what you are trying to achieve. You can use the following code though to calculate your overlapping days etc. I did it the way you asked, but i would recommend a separate table, like a Time dimension to produce a "cleaner" solution
/*sample data set in temp table*/
select '000001' as id, '2017-12-12'as start_dt, ' 2018-01-03' as end_dt into #tmp union
select '000002' as id, '2018-01-13 'as start_dt, null as end_dt union
select '000003' as id, '2018-01-02' as start_dt, '2018-01-11' as end_dt
/*calculate week numbers and week diff according to dates*/
select *,
DATEPART(WK,start_dt) as start_weekNumber,
DATEPART(WK,end_dt) as end_weekNumber,
case
when DATEPART(WK,end_dt) - DATEPART(WK,start_dt) > 0 then (DATEPART(WK,end_dt) - DATEPART(WK,start_dt)) +1
else (52 - DATEPART(WK,start_dt)) + DATEPART(WK,end_dt)
end as WeekDiff
into #tmp1
from
(
SELECT *,DATEADD(DAY, 2 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [start_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [startdt_Week_End_Date],
DATEADD(DAY, 2 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_End_Date]
from #tmp
) s
/*cte used to create duplicates when week diff is over 1*/
;with x as
(
SELECT TOP (10) rn = ROW_NUMBER() --modify the max you want
OVER (ORDER BY [object_id])
FROM sys.all_columns
ORDER BY [object_id]
)
/*final query*/
select --*
ID,
start_weekNumber+ (r-1) as Week,
DATEPART(YY,start_dt) as [YEAR],
start_dt,
end_dt,
null as Overlap,
null as unavailable_days
from
(
select *,
ROW_NUMBER() over (partition by id order by id) r
from
(
select d.* from x
CROSS JOIN #tmp1 AS d
WHERE x.rn <= d.WeekDiff
union all
select * from #tmp1
where WeekDiff is null
) a
)a_ext
order by id,start_weekNumber
--drop table #tmp1,#tmp
The above will produce the results you want except the overlap and unavailable columns. Instead of just counting weeks, i added the number of week in the year using start_dt, but you can change that if you don't like it:
ID Week YEAR start_dt end_dt Overlap unavailable_days
000001 50 2017 2017-12-12 2018-01-03 NULL NULL
000001 51 2017 2017-12-12 2018-01-03 NULL NULL
000001 52 2017 2017-12-12 2018-01-03 NULL NULL
000002 2 2018 2018-01-13 NULL NULL NULL
000003 1 2018 2018-01-02 2018-01-11 NULL NULL
000003 2 2018 2018-01-02 2018-01-11 NULL NULL

How to make a query of the max month/year of a group of columns

I don't really know how to make a MySQL query where I get all the data(Columns) of the most present month + year.
Being a little bit more specific I need to group the columns (description, data1, data2, data3) and only get the one that has the highest month + year.
The table contains data like this one:
description data1 data2 data3 year month adddate
desc1 0 7 1 2019 5 2019-05-23
desc2 0 7 1 2019 5 2019-05-23
desc3 1 7 1 2019 5 2019-05-23
desc4 0 2 1 2018 12 2019-05-23
I've tried using max on the month, year and adddate.
select description, data1, data2, data3, max(year) as year, max(month) as month, max(adddate) as adddate
from tabledata
group by description, data1, data2, data3
But with this I'm getting that the max register is desc2 with the month 12 which is not correct since the month is desc2 is 6.
You need to get the max value of an expression like:
100 * year + month
or
12 * year + month
So do this:
select *
from tabledata
where 100 * year + month = (
select max(100 * year + month)
from tabledata
)
See the demo.
Results:
> description | data1 | data2 | data3 | year | month | adddate
> :---------- | ----: | ----: | ----: | ---: | ----: | :---------
> desc1 | 0 | 7 | 1 | 2019 | 5 | 23/05/2019
> desc2 | 0 | 7 | 1 | 2019 | 5 | 23/05/2019
> desc3 | 1 | 7 | 1 | 2019 | 5 | 23/05/2019
You need to find the max combination of year and month (MaxYearMonth) and then join to your table again to find all the rows that match that MaxYearMonth. Here is the solution for SQL Server...
IF OBJECT_ID('tempdb.dbo.#tabledata', 'U') IS NOT NULL DROP TABLE #tabledata;
CREATE TABLE #tabledata
(
description VARCHAR(10)
, data1 INT
, data2 INT
, data3 INT
, year INT
, month INT
, adddate DATE
);
INSERT INTO #tabledata VALUES ('desc1', 0, 7, 1, 2019, 5, '2019-05-23')
INSERT INTO #tabledata VALUES ('desc2', 0, 7, 1, 2019, 5, '2019-05-23')
INSERT INTO #tabledata VALUES ('desc3', 1, 7, 1, 2019, 5, '2019-05-23')
INSERT INTO #tabledata VALUES ('desc4', 0, 2, 1, 2018, 12, '2019-05-23')
select a.description, a.data1, a.data2, a.data3
from #tabledata a
inner join
(
select max(convert(char(4), year) + right('0' + convert(varchar(2), month), 2)) as [MaxYearMonth]
from #tabledata
) b on convert(char(4), a.year) + right('0' + convert(varchar(2), a.month), 2) = b.MaxYearMonth
Yields this...
description data1 data2 data3
----------- ----------- ----------- -----------
desc1 0 7 1
desc2 0 7 1
desc3 1 7 1
Click here to see it in action.
You mention "mysql" so if you are really using mySQL then the syntax will likely need to change a bit.

Query for negative account balance period in bigquery

I am playing around with bigquery and hit an interesting use case. I have a collection of customers and account balances. The account balances collection records any account balance change.
Customers:
+---------+--------+
| ID | Name |
+---------+--------+
| 1 | Alice |
| 2 | Bob |
+---------+--------+
Accounts balances:
+---------+---------------+---------+------------+
| ID | customer_id | value | timestamp |
+---------+---------------+---------+------------+
| 1 | 1 | -500 | 2019-02-12 |
| 2 | 1 | -200 | 2019-02-10 |
| 3 | 2 | 200 | 2019-02-10 |
| 4 | 1 | 0 | 2019-02-09 |
+---------+---------------+---------+------------+
The goal is to find out, for how long a customer has a negative account balance. The resulting collection would look like this:
+---------+--------+---------------------------------+
| ID | Name | Negative account balance since |
+---------+--------+---------------------------------+
| 1 | Alice | 2 days |
+---------+--------+---------------------------------+
Bob is not in the collection, because his last account record shows a positive value.
I think following steps are involved:
get last account balance per customer, see if it is negative
go through the account balance values until you hit a positive (or no more) value
compute datediff
Is something like this even possible in sql? Do you have any ideas on who to create such query? To get customers that currently have a negative account balance, I use this query:
SELECT customer_id FROM (
SELECT t.account_balance, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) as seqnum FROM `account_balances` t
) t
WHERE seqnum = 1 AND account_balance<0
Below is for BigQuery Standard SQL
#standardSQL
SELECT customer_id, name,
SUM(IF(negative_positive < 0, days, 0)) negative_days,
SUM(IF(negative_positive = 0, days, 0)) zero_days,
SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
SELECT customer_id, negative_positive, grp,
1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
FROM (
SELECT customer_id, ts, SIGN(value) negative_positive,
COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
FROM (
SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
FROM `project.dataset.balances`
)
)
GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.balances` AS (
SELECT 1 customer_id, -500 value, DATE '2019-02-12' ts UNION ALL
SELECT 1, -200, '2019-02-10' UNION ALL
SELECT 2, 200, '2019-02-10' UNION ALL
SELECT 1, 0, '2019-02-09'
), `project.dataset.customers` AS (
SELECT 1 id, 'Alice' name UNION ALL
SELECT 2, 'Bob'
)
SELECT customer_id, name,
SUM(IF(negative_positive < 0, days, 0)) negative_days,
SUM(IF(negative_positive = 0, days, 0)) zero_days,
SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
SELECT customer_id, negative_positive, grp,
1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
FROM (
SELECT customer_id, ts, SIGN(value) negative_positive,
COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
FROM (
SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
FROM `project.dataset.balances`
)
)
GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name
-- ORDER BY customer_id
with result
Row customer_id name negative_days zero_days positive_days
1 1 Alice 3 1 0
2 2 Bob 0 0 1

Oracle first and last observation over multiple windows

I have a problem with a query in Oracle.
My table contains all of the loan applications from last year. Some of the customers have more than one application. I want to aggregate those applications as follows:
For each customer, I want to find his first application (let's call it A) in the last year and then I want to find out what was the last application in 30 days interval, counting from the first application (say B is the last one). Next, I need to find the application following B and again find for it the last one in 30 days interval, as in the previous step. What I want as the result is the table with the latest and earliest applications on each customer's interval. It is also possible that the first one is the same as the last one.
How could I do this in Oracle without plsql? Is this possible? Should I use cumulative sums of time intervals for it? (but then the starting point for each sum depends on the counted sum..)
Let's say the table has a following form:
application_id (unique) | customer_id (not unique) | create_date
1 1 2017-01-02 <- first
2 1 2017-01-10 <- middle
3 1 2017-01-30 <- last
4 1 2017-05-02 <- first and last
5 1 2017-06-02 <- first
6 1 2017-06-30 <- middle
7 1 2017-06-30 <- middle
8 1 2017-07-01 <- last
What I expect is:
application_id (unique) | customer_id (not unique) | create_date
1 1 2017-01-02 <- first
3 1 2017-01-30 <- last
4 1 2017-05-02 <- first and last
5 1 2017-06-02 <- first
8 1 2017-07-01 <- last
Thanks in advance for help.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( application_id, customer_id, create_date ) AS
SELECT 1, 1, DATE '2017-01-02' FROM DUAL UNION ALL -- <- first
SELECT 2, 1, DATE '2017-01-10' FROM DUAL UNION ALL -- <- middle
SELECT 3, 1, DATE '2017-01-30' FROM DUAL UNION ALL -- <- last
SELECT 4, 1, DATE '2017-05-02' FROM DUAL UNION ALL -- <- first and last
SELECT 5, 1, DATE '2017-06-02' FROM DUAL UNION ALL -- <- first
SELECT 6, 1, DATE '2017-06-30' FROM DUAL UNION ALL -- <- middle
SELECT 7, 1, DATE '2017-06-30' FROM DUAL UNION ALL -- <- middle
SELECT 8, 1, DATE '2017-07-01' FROM DUAL -- <- last
Query 1:
WITH data ( application_id, customer_id, create_date, first_date, grp ) AS (
SELECT t.application_id,
t.customer_id,
t.create_date,
t.create_date,
1
FROM table_name t
WHERE application_id = 1
UNION ALL
SELECT t.application_id,
t.customer_id,
t.create_date,
CASE WHEN t.create_date <= d.first_date + INTERVAL '30' DAY
THEN d.first_date
ELSE t.create_date
END,
CASE WHEN t.create_date <= d.first_date + INTERVAL '30' DAY
THEN grp
ELSE grp + 1
END
FROM data d
INNER JOIN table_name t
ON ( d.customer_id = t.customer_id
AND d.application_id + 1 = t.application_id )
)
SELECT application_id,
customer_id,
create_date,
grp
FROM (
SELECT d.*,
ROW_NUMBER() OVER ( PARTITION BY customer_id, grp ORDER BY create_date ASC ) AS rn_a,
ROW_NUMBER() OVER ( PARTITION BY customer_id, grp ORDER BY create_date DESC ) AS rn_d
FROM data d
)
WHERE rn_a = 1
OR rn_d = 1
Results:
| APPLICATION_ID | CUSTOMER_ID | CREATE_DATE | GRP |
|----------------|-------------|----------------------|-----|
| 1 | 1 | 2017-01-02T00:00:00Z | 1 |
| 3 | 1 | 2017-01-30T00:00:00Z | 1 |
| 4 | 1 | 2017-05-02T00:00:00Z | 2 |
| 5 | 1 | 2017-06-02T00:00:00Z | 3 |
| 8 | 1 | 2017-07-01T00:00:00Z | 3 |

Count and pivot a table by date

I would like to identify the returning customers from an Oracle(11g) table like this:
CustID | Date
-------|----------
XC321 | 2016-04-28
AV626 | 2016-05-18
DX970 | 2016-06-23
XC321 | 2016-05-28
XC321 | 2016-06-02
So I can see which customers returned within various windows, for example within 10, 20, 30, 40 or 50 days. For example:
CustID | 10_day | 20_day | 30_day | 40_day | 50_day
-------|--------|--------|--------|--------|--------
XC321 | | | 1 | |
XC321 | | | | 1 |
I would even accept a result like this:
CustID | Date | days_from_last_visit
-------|------------|---------------------
XC321 | 2016-05-28 | 30
XC321 | 2016-06-02 | 5
I guess it would use a partition by windowing clause with unbounded following and preceding clauses... but I cannot find any suitable examples.
Any ideas...?
Thanks
No need for window functions here, you can simply do it with conditional aggregation using CASE EXPRESSION :
SELECT t.custID,
COUNT(CASE WHEN (last_visit- t.date) <= 10 THEN 1 END) as 10_day,
COUNT(CASE WHEN (last_visit- t.date) between 11 and 20 THEN 1 END) as 20_day,
COUNT(CASE WHEN (last_visit- t.date) between 21 and 30 THEN 1 END) as 30_day,
.....
FROM (SELECT s.custID,
LEAD(s.date) OVER(PARTITION BY s.custID ORDER BY s.date DESC) as last_visit
FROM YourTable s) t
GROUP BY t.custID
Oracle Setup:
CREATE TABLE customers ( CustID, Activity_Date ) AS
SELECT 'XC321', DATE '2016-04-28' FROM DUAL UNION ALL
SELECT 'AV626', DATE '2016-05-18' FROM DUAL UNION ALL
SELECT 'DX970', DATE '2016-06-23' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-05-28' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-06-02' FROM DUAL;
Query:
SELECT *
FROM (
SELECT CustID,
Activity_Date AS First_Date,
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '10' DAY FOLLOWING )
- 1 AS "10_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '20' DAY FOLLOWING )
- 1 AS "20_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '30' DAY FOLLOWING )
- 1 AS "30_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '40' DAY FOLLOWING )
- 1 AS "40_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '50' DAY FOLLOWING )
- 1 AS "50_Day",
ROW_NUMBER() OVER ( PARTITION BY CustID ORDER BY Activity_Date ) AS rn
FROM Customers
)
WHERE rn = 1;
Output
USTID FIRST_DATE 10_Day 20_Day 30_Day 40_Day 50_Day RN
------ ------------------- ---------- ---------- ---------- ---------- ---------- ----------
AV626 2016-05-18 00:00:00 0 0 0 0 0 1
DX970 2016-06-23 00:00:00 0 0 0 0 0 1
XC321 2016-04-28 00:00:00 0 0 1 2 2 1
Here is an answer that works for me, I have based it on your answers above, thanks for contributions from MT0 and Sagi:
SELECT CustID,
visit_date,
Prev_Visit ,
COUNT( CASE WHEN (Days_between_visits) <=10 THEN 1 END) AS "0-10_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 11 AND 20 THEN 1 END) AS "11-20_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 21 AND 30 THEN 1 END) AS "21-30_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 31 AND 40 THEN 1 END) AS "31-40_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 41 AND 50 THEN 1 END) AS "41-50_day" ,
COUNT( CASE WHEN (Days_between_visits) >50 THEN 1 END) AS "51+_day"
FROM
(SELECT CustID,
visit_date,
Lead(T1.visit_date) over (partition BY T1.CustID order by T1.visit_date DESC) AS Prev_visit,
visit_date - Lead(T1.visit_date) over (
partition BY T1.CustID order by T1.visit_date DESC) AS Days_between_visits
FROM T1
) T2
WHERE Days_between_visits >0
GROUP BY T2.CustID ,
T2.visit_date ,
T2.Prev_visit ,
T2.Days_between_visits;
This returns:
CUSTID | VISIT_DATE | PREV_VISIT | DAYS_BETWEEN_VISIT | 0-10_DAY | 11-20_DAY | 21-30_DAY | 31-40_DAY | 41-50_DAY | 51+DAY
XC321 | 2016-05-28 | 2016-04-28 | 30 | | | 1 | | |
XC321 | 2016-06-02 | 2016-05-28 | 5 | 1 | | | | |