History record which came earlier than recent - sql

I have certain ID_NUM which have transactions which have History record which came earlier than recent
Below is one example
ID_num Create Datetime Start Datetime Rank_num
1 1/1/19 5:28 NULL 1
1 12/1/18 9:25 1/1/19 9:25 2
1 12/1/18 7:39 12/1/18 9:25 3
1 11/1/18 7:40 12/1/18 13:37 4
1 10/1/18 7:38 11/1/18 13:37 5
1 9/1/18 13:37 9/1/18 13:37 6
1 9/1/18 13:37 10/1/18 13:37 7
Here Rank#4 has a Start Datetime > Rank#3.
These incorrect records are set because of a system error and would like to identify how many such rows exists
I would like to list all ID_num's which have similar behaviour
Any suggestion would help

You can use lag(). For instance:
select t.*
from (select t.*,
lag(start_datetime) partition by (id_num order by ranknum) as prev_start_datetime
from t
) t
where start_datetime < prev_start_datetime

Related

Calculate Churn by aggregating by date range in SQL

I am trying to calculate the churn rate from a data that has customer_id, group, date. The aggregation is going to be by id, group and date. The churn formula is (customers in previous cohort - customers in last cohort)/customers in previous cohort
customers in previous cohort refers to cohorts in before 28 days
customers in last cohort refers to cohorts in last 28 days
I am not sure how to aggregate them by date range to calculate the churn.
Here is sample data that I copied from SQL Group by Date Range:
Date Group Customer_id
2014-03-01 A 1
2014-04-02 A 2
2014-04-03 A 3
2014-05-04 A 3
2014-05-05 A 6
2015-08-06 A 1
2015-08-07 A 2
2014-08-29 XXXX 2
2014-08-09 XXXX 3
2014-08-10 BB 4
2014-08-11 CCC 3
2015-08-12 CCC 2
2015-03-13 CCC 3
2014-04-14 CCC 5
2014-04-19 CCC 4
2014-08-16 CCC 5
2014-08-17 CCC 3
2014-08-18 XXXX 2
2015-01-10 XXXX 3
2015-01-20 XXXX 4
2014-08-21 XXXX 5
2014-08-22 XXXX 2
2014-01-23 XXXX 3
2014-08-24 XXXX 2
2014-02-25 XXXX 3
2014-08-26 XXXX 2
2014-06-27 XXXX 4
2014-08-28 XXXX 1
2014-08-29 XXXX 1
2015-08-30 XXXX 2
2015-09-31 XXXX 3
The goal is to calculate the churn rate every 28 days in between 2014 and 2015 by the formula given above. So, it is going to be aggregating the data by rolling it by 28 days and calculating the churn by the formula.
Here is what I tried to aggregate the data by date range:
SELECT COUNT(distinct customer_id) AS count_ids, Group,
DATE_SUB(CAST(Date AS DATE), INTERVAL 56 DAY) AS Date_min,
DATE_SUB(CURRENT_DATE, INTERVAL 28 DAY) AS Date_max
FROM churn_agg
GROUP BY count_ids, Group, Date_min, Date_max
Hope someone will help me with aggregation and churn calculation. I want to simply deduct the aggregated count_ids to deduct it from the next aggregated count_ids which is after 28 days. So this is going to be successive deduction of the same column value (count_ids). I am not sure if I have to use rolling window or simple aggregation to find the churn.
As corrected by #jarlh, it's not 2015-09-31 but 2015-09-30
You can use this to create 28 days calendar:
create table daysby28 (i int, _Date date);
insert into daysby28 (i, _Date)
SELECT i, cast('01-01-2014'as date) + i*INTERVAL '28 day'
from generate_series(0,50) i
order by 1;
After you use #jarlh churn_agg table creation he sent with the fiddle, with this query, you get what you want:
with cte as
(
select count(Customer) as TotalCustomer, Cohort, CohortDateStart From
(
select distinct a.Customer_id as Customer, b.i as Cohort, b._Date as CohortDateStart
from churn_agg a left join daysby28 b on a._Date >= b._Date and a._Date < b._Date + INTERVAL '28 day'
) a
group by Cohort, CohortDateStart
)
select a.CohortDateStart,
1.0*(b.TotalCustomer - a.TotalCustomer)/(1.0*b.TotalCustomer) as Churn from cte a
left join cte b on a.cohort > b.cohort
and not exists(select 1 from cte c where c.cohort > b.cohort and c.cohort < a.cohort)
order by 1
The fiddle of all together is here

Creating a new calculated column in SQL

Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days
You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;

Getting a count by date based on the number of observations with encompassing date ranges

I am working with a table in Microsoft Access whereby I have 2 columns with a start and end date.
I want to get the count by date of the number of rows with date ranges that encompass the date in the output table.
Input Data
Start Date End Date
01/02/2017 03/02/2017
07/02/2017 19/02/2017
09/02/2017 19/02/2017
11/02/2017 12/02/2017
12/02/2017 17/02/2017
Desired Output
Date Count
01/02/2017 1
02/02/2017 1
03/02/2017 1
04/02/2017 0
05/02/2017 0
06/02/2017 0
07/02/2017 1
08/02/2017 1
09/02/2017 2
10/02/2017 2
11/02/2017 3
12/02/2017 4
13/02/2017 3
14/02/2017 3
15/02/2017 3
16/02/2017 3
17/02/2017 3
18/02/2017 2
19/02/2017 2
20/02/2017 0
For this project, I have to use Microsoft Access 2010, so a solution in either SQL code or design view input would be great.
Any help on this would be appreciated. Thanks!
Use the below query to get the required result. You can also change the column with respect to your requirements
SELECT END_DATE AS DATE, COUNT(*) AS COUNT FROM TABLE_NAME
GROUP BY END_DATE ORDER BY END_DATE;

Detect Intervals

id_person transaction internation_in internation_out
1 456465 2015-01-01 2015-02-01
2 564564 2015-02-03 2015-04-02
3 4564654 2015-01-01 2015-01-05
4 4564646 2015-01-01 2015-02-04
4 4564656 2015-03-01 2015-04-15
4 87899465 2015-05-16 2015-05-25
5 56456456 2015-01-01 2105-01-08
5 45456546 2015-02-04 2015-03-04
I want to know how to group by id_person the difference (Interval in hours) between the internation_out from the first transaction with the internation_in of the next transaction.
I probe with lag and lead but I can't group by id_person
I Want this Result using id_person 4 for example
id_person transaction Gap
4 4564646 Null
4 4564656 The result of (2015-02-04- 2015-03-01)
4 87899465 The result of (2015-04-15- 2015-05-16)
If your time periods are not overlapping (and yours are not), then there is a simple calculation for the gaps: it is the total number of days from the beginning to the end minus the total on each row. So, you don't need lead() or lag():
select id_person,
(case when count(*) > 1
then (max(internation_out) - min(internation_in) -
sum(internation_out - internation_in)
)
end) as gap_duration
from table t
group by id_person;
Note that this returns NULL if there is only one row for the person. If you want 0, then you don't need the case.

Average the data in sql table

I need your help in following case.
i have table ChartData
in that table i have large number of records as per below
PK,ProjectID,MAchineId,Powervalue,PowerData
1 1 1 20.5 2011-07-05 12:00:00
2 1 1 21.5 2011-07-05 12:01:00
3 1 1 22.5 2011-07-05 12:02:00
4 1 1 23.5 2011-07-05 12:03:00
5 1 1 24.5 2011-07-05 12:04:00
6 1 1 25.5 2011-07-05 12:05:00
7 1 1 26.5 2011-07-05 12:06:00
8 1 1 27.5 2011-07-05 12:07:00
9 1 1 26.5 2011-07-05 12:08:00
10 1 1 28.5 2011-07-05 12:09:00
Output
PK,ProjectID,MAchineId,Powervalue(Avg value of power),PowerData
1 1 1 20.5 2011-07-05 12:00:00
6 1 1 25.5 2011-07-05 12:05:00
any help would be appreciated. thanks in advance.
Select * from ChartData where PK=1 or PK=6
You are not doing any averages as far as i know
I guess you want to aggregate every five tuples? If so, maybe something like this will help:
SELECT PK, ProjectID, MachineID, AVG(Powervalue), PowerData FROM table GROUP BY (PK-1)/5
although I don't really know if you can do that with GROUP BY
#gbn: I would like to answer to your comment directly, but I think I lack the privilege. Can you explain why this will fail on all but MySql?
You just want the average powervalue?
SELECT PK,ProjectID,MachineID, AVG(PowerValue), PowerData FROM ChartData
If you'd for example want the average powervalue per machine, this would become the query:
SELECT PK,ProjectID,MachineID, AVG(PowerValue), PowerData FROM ChartData GROUP BY MachineID
Select every 5th record (T-SQL).
select
ch.*
from
(select
row_number() over (order by pk asc) as r,
pk
from
chartdata
) as r
inner join chartdata as ch
on r.pk = c.pl
where
(r.r-1)%5 = 0
;
Without further clarification on what you want this is the best answer I can give.