Average time between each transaction for customers - sql

How can I know the Average time between each transaction for customers (In seconds)?
Time Customer ID Transaction
11/08/2020 00:00:01 1 111
11/08/2020 00:02:00 2 0
11/08/2020 00:02:07 1 0
11/08/2020 00:03:09 3 412
11/08/2020 00:04:00 1 0
Before the Expected table I need to show the required steps:
for Customer ID 1 has 3 transactions , the differences transactions.
the difference between first and second transaction 126 seconds.
the difference between second and third transaction 113 seconds.
The Expected table:
Customer ID Average time between each transactions for customer
1 (126+113)/3
2
3

The average time is the total time divided by one less than the number of transactions. So:
select customerId,
(case when count(*) > 1
then datediff(second, min(time), max(time)) / (count(*) - 1)
end) as avg_time
from t
group by customerId;
Note: SQL Server does integer division. If you want a non-integer as a result, you might want a conversion or perhaps count(*) - 1.0 in the expression.
This does assume that the times are only increasing (which seems like a reasonable assumption for this type of problem).

Related

Database schema pattern for grouping transactions

I am working on an accounting system in which there is a way to revert transactions which are made by mistake.
There are processes which run on invoices which generate transactions.
One process can generate multiple transactions for an invoice. There can be multiple processes which can be run on an invoice.
The schema looks as under:
Transactions
========================================================
Id | InvoiceId | InvoiceProcessType | Amount | CreatedOn
1 1 23 10.00 Today
2 1 23 13.00 Today
3 1 23 17.00 Yesterday
4 1 23 32.00 Yesterday
Now 1 and 2 happened together and 3 and 4 happened together and I want to revert the latter (3,4), what would be a possible solution to group them.
One possible solution is to add a column ProcessCount which is incremented on every process.
The new schema would look as under.
Transactions
==============================================================================
Id | InvoiceId | InvoiceProcessType | Amount | CreatedOn | ProcessCount
1 1 23 10.00 Today 1
2 1 23 13.00 Today 1
3 1 23 17.00 Yesterday 2
4 1 23 32.00 Yesterday 2
Is there any other way I can implement this ?
TIA
If you are basing the batching on an arbitrary time frame between the createdon date/time values, then you can use lag() and a cumulative sum. For instance, if two rows are in the same batch if they are within an hour, then:
select t.*,
sum(case when prev_createdon > dateadd(hour, -1, createdon) then 0 else 1 end) over
(partition by invoiceid order by createdon, id) as processcount
from (select t.*,
lag(createdon) over (partition by invoiceid order by createdon, id) as prev_createdon
from transactions t
) t;
That said, it would seem that your processing needs to be enhanced. Each time the code runs, a row should be inserted into some table (say processes). The id generated from that insertion should be used to insert into transactions. That way, you can keep the information about when -- and who and what and so on -- inserted particular transactions.
You can use the dense_rank to identify it as follows:
select t.*,
dense_rank() over (partition by InvoiceId
order by CreatedOn desc) as ProcessCount
from your_table t
You can then revert (/delete) as per your requirement, There is no need to explicitly maintain the ProcessCount column. It can be derived as per the above query.

SQL How to calculate Average time between Order Purchases? (do sql calculations based on next and previous row)

I have a simple table that contains the customer email, their order count (so if this is their 1st order, 3rd, 5th, etc), the date that order was created, the value of that order, and the total order count for that customer.
Here is what my table looks like
Email Order Date Value Total
r2n1w#gmail.com 1 12/1/2016 85 5
r2n1w#gmail.com 2 2/6/2017 125 5
r2n1w#gmail.com 3 2/17/2017 75 5
r2n1w#gmail.com 4 3/2/2017 65 5
r2n1w#gmail.com 5 3/20/2017 130 5
ation#gmail.com 1 2/12/2018 150 1
ylove#gmail.com 1 6/15/2018 36 3
ylove#gmail.com 2 7/16/2018 41 3
ylove#gmail.com 3 1/21/2019 140 3
keria#gmail.com 1 8/10/2018 54 2
keria#gmail.com 2 11/16/2018 65 2
What I want to do is calculate the time average between purchase for each customer. So lets take customer ylove. First purchase is on 6/15/18. Next one is 7/16/18, so thats 31 days, and next purchase is on 1/21/2019, so that is 189 days. Average purchase time between orders would be 110 days.
But I have no idea how to make SQL look at the next row and calculate based on that, but then restart when it reaches a new customer.
Here is my query to get that table:
SELECT
F.CustomerEmail
,F.OrderCountBase
,F.Date_Created
,F.Total
,F.TotalOrdersBase
FROM #FullBase F
ORDER BY f.CustomerEmail
If anyone can give me some suggestions, that would be greatly appreciated.
And then maybe I can calculate value differences (in percentage). So for example, ylove spent $36 on their first order, $41 on their second which is a 13% increase. Then their second order was $140 which is a 341% increase. So on average, this customer increased their purchase order value by 177%. Unrelated to SQL, but is this the correct way of calculating a metric like this?
looking to your sample you clould try using the diff form min and max date divided by total
select email, datediff(day, min(Order_Date), max(Order_Date))/(total-1) as avg_days
from your_table
group by email
and for manage also the one order only
select email,
case when total-1 > 0 then
datediff(day, min(Order_Date), max(Order_Date))/(total-1)
else datediff(day, min(Order_Date), max(Order_Date)) end as avg_days
from your_table
group by email
The simplest formulation is:
select email,
datediff(day, min(Order_Date), max(Order_Date)) / nullif(total-1, 0) as avg_days
from t
group by email;
You can see this is the case. Consider three orders with od1, od2, and od3 as the order dates. The average is:
( (od2 - od1) + (od3 - od2) ) / 2
Check the arithmetic:
--> ( od2 - od1 + od3 - od2 ) / 2
--> ( od3 - od1 ) / 2
This pretty obviously generalizes to more orders.
Hence the max() minus min().

SQL: How to group rows with the condition that sum of fields is limited to a certain value?

This is my table:
id user_id date balance
1 1 2016-05-10 10
2 1 2016-05-10 30
3 2 2017-04-24 5
4 2 2017-04-27 10
5 3 2017-11-10 40
I want to group the rows by user_id and sum the balance, but so that the sum is equal or less than 30. Moreover, I need to display the minimum date in the group. It should look like this:
id balance date_start
1-1 10 2016-05-10
1-2 30 2016-05-10
2-1 15 2017-04-24
Excuse for my language. Thanks.
You should be able to do so by using group by & having, here is an example of what may solve your case :
SELECT id, user_id, SUM(balance) as balance, data_start
FROM your_table
GROUP BY user_id
HAVING SUM(balance) >= 30
AND MIN(date_start)
This is a good way to do it with one query, but it is a complex query and you should be careful if using it on a very large tables.

SQL : How to count number of times each ID exists continuously from previous period

My SQL data set is like this;
Date firm_id
======================
2010-01 1
2010-01 2
2010-01 3
----------------------
2010-02 1
2010-02 2
----------------------
2010-03 1
2010-03 2
2010-03 3
----------------------
2010-04 1
2010-04 3
How can I create a variable, name firm_age, to represent age of firms existing continuously from the previous period? like this,
Date firm_id firm_age
=================================
2010-01 1 0
2010-01 2 0
2010-01 3 0
-----------------------------------
2010-02 1 1
2010-02 2 1
-----------------------------------
2010-03 1 2
2010-03 2 2
2010-03 3 0
-----------------------------------
2010-04 1 3
2010-04 3 1
Thank you
This is a use case for the PACK operator from "Time & Relational Theory", which is not supported, at least not directly, in SQL.
You are trying to find [for each given row of the table] the smallest month such that there does not exist any intervening month between that smallest month and the month of the given row such that the company of the given row did not exist at that intervening month. Given two months, assessing the [non-]existence of such an intervening month is relatively trivial, however, finding the smallest month that makes the condition true for all intervening months is another order (*). I wouldn't try to do this completely in plain SQL.
(*) which set of months are you going to SELECT that "smallest month" from ? You cannot rely on the fact that all months will be mentioned in your table as there is always the slight theoretical possibility that one particular month, no companies existed at all. (This possibility also breaks any attack on the problem based on window functions ans row_numbers.)
This is a gaps-and-islands problem. You want "islands" where the values are sequential. Then you want to enumerate them. You can use row_number() for this:
select t.*,
row_number() over (partition by firm_id, date - seqnum * interval '1 month'
order by date
) as firm_age
from (select t.*,
row_number() over (partition by firm_id order by date) as seqnum
from t
) t;
Note that date functions are not standard across databases. This makes some assumptions about the data representation, but the idea for the processing should work in almost any database.

Performing calculations based on dates in oracle

I have the following tables.
Accounts(account_number*,balance)
Transactions(account_number*,transaction_number*,date,amount,type)
Date is the date that the transaction happened. Amount is the amount of the transaction
and it can have a positive or a negative value, dependent of the type(Withdrawal -,Deposit +). I think the type is irrelevant here as the amount is already given in the proper way.
I need to write a query which points out the account_number of the accounts that have at least once had negative balance.
Here's some sample data from the Transactions table, ordered by account_number and date.
account_number transaction_number date amount type
--------------------------------------------------------------------
1 2 02/03/2013 -20000 withdrawal
1 3 03/15/2013 300 deposit
1 1 01/01/2013 100 deposit
2 1 04/15/2013 235236 deposit
3 1 06/15/2013 500 deposit
4 1 03/01/2013 10 deposit
4 2 04/01/2013 80 deposit
5 1 11/11/2013 10000 deposit
5 2 12/11/2013 20000 deposit
5 3 12/13/2013 -10002 withdrawal
6 1 03/15/2013 102300 deposit
7 1 03/15/2013 100 deposit
8 1 08/08/2013 133990 deposit
9 1 05/09/2013 10000 deposit
9 2 06/01/2013 300 deposit
9 3 10/11/2013 23 deposit
Something like this with an analytic to keep a running balance for an account:
SELECT DISTINCT account_number
FROM ( SELECT account_number
,SUM(amount)
OVER (PARTITION BY account_number ORDER BY date) AS running_balance
FROM transactions
) x
WHERE running_balance < 0
Explanation:
It is using an analytic function: the PARTITION BY breaks the table into groups identified by the account number. Within each group, the data is ordered by date. Then there is a walk through each element in the ordered group and the SUM function is applied (by default summing everything from the beginning of the group to the current row). This gives you a running balance. Just run the inner query on its own and take a look at the output, then read a bit about analytic queries. They are pretty cool.