SQL count of 90 day gaps between records - sql

Say I have a Payment table. I need to know the number of times the gap between payments is greater than 90 days grouped by personID. Payment frequency varies. There is no expected number of payments. There could 0 or hundreds of payments in a 90 day period. If there was no payment for a year, that would count as 1. If there was a payment every month, the count would be 0. If there were 4 payments the first month, then a 90 day gap, then 2 more payments, then another 90 day gap, the count would be 2.
CREATE TABLE Payments
(
ID int PRIMARY KEY,
PersonID int FOREIGN KEY REFERENCES Persons(ID),
CreateDate datetime
)

If you have SQL Server 2014, you can use the LAG or LEAD function to peek at other rows, making this easy:
Select PersonId, Sum(InfrequentPayment) InfrequentPayments
from
(
select PersonId
, case
when dateadd(day,#period,paymentdate) < coalesce(lead(PaymentDate) over (partition by personid order by PaymentDate),getutcdate())
then 1
else 0
end InfrequentPayment
from #Payment
) x
Group by PersonId
Demo: http://sqlfiddle.com/#!6/9eecb7d/491
Explanation:
The outer SQL is fairly trivial; we take the results of the inner SQL, group by PersonId, and count/sum the number of times they've paid payment judged as Infrequent.
The inner SQL is also simple; we're selecting every record, making a note of the person and whether that payment (or rather the delay after that payment) was judged infrequent.
The case statement determines what constitutes an infrequent payment.
Here we say that if the record's paymentdate plus 90 days is still earlier than the next payment (or current date if it's the last payment, so there's no next payment) then it's infrequent (1); otherwise it's not (0).
The coalesce is simply there to handle the last record for a person; i.e. so that if there is no next payment the current date is used (thus capturing anyone who's last payment was over 90 days before today).
Now for the "clever" bit: lead(PaymentDate) over (partition by personid order by PaymentDate).
LEAD is a new SQL function which lets you look at the record after the current one (LAG is to see the previous record).
If you're familiar with row_number() or rank() you may already understand what's going on here.
To determine the record after the current one we don't look at the current query though; rather we specify an order by clause just for this function; that's what's in the brackets after the over keyword.
We also want to only compare each person's payment dates with other payments made by them; not by any customer. To achieve that we use the partition by clause.
I hope that makes sense / meets your requirement. Please say if anything's unclear and I'll try to improve my explanation.
EDIT
For older versions of SQL, the same effect can be achieved by use or ROW_NUMBER and a LEFT OUTER JOIN; i.e.
;with cte (PersonId, PaymentDate, SequenceNo) as
(
select PersonId
, PaymentDate
, ROW_NUMBER() over (partition by PersonId order by PaymentDate)
from #Payment
)
select a.PersonId
, sum(case when dateadd(day,#period,a.paymentdate) < coalesce(b.paymentdate,getutcdate()) then 1 else 0 end) InfrequentPayments
from cte a
left outer join cte b
on b.PersonId = a.PersonId
and b.SequenceNo = a.SequenceNo + 1
Group by a.PersonId
Another method which should work on most databases (though less efficient)
select PersonId
, sum(InfrequentPayment) InfrequentPayments
from
(
select PersonId
, case when dateadd(day,#period,paymentdate) < coalesce((
select min(PaymentDate)
from #Payment b
where b.personid = a.personid
and b.paymentdate > a.paymentdate
),getutcdate()) then 1 else 0 end InfrequentPayment
from #Payment a
) x
Group by PersonId

Generic query for this problem given a timestamp field would be something like this:
SELECT p1.personID, COUNT(*)
FROM payments p1
JOIN payments p2 ON
p1.timestamp < p2.timestamp
AND p1.personID = p2.personID
AND NOT EXISTS (-- exclude combinations of p1 and p2 where p exists between them
SELECT * FROM payments p
WHERE p.personID = p1.personID
AND p.timestamp > p1.timestamp
AND p.timestamp < p2.timestamp)
WHERE
DATEDIFF(p2.timestamp, p1.timestamp) >= 90
GROUP BY p1.personID

Related

Best approach to display all the users who have more than 1 purchases in a month in SQL

I have two tables in an Oracle Database, one of which is all the purchases done by all the customers over many years (purchase_logs). It has a unique purchase_id that is paired with a customer_id.The other table contains the user info of all the customers. Both have a common key of customer_id.
I want to display the user info of customers who have more than 1 unique item (NOT the item quantity) purchased in any month (i.e if A customer bought 4 unique items in february 2020 they would be valid as well as someone who bought 2 items in june). I was wondering what should my correct approach be and also how to correct execute that approach.
The two approaches that I can see are
Approach 1
Count the overall number of purchases done by all customers, filter the ones that are greater than 1 and then check if they any of them were done within a month.
Use this as a subquery in the where clause of the main query for retrieving the customer info for all the customer_id which match this condition.
This is what i've done so far,this retrieves the customer ids of all the customers who have more than 1 purchases in total. But I do not understand how to filter out all the purchases that did not occur in a single arbitrary month.
SELECT * FROM customer_details
WHERE customer_id IN (
SELECT cust_id from purchase_logs
group by cust_id
having count(*) >= 2);
Approach 2
Create a temporary table to Count the number of monthly purchases of a specific user_id then find the MAX() of the whole table and check if that MAX value is bigger than 1 or not. Then if it is provide it as true for the main query's where clause for the customer_info.
Approach 2 feels like the more logical option but I cannot seem to understand how to write the proper subquery for it as the command MAX(COUNT(customer_id)) from purchase_logs does not seem to be a valid query.
This is the DDL diagram.
This is the Sample Data of Purchase_logs
Customer_info
and Item_info
and the expected output for this sample data would be
It is certainly possible that there is a simpler approach that I am not seeing right now.
Would appreciate any suggestions and tips on this.
You need this query:
SELECT DISTINCT cust_id
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1;
to get all the cust_ids of the customers who have more than 1 unique item purchased in any month and you can use with the operator IN:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT DISTINCT cust_id -- here DISTINCT may be removed as it does not make any difference when the result is used with IN
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1
);
One approach might be to try
with multiplepurchase as (
select customer_id,month(purchasedate),count(*) as order_count
from purchase_logs
group by customer_id,month(purchasedate)
having count(*)>=2)
select customer_id,username,usercategory
from mutiplepurchase a
left join userinfo b
on a.customer_id=b.customer_id
Expanding on #MT0 answer:
SELECT *
FROM customer_details CD
WHERE exists (
SELECT cust_id
FROM purchase_logs PL
where CD.customer_id = PL.customer_id
GROUP BY cust_id, item_id, to_char(purchase_date,'YYYYMM')
HAVING count(*) >= 2
);
I want to display the user info of customers who have more than 1 purchases in a single arbitrary month.
Just add a WHERE filter to your sub-query.
So assuming that you wanted the month of July 2021 and you had a purchase_date column (with a DATE or TIMESTAMP data type) in your purchase_logs table then you can use:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT cust_id
FROM purchase_logs
WHERE DATE '2021-07-01' <= purchase_date
AND purchase_date < DATE '2021-08-01'
GROUP BY cust_id
HAVING count(*) >= 2
);
If you want the users where they have bought two-or-more items in any single calendar month then:
SELECT *
FROM customer_details c
WHERE EXISTS (
SELECT 1
FROM purchase_logs p
WHERE c.customer_id = p.cust_id
GROUP BY cust_id, TRUNC(purchase_date, 'MM')
HAVING count(*) >= 2
);

Optimize SQL Script: getting range value from another table

My script I believe should be running but it may not be that 'efficient' and the main problem is I guess it's taking too long to run hence when I run it at work, the whole session is being aborted before it finishes.
I have basically 2 tables
Table A - contains every transactions a person do
Person's_ID Transaction TransactionDate
---------------------------------------
123 A 01/01/2017
345 B 04/06/2015
678 C 13/07/2015
123 F 28/10/2016
Table B - contains person's ID and GraduationDate
What I want to do is check if a person is active.
Active = if there is at least 1 transaction done by the person 1 month before his GraduationDate
The run time is too long because imagine if I have millions of persons and each persons do multiple transactions and these transactions are recorded line by line in Table A
SELECT
PERSON_ID
FROM
(SELECT PERSON_ID, TRANSACTIONDATE FROM TABLE_A) A
LEFT JOIN
(SELECT CIN, GRAD_DATE FROM TABLE_B) B
ON A.PERSON_ID = B.PERSON_ID
AND TRANSACTIONDATE <= GRAD_DATE
WHERE TRANSACTIONDATE BETWEEN GRAD_DATE - INTERVAL '30' DAY AND GRAD_DATE;
*Table A and B are products of joined tables hence they are subqueried.
If you just want active customers, I would try exists:
SELECT PERSON_ID
FROM TABLE_A A
WHERE EXISTS (SELECT 1
FROM TABLE_B B
WHERE A.PERSON_ID = B.PERSON_ID AND
A.TRANSACTIONDATE BETWEEN B.GRAD_DATE - INTERVAL '30' DAY AND GRAD_DATE
);
The performance, though, is likely to be similar to your query. If the tables were really tables, I would suggest indexes. In reality, you will probably need to understand the views (so you can create better indexes) or perhaps use temporary tables.
A non-equi-join might be quite inefficient (no matter if it's coded as join or a Not Exists), but the logic can be rewritten to:
SELECT
PERSON_ID
FROM
( -- combine both Selects
SELECT 0 AS flag -- indicating source table
PERSON_ID, TRANSACTIONDATE AS dt
FROM TABLE_A
UNION ALL
SELECT 1 AS flag,
PERSON_ID, GRAD_DATE
FROM TABLE_B
) A
QUALIFY
flag = 1 -- only return a row from table B
AND Min(dt) -- if the previous row (from table A) is within 30 days
Over (PARTITION BY PERSON_ID
ORDER BY dt, flag
ROWS BETWEEN 1 Preceding AND 1 Preceding) >= dt - 30
This assumes that there's only one row from table A per person, otherwise the MIN has to be changed to:
AND MAX(CASE WHEN flag = 1 THEN dt END) -- if the previous row (from table A) is within 30 days
Over (PARTITION BY PERSON_ID
ORDER BY dt, flag
ROWS UNBOUNDED Preceding) >= dt - 30

SQL Query Optimization, looking at first x number of records

This query works, but is there a better way to write this query? The current one seems slow. The scenario is very straight forward.
I have two tables, Customers and Payments. The Customers table has what you would expect with Customer info. The Payments table keeps track of the monthly payments that a Customer makes. It has a few fields we need to look at - DueDate, PaymentDate, and CustomerID.
The query I want is I want all Customers who were late by at least 3 months in their first 12 payments. The query I have is below, but it seems to be pretty slow. Is there a better way to write this, than what I have below?
SELECT CustomerID
FROM Customers AS C
WHERE EXISTS ( SELECT DueDate, CustomerID, PaymentDate
FROM ( SELECT TOP 12 *
FROM Payments as P
WHERE P.CustomerID = C.CustomerID
ORDER BY PaymentDate
) AS First12Payments
WHERE DATEDIFF(MONTH, First12Payments.DueDate, First12Payments.PaymentDate) > 3 )
Thanks!
Well, the suggestions in the comment by Joe Enos and Brandon are great. However, if you can't add that column there 2 minor changes to your SQL statement that will probably make it a little bit faster. to make it better you will probably need to add indexes to columns DueDate and PaymentDate in Payments table.
SELECT CustomerID
FROM Customers AS C
WHERE EXISTS ( SELECT 1 -- no need for a columns list since you only check for existance
FROM (SELECT TOP 12 DueDate, PaymentDate -- no need for all the columns, only the ones you use
FROM Payments as P
WHERE P.CustomerID = C.CustomerID
ORDER BY PaymentDate
) AS First12Payments
WHERE DATEDIFF(MONTH, First12Payments.DueDate, First12Payments.PaymentDate) > 3

SQL to Generate Periodic Snapshots from Transactions Table

I'm trying to create a periodic snapshot view from a database's transaction table after the fact. The transaction table has the following fields:
account_id (foreign key)
event_id
status_dt
status_cd
Every time an account changes status in the application, a new row is added to the transaction table with the new status. I'd like to produce a view that shows the count of accounts by status on every date; it should have the following fields:
snapshot_dt
status_cd
count_of_accounts
This will get the count for any given day, but not for all days:
SELECT status_cd, COUNT(account_id) AS count_of_accounts
FROM transactions
JOIN (
SELECT account_id, MAX(event_id) AS event_id
FROM transactions
WHERE status_dt <= DATE '2014-12-05') latest
USING (account_id, event_id)
GROUP BY status_cd
Thank you!
Okay, this is going to be hard to explain.
On each date for each status, you should count up two values:
The number of customers who start with that status.
The number of customers who leave with that status.
The first value is easy. It is just the aggregation of the transactions by the date and the status.
The second value is almost as easy. You get the previous status code and count the number of times that that status code "leaves" on that date.
Then, the key is the cumulative sum of the first value minus the cumulative sum of the second value.
I freely admit that the following code is not tested (if you had a SQL Fiddle, I'd be happy to test it). But this is what the resulting query looks like:
select status_dte, status_cd,
(sum(inc_cnt) over (partition by status_cd order by status_dt) -
sum(dec_cnt) over (partition by status_cd order by status_dt)
) as dateamount
from ((select t.status_dt, t.status_cd, count(*) as inc_cnt, 0 as dec_cnt
from transactions t
group by t.status_dt, t.status_cd
) union all
(select t.status_dt, prev_status_cd, 0, count(*)
from (select t.*
lag(t.status_cd) over (partition by t.account_id order by status_dt) as prev_status_cd
from transactions t
) t
where prev_status_cd is null
group by t.status_dt, prev_status_cd
)
) t;
If you have dates where there is no change for one or more statuses and you want to include those in the output, then the above query would need to use cross join to first create the rows in the result set. It is unclear if this is a requirement, so I'm leaving out that complication.

How to add 1 records data to previous?

i am stuck in problem like i am passing accountID and on the basis of that SP picks amount details of a person e.g.
AccountID AccountTitle TransactionDate Amount
1 John01 2014/11/28 20
now if there is 2nd or more records for same accountID then it should add with previous e.g. if 2nd record for accountID 1 is 40 then amount should display 60 (such that it should be already added to 20 and display total in 2nd record)
AccountID AccountTitle TransactionDate Amount
1 John01 2014/12/30 60 (in real it was 40 but it should show result after being added to 1st record)
and same goes for further records
Select Payments.Accounts.AccountID, Payments.Accounts.AccountTitle,
Payments.Transactions.DateTime as TranasactionDateTime,
Payments.Transactions.Amount from Payments.Accounts
Inner Join Payments.Accounts
ON Payments.Accounts.AccountID = Payments.Transactions.Account_ID
Inner Join Payments.Transactions
where Payments.Transactions.Account_ID = 1
it has wasted my time and can't tackle it anymore, so please help me,
SQL Server 2012+ supports cumulative sums (which seems to be what you want):
Select a.AccountID, a.AccountTitle, t.DateTime as TranasactionDateTime,
t.Amount,
sum(t.Amount) over (partition by t.Account_Id order by t.DateTime) as RunningAmount
from Payments.Accounts a Inner Join
Payments.Transactions t
on a.AccountID = t.Account_ID
where t.Account_ID = 1;
In earlier versions of SQL Server you can most easily do this with a correlated subquery or using cross apply.
I also fixed your query. I don't know why you were joining to the Accounts table twice. Also, table aliases make queries much easier to write and to read.
Here is the answer if grouping by all columns is acceptable to you.
Select AccountID, AccountTitle, TransactionDate, SUM(Payments.Transactions.Amount)
from Payments.Accounts
group by AccountID, AccountTitle, TransactionDate
If you want to group only by AccountId, The query is this:
Select AccountID, SUM(Payments.Transactions.Amount)
from Payments.Accounts
group by AccountID
In the second query, the AccountTitle and TransactionDate are missing because they are not used in the group by clause. To include them in the results, you must think of a rule to decide which row of the multiple rows with the same AccountID is used to get the values AccountTitle and TransactionDate.
What version of SQL-Server are you using? This should do the trick:
Select AccountID, AccountTitle, TransactionData,
SUM(Amount) OVER (partiton by AccountID order by TransactionDate) .
from yourtable group by AccountID, AccountTitle, TransactionData
You take group of rows with AccountID, order them by Transaction date and count SUM in that group by Transaction date .