Compare two dates and compute date difference - sql

I have to tables to compare:
Schedule table:
Official receipt table:
I just want to know if the client is paying according to his schedule by comparing the dates between Official receipt and Schedule table. if not, it will give him a penalty of $10 daily by counting the days from scheduled date.
example: the 1st schedule of payment is 2019-11-02. but the OR shows he paid on 2019-12-10. which is 38 days later than his 1st payment schedule. then penalty will be imposed. Any Idea? Thank you.
I want something like this:
Loanid | PaymentSched | Date OR | Past Due | Penalty
H1807.0008 | 2019-11-02 | 2019-12-10 | 38 Days | 380

Assuming that there is no missing payment and no partial payment, then one option is to enumerate the scheduled payments and receipts with row_number(), then join them together. The rest is is just filtering on late payment and computing the days late and penalty:
select
s.loan_id,
s.date_payment,
r.date_or,
datediff(day, s.date_payment, r.date_or) as past_due_days,
10 * datediff(day, s.date_payment, r.date_or) as penalty
from (
select s.*, row_number() over(partition by loan_id order by date_payment) rn
from schedule s
where total_payment > 0
) s
inner join (
select r.*, row_number() over(partition by loan_id order by date_or) rn
from official_receipt r
) r on s.loan_id = r.loan_id and s.rn = r.rn and s.total_payment = r.amount
where r.date_or > s.date_payment

datediff will help you
select datediff(day,'2019-10-02','2019-12-10')
select [Loanid],[PaymentSched],[Date OR],datediff(day,[PaymentSched],[Date OR]) as [Past Due],datediff(day,[PaymentSched],[Date OR])*10 as Penalty
from Schedule s
join [Official receipt] o on o.[Loanid]=s.[Loanid]

Related

Keep the sum even on days without revenue in cumulative sum when using window function in Presto

So my problem is that I have sales data from, say for the sake of clarity, 3 different products. I will be selling 10 of these a week and I want to visualize them in a cumulative sum. I have been using the following little snippet to get the cumulative sum of the revenue.
SUM(revenue) OVER (PARTITION BY purchase_date, product ORDER BY day) AS cumulative_revenue
However, this is not sufficient since I have the product in the window function. It works only as long as I have sales data on every single day of the week for each product. If I sell product_1 four pieces a week, the result from the query will show cumulative revenue only from those days. So this means that if I have all sales of that product on Monday to Wednesday, the rest of the week won't have them listed in the output. This causes problems if I try to visualize the data by stacking the results as the rest of the week will have lower cumulative revenue that the beginning of the week.
So what I want is to get it to show 0 on revenue for all products on all days of the week. I can of course do this with some cross join magic, but it is sloooow since I have quite a lot of rows, so is there a way to do it with a window function?
My data looks a bit like this
purchase_date|product|buyer|revenue
-----------------------------------
12/12/2020 | pr_1 | a | 100.0
12/12/2020 | pr_2 | b | 200.0
13/12/2020 | pr_1 | d | 100.0
14/12/2020 | pr_1 | t | 100.0
...
You can generate a row for all product/date combinations by using a cross join and then a left join. I suspect you want something like this:
select p.product, d.purchase_date,
sum(t.revenue) as revenue_on_date,
sum(sum(t.revenue)) over (partition by product order by d.purchase_date) as cumulative_revenue
from (select distinct product from t) p cross join
(select distinct purchase_date from t) d left join
t
on p.product = t.product and
d.purchase_date = t.purchase_date
group by p.product, p.purchase_date;
Note: This assumes that there is at least one purchase on each day. Otherwise, you might need another source for all dates in the range you care about.

SQL monthly rolling sum

I am trying to calculate monthly balances of bank accounts from the following postgresql table, containing transactions:
# \d transactions
View "public.transactions"
Column | Type | Collation | Nullable | Default
--------+------------------+-----------+----------+---------
year | double precision | | |
month | double precision | | |
bank | text | | |
amount | numeric | | |
In "rolling sum" I mean that the sum should contain the sum of all transactions until the end of the given month from the beginning of time, not just all transactions in thegiven month.
I came up with the following query:
select
a.year, a.month, a.bank,
(select sum(b.amount) from transactions b
where b.year < a.year
or (b.year = a.year and b.month <= a.month))
from
transactions a
order by
bank, year, month;
The problem is that this contains as many rows for each of the months for each banks as many transactions were there. If more, then more, if none, then none.
I would like a query which contains exactly one row for each bank and month for the whole time interval including the first and last transaction.
How to do that?
An example dataset and a query can be found at https://rextester.com/WJP53830 , courtesy of #a_horse_with_no_name
You need to generate a list of months first, then you can outer join your transactions table to that list.
with all_years as (
select y.year, m.month, b.bank
from generate_series(2010, 2019) as y(year) --<< adjust here for your desired range of years
cross join generate_series(1,12) as m(month)
cross join (select distinct bank from transactions) as b(bank)
)
select ay.*, sum(amount) over (partition by ay.bank order by ay.year, ay.month)
from all_years ay
left join transactions t on (ay.year, ay.month, ay.bank) = (t.year::int, t.month::int, t.bank)
order by bank, year, month;
The cross join with all banks is necessary so that the all_years CTE will also contain a bank for each month row.
Online example: https://rextester.com/ZZBVM16426
Here is my suggestion in Oracle 10 SQL:
select a.year,a.month,a.bank, (select sum(b.amount) from
(select a.year as year,a.month as month,a.bank as bank,
sum(a.amount) as amount from transactions c
group by a.year,a.month,a.bank
) b
where b.year<a.year or (b.year=a.year and b.month<=a.month))
from transactions a order by bank, year, month;
Consider aggregating all transactions first by bank and month, then run a window SUM() OVER() for rolling monthly sum since earliest amount.
WITH agg AS (
SELECT t.year, t.month, t.bank, SUM(t.amount) AS Sum_Amount
FROM transactions t
GROUP BY t.year, t.month, t.bank
)
SELECT agg.year, agg.month, agg.bank,
SUM(agg.Sum_Amount) OVER (PARTITION BY agg.bank ORDER BY agg.year, agg.month) AS rolling_sum
FROM agg
ORDER BY agg.year, agg.month, agg.bank
Should you want YTD rolling sums, adjust the OVER() clause by adding year to partition:
SUM(agg.Sum_Amount) OVER (PARTITION BY agg.bank, agg.year ORDER BY agg.month)

Calculate average between rows in SQL by using lag and ignore first row

I am trying to write a SQL query that calculate the average days from purchase to purchase for all customers who made two or more purchases:
Customer_ID | Average number of day
1033 | 175
11 | 334
1100 | 202.5
111 | 52.5
I succeeded to show all the purchase dates for all customers and calculate the days between purchase to purchase.
SELECT Customer_ID, Order_Date Cur,
LAG(Order_Date, 1) OVER (ORDER BY Customer_ID) AS Previous,
DATEDIFF(day, LAG(Order_Date, 1) OVER (ORDER BY Customer_ID), Order_Date)
[Days Between Purchases]
FROM Orders
How can I ignore the first row per customer and calculate averages between purchase to purchase?
(I have to use LAG in my answer
The simplest method is aggregation and some arithmetic:
SELECT CustomerId,
DATEDIFF(day, MIN(o.Order_Date), MAX(o.Order_date)) * 1.0 / NULLIF(COUNT(*) - 1, 0)
FROM Orders o
GROUP BY CustomerId
HAVING COUNT(*) >= 2;
In a sense, the "average days between orders" is a trick question. You think that you have to calculate the difference between each order and the next.
In fact, you just need to divide the total time from the first order to the last order by one less than the number of orders. I can let you work out why this works.
Your script is almost OK. Just add the customer ID relation to it. Other wise first row all eact customer will not be NULL.
SELECT
cur.Customer_ID,
cur.Order_Date Cur,
previous.Order_Date Previous,
DATEDIFF(day, previous.Order_Date, cur.Order_Date) [Days Between purchases]
FROM tblDifference cur
LEFT OUTER JOIN tblDifference previous
ON cur.RowNumber = previous.RowNumber+1
AND cur.customer_id = previous.customer_id;

SQL Retention Cohort Analysis

I am trying to write a query for monthly retention, to calculate percentage of users returning from their initial start month and moving forward.
TABLE: customer_order
fields
id
date
store_id
TABLE: customer
id
person_id
job_id
first_time (bool)
This gets me the initial monthly cohorts based on the first dates
SELECT first_job_month, COUNT( DISTINCT person_id) user_counts
FROM
( SELECT DATE_TRUNC(MIN(CAST(date AS DATE)), month) first_job_month, person_id
FROM customer_order cd
INNER JOIN consumer co ON co.job_id = cd.id
GROUP BY 2
ORDER BY 1 ) first_d GROUP BY 1 ORDER BY 1
first_job_month user_counts
2018-04-01 36
2018-05-01 37
2018-06-01 39
2018-07-01 45
2018-08-01 38
I have tried a bunch of things, but I can't figure out how to keep track of the original cohorts/users from the first month onwards
Get your the first order month for every customer
Join orders to the previous subquery to find out what is the difference in months between the given order and the first order
Use conditional aggregates to count customers that still order by X month
There are some alternative options like using window functions to do (1) and (2) in the same subquery but the easiest option is this one:
WITH
cohorts as (
SELECT person_id, DATE_TRUNC(MIN(CAST(date AS DATE)), month) as first_job_month
FROM customer_order cd
JOIN consumer co
ON co.job_id = cd.id
GROUP BY 1
)
,orders as (
SELECT
*
,round(1.0*(DATE_TRUNC(MIN(CAST(cd.date AS DATE))-c.first_job_month)/30) as months_since_first_order
FROM cohorts c
JOIN customer_order cd
USING (person_id)
)
SELECT
first_job_month as cohort
,count(distinct person_id) as size
,count(distinct case when months_since_first_order>=1 then person_id end) as m1
,count(distinct case when months_since_first_order>=2 then person_id end) as m2
,count(distinct case when months_since_first_order>=3 then person_id end) as m3
-- hardcode up to the number of months you want and the history you have
FROM orders
GROUP BY 1
ORDER BY 1
See, you can use CASE statements inside the aggregate functions like COUNT to identify different subsets of rows that you'd like to aggregate within the same group. This is one of the most important BI techniques in SQL.
Note, >= not = is used in the conditional aggregate so that for example if the customer buys in m3 after m1 and doesn't buy in m2 they will still be counted in m2. If you want your customers to buy every month and/or see the actual retention for every month and are ok if subsequent months values can be higher than previous you can use =.
Also, if you don't want the "triangle" view like one you get from this query or you don't want to hardcode the "mX" part you would just group by first_job_month and months_since_first_order and count distinct. Some visualization tools might consume this simple format and make a triangle view out of it.

query to display additional column based on aggregate value

I've been mulling on this problem for a couple of hours now with no luck, so I though people on SO might be able to help :)
I have a table with data regarding processing volumes at stores. The first three columns shown below can be queried from that table. What I'm trying to do is to add a 4th column that's basically a flag regarding if a store has processed >=$150, and if so, will display the corresponding date. The way this works is the first instance where the store has surpassed $150 is the date that gets displayed. Subsequent processing volumes don't count after the the first instance the activated date is hit. For example, for store 4, there's just one instance of the activated date.
store_id sales_volume date activated_date
----------------------------------------------------
2 5 03/14/2012
2 125 05/21/2012
2 30 11/01/2012 11/01/2012
3 100 02/06/2012
3 140 12/22/2012 12/22/2012
4 300 10/15/2012 10/15/2012
4 450 11/25/2012
5 100 12/03/2012
Any insights as to how to build out this fourth column? Thanks in advance!
The solution start by calculating the cumulative sales. Then, you want the activation date only when the cumulative sales first pass through the $150 level. This happens when adding the current sales amount pushes the cumulative amount over the threshold. The following case expression handles this.
select t.store_id, t.sales_volume, t.date,
(case when 150 > cumesales - t.sales_volume and 150 <= cumesales
then date
end) as ActivationDate
from (select t.*,
sum(sales_volume) over (partition by store_id order by date) as cumesales
from t
) t
If you have an older version of Postgres that does not support cumulative sum, you can get the cumulative sales with a subquery like:
(select sum(sales_volume) from t t2 where t2.store_id = t.store_id and t2.date <= t.date) as cumesales
Variant 1
You can LEFT JOIN to a table that calculates the first date surpassing the 150 $ limit per store:
SELECT t.*, b.activated_date
FROM tbl t
LEFT JOIN (
SELECT store_id, min(thedate) AS activated_date
FROM (
SELECT store_id, thedate
,sum(sales_volume) OVER (PARTITION BY store_id
ORDER BY thedate) AS running_sum
FROM tbl
) a
WHERE running_sum >= 150
GROUP BY 1
) b ON t.store_id = b.store_id AND t.thedate = b.activated_date
ORDER BY t.store_id, t.thedate;
The calculation of the the first day has to be done in two steps, since the window function accumulating the running sum has to be applied in a separate SELECT.
Variant 2
Another window function instead of the LEFT JOIN. May of may not be faster. Test with EXPLAIN ANALYZE.
SELECT *
,CASE WHEN running_sum >= 150 AND thedate = first_value(thedate)
OVER (PARTITION BY store_id, running_sum >= 150 ORDER BY thedate)
THEN thedate END AS activated_date
FROM (
SELECT *
,sum(sales_volume)
OVER (PARTITION BY store_id ORDER BY thedate) AS running_sum
FROM tbl
) b
ORDER BY store_id, thedate;
->sqlfiddle demonstrating both.