SQL: Difference between consecutive rows - sql

Table with 3 columns: order id, member id, order date
Need to pull the distribution of orders broken down by No. of days b/w 2 consecutive orders by member id
What I have is this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id+1;
It's not helping me completely as the output I need is:

You can use lag() to get the date of the previous order by the same customer:
select o.*,
datediff(
order_date,
lag(order_date) over(partition by member_id order by order_date, order_id)
) days_diff
from orders o
When there are two rows for the same date, the smallest order_id is considered first. Also note that I fixed your datediff() syntax: in Hive, the function just takes two dates, and no unit.
I just don't get the logic you want to compute num_orders.

May be something like this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id
where not exists (
select intermediate_order
from orders as intermedite_order
where intermediate_order.order_date < a1.order_date and intermediate_order.order_date > a2.order_date) ;

Related

How to summarize information over the dynamic period in sql?

I have a table with orders and the following fields:
create table orders2 (
orderID int,
customerID int,
date DateTime,
amount int)
engine=Memory;
Each customer can make 0 or many orders each day. I need to create an SQL query that will show for each customer how many orders he/she made during the period of 3 days starting from the day when the customer has made his/her first order.
So, for each customer, the query should detect the date of the first order, then compute the date that is 3 days in the future from the first date, then filter rows to take only orders with dates in the given range, and then perform counting of orders (orderID) in that time period. At the moment, I was able to just detect the date of the first order for each customer.
SELECT
O.customerID,
O.date AS first_day,
COUNT(O.orderID) AS first_day_orders_num,
SUM(O.amount) AS first_day_amount
FROM orders2 AS O
INNER JOIN
(
SELECT
customerID,
MIN(date) AS first_date
FROM orders2
GROUP BY customerID
) AS I ON (O.customerID = I.customerID) AND (O.date = I.first_date)
GROUP BY
O.customerID,
O.date
I don't really understand what result do you need. Probably it can be solved using arrays.
Here is solution using vanilla sql
select customerID, min(first_date), sum(num_orders_per_day)
from (
select customerID, date, min(date) first_date, count() num_orders_per_day
from orders2
group by customerID, date
having date <= first_date + interval 3 days
)
group by customerID
You can use window functions to get the first order date:
select o.CustomerID, count(*) as num_orders_3_days
from (select o.*, min(date) over (partition by CustomerID) as min_date
from orders o
) o
where date < min_date + interval '3 day'
group by CustomerID;
Try this query:
SELECT customerID, orders_count
FROM (
SELECT customerID,
arraySort(x -> x.1, groupArray((date, orderID))) sorted_date_per_order_pairs,
sorted_date_per_order_pairs[1].1 + INTERVAL 3 day AS end_date,
arrayFilter(x -> x.1 < end_date, sorted_date_per_order_pairs) orders_in_period,
length(orders_in_period) orders_count
FROM orders2
GROUP BY customerID);

Difference between multiple dates

I am working in a database with multiple orders of multiple suppliers. Now I would like to know the difference in days between order 1 and order 2, order 2 and order 3, order 3 and order 4 and so on.. For each supplier on its own. I need this to generate the Standard Deviation for each supplier based on their days between orders.
Hopefully someone can help..
What you describe is lag() with aggregation:
select supplier,
stddev(orderdate - prev_orderdate) as std_orderdate
from (select t.*,
lag(orderdate) over (partition by supplier order by orderdate) as prev_orderdate
from t
) t
group by supplier;
You would typically use window function lag() and date arithmetics.
Assuming the following data structure for table orders:
order_id int primary key
supplier_id int
order_date date
You would go:
select
i.*,
order_date
- lag(order_date) over(partition by supplier_id order by order_date) date_diff
from orders o
Which gives you, for each order, the difference in days from the previous order of the same supplier (or null if this is the first order of the supplier).
You can then compute the standard deviation with aggregation:
select supplier_id, stddev(date_diff)
from (
select
o.*,
order_date
- lag(order_date) over(partition by supplier_id order by order_date) date_diff
from orders o
) x
group by supplier_id

Postgresql: How to use a WITH subquery with JOIN

I have 2 tables: orders and contragents. Each contragent might have many orders. Each order has an order_date. I want to get a first order date for each contragent, but with a caveat: if there was a gap between orders more than 180 days, I need to "forget" those before the gap (and thus the first order after the gap is considered "the first".
For this, I've implement a following statement:
with o1 as (
select order_date, lag(order_date) over(order by order_date ASC) as prev_order_date
from orders o
where o.contragent_code = :code
order by order_date desc)
select o1.date_debts from o1
where extract(day from o1.order_date-o1.prev_order_date)>=180 or o1.prev_order_date is null
order by order_date desc
limit 1
this results in a single value being returned for a contragent with code code, which is what I need.
But I cannot figure out how to run a select that would return this date for every contragent in a table!
The only way I was able to do it was using a CREATE FUNCTION, but I will be unable to do it on production, so.. any advice is highly appreciated!
You want to add partition by, which is kinda like group by for over.
with o1 as (
select order_date, lag(order_date) over(partition by contragent_code order by order_date ASC) as prev_order_date
from orders o
order by order_date desc)
select o1.date_debts from o1
where extract(day from o1.order_date-o1.prev_order_date)>=180 or o1.prev_order_date is null
order by order_date desc
Now lag looks for the previous order_date of rows with same contragent_code.
UPDATE: at the end, it appears that that was not exactly enough. This is the final statement:
with s as (
select o.contragent_code, o.order_date,
case
when
extract(day from order_date-lag(order_date) over(partition by contragent_code order by order_date asc))>=180
then o.order_date else null
end as date_with_gap
from orders o
) select contragent_code, coalesce(max(date_with_gap), min(order_date)) from s
group by contragent_code

Calculating business days in Teradata

I need help in business days calculation.
I've two tables
1) One table ACTUAL_TABLE containing order date and contact date with timestamp datatypes.
2) The second table BUSINESS_DATES has each of the calendar dates listed and has a flag to indicate weekend days.
using these two tables, I need to ensure business days and not calendar days (which is the current logic) is calculated between these two fields.
My thought process was to first get a range of dates by comparing ORDER_DATE with TABLE_DATE field and then do a similar comparison of CONTACT_DATE to TABLE_DATE field. This would get me a range from the BUSINESS_DATES table which I can then use to calculate count of days, sum(Holiday_WKND_Flag) fields making the result look like:
Order# | Count(*) As DAYS | SUM(WEEKEND DATES)
100 | 25 | 8
However this only works when I use a specific order number and cant' bring all order numbers in a sub query.
My Query:
SELECT SUM(Holiday_WKND_Flag), COUNT(*) FROM
(
SELECT
* FROM
BUSINESS_DATES
WHERE BUSINESS.Business BETWEEN (SELECT ORDER_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
AND
(SELECT CONTACT_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
TEMP
Uploading the table structure for your reference.
SELECT ORDER#, SUM(Holiday_WKND_Flag), COUNT(*)
FROM business_dates bd
INNER JOIN actual_table at ON bd.table_date BETWEEN at.order_date AND at.contact_date
GROUP BY ORDER#
Instead of joining on a BETWEEN (which always results in a bad Product Join) followed by a COUNT you better assign a bussines day number to each date (in best case this is calculated only once and added as a column to your calendar table). Then it's two Equi-Joins and no aggregation needed:
WITH cte AS
(
SELECT
Cast(table_date AS DATE) AS table_date,
-- assign a consecutive number to each busines day, i.e. not increased during weekends, etc.
Sum(CASE WHEN Holiday_WKND_Flag = 1 THEN 0 ELSE 1 end)
Over (ORDER BY table_date
ROWS Unbounded Preceding) AS business_day_nbr
FROM business_dates
)
SELECT ORDER#,
Cast(t.contact_date AS DATE) - Cast(t.order_date AS DATE) AS #_of_days
b2.business_day_nbr - b1.business_day_nbr AS #_of_business_days
FROM actual_table AS t
JOIN cte AS b1
ON Cast(t.order_date AS DATE) = b1.table_date
JOIN cte AS b2
ON Cast(t.contact_date AS DATE) = b2.table_date
Btw, why are table_date and order_date timestamp instead of a date?
Porting from Oracle?
You can use this query. Hope it helps
select order#,
order_date,
contact_date,
(select count(1)
from business_dates_table
where table_date between a.order_date and a.contact_date
and holiday_wknd_flag = 0
) business_days
from actual_table a

Sql Subquery Syntax

I want to execute a subquery using the current customer ID as I try to describe below
SELECT DISTINCT Customer_Id,
(SELECT SUM (total) FROM Orders where Customer_Id = Customer_Id AND CAST(Date) > DayIspecify )
FROM Orders where shop_id= '1-9THT'
What I want is to calculate the SUM each customer spent over a specified time period on the specific shop.
SELECT Customer_Id, SUM(total) SumTotal
FROM Orders
where shop_id= '1-9THT'
group by Customer_id
Not require subquery
Try this:
SELECT Customer_Id,SUM(total)FROM Orders WHERE shop_id='1-9THT' GROUP BY Customer_Id
(Updated) Try:
select Customer_Id,
sum(case when o.shop_id = '1-9THT' and Date > DayIspecify
then total else 0 end) total
from Orders
group by Customer_Id
- to return all customers recorded on the Orders table, together with the values of any of their orders placed through shop 1-9THT after the date specified. (Change > to >= to make it on or after the date specified.)
Use SQL GroupBy
SELECT DISTINCT Customer_Id, SUM (total) FROM Orders where shop_id= '1-9THT' group by customer_Id