Create new column based on some conditions in SQL - sql

I have the following table. Let's call it orders. I would like to add a new column to this existing table which calculates the days apart from the first order date by the customerid. If there are 0 days apart from the minimum sold date, then it should be 0.
From this
customerid orderdate
1 1/21/2018
1 1/21/2018
1 2/21/2018
1 5/22/2018
2 3/22/2018
3 4/5/2018
3 4/5/2018
to this
customerid orderdate daysapart
1 1/21/2018 0
1 1/21/2018 0
1 2/21/2018 30
1 2/21/2018 123
2 3/22/2018 0
3 4/5/2018 0
3 4/5/2018 0

Using a Windowed Aggregate:
select customerid, orderdate,
orderdate - min(orderdate) over (partition by customerid) as daysapart
from mytab

Here is one approach, using a correlated subquery:
SELECT
t1.customerid,
t1.orderdate,
t1.orderdate - (SELECT MIN(t2.orderdate)
FROM your_table t2
WHERE t1.customerid = t2.customerid) daysapart
FROM your_table t1;

Related

Calculating average time between customer orders and average order value in Postgres

In PostgreSQL I have an orders table that represents orders made by customers of a store:
SELECT * FROM orders
order_id
customer_id
value
created_at
1
1
188.01
2020-11-24
2
2
25.74
2022-10-13
3
1
159.64
2022-09-23
4
1
201.41
2022-04-01
5
3
357.80
2022-09-05
6
2
386.72
2022-02-16
7
1
200.00
2022-01-16
8
1
19.99
2020-02-20
For a specified time range (e.g. 2022-01-01 to 2022-12-31), I need to find the following:
Average 1st order value
Average 2nd order value
Average 3rd order value
Average 4th order value
E.g. the 1st purchases for each customer are:
for customer_id 1, order_id 8 is their first purchase
customer 2, order 6
customer 3, order 5
So, the 1st-purchase average order value is (19.99 + 386.72 + 357.80) / 3 = $254.84
This needs to be found for the 2nd, 3rd and 4th purchases also.
I also need to find the average time between purchases:
order 1 to order 2
order 2 to order 3
order 3 to order 4
The final result would ideally look something like this:
order_number
AOV
av_days_since_last_order
1
254.84
0
2
300.00
28
3
322.22
21
4
350.00
20
Note that average days since last order for order 1 would always be 0 as it's the 1st purchase.
Thanks.
select order_number
,round(avg(value),2) as AOV
,coalesce(round(avg(days_between_orders),0),0) as av_days_since_last_order
from
(
select *
,row_number() over(partition by customer_id order by created_at) as order_number
,created_at - lag(created_at) over(partition by customer_id order by created_at) as days_between_orders
from t
) t
where created_at between '2022-01-01' and '2022-12-31'
group by order_number
order by order_number
order_number
aov
av_days_since_last_order
1
372.26
0
2
25.74
239
3
200.00
418
4
201.41
75
5
159.64
175
Fiddle
Im suppose it should be something like this
WITH prep_data AS (
SELECT order_id,
cuntomer_id,
ROW_NUMBER() OVER(PARTITION BY order_id, cuntomer_id ORDER BY created_at) AS pushcase_num,
created_at,
value
FROM pushcases
WHERE created_at BETWEEN :date_from AND :date_to
), prep_data2 AS (
SELECT pd1.order_id,
pd1.cuntomer_id,
pd1.pushcase_num
pd2.created_at - pd1.created_at AS date_diff,
pd1.value
FROM prep_data pd1
LEFT JOIN prep_data pd2 ON (pd1.order_id = pd2.order_id AND pd1.cuntomer_id = pd2.cuntomer_id AND pd1.pushcase_num = pd2.pushcase_num+1)
)
SELECT order_id,
cuntomer_id,
pushcase_num,
avg(value) AS avg_val,
avg(date_diff) AS avg_date_diff
FROM prep_data2
GROUP BY pushcase_num

SQL - Find if column dates include at least partially a date range

I need to create a report and I am struggling with the SQL script.
The table I want to query is a company_status_history table which has entries like the following (the ones that I can't figure out)
Table company_status_history
Columns:
| id | company_id | status_id | effective_date |
Data:
| 1 | 10 | 1 | 2016-12-30 00:00:00.000 |
| 2 | 10 | 5 | 2017-02-04 00:00:00.000 |
| 3 | 11 | 5 | 2017-06-05 00:00:00.000 |
| 4 | 11 | 1 | 2018-04-30 00:00:00.000 |
I want to answer to the question "Get all companies that have been at least for some point in status 1 inside the time period 01/01/2017 - 31/12/2017"
Above are the cases that I don't know how to handle since I need to add some logic of type :
"If this row is status 1 and it's date is before the date range check the next row if it has a date inside the date range."
"If this row is status 1 and it's date is after the date range check the row before if it has a date inside the date range."
I think this can be handled as a gaps and islands problem. Consider the following input data: (same as sample data of OP plus two additional rows)
id company_id status_id effective_date
-------------------------------------------
1 10 1 2016-12-15
2 10 1 2016-12-30
3 10 5 2017-02-04
4 10 4 2017-02-08
5 11 5 2017-06-05
6 11 1 2018-04-30
You can use the following query:
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
ORDER BY company_id, effective_date
to get:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 0
2 10 1 2016-12-30 1
3 10 5 2017-02-04 2
4 10 4 2017-02-08 2
5 11 5 2017-06-05 0
6 11 1 2018-04-30 0
Now you can identify status = 1 islands using:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
)
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
Output:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 1
2 10 1 2016-12-30 1
3 10 5 2017-02-04 1
4 10 4 2017-02-08 2
5 11 5 2017-06-05 1
6 11 1 2018-04-30 2
Calculated field grp will help us identify those islands:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
), CTE2 AS
(
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
)
SELECT company_id,
MIN(effective_date) AS start_date,
CASE
WHEN COUNT(*) > 1 THEN DATEADD(DAY, -1, MAX(effective_date))
ELSE MIN(effective_date)
END AS end_date
FROM CTE2
GROUP BY company_id, grp
HAVING COUNT(CASE WHEN status_id = 1 THEN 1 END) > 0
Output:
company_id start_date end_date
-----------------------------------
10 2016-12-15 2017-02-03
11 2018-04-30 2018-04-30
All you want know is those records from above that overlap with the specified interval.
Demo here with somewhat more complicated use case.
Maybe this is what you are looking for? For these kind of questions, you need to join two instance of your table, in this case I am just joining with next record by Id, which probably is not totally correct. To do it better, you can create a new Id using a windowed function like row_number, ordering the table by your requirement criteria
If this row is status 1 and it's date is before the date range check
the next row if it has a date inside the date range
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
else NULL
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
Implementing second criteria:
"If this row is status 1 and it's date is after the date range check
the row before if it has a date inside the date range."
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
when csh1.status_id=1 and csh1.effective_date>#range_en
then
case
when csh3.effective_date between #range_st and #range_en then true
else false
end
else null -- ¿?
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
left join company_status_history csh3
on csh1.id=csh3.id-1
I would suggest the use of a cte and the window functions ROW_NUMBER. With this you can find the desired records. An example:
DECLARE #t TABLE(
id INT
,company_id INT
,status_id INT
,effective_date DATETIME
)
INSERT INTO #t VALUES
(1, 10, 1, '2016-12-30 00:00:00.000')
,(2, 10, 5, '2017-02-04 00:00:00.000')
,(3, 11, 5, '2017-06-05 00:00:00.000')
,(4, 11, 1, '2018-04-30 00:00:00.000')
DECLARE #StartDate DATETIME = '2017-01-01';
DECLARE #EndDate DATETIME = '2017-12-31';
WITH cte AS(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) AS rn
FROM #t
),
cteLeadLag AS(
SELECT c.*, ISNULL(c2.effective_date, c.effective_date) LagEffective, ISNULL(c3.effective_date, c.effective_date)LeadEffective
FROM cte c
LEFT JOIN cte c2 ON c2.company_id = c.company_id AND c2.rn = c.rn-1
LEFT JOIN cte c3 ON c3.company_id = c.company_id AND c3.rn = c.rn+1
)
SELECT 'Included' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Following' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date > #EndDate
AND LagEffective BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Trailing' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date < #EndDate
AND LeadEffective BETWEEN #StartDate AND #EndDate
I first select all records with their leading and lagging Dates and then I perform your checks on the inclusion in the desired timespan.
Try with this, self-explanatory. Responds to this part of your question:
I want to answer to the question "Get all companies that have been at
least for some point in status 1 inside the time period 01/01/2017 -
31/12/2017"
Case that you want to find those id's that have been in any moment in status 1 and have records in the period requested:
SELECT *
FROM company_status_history
WHERE id IN
( SELECT Id
FROM company_status_history
WHERE status_id=1 )
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'
Case that you want to find id's in status 1 and inside the period:
SELECT *
FROM company_status_history
WHERE status_id=1
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'

3 or more consecutive entries in the last 15 days

I have the following data:
ID EMP_ID SALE_DATE
---------------------------------
1 777 5/28/2016
2 777 5/29/2016
3 777 5/30/2016
4 777 5/31/2016
5 888 5/26/2016
6 888 5/28/2016
7 888 5/29/2016
8 999 5/29/2016
9 999 5/30/2016
10 999 5/31/2016
i need to fetch data for emp_id having 3 or more days of consecutive sales in the last 15 days.
Output should be:
777
999
Following is the query:
SELECT TRUNC (sale_date), emp_id
FROM table1
WHERE sale_date >= SYSDATE - 14
GROUP BY TRUNC (sale_date), emp_id
HAVING COUNT (*) >= 3
But this returns consecutive transactions in the last three days only.
Note: This is oracle.
Assuming you have one row per day, you can use lead():
select distinct emp_id
from (select t1.*,
lead(sale_date, 1) over (partition by emp_id order by sale_date) as sd_1,
lead(sale_date, 2) over (partition by emp_id order by sale_date) as sd_2
from table1 t1
where sale_date >= trunc(sysdate) - 14
) t
where sd_1 = sale_date + 1 and
sd_2 = sale_date + 2;

New column which indicate min date from group of value

I would like to add new column which will be indicating min value of subgroup.
Id ShopId OrderDate
12232018 12229018 2011-01-01 00:00:00.000
12232018 12229018 2012-01-01 00:00:00.000
12232018 12394018 2012-02-02 00:00:00.000
12232018 11386005 2012-03-01 00:00:00.000
12232018 14347023 2012-04-02 00:00:00.000
12232018 14026026 2014-03-16 00:00:00.000
Here is the result I want to get:
NewCol Id ShopId OrderDate
1 12232018 12229018 2011-01-01 00:00:00.000
1 12232018 12229018 2012-01-01 00:00:00.000
0 12232018 12394018 2012-02-02 00:00:00.000
0 12232018 11386005 2012-03-01 00:00:00.000
0 12232018 14347023 2012-04-02 00:00:00.000
0 12232018 14026026 2014-03-16 00:00:00.000
Because ShopId have min OrderDate for Id I would like to assign '1' to this ShopId.
You can use min with windowing function to get this as below:
select NewCol = Case when orderdate = min(orderdate) over() then 1 else 0 end,*
from yourtable
--Probably you might require to add Partition by Id or shopId depends on requirement
Try this:
SELECT Id, ShopId, OrderDate,
CASE
WHEN MIN(OrderDate) OVER (PARTITION BY Id, ShopId) =
MIN(OrderDate) OVER (PARTITION BY Id) THEN 1
ELSE 0
END AS NewCol
FROM mytable
The query uses windowed version of MAX in order to compare the minimum-per-Id OrderDate to the minimum-per- (Id, ShopId) date. If these two values are the same, then we mark the corresponding (Id, ShopId) partition with 1.
Demo here
Less elegant than the others, but is ANSI
select MyTable.*, case when a1.mindate = orderdate then 1 else 0 end as NewCol
from MyTable
inner join
(
select id, min(orderdate) as mindate
from Mytable
group by id
) a1
on a1.id = MyTable.id
Use min with orderdate on ShopId and use that in the Case When statement like this:-
Select case when (a.OrderDate=b.min_order_dt) then 1 else 0 end as NewCol, a.*
from
your_table_name a
inner join
(
SELECT ShopId, min(OrderDate) as min_order_dt
from
your_table_name
group by shop_id
) b
on a.ShopId=b.ShopId;
Try this
select case when t2.ShopId is null then 0 else 1 end as newcol,t1.id,
t1.ShopiId,t1.OrderDate
from table as t1 left join
(
select ShopId,min(OrderDate) as OrderDate from table
group by ShopId
) as t2 on t1.ShopId=t2.ShopId and t1.OrderDate=t2.OrderDate

Select rows based on criteria from within group

I have the following table:
pk_positions ass_pos_id underlying entry_date
1 1 abc 2016-03-14
2 1 xyz 2016-03-17
3 tlt 2016-03-18
4 4 ujf 2016-03-21
5 4 dks 2016-03-23
6 4 dqp 2016-03-26
I need to select one row per ass_pos_id which has the earliest entry_date. Rows which do not have a value for ass_pos_id are not included.
In other words, for each non null ass_pos_id group, select the row which has the earliest entry_date
The following is the desired result:
pk_positions ass_pos_id underlying entry_date
1 1 abc 2016-03-14
4 4 ujf 2016-03-21
You could use the row_number window function:
SELECT pk_positions, ass_pos_id, underlying, entry_date
FROM (SELECT pk_positions, ass_pos_id, underlying, entry_date,
ROW_NUMBER() OVER (PARTITION BY ass_pos_id
ORDER BY entry_date ASC) rn
FROM mytable
WHERE ass_pos_id IS NOT NULL) t
WHERE rn = 1