Limiting records by included/excluded/null date ranges - sql

First, a description of my task. I need to identify customers that have placed orders within the past 2 years. However, I need a subset of those records.
There needs to be 1 or more orders placed between 12-24 months ago.
A gap where NO orders are placed between 1-12 months ago.
1 or more new orders have been placed within the past month.
Sounds easy enough, but I've spent way too much time isolating the constraints without receiving the desired output.
Here's my current code attempt:
SELECT * FROM
(SELECT CUSTOMER_ID AS "CUSTOMER", NAME, DATE_ENTERED,
ROW_NUMBER() OVER(PARTITION BY CUSTOMER_ID
ORDER BY DATE_ENTERED desc) SEQ
FROM
A_ATEST
WHERE
DATE_ENTERED >= ADD_MONTHS(TRUNC(sysdate),-24) AND
(DATE_ENTERED >= ADD_MONTHS(TRUNC(sysdate),-1) AND
DATE_ENTERED < ADD_MONTHS(TRUNC(sysdate),-12)) AND
NOT EXISTS(SELECT null FROM A_ATEST WHERE
DATE_ENTERED < ADD_MONTHS(TRUNC(sysdate),-1) AND
DATE_ENTERED > ADD_MONTHS(TRUNC(sysdate),-12))
) a
WHERE
(SEQ = 1 AND
DATE_ENTERED >= ADD_MONTHS(TRUNC(sysdate),-1)) AND
(SEQ = 2 AND
DATE_ENTERED < ADD_MONTHS(TRUNC(sysdate),-12))
SAMPLE DATA: (I don't see a way to add a table, so here goes...)
CUSTOMER, NAME, DATE_ENTERED
100 A 08-APR-20
100 A 01-MAR-20
100 A 01-MAR-20
101 B 09-MAR-20
101 B 07-MAR-19
101 B 01-MAR-19
102 C 04-APR-20
102 C 03-JAN-19
102 C 05-JAN-18
Ideally, the result set from my current code should display:
CUSTOMER, NAME, DATE_ENTERED, SEQ
102 C 04-APR-20 1
102 C 03-JAN-19 2
I'm not married to my code as it is. I'm hoping someone can lead me to a better way to approach this task.
Thanks!
-dougbert

I think this will give you what you want. Your question says you want the list of customers, but your output data suggests you want a list of orders from those customers.
SELECT CUSTOMER_ID AS "CUSTOMER", NAME, DATE_ENTERED,
FROM A_ATEST a1
WHERE a1.DATE_ENTERED >= ADD_MONTHS(TRUNC(sysdate),-24)
AND EXISTS ( SELECT 1 FROM A_ATEST a3
WHERE a3.customer_id = a1.customer_id
AND a3.DATE_ENTERED BETWEEN ADD_MONTHS(TRUNC(sysdate), -24)
AND ADD_MONTHS(TRUNC(sysdate), -12))
AND NOT EXISTS ( SELECT 1 FROM A_ATEST a2
WHERE a2.customer_id = a1.customer_id
AND DATE_ENTERED < ADD_MONTHS(TRUNC(sysdate), -1)
AND DATE_ENTERED > ADD_MONTHS(TRUNC(sysdate), -12))
AND EXISTS ( SELECT 1 FROM A_ATEST a4
WHERE a4.customer_id = a1.customer_id
AND a4.DATE_ENTERED > ADD_MONTHS(TRUNC(sysdate), -12))
The key here is that your subqueries need to correlate customer_id back to the outermost A_ATEST table. The way you had it written basically meant "and there exists an order from any customer between 1 and 12 months ago".

You need orders in the last two years with a gap of one year. That suggests lag():
select a.*
from (select a.*,
max(case when prev_de < add_months(date_entered, -12) then 1 else 0 end) over (partition by customer_id) as has_12month_gap
from (select a.*,
lag(date_entered) over (partition by CUSTOMER_ID order by date_entered) as prev_de,
max(date_entered) over (partition by customer_id) as max_de
from A_ATEST a
where date_entered > add_months(sysdate, -24)
) a
) a
where max_de > add_months(sysdate, -1) and
has_12month_gap = 1;
EDIT:
The above brings back all the transactions. For the customers only, it is similar logic but a little simpler:
select customer
from (select a.*,
lag(date_entered) over (partition by CUSTOMER_ID order by date_entered) as prev_de
from A_ATEST a
where date_entered > add_months(sysdate, -24)
) a
group by customer
where max(date_entered) > add_months(sysdate, -1) and
max(case when prev_de < add_months(date_entered, -12) then 1 else 0 end) = 1;

In case anyone references my question in the future, I wanted to share my final production solution. So, there a number of changes to get the output I required.
SELECT DISTINCT a1.CUSTOMER_NO AS "CUSTOMER", ci.NAME, MAX(a1.DATE_ENTERED) AS "ORDER DATE", a1.SALESMAN_CODE AS "SALESPERSON"
FROM CUSTOMER_INFO ci LEFT JOIN CUSTOMER_ORDER a1 ON ci.CUSTOMER_ID = a1.CUSTOMER_NO
WHERE a1.DATE_ENTERED >= ADD_MONTHS(TRUNC(sysdate), -1)
AND EXISTS ( SELECT 1 FROM CUSTOMER_ORDER a3
WHERE a3.customer_no = a1.customer_no
AND a3.DATE_ENTERED BETWEEN ADD_MONTHS(TRUNC(sysdate), -24)
AND ADD_MONTHS(TRUNC(sysdate), -12))
AND NOT EXISTS ( SELECT 1 FROM CUSTOMER_ORDER a2
WHERE a2.customer_no = a1.customer_no
AND DATE_ENTERED < ADD_MONTHS(TRUNC(sysdate), -1)
AND DATE_ENTERED > ADD_MONTHS(TRUNC(sysdate), -12))
AND EXISTS ( SELECT 1 FROM CUSTOMER_ORDER a4
WHERE a4.customer_no = a1.customer_no
AND a4.DATE_ENTERED > ADD_MONTHS(TRUNC(sysdate), -24))
GROUP BY a1.CUSTOMER_NO, ci.NAME, a1.SALESMAN_CODE
ORDER BY a1.CUSTOMER_NO, "ORDER DATE"
Thanks again to both eaolson and Gordon Linoff for your help in getting me to where I needed to go.

Related

Return value in Access query

I need help with a query in Access. I need to return a customer name who has three or more orders within the past 14 days as of today's date that are still active. It also should display the orderdates in the results. This will populate on a report and grouped by the "cusname" and show each "orderdate". I tried using the query wizard and entering in the below sql but it populates no results. Can someone please help?
Select customerid, count(*), cusname,orderdate,orderstatus
From tablename
Where orderstatus="active"
Group by customerid,cusname,orderdate,orderstatus
Having Count(*) >=3;
Table:
CusName:|orderdate:
Mary 4/4/2021
Mary 4/3/2021
Mary 4/8/2021
Mary 3/23/2021
Bob 4/9/2021
Bob 4/1/2021
What I expect the result to be :
Table:
Customerid:|CusName:|orderdate:
1 Mary 4/4/2021
1 Mary 4/3/2021
1 Mary 4/8/2021
you should put filter on date as well
you probably shouldn't group by order date, if you're trying to count the "unique orders per customer" but not "unique orders per customer per date"
SELECT customer_id, cusname, COUNT(*)
FROM <tablename>
WHERE ABS(DateDiff('d', order_date, NOW())) <= 14 -- check order date
AND order_status = 'active' -- check order status
GROUP BY customer_id, cusname -- group by customer, not by order
HAVING COUNT(*) >= 3 -- filter customers with 3+ orders
This is rather tricky in MS Access, but you can use:
select t.*
from tablename as t
where t.orderstatus = "active" and
t.orderdate in (select top 3 t2.orderdate
from tablename t2
where t2.customerid = t.customerid and
t2.orderstatus = "active" and
t2.orderdate > dateadd("day", -14, date())
order by t2.orderdate desc
) and
3 <= (select count(*)
from tablename as t3
where t3.customerid = t.customerid and
t3.orderstatus = "active" and
t3.orderdate > dateadd("day", -14, date())
);
The first subquery gets the most recent three rows for each customer. The second checks that there are at least three.
Try this
SELECT t.customerid,
t.cusname,
t.orderdate,
t.orderstatus
FROM tablename AS t
WHERE t.orderstatus = "active"
AND t.orderdate > Dateadd("d", -14, DATE())
AND (SELECT Count(t1.cusname)
FROM tablename AS t1
WHERE t.customerid = t1.customerid
AND t1.orderstatus = "active"
AND t1.orderdate > Dateadd("d", -14, DATE())) >= 3

oracle sql get transactions between the period

I have 3 tables in oracle sql namely investor, share and transaction.
I am trying to get new investors invested in any shares for a certain period. As they are the new investor, there should not be a transaction in the transaction table for that investor against that share prior to the search period.
For the transaction table with the following records:
Id TranDt InvCode ShareCode
1 2020-01-01 00:00:00.000 inv1 S1
2 2019-04-01 00:00:00.000 inv1 S1
3 2020-04-01 00:00:00.000 inv1 S1
4 2021-03-06 11:50:20.560 inv2 S2
5 2020-04-01 00:00:00.000 inv3 S1
For the search period between 2020-01-01 and 2020-05-01, I should get the output as
5 2020-04-01 00:00:00.000 inv3 S1
Though there are transactions for inv1 in the table for that period, there is also a transaction prior to the search period, so that shouldn't be included as it's not considered as new investor within the search period.
Below query is working but it's really taking ages to return the results calling from c# code leading to timeout issues. Is there anything we can do to refine to get the results quicker?
WITH
INVESTORS AS
(
SELECT I.INVCODE FROM INVESTOR I WHERE I.CLOSED IS NULL)
),
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
),
SHARES_IN_PERIOD AS
(
SELECT DISTINCT
T.INVCODE,
T.SHARECODE,
T.TYPE
FROM TRANSACTION T
JOIN INVESTORS I ON T.INVCODE = I.INVCODE
JOIN SHARES S ON T.SHARECODE = S.SHARECODE
WHERE T.TRANDT >= :startDate AND T.TRANDT <= :endDate
),
PREVIOUS_SHARES AS
(
SELECT DISTINCT
T.INVCODE,
T.SHARECODE,
T.TYPE
FROM TRANSACTION T
JOIN INVESTORS I ON T.INVCODE = I.INVCODE
JOIN SHARES S ON T.TRSTCODE = S.TRSTCODE
WHERE T.TRANDT < :startDate
)
SELECT
DISTINCT
SP.INVCODE AS InvestorCode,
SP.SHARECODE AS ShareCode,
SP.TYPE AS ShareType
FROM SHARES_IN_PERIOD SP
WHERE (SP.INVCODE, SP.SHARECODE, SP.TYPE) NOT IN
(
SELECT
PS.INVCODE,
PS.SHARECODE,
PS.TYPE
FROM PREVIOUS_SHARES PS
)
With the suggestion given by #Gordon Linoff, I tried following options (for all the shares I need) but they are taking long time too. Transaction table is over 32 million rows.
1.
WITH
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
)
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
join shares s on s.sharecode = t.sharecode
where seqnum = 1 and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
WITH
INVESTORS AS
(
SELECT I.INVCODE FROM INVESTOR I WHERE I.CLOSED IS NULL)
),
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
)
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
join investors i on i.invcode = t.invcode
join shares s on s.sharecode = t.sharecode
where seqnum = 1 and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
where seqnum = 1 and
t.sharecode IN (SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL)))
and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
If you want to know if the first record in transactions for a share is during a period, you can use window functions:
select t.*
from (select t.*,
row_number() over (partition by invcode, sharecode order by trandt) as seqnum
from transactions t
) t
where seqnum = 1 and
t.sharecode = :sharecode and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
For performance for this code, you want an index on transactions(invcode, sharecode, trandate).

Find the true start end dates for customers that have multiple accounts in SQL Server 2014

I have a checking account table that contains columns Cust_id (customer id), Open_Date (start date), and Closed_Date (end date). There is one row for each account. A customer can open multiple accounts at any given point. I would like to know how long the person has been a customer.
eg 1:
CREATE TABLE [Cust]
(
[Cust_id] [varchar](10) NULL,
[Open_Date] [date] NULL,
[Closed_Date] [date] NULL
)
insert into [Cust] values ('a123', '10/01/2019', '10/15/2019')
insert into [Cust] values ('a123', '10/12/2019', '11/01/2019')
Ideally I would like to insert this into a table with just one row, that says this person has been a customer from 10/01/2019 to 11/01/2019. (as he opened his second account before he closed his previous one.
Similarly eg 2:
insert into [Cust] values ('b245', '07/01/2019', '09/15/2019')
insert into [Cust] values ('b245', '10/12/2019', '12/01/2019')
I would like to see 2 rows in this case- one that shows he was a customer from 07/01 to 09/15 and then again from 10/12 to 12/01.
Can you point me to the best way to get this?
I would approach this as a gaps and islands problem. You want to group together groups of adjacents rows whose periods overlap.
Here is one way to solve it using lag() and a cumulative sum(). Everytime the open date is greater than the closed date of the previous record, a new group starts.
select
cust_id,
min(open_date) open_date,
max(closed_date) closed_date
from (
select
t.*,
sum(case when not open_date <= lag_closed_date then 1 else 0 end)
over(partition by cust_id order by open_date) grp
from (
select
t.*,
lag(closed_date) over (partition by cust_id order by open_date) lag_closed_date
from cust t
) t
) t
group by cust_id, grp
In this db fiddle with your sample data, the query produces:
cust_id | open_date | closed_date
:------ | :--------- | :----------
a123 | 2019-10-01 | 2019-11-01
b245 | 2019-07-01 | 2019-09-15
b245 | 2019-10-12 | 2019-12-01
I would solve this with recursion. While this is certainly very heavy, it should accommodate even the most complex account timings (assuming your data has such). However, if the sample data provided is as complex as you need to solve for, I highly recommend sticking with the solution provided above. It is much more concise and clear.
WITH x (cust_id, open_date, closed_date, lvl, grp) AS (
SELECT cust_id, open_date, closed_date, 1, 1
FROM (
SELECT cust_id
, open_date
, closed_date
, row_number()
OVER (PARTITION BY cust_id ORDER BY closed_date DESC, open_date) AS rn
FROM cust
) AS t
WHERE rn = 1
UNION ALL
SELECT cust_id, open_date, closed_date, lvl, grp
FROM (
SELECT c.cust_id
, c.open_date
, c.closed_date
, x.lvl + 1 AS lvl
, x.grp + CASE WHEN c.closed_date < x.open_date THEN 1 ELSE 0 END AS grp
, row_number() OVER (PARTITION BY c.cust_id ORDER BY c.closed_date DESC) AS rn
FROM cust c
JOIN x
ON x.cust_id = c.cust_id
AND c.open_date < x.open_date
) AS t
WHERE t.rn = 1
)
SELECT cust_id, min(open_date) AS first_open_date, max(closed_date) AS last_closed_date
FROM x
GROUP BY cust_id, grp
ORDER BY cust_id, grp
I would also add the caveat that I don't run on SQL Server, so there could be syntax differences that I didn't account for. Hopefully they are minor, if present.
you can try something like that:
select distinct
cust_id,
(select min(Open_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
),
(select max(Closed_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
)
from Cust as a
so, for every row - you're selecting minimal and maximal dates from all overlapping ranges, later distinct filters out duplicates

how to filter data in sql based on percentile

I have 2 tables, the first one is contain customer information such as id,age, and name . the second table is contain their id, information of product they purchase, and the purchase_date (the date is from 2016 to 2018)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
my desired result is to generate the table that contain customer_name and product who made purchase in 2017 and older than 75% of customer that make purchase in 2016.
Depending on your flavor of SQL, you can get quartiles using the more general ntile analytical function. This basically adds a new column to your query.
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
This returns the lowest age of the 4th-quartile customers, which can be used in a subquery against the customers who made purchases in 2017.
The argument to ntile is how many buckets you want to divide into. In this case 75%+ equals 4th quartile, so 4 buckets is OK. The OVER() clause specifies what you want to sort by (customer_age in our case), and also lets us partition (group) the data if we want to, say, create multiple rankings for different years or countries.
Age is a horrible field to include in a database. Every day it changes. You should have date-of-birth or something similar.
To get the 75% oldest value in 2016, there are several possibilities. I usually go for row_number() and count(*):
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
Then, to use this for a query for 2017:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;

Teradata Correlated subquery

I'm facing an issue since 2 days regarding this query :
select distinct a.id,
a.amount as amount1,
(select max (a.date) from t1 a where a.id=t.id and a.cesitc='0' and a.date<t.date) as date1,
t.id, t.amount as amount2, t.date as date2
from t1 a
inner join t1 t on t.id = a.id and a.cevexp in ('0', '1' )
and exists (select t.id from t1 t
where t.id= a.id and t.amount <> a.amount and t.date > a.date)
and t.cesitc='1' and t.dafms='2015-07-31' and t.date >='2015-04-30' and '2015-07-31' >= t.daefga
and '2015-07-31' <= t.daecga and t.cevexp='1' and t.amount >'1'
Some details, the goal is to compare the difference in valuation of assets (id), column n2 (a.amount/ amount1) is the one which needs to be corrected.
I would like my a.mount/amount1 being correlated with my subquery 'date1' which is actually not the case. Same criterias have to be applied to find the correct amount1.
The outcomes of this query are currently displaying like this :
Id Amount1 Date1 id amount2 date2
1 100 04/03/2014 1 150 30/06/2015
1 102 04/03/2014 1 150 30/06/2015
1 170 04/03/2014 1 150 30/06/2015
the Amount1 matches with all Date1 < date2 instead of max(date1) < date2 that's why I have several amount1
Thanks in advance for helping hand :)
have a good day !
You can access the previous row's data using a Windowed Aggregate Function, there's no LEAD/LAG in Teradata, but it's easy to rewrite.
This will return the correct data for your example:
SELECT t.*,
MIN(amount) -- previous amount
OVER (PARTITION BY Id
ORDER BY date_valuation, dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_amount,
MIN(date_valuation) -- previous date
OVER (PARTITION BY Id
ORDER BY date_valuation, dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_date
FROM test5 AS t
QUALIFY cesitc = '1' -- return only the current row
If it doesn't work as expected you need to add more details of the applied logic.
Btw, if a column is a DECIMAL you shouldn't add quotes, 150 instead of '150'. And there's only one recommended way to write a date, using a date literal, e.g. DATE '2015-07-31'
The final query :
SELECT a.id, a.mtvbie, a.date_valuation, t.id,
MIN(t.amount) -- previous amount
OVER (PARTITION BY t.Id
ORDER BY t.date_valuation, t.dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_amount,
MIN(t.date_valuation) -- previous date
OVER (PARTITION BY t.Id
ORDER BY t.date_valuation, t.dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_date
FROM test5 t
inner join test5 a on a.id=t.id
where t.amount <> a.amount and a.cesitc='1' and a.date_valuation > t.date_valuation and a.dafms ='2015-07-31' and another criteria....
QUALIFY row_number () over (partition by a.id order a.cogarc)=1