Overlapping Data

Overlapping Data - sql

I have a sql query to check overlapping of product records in table PRODUCTS.
In most cases query works fine except for the following.
select * from products where
product_reg_no = 'AL-NAPT'
and (to_date('14-Aug-2001') BETWEEN to_date('27-Aug-2001') AND to_date('30-Aug-2001')
or to_date('31-Aug-2001') BETWEEN to_date('27-Aug-2001') AND to_date('30-Aug-2001'))
How to make this query to catch all records are overlapping either partially or completely?
If required I can provide table structure with sample records.
Thanks
Update 1
I have added table structure and records here or as below:
create table products
(product_reg_no varchar2(32),
start_date date,
end_date date);
Insert into products
(product_reg_no, START_DATE, END_DATE)
Values
('AL-NAPT', TO_DATE('08/14/2012', 'MM/DD/YYYY'), TO_DATE('08/31/2012', 'MM/DD/YYYY'));
Insert into products
(product_reg_no, START_DATE, END_DATE)
Values
('AL-NAPT', TO_DATE('08/27/2012', 'MM/DD/YYYY'), TO_DATE('08/30/2012', 'MM/DD/YYYY'));
COMMIT;
The first record which is from August, 14 2012 to August, 31 2012 is overlapping with
second record which is from August, 27 2012 to August, 30 2012. So how can I modify my query to get the overlapping?

See Determine whether two date ranges overlap.
You need to evaluate the following, or a minor variant on it using <= instead of <, perhaps:
Start1 < End2 AND Start2 < End1
Since you're working with a single table, you need to have a self-join:
SELECT p1.*, p2.*
FROM products p1
JOIN products p2
ON p1.product_reg_no != p2.product_reg_no
AND p1.start < p2.end
AND p2.start < p1.end;
The not equal condition ensures that you don't get a record paired with itself (though the < conditions also ensure that, but if you used <=, the not equal condition would be a good idea.
This will generate two rows for each pair of products (one row with ProductA as p1 and ProductB as p2, the other with ProductB as p1 and ProductA as p2). To prevent that happening, change the != into either < or >.
And, looking more closely at the sample data, it might be that you're really interesting in rows where the registration numbers match and the dates overlap. In which case, you can ignore my wittering about != and < or > and replace the condition with = after all.
SELECT p1.*, p2.*
FROM products p1
JOIN products p2
ON p1.product_reg_no = p2.product_reg_no
AND p1.start < p2.end
AND p2.start < p1.end;
SQL Fiddle (unsaved) shows that this works:
SELECT p1.product_reg_no p1_reg, p1.start_date p1_start, p1.end_date p1_end,
p2.product_reg_no p2_reg, p2.start_date p2_start, p2.end_date p2_end
FROM products p1
JOIN products p2
ON p1.product_reg_no = p2.product_reg_no
AND p1.start_date < p2.end_date
AND p2.start_date < p1.end_date
WHERE (p1.start_date != p2.start_date OR p1.end_date != p2.end_date);
The WHERE clause eliminates the rows that are joined to themselves. With the duplicate column names in the SELECT-list eliminated, you get to see all the data. I added a row:
INSERT INTO products (product_reg_no, start_date, end_date)
VALUES ('AL-NAPT', TO_DATE('08/27/2011', 'MM/DD/YYYY'), TO_DATE('08/30/2011', 'MM/DD/YYYY'));
This was not selected — demonstrating that it does reject non-overlapping entries.
If you want to eliminate the double rows, then you have to add another fancy criterion:
SELECT p1.product_reg_no p1_reg, p1.start_date p1_start, p1.end_date p1_end,
p2.product_reg_no p2_reg, p2.start_date p2_start, p2.end_date p2_end
FROM products p1
JOIN products p2
ON p1.product_reg_no = p2.product_reg_no
AND p1.start_date < p2.end_date
AND p2.start_date < p1.end_date
WHERE (p1.start_date != p2.start_date OR p1.end_date != p2.end_date)
AND (p1.start_date < p2.start_date OR
(p1.start_date = p2.start_date AND p1.end_date < p2.end_date));

This is a strange query. You check if 14-Aug-2001 is between 27-Aug-2001 and 30-Aug-2001 which is always false OR 31-Aug-2001 is between 27-Aug-2001 and 30-Aug-2001 which also always is false. So your where clause will always be false.
Edit: Thanks for clarification
SQL Fiddle Demo
select p1.product_reg_no
, p1.start_date p1s
, p1.end_date p1e
, p2.start_date p2s
, p2.end_date p2e
from products p1, products p2
where p1.product_reg_no = p2.product_reg_no
and not ( p1.end_date < p2.start_date
and p1.start_date > p2.end_date );
What you want is the following scenarios (1 stands for the first row 2 for the second)
1 1
2 2
1 1
2 2
1 1
2 2
1 1
2 2
1 1
2 2
That you could also be turned around and say you do not want this:
1 1
2 2
1 1
2 2
I assumed you also do want this
1 1
2 2
1 1
2 2
The WHERE clause could also be written differently
not ( p1.end_date < p2.start_date and p1.start_date > p2.end_date )
is the same as
p1.end_date >= p2.start_date or p1.start_date <= p2.end_date
I think it was called De Morgan's law when I had that in school eons ago.
You must probably think about what would happen if you have more than 2 rows.

Related

SUM with left outer join gets inflated result

The following query gives me the MRR (monthly recurring revenue) for my customer:
with dims as (
select distinct subscription_id, country_name, product_name from revenue
where site_id = '18XLsHIVSJg' and subscription_id is not null
)
select to_date('2022-07-01') as occurred_date,
count(distinct srm.subscription_id) as subscriptions,
count(distinct srm.receiver_contact) as subscribers,
sum(srm.baseline_mrr) as mrr_srm
from subscription_revenue_mart srm
join dims d on d.subscription_id = srm.subscription_id
where srm.site_id = '18XLsHIVSJg'
-- MRR as of the day before ie June 30th
and to_date(srm.creation_date) < '2022-07-01'
-- Counting the subscriptions active after July 1st
and ((srm.subscription_status = 'SUBL.A') or
-- Counting the subscriptions canceled/deactivated after July 1st
(srm.subscription_status = 'SUBL.C' and (srm.deactivation_date >= '2022-07-01') or (srm.canceled_date >= '2022-07-01')) ) group by 1;
I get a total of $5922.15 but I need to add data from another table to capture upgrades/downgrades a customer makes on a product subscription. Using the same approach as above, I can query my "change" table thusly:
select subscription_id, sum(mrr_change_amount) mrr_change_amount,max(subscription_event_date) subscription_event_date from subscription_revenue_mart_change srmc
where site_id = '18XLsHIVSJg'
and to_date(srmc.creation_date) < '2022-07-01'
and ((srmc.subscription_status = 'SUBL.A')
or (srmc.subscription_status = 'SUBL.C' and (srmc.deactivation_date >= '2022-07-01') or (srmc.canceled_date >= '2022-07-01')))
group by 1;
I get a total of $3635.47
When I combine both queries into one, I get an inflated result:
with dims as (
select distinct subscription_id, country_name, product_name from revenue
where site_id = '18XLsHIVSJg' and subscription_id is not null
),
change as (
select subscription_id, sum(mrr_change_amount) mrr_change_amount,
-- there can be multiple changes per subscription
max(subscription_event_date) subscription_event_date from subscription_revenue_mart_change srmc
where site_id = '18XLsHIVSJg'
and to_date(srmc.creation_date) < '2022-07-01'
and ((srmc.subscription_status = 'SUBL.A')
or (srmc.subscription_status = 'SUBL.C' and (srmc.deactivation_date >= '2022-07-01') or (srmc.canceled_date >= '2022-07-01')))
group by 1
)
select to_date('2022-07-01') as occurred_date,
count(distinct srm.subscription_id) as subscriptions,
count(distinct srm.receiver_contact) as subscribers,
-- See comment RE: LEFT OUTER join
sum(coalesce(c.mrr_change_amount,srm.baseline_mrr)) as mrr
from subscription_revenue_mart srm
join dims d
on d.subscription_id = srm.subscription_id
-- LEFT OUTER join required for customers that never made a change
left outer join change c
on srm.subscription_id = c.subscription_id
where srm.site_id = '18XLsHIVSJg'
and to_date(srm.creation_date) < '2022-07-01'
and ((srm.subscription_status = 'SUBL.A')
or (srm.subscription_status = 'SUBL.C' and (srm.deactivation_date >= '2022-07-01') or (srm.canceled_date >= '2022-07-01'))) group by 1;
It should be $9557.62 ie (5922.15 + $3635.47) but the query outputs $16116.91, which is wrong.
I think the explode-implode syndrome may cause this.
I had designed my "change" CTE to prevent this by aggregating all the relevant fields but it's not working.
Can someone provide pointers on the best way to work around this issue?

It would help if you gave us sample data too, but I see a problem here:
sum(coalesce(c.mrr_change_amount,srm.baseline_mrr)) as mrr
Why COALESCE? That will give you one of the 2 numbers, but I guess what you want is:
sum(ifnull(c.mrr_change_amount, 0) + srm.baseline_mrr) as mrr
That's the best I can offer with what you've given us.

List count for last 12 months broken down by month

I have a query that can get a total active count of products until a specified date #POINT
SELECT
COUNT(DISTINCT e.productId) CNT
FROM
pro p
OUTER APPLY (
SELECT
TOP 1 p2.*
FROM
pro p2
WHERE
p2.productId = p.productId
AND p2.date >= #POINT
AND p2.STATUS IN ('SOLD', 'ACTIVE')
ORDER BY
p2.date ASC
) NEXT
WHERE
p.date < #POINT
AND p.STATUS = 'SOLD'
AND NEXT.productId IS NOT NULL
Output for #POINT "01/01/2021" is
CNT
500
From a table like
productId date STATUS
1001 01/04/2021 ACTIVE
1002 01/06/2021 SOLD
1003 01/07/2021 OTHER
...
How would I remake this query so that I can have a list of points (last 12 months) like
POINT CNT
02/01/2021 550
01/01/2021 500
12/01/2020 450
...
03/01/2020 550
in one query? I don't want to create a separate table of dates. The database is MSSQL.
Since no responded to question, I'll assume there isn't a function to generate these dates efficiently. I wrote a subquery that CAST couple dates to varchar to date, resulting with first months for the past 12 months.

Just group it by the Point and COUNT(*) the result.
You say in your update "CAST couple dates to varchar to date", which I think means you want just the date part, in which case you can use CAST(NEXT.date AS date):
SELECT
NEXT.date POINT, -- or CAST(NEXT.date AS date)
COUNT(*) CNT
FROM
pro p
OUTER APPLY (
SELECT
TOP 1 p2.*
FROM
pro p2
WHERE
p2.productId = p.productId
AND p2.date >= #POINT
AND p2.STATUS IN ('SOLD', 'ACTIVE')
ORDER BY
p2.date ASC
) NEXT
WHERE
p.date < #POINT
AND p.STATUS = 'SOLD'
AND NEXT.productId IS NOT NULL
GROUP BY
NEXT.date; -- or CAST(NEXT.date AS date)

Get rows with difference of dates being one

I have the following table and rows defined in SQLFiddle
I need to select rows from products table where difference between two rows start_date and
nvl(return_date,end_date) is 1. i.e. start_date of current row and nvl(return_date,end_date) of previous row should be one
For example
PRODUCT_NO TSH098 and PRODUCT_REG_NO FLDG, the END_DATE is August, 15 2012 and
PRODUCT_NO TSH128 and PRODUCT_REG_NO FLDG start_date is August, 16 2012, so the difference is only of a day.
How can I get the desired output using sql.
Any help is highly appreciable.
Thanks

You can use lag analytical function to get access to a row at a given physical offset prior to the current position. According to your sorting order it might look like this(not so elegant though).
select *
from products p
join (select *
from(select p.Product_no
, p.Product_Reg_No
, case
when (lag(start_date, 1, start_date) over(order by product_reg_no)-
nvl(return_date, end_date)) = 1
then lag(start_date, 1, start_date)
over(order by product_reg_no)
end start_date
, End_Date
, Return_Date
from products p
order by 2,1 desc
)
where start_date is not null
) s
on (p.start_date = s.start_date or p.end_date = s.end_date)
order by 2, 1 desc
SQL FIddle DEMO

In SQL, date + X adds X days to the date. So you can:
select *
from products
where start_date + 1 = nvl(end_date, return_date)
If the dates could contain a time part, use trunc to remove the time part:
select *
from products
where trunc(start_date) + 1 = trunc(nvl(end_date, return_date))
Live example at SQL Fiddle.

I am under the impression you only want the matching dates differing by 1 day if the product reg no matches. So I simply joint it and I think this is what you want
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
nvl(p2.return_date,p2.end_date) return_or_end_date_2
from products p1
join products p2 on (p1.product_reg_no = p2.product_reg_no)
where p1.start_date-1 = nvl(p2.return_date,p2.end_date)
SQL Fiddle
If I was wrong with the grouping then just leave the join condition away which with the given example products table brings the same result
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
nvl(p2.return_date,p2.end_date) return_or_end_date_2
from products p1, products p2
where p1.start_date-1 = nvl(p2.return_date,p2.end_date)
SQL Fiddle 2
Now you say the difference is 1 day. I automatically assumed that start_date is 1 day higher than the nvl(return_date,end_date). Also I assumed that the date is always midnight. But to have all that also excluded you can work with trunc and go in both directions:
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
nvl(p2.return_date,p2.end_date) return_or_end_date_2
from products p1, products p2
where trunc(p1.start_date)-1 = trunc(nvl(p2.return_date,p2.end_date))
or trunc(p1.start_date)+1 = trunc(nvl(p2.return_date,p2.end_date))
SQL Fiddle 3
And this all works because dates (not timestamp) can be calculated by adding and subtracting.
EDIT: Following your comment you want return_date or end_date to be compared and equal dates are also wanted:
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
p2.return_date return_date_2,
p2.end_date end_date_2
from products p1, products p2
where trunc(p1.start_date) = trunc(p2.return_date)
or trunc(p1.start_date)-1 = trunc(p2.return_date)
or trunc(p1.start_date)+1 = trunc(p2.return_date)
or trunc(p1.start_date) = trunc(p2.end_date)
or trunc(p1.start_date)-1 = trunc(p2.end_date)
or trunc(p1.start_date)+1 = trunc(p2.end_date)
SQL Fiddle 4

The way to compare the current row with the previous row is to user the LAG() function. Something like this:
select * from
(
select p.*
, lag (end_date) over
(order by start_date )
as prev_end_date
, lag (return_date) over
(order by start_date )
as prev_return_date
from products p
)
where (trunc(start_date) - 1) = trunc(nvl(prev_return_date, prev_end_date))
order by 2,1 desc
However, this will not return the results you desire, because you have not defined a mechanism for defining a sort order. And without a sort order the concept of "previous row" is meaningless.
However, what you can do is this:
select p1.*
, p2.*
from products p1 cross join products p2
where (trunc(p2.start_date) - 1) = trunc(nvl(p1.return_date, p1.end_date))
order by 2, 1 desc
This SQL queries your table twice, filtering on the basis of dates. The each row in the result set contains a record from each table. If a given start_date matches more than one end_date or vice versa you will get records for multiple hits.

You mean like this?
SELECT T2.*
FROM PRODUCTS T1
JOIN PRODUCTS T2 ON (
nvl(T1.end_date, T1.return_date) + 1 = T2.start_date
);
In your SQL Fiddle example, it returns:
PRODUCT_NO PRODUCT_REG_NO START_DATE END_DATE RETURN_DATE
TSH128 FLDG August, 16 2012 00:00:00-0400 September, 15 2012 00:00:00-0400 (null)
TSH125 SCRW August, 08 2012 00:00:00-0400 September, 07 2012 00:00:00-0400 (null)
TSH137 SCRW September, 08 2012 00:00:00-0400 October, 07 2012 00:00:00-0400 (null)
TSH128 is returned for the reasons you already explained.
TSH125 is returned because TSH116 end_date is August, 07 2012.
TSH137 is returned because TSH125 end_date is September, 07 2012.
If you want to compare only rows within the same product_reg_no, it's easy to add that to the JOIN condition. If you want both "directions" of the 1-day difference, it's easy to add that too.

Count records with a criteria like "within days"

I have a table as below on sql.
OrderID Account OrderMethod OrderDate DispatchDate DispatchMethod
2145 qaz 14 20/3/2011 23/3/2011 2
4156 aby 12 15/6/2011 25/6/2011 1
I want to count all records that have reordered 'within 30 days' of dispatch date where Dispatch Method is '2' and OrderMethod is '12' and it has come from the same Account.
I want to ask if this all can be achieved with one query or do I need to create different tables and do it in stages as I think I wll have to do now? Please can someone help with a code/query?
Many thanks
T

Try the following, replacing [tablename] with the name of your table.
SELECT Count(OriginalOrders.OrderID) AS [Total_Orders]
FROM [tablename] AS OriginalOrders
INNER JOIN [tablename] AS Reorders
ON OriginalOrders.Account = Reorders.Account
AND OriginalOrders.OrderDate < Reorders.OrderDate
AND DATEDIFF(day, OriginalOrders.DispatchDate, Reorders.OrderDate) <= 30
AND Reorders.DispatchMethod = '2'
AND Reorders.OrderMethod = '12';
By using an inner join you'll be sure to only grab orders that meet all the criteria.
By linking the two tables (which are essentially the same table with itself using aliases) you make sure only orders under the same account are counted.
The results from the join are further filtered based on the criteria you mentioned requiring only orders that have been placed within 30 days of the dispatch date of a previous order.

Totally possible with one query, though my SQL is a little stale..
select count(*) from table
where DispatchMethod = 2
AND OrderMethod = 12
AND DATEDIFF(day, OrderDate, DispatchDate) <= 30;
(Untested, but it's something similar)

One query can do it.
SELECT COUNT(*)FROM myTable reOrder
INNER JOIN myTable originalOrder
ON reOrder.Account = originalOrder.Account
AND reOrder.OrderID <> originalOrder.OrderID
-- all re-orders that are within 30 days or the
-- original orders dispatch date
AND DATEDIFF(d, originalOrder.DispatchDate, reOrder.OrderDate) <= 30
WHERE reOrder.DispatchMethod = 2
AND reOrder.OrderMethod = 12

You need a self-join.
The query below assumes that a given account will have either 1 or 2 records in the table - 2 if they've reordered, else 1.
If 3 records exist for a given account, 2 orders + 1 reorder then this won't work - but we'd then need more information on how to distinguish between an order and a reorder.
SELECT COUNT(*) FROM myTable new, myTable prev
WHERE new.DispatchMethod = 2
AND new.OrderMethod = 12
AND DATEDIFF(day, prev.DispatchDate, new.OrderDate) <=30
AND prev.Account == new.Account
AND prev.OrderDate < new.OrderDate

Can we use GROUP BY in this case, such as the following?
SELECT COUNT(Account)
FROM myTable
WHERE DispatchMethod = 2 AND OrderMethod = 12
AND DATEDIFF(d, DispatchDate, OrderDate) <=30
GROUP BY Account
Will the above work or am I missing something here?

sql query to find customers who order too frequently?

My database isn't actually customers and orders, it's customers and prescriptions for their eye tests (just in case anyone was wondering why I'd want my customers to make orders less frequently!)
I have a database for a chain of opticians, the prescriptions table has the branch ID number, the patient ID number, and the date they had their eyes tested. Over time, patients will have more than one eye test listed in the database. How can I get a list of patients who have had a prescription entered on the system more than once in six months. In other words, where the date of one prescription is, for example, within three months of the date of the previous prescription for the same patient.
Sample data:
Branch Patient DateOfTest
1 1 2007-08-12
1 1 2008-08-30
1 1 2008-08-31
1 2 2006-04-15
1 2 2007-04-12
I don't need to know the actual dates in the result set, and it doesn't have to be exactly three months, just a list of patients who have a prescription too close to the previous prescription. In the sample data given, I want the query to return:
Branch Patient
1 1
This sort of query isn't going to be run very regularly, so I'm not overly bothered about efficiency. On our live database I have a quarter of a million records in the prescriptions table.

Something like this
select p1.branch, p1.patient
from prescription p1, prescription p2
where p1.patient=p2.patient
and p1.dateoftest > p2.dateoftest
and datediff('day', p2.dateoftest, p1.dateoftest) < 90;
should do... you might want to add
and p1.dateoftest > getdate()
to limit to future test prescriptions.

This one will efficiently use an index on (Branch, Patient, DateOfTest) which you of course should have:
SELECT Patient, DateOfTest, pDate
FROM (
SELECT (
SELECT TOP 1 DateOfTest AS last
FROM Patients pp
WHERE pp.Branch = p.Branch
AND pp.Patient = p.Patient
AND pp.DateOfTest BETWEEN DATEADD(month, -3, p.DateOfTest) AND p.DateOfTest
ORDER BY
DateOfTest DESC
) pDate
FROM Patients p
) po
WHERE pDate IS NOT NULL

On way:
select d.branch, d.patient
from data d
where exists
( select null from data d1
where d1.branch = d.branch
and d1.patient = d.patient
and "difference (d1.dateoftest ,d.dateoftest) < 6 months"
);
This part needs changing - I'm not familiar with SQL Server's date operations:
"difference (d1.dateoftest ,d.dateoftest) < 6 months"

Self-join:
select a.branch, a.patient
from prescriptions a
join prescriptions b
on a.branch = b.branch
and a.patient = b.patient
and a.dateoftest > b.dateoftest
and a.dateoftest - b.dateoftest < 180
group by a.branch, a.patient
This assumes you want patients who visit the same branch twice. If you don't, take out the branch part.

SELECT Branch
,Patient
FROM (SELECT Branch
,Patient
,DateOfTest
,DateOfOtherTest
FROM Prescriptions P1
JOIN Prescriptions P2
ON P2.Branch = P1.Branch
AND P2.Patient = P2.Patient
AND P2.DateOfTest <> P1.DateOfTest
) AS SubQuery
WHERE DATEDIFF(day, SubQuery.DateOfTest, SubQuery.DateOfOtherTest) < 90

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Overlapping Data - sql

Related

SUM with left outer join gets inflated result

List count for last 12 months broken down by month

Get rows with difference of dates being one

Count records with a criteria like "within days"

sql query to find customers who order too frequently?

Categories

Resources