Need to get the minimum start date and maximum end date, when there is no break in months - sql

i have 8 rows as shown below,
Column1 Start_date end_date Row_number
1 2014-02-01 2014-02-28 1
1 2014-03-01 2014-03-31 2
1 2014-04-01 2014-04-30 3
1 2014-05-01 2014-05-31 4
1 2014-07-01 2014-07-31 5
1 2015-02-01 2015-02-28 6
1 2015-03-01 2015-03-31 7
I need result like below,
Column1 Start_date end_date
1 2014-02-01 2014-05-31
1 2014-07-01 2014-07-31
1 2015-02-01 2015-03-31
so when the end_date of first row is one day less than the start_date in next row, I need to group all the continuous rows like that and get the result as I shown. I need to do this only via SQL. please let me know, if anyone have any idea to solve this.
In the input record, you can see, first 4 rows are continuous, and 5th row is not continuous and 6th and 7th row is a continuous one.
Thanks in advance.

The trick here is that you need to first filter out only entries that are the ends of an interval, and then merge them together, rather than trying to keep a running count in one go.
So I don't know what flavour of SQL you're running, and I have no idea what you're trying to signify with Column1, but this should do the trick (written in SQL server flavour, but the only functions you need to adjust are the dateadd and the isnull). The fiddle is here
SELECT DISTINCT
CASE WHEN Q1.IsStart = 1
THEN Q1.start_date
ELSE LAG(start_date) OVER(ORDER BY Q1.Row_number) END AS start_date,
CASE WHEN Q1.IsEnding = 1
THEN Q1.end_date
ELSE LEAD(end_date) OVER(ORDER BY Q1.Row_number) END AS end_date
FROM
(SELECT
start_date,
end_date,
Row_number,
CASE WHEN DATEADD(day,1,end_date) =
ISNULL(LEAD(start_date) OVER(ORDER BY Row_number),
end_date)
THEN 0
ELSE 1 END AS IsEnding,
CASE WHEN DATEADD(day,-1,start_date) =
ISNULL(LAG(end_date) OVER(ORDER BY Row_number),
start_date)
THEN 0
ELSE 1 END AS IsStart
FROM table1) Q1
WHERE Q1.IsEnding = 1 OR Q1.IsStart = 1
For ANSI SQL/For those of you without LAG or LEAD:
SELECT
StartDates.start_date,
MIN(EndDates.end_date)
FROM
(SELECT
MainEntry.start_date,
MainEntry.row_number
FROM
mytable MainEntry
LEFT OUTER JOIN mytable PrevEntry ON PrevEntry.row_number - 1 = MainEntry.row_number
WHERE
PrevEntry.end_date IS NULL OR
EXTRACT(day FROM (MainEntry.start_date - PrevEntry.end_date)) > 1) StartDates
INNER JOIN
(SELECT
MainEntry.end_date,
MainEntry.row_number
FROM
mytable MainEntry
LEFT OUTER JOIN mytable NextEntry ON NextEntry.row_number + 1 = MainEntry.row_number
WHERE
NextEntry.start_date IS NULL OR
EXTRACT(day FROM (NextEntry.start_date - MainEntry.end_date)) > 1) EndDates
ON StartDates.row_number <= EndDates.row_number
GROUP BY
StartDates.start_date
Note that the GROUP BY could contain StartDates.row_number if that takes advantage of an index. Also note that this ANSI solution initially missed the edge cases of rows without any pairs (had INNER JOINs inside the subqueries).

Related

how to aggregate one record multiple times based on condition

I have a bunch of records in the table below.
product_id produced_date expired_date
123 2010-02-01 2012-05-31
234 2013-03-01 2014-08-04
345 2012-05-01 2018-02-25
... ... ...
I want the output to display how many unexpired products currently we have at the monthly level. (Say, if a product expires on August 04, we still count it in August stock)
Month n_products
2010-02-01 10
2010-03-01 12
...
2022-07-01 25
2022-08-01 15
How should I do this in Presto or Hive? Thank you!
You can use below SQL.
Here we are using case when to check if a product is expired or not(produced_date >= expired_date ), if its expired, we are summing it to get count of product that has been expired. And then group that data over expiry month.
select
TRUNC(expired_date, 'MM') expired_month,
SUM( case when produced_date >= expired_date then 1 else 0 end) n_products
from mytable
group by 1
We can use unnest and sequence functions to create a derived table; Joining our table with this derived table, should give us the desired result.
Select m.month,count(product_id) as n_products
(Select
(select x
from unnest(sequence(Min(month(produced_date)), Max(month(expired_date)), Interval '1' month)) t(x)
) as month
from table) m
left join table t on m.month >= t.produced_date and m.month <= t.expired_date
group by 1
order by 1

SQL Troubleshooting Help on Table Structure

I'm attempting to calculate average number of days between a customer's 1st and 3rd purchase, but struggling to get the data ordered in a way that will allow me to calculate.
I currently have the below data table. (Note: Order sequence number refers to the number order for that customer.)
Order Date
Customer Number
Order Sequence Number
2020-09-20
1
1
2021-01-20
1
2
2021-01-21
1
3
2020-10-01
2
1
2020-08-06
3
1
2020-09-06
3
2
2020-09-09
3
3
I've been trying to get the data to look like the following table. [To then be able to calculate datediff on the last two columns.]
Customer Number
Order Count
First Order Date
Third Order Date
1
3
2020-09-20
2021-01-21
2
1
2020-10-01
Null
3
3
2020-08-06
2020-09-09
I've completely messed up the code, but here's what I've been trying.
CREATE TABLE X2 as
SELECT
customer_number,
max(order_sequence_number) as order_count,
CASE
WHEN order_sequence_number = 1 then order_date
ELSE null
END as first_order_date,
CASE
WHEN order_sequence_number = 3 then order_date
ELSE null
END as third_order_date
FROM X1
GROUP BY customer_number;
Can someone please tell me what I'm missing? Thanks in advance!
You are on the right track but you need aggregation functions:
SELECT customer_number,
max(order_sequence_number) as order_count,
MAX(CASE WHEN order_sequence_number = 1 THEN order_date END) as first_order_date,
MAX(CASE WHEN order_sequence_number = 3 THEN order_date END) as third_order_date
FROM X1
GROUP BY customer_number;
To get the difference in days, you would just subtract the two expressions using whatever date arithmetic is supported in your database.

T-SQL filtering records based on dates and time difference with other records

I have a table for which I have to perform a rather complex filter: first a filter by date is applied, but then records from the previous and next days should be included if their time difference does not exceed 8 hours compared to its prev or next record (depending if the date is less or greater than filter date).
For those adjacent days the selection should stop at the first record that does not satisfy this condition.
This is how my raw data looks like:
Id
Desc
EntryDate
1
Event type 1
2021-03-12 21:55:00.000
2
Event type 1
2021-03-12 01:10:00.000
3
Event type 1
2021-03-11 20:17:00.000
4
Event type 1
2021-03-11 05:04:00.000
5
Event type 1
2021-03-10 23:58:00.000
6
Event type 1
2021-03-10 11:01:00.000
7
Event type 1
2021-03-10 10:00:00.000
In this example set, if my filter date is '2021-03-11', my expected result set should be all records from that day plus adjacent records from 03-12 and 03-10 that satisfy the 8 hours condition. Note how record with Id 7 is not be included because record with Id 6 does not comply:
Id
EntryDate
2
2021-03-12 01:10:00.000
3
2021-03-11 20:17:00.000
4
2021-03-11 05:04:00.000
5
2021-03-10 23:58:00.000
Need advice how to write this complex query
This is a variant of gaps-and-islands. Define the difference . . . and then groups based on the differences:
with e as (
select t.*
from (select t.*,
sum(case when prev_entrydate > dateadd(hour, -8, entrydate) then 0 else 1 end) over (order by entrydate) as grp
from (select t.*,
lag(entrydate) over (order by entrydate) as prev_entrydate
from t
) t
)
select e.*
from e.*
where e.grp in (select e2.grp
from t e2
where date(e2.entrydate) = #filterdate
);
Note: I'm not sure exactly how filter date is applied. This assumes that it is any events on the entire day, which means that there might be multiple groups. If there is only one group (say the first group on the day), the query can be simplified a bit from a performance perspective.
declare #DateTime datetime = '2021-03-11'
select *
from t
where t.EntryDate between DATEADD(hour , -8 , #DateTime) and DATEADD(hour , 32 , #DateTime)

Combining Two Tables & Summing REV amts by Mth

Below are my two tables of data
Acct BillingDate REV
101 01/05/2018 5
101 01/30/2018 4
102 01/15/2018 2
103 01/4/2018 3
103 02/05/2018 2
106 03/06/2018 5
Acct BillingDate Lease_Rev
101 01/15/2018 2
102 01/16/2018 1
103 01/19/2018 2
104 02/05/2018 3
105 04/02/2018 1
Desired Output
Acct Jan Feb Mar Apr
101 11
102 3
103 5 2
104 3
105 1
106 5
My SQL Script is Below:
SELECT [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,SUM(case when [NewSalesHistory].[billingdate] between '6/1/2016' and '6/30/2016' then REV else 0 end ) + [X].[Jun-16] AS 'Jun-16'
FROM [NewSalesHistory]
FULL join (SELECT [Account]
,SUM(case when [BWLease].[billingdate] between '6/1/2016' and '6/30/2016' then Lease_REV else 0 end ) as 'Jun-16'
FROM [AirgasPricing].[dbo].[BWLease]
GROUP BY [Account]) X ON [NewSalesHistory].[Account] = [X].[Account]
GROUP BY [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,[X].[Jun-16]
I am having trouble combining these tables. If there is a rev amt and lease rev amt then it will combine (sum) for that account. If there is not a lease rev amt (which is the majority of the time), it brings back NULLs for all other rev amts accounts in Table 1. Table one can have duplicate accounts with different Rev, while the Table two is one unique account only w Lease rev. The output above is how I would like to see the data.
What am I missing here? Thanks!
I would suggest union all and group by:
select acct,
sum(case when billingdate >= '2016-01-01' and billingdate < '2016-02-01' then rev end) as rev_201601,
sum(case when billingdate >= '2016-02-01' and billingdate < '2016-03-01' then rev end) as rev_201602,
. . .
from ((select nsh.acct, nsh.billingdate, nsh.rev
from NewSalesHistory
) union all
(select bl.acct, bl.billingdate, bl.rev
from AirgasPricing..BWLease bl
)
) x
group by acct;
Okay, so there are a few things going on here:
1) As Gordon Linoff mentioned you can perform a union all on the two tables. Be sure to limit your column selections and name your columns appropriately:
select
x as consistentname1,
y as consistentname2,
z as consistentname3
from [NewSalesHistory]
union all
select
a as consistentname1,
b as consistentname2,
c as consistentname3
from [BWLease]
2) Your desired result contains a pivoted month column. Generate a column with your desired granularity on the result of the union in step one. F.ex. months:
concat(datepart(yy, Date_),'-',datename(mm,Date_)) as yyyyM
Then perform aggregation using a group by:
select sum(...) as desiredcolumnname
...
group by PK1, PK2, yyyyM
Finally, PIVOT to obtain your result: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
3) If you have other fields/columns that you wish to present then you first need to determine whether they are measures (can be aggregated) or are dimensions. That may be best addressed in a follow up question after you've achieved what you set out for in this part.
Hope it helps
As an aside, it seems like you are preparing data for reporting. Performing these transformations can be facilitated using a GUI such as MS Power Query. As long as your end goal is not data manipulation in the DB itself, you do not need to resort to raw sql.

Get count of two different values but not same values

I have the following table format.
**ID Name Start Date End Date**
1 ABC 1/1/2015 12/31/2015
1 XYZ 4/1/2015 8/31//2015
1 DEF 1/1/2012 12/31/2012
2 ABC 1/23/2011 1/23/2012
2 ABC 1/31/2012 1/31/2013
3 DEF 2/12/2015 5/30/2015
3 XYZ 4/1/2015 6/01/2015
4 DEF 3/1/2015 12/31/2015
4 DEF 4/1/2015 6/30/2015
I need the count of ID's having Different Name which lies in date range of May 2015
Expected Results
ID COUNT
1 2
3 2
P.S - ID 4 also lies in the date range of MAY 2015, but the Name is same i.e DEF. So I need only ID's 1 and 3 but not 4.
Thank You in advance and appreciated for your efforts.
I imagine your sample data doesn't match your desired results, but I think this is what you're looking for using conditional aggregation:
select id, count(*)
from yourtable
group by id
having sum(case when '5/1/2015' between startdate and enddate then 1 else 0 end) > 1
and count(distinct name) = count(name)
SQL Fiddle Demo
The sum aggregation in the having clause is making sure there are multiple records in between that date. The count clause in the having clause is making sure there aren't any duplicates.
declare
#startdate datetime = '20150501',
#enddate datetime = '20150531'
select t.id, count(distinct t.name)
from mytable t
where t.startdate <= #enddate and t.enddate >= #startdate
group by t.id
having count(distinct t.name) > 1