Flag the date when they return - sql

Story:
For each id , they have a join date to a subscription and when they get rebilled monthly, they have a returning date. The first part of the exercise was to flag consecutive months of returned dates from the join date. Here's an example:
+----+------------+----------------+------+
| id | join_date | returning_date | flag |
+----+------------+----------------+------+
| 1 | 2018-12-01 | 2019-01-01 | 1 |
| 1 | 2018-12-01 | 2019-02-01 | 1 |
| 1 | 2018-12-01 | 2019-03-01 | 1 |
+----+------------+----------------+------+
Objective:
What I would like to add is to flag those who return from a canceled subscription. That flag can be in another column. For example the following results shows that on May 1st 2019 , he returned. This date needs to be flagged:
+----+------------+----------------+------+
| id | join_date | returning_date | flag |
+----+------------+----------------+------+
| 1 | 2018-12-01 | 2019-01-01 | 1 |
| 1 | 2018-12-01 | 2019-02-01 | 1 |
| 1 | 2018-12-01 | 2019-03-01 | 1 |
| 1 | 2018-12-01 | 2019-05-01 | 0 |
| 1 | 2018-12-01 | 2019-06-01 | 0 |
+----+------------+----------------+------+
Fiddle Data:
DROP TABLE IF EXISTS #T1
create table #t1 (id int,join_date date, returning_date date)
insert into #t1 values
(1,'2018-12-01', '2019-01-01'),
(1,'2018-12-01', '2019-02-01'),
(1,'2018-12-01', '2019-03-01'),
(1,'2018-12-01', '2019-05-01'),
(1,'2018-12-01', '2019-06-01'),
(2,'2018-12-01', '2019-02-01'),
(2,'2018-12-01', '2019-03-01'),
(2,'2018-12-01', '2019-05-01'),
(2,'2018-12-01', '2019-06-01'),
(3,'2019-05-01', '2019-06-01'),
(3,'2019-05-01', '2019-08-01'),
(3,'2019-05-01', '2019-10-01')
Current query with flag for consecutive months:
select *
,CASE WHEN DATEDIFF(MONTH,join_date,returning_date) = ROW_NUMBER() OVER (PARTITION BY id ORDER BY returning_date ASC) THEN 1 ELSE 0 END AS flag
from #t1
ORDER BY ID,returning_date

You seem to be asking if there are any gaps since an id first returned (with a given join_date).
If so, that is simply counting. How many months since the first return_date? How many rows? Compare these to see if there are gaps:
select t1.*,
(case when datediff(month, min(returning_date) over (partition by id, join_date order by returning_date), returning_date) <>
row_number() over (partition by id, join_date order by returning_date) - 1
then 0 else 1
end) as flag
from t1;
Here is a db<>fiddle.

since you didn't specify which recurrence of returning as the target to flag, my query flags any non-consecutive date as a return date cause a subscriber could leave and return many times after their join date (the subscriber with [id] 3 technically returned in August and then again in October so that's returning twice but October is marked as LAST instead based on the data set). i also made it easier to read by adding in start date and end date based on the data set in your fiddle.
you can use this query as a temp table, cte, basis, or whatever to continue to query against if you need to manipulate the data further.
select a.*
,case
when a.returning_date = (select min(c.returning_date) from subscription c where c.id = a.id and c.join_date = a.join_date) then 'START'
when a.returning_date = (select max(c.returning_date) from subscription c where c.id = a.id and c.join_date = a.join_date) then 'END'
when b.id is null then 'RETURN'
else 'CONSECUTIVE'
end as SubStatus
from subscription a
left join subscription b on a.id = b.id and a.join_date = b.join_date and DATEADD(month,-1,a.returning_date) = b.returning_date
here is the result set from my query:
id join_date returning_date SubStatus
----------- ---------- -------------- -----------
1 2018-12-01 2019-01-01 START
1 2018-12-01 2019-02-01 CONSECUTIVE
1 2018-12-01 2019-03-01 CONSECUTIVE
1 2018-12-01 2019-05-01 RETURN
1 2018-12-01 2019-06-01 END
2 2018-12-01 2019-02-01 START
2 2018-12-01 2019-03-01 CONSECUTIVE
2 2018-12-01 2019-05-01 RETURN
2 2018-12-01 2019-06-01 END
3 2019-05-01 2019-06-01 START
3 2019-05-01 2019-08-01 RETURN
3 2019-05-01 2019-10-01 END

flag consecutive months
and
renders all future payments
are not phrases that are going to lead to a pretty query. Which is why you had to resort to a while loop. Nevertheless, what you seek is possible, and with work may prove more performant than your while loop for large data. I present my sample code below using cte's, but you may want to use temp tables ore update an originally null 'flag' column on the base table.
In flagNonConsecutive, a flag is applied for any date that is not consecutive with the previous date (as identified using the lag window function) or by the join_date.
This meets the first requirement. Then in minNonConsecutives, you identify the earliest of those flags for each id.
In the main query, any dates after the minimum get the 0 treatment:
with
flagNonConsecutive as (
select *,
nonConsecutive =
case
when datediff(month, join_date, returning_date) = 1 then 1
when datediff(
month,
lag(returning_date) over(
partition by id
order by returning_date
),
returning_date
) = 1 then 1
else 0
end
from #t1
),
minNonConsecutives as (
select id,
minNonConsec = min(returning_date)
from flagNonConsecutive
where nonConsecutive = 0
group by id
)
select fnc.id,
fnc.join_date,
fnc.returning_date,
flag = iif(fnc.returning_date >= mnc.minNonConsec, 0, 1)
from flagNonConsecutive fnc
left join minNonConsecutives mnc on fnc.id = mnc.id;

Related

How to create BigQuery this query in retail dataset

I have a table with user retail transactions. It includes sales and cancels. If Qty is positive - it sells, if negative - cancels. I want to attach cancels to the most appropriate sell. So, I have tables likes that:
| CustomerId | StockId | Qty | Date |
|--------------+-----------+-------+------------|
| 1 | 100 | 50 | 2020-01-01 |
| 1 | 100 | -10 | 2020-01-10 |
| 1 | 100 | 60 | 2020-02-10 |
| 1 | 100 | -20 | 2020-02-10 |
| 1 | 100 | 200 | 2020-03-01 |
| 1 | 100 | 10 | 2020-03-05 |
| 1 | 100 | -90 | 2020-03-10 |
User with ID 1 has the following actions: buy 50 -> return 10 -> buy 60 -> return 20 -> buy 200 -> buy 10 - return 90. For each cancel row (with negative Qty) I find the previous row (by Date) with positive Qty and greater than cancel Qty.
So I need to create BigQuery queries to create table likes this:
| CustomerId | StockId | Qty | Date | CancelQty |
|--------------+-----------+-------+------------+-------------|
| 1 | 100 | 50 | 2020-01-01 | -10 |
| 1 | 100 | 60 | 2020-02-10 | -20 |
| 1 | 100 | 200 | 2020-03-01 | -90 |
| 1 | 100 | 10 | 2020-03-05 | 0 |
Does anybody help me with these queries? I have created one candidate query (split cancel and sales, join them, and do some staff for removing), but it works incorrectly in the above case.
I use BigQuery, so any BQ SQL features could be applied.
Any ideas will be helpful.
You can use the following query.
;WITH result AS (
select t1.*,t2.Qty as cQty,t2.Date as Date_t2 from
(select *,ROW_NUMBER() OVER (ORDER BY qty DESC) AS [ROW NUMBER] from Test) t1
join
(select *,ROW_NUMBER() OVER (ORDER BY qty) AS [ROW NUMBER] from Test) t2
on t1.[ROW NUMBER] = t2.[ROW NUMBER]
)
select CustomerId,StockId,Qty,Date,ISNULL(cQty, 0) As CancelQty,Date_t2
from (select CustomerId,StockId,Qty,Date,case
when cQty < 0 then cQty
else NULL
end AS cQty,
case
when cQty < 0 then Date_t2
else NULL
end AS Date_t2 from result) t
where qty > 0
order by cQty desc
result: https://dbfiddle.uk
You can do this as a gaps-and-islands problem. Basically, add a grouping column to the rows based on a cumulative reverse count of negative values. Then within each group, choose the first row where the sum is positive. So:
select t.* (except cancelqty, grp),
(case when min(case when cancelqty + qty >= 0 then date end) over (partition by customerid grp) = date
then cancelqty
else 0
end) as cancelqty
from (select t.*,
min(cancelqty) over (partition by customerid, grp) as cancelqty
from (select t.*,
countif(qty < 0) over (partition by customerid order by date desc) as grp
from transactions t
) t
from t
) t;
Note: This works for the data you have provided. However, there may be complicated scenarios where this does not work. In fact, I don't think there is a simple optimal solution assuming that the returns are not connected to the original sales. I would suggest that you fix the data model so you record where the returns come from.
The below query seems to satisfy the conditions and the output mentioned.The solution is based on mapping the base table (t) and having the corresponding canceled qty row alongside from same table(t1)
First, a self join based on the customer and StockId is done since they need to correspond to the same customer and product.
Additionally, we are bringing in the canceled transactions t1 that happened after the base row in table t t.Dt<=t1.Dt and to ensure this is a negative qty t1.Qty<0 clause is added
Further we cannot attribute the canceled qty if they are less than the Original Qty. Therefore I am checking if the positive is greater than the canceled qty. This is done by adding a '-' sign to the cancel qty so that they can be compared easily. -(t1.Qty)<=t.Qty
After the Join, we are interested only in the positive qty, so adding a where clause to filter the other rows from the base table t with canceled quantities t.Qty>0.
Now we have the table joined to every other canceled qty row which is less than the transaction date. For example, the Qty 50 can have all the canceled qty mapped to it but we are interested only in the immediate one came after. So we first group all the base quantity values and then choose the date of the canceled Qty that came in first in the Having clause condition HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
Finally we get the rows we need and we can exclude the last column if required using an outer select query
SELECT t.CustomerId,t.StockId,t.Qty,t.Dt,IFNULL(t1.Qty, 0) CancelQty
,t1.dt dt_t1
FROM tbl t
LEFT JOIN tbl t1 ON t.CustomerId=t1.CustomerId AND
t.StockId=t1.StockId
AND t.Dt<=t1.Dt AND t1.Qty<0 AND -(t1.Qty)<=t.Qty
WHERE t.Qty>0
GROUP BY 1,2,3,4
HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
ORDER BY 1,2,4,3
fiddle
Consider below approach
with sales as (
select * from `project.dataset.table` where Qty > 0
), cancels as (
select * from `project.dataset.table` where Qty < 0
)
select any_value(s).*,
ifnull(array_agg(c.Qty order by c.Date limit 1)[offset(0)], 0) as CancelQty
from sales s
left join cancels c
on s.CustomerId = c.CustomerId
and s.StockId = c.StockId
and s.Date <= c.Date
and s.Qty > abs(c.Qty)
group by format('%t', s)
if applied to sample data in your question - output is

SQL logic to determine unsold inventory and corresponding available dates (Available to sell)

I am looking for advice on how to generate SQL to be used in SQL Server that will show available inventory to sell and the corresponding date that said inventory will be available. I am easily able to determine if we have inventory that is available immediately but can't wrap my head around what logic would be needed to determine future available quantities.
In the below table. The +/- column represents the weekly inbound vs outbound and the quantity available is a rolling SUM OVER PARTITION BY of the +/- column. I was able to get the immediate quantity available through this simple logic:
Case when Min(X.Qty_Available) > 0 Then Min(X.Qty_Available) else 0 END
AS Immediate_available_Qty
Table:
+-------------+---------------+---------------+------+---------------+
| Item Number | Item Name | week_end_date | +/- | Qty_Available |
+-------------+---------------+---------------+------+---------------+
| 123456 | Fidget Widget | 7/13/2019 | 117 | 117 |
| 123456 | Fidget Widget | 7/20/2019 | 49 | 166 |
| 123456 | Fidget Widget | 7/27/2019 | -7 | 159 |
| 123456 | Fidget Widget | 8/3/2019 | -12 | 147 |
| 123456 | Fidget Widget | 8/10/2019 | -1 | 146 |
| 123456 | Fidget Widget | 8/17/2019 | 45 | 191 |
| 123456 | Fidget Widget | 8/24/2019 | -1 | 190 |
| 123456 | Fidget Widget | 8/31/2019 | -1 | 189 |
| 123456 | Fidget Widget | 9/7/2019 | 50 | 239 |
+-------------+---------------+---------------+------+---------------+
My desired results of this query would be as follows:
+-----------+-----+
| Output | Qty |
+-----------+-----+
| 7/13/2019 | 117 |
| 7/20/2019 | 29 |
| 8/17/2019 | 43 |
+-----------+-----+
the second availability is determined by taking the first available quantity of 117 out of each line in Qty_Available column and finding the new minimum. If the new min is Zero, find the next continuously positive string of data (that runs all the way to the end of the data). Repeat for the third_available quantity and then stop.
I was on the thought train of pursuing RCTE logic but don't want to dive into that rabbit hole if there is a better way to tackle this issue and I'm not even sure the RCTE work for this problem?
This should return your expected result:
SELECT Item_Number, Min(week_end_date), Sum("+/-")
FROM
(
SELECT *
-- put a positive value plus all following negative values in the same group
-- using a Cumulative Sum over 0/1
,Sum(CASE WHEN "+/-" > 0 THEN 1 ELSE 0 end)
Over (PARTITION BY Item_Number
ORDER BY week_end_date
ROWS UNBOUNDED PRECEDING) AS grp
FROM my_table
) AS dt
WHERE grp <= 3 -- only the 1st 3 groups
GROUP BY Item_Number, grp
So here's what I came up with. I know this is poor, I didn't want to leave this thread high and dry and maybe I can get more insight on a better path. Please know that I've never had any real training so I don't know what I don't know.
I ended up running this into a temp table and altering the commented out section in table "A". then re-running that into a temp table.
Select
F.Upc,
F.name,
F.Week_end_date as First_Available_Date,
E.Qty_Available_1
From
(
Select Distinct
D.Upc,
D.name,
Case When Min(D.Rolling_Qty_Available) Over ( PARTITION BY D.upc) < 1 then 0 else
Min(D.Rolling_Qty_Available) Over ( PARTITION BY D.upc) END as Qty_Available_1,
Case When Max(D.Look_up_Ref) Over ( PARTITION BY D.upc) = 0 then '-1000' else
Max(D.Look_up_Ref) Over ( PARTITION BY D.upc) END as Look_up_Ref_1
From
(
Select
A.Upc,
A.name,
A.Week_end_Date,
A.Rolling_Qty_Available,
CASE WHEN
C.Max_Row = A.Row_num and A.[Rolling_Qty_Available] >1 THEN 1
ELSE
CASE WHEN
Sum(A.Calc_Row_Thing) OVER (Partition by A.UPC Order by A.Row_Num DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND Current ROW
) = (C.Max_Row - A.Row_num + 1)
THEN
C.Max_Row - A.Row_num + 1
ELSE 0 END
END as Look_up_Ref
FROM (
Select
G.Upc,
G.Name,
G.Week_End_Date,
G.Row_num,
G.Calc_Row_Thing,
G.Rolling_Qty_Available
--CASE When (G.Rolling_Qty_Available -
--isnull(H.Qty_Available_1,0)) > 0 then 1 else - 0 END as
--Calc_Row_Thing,
From [dbo].[ATS_item_detail_USA_vw] as G
--Left Join [dbo].[tmp_ats_usa_qty_1] as H on G.upc = H.upc
) AS A --Need to subtract QTY 1 out of here and below
join (
SELECT
B.upc,
Max(Row_num) AS Max_Row
FROM [dbo].[ATS_item_detail_USA_vw] AS B
GROUP BY B.upc
) as C on A.upc = C.upc
) as D
GROUP BY
D.Upc,
D.name,
D.Rolling_Qty_Available,
D.Look_up_Ref
HAVING Max(D.Look_up_Ref) > 1
) as E
Left join
(
SELECT
A.Upc,
A.name,
A.Week_end_Date,
A.Rolling_Qty_Available,
CASE WHEN
C.Max_Row = A.Row_num and A.[Rolling_Qty_Available] >1 THEN 1
ELSE
CASE WHEN
Sum(A.Calc_Row_Thing) OVER (Partition by A.UPC Order by A.Row_Num DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND Current ROW
) = (C.Max_Row - A.Row_num + 1)
THEN
C.Max_Row - A.Row_num + 1
ELSE 0 END
END as Look_up_Ref
From (
Select
G.Upc,
G.Name,
G.Week_End_Date,
G.Row_num,
G.Calc_Row_Thing,
G.Rolling_Qty_Available
--CASE When (G.Rolling_Qty_Available -
--isnull(H.Qty_Available_1,0)) > 0 then 1 else - 0 END as
--Calc_Row_Thing,
From [dbo].[ATS_item_detail_USA_vw] as G
--Left Join [dbo].[tmp_ats_usa_qty_1] as H on G.upc = H.upc
) as A --subtract qty_1 out the start qty 2 calc
join (
SELECT
B.upc,
Max(Row_num) as Max_Row
FROM [dbo].[ATS_item_detail_USA_vw] as B
GROUP BY B.upc
) AS C ON A.upc = C.upc
) AS F ON E.upc = F.upc and E.Look_up_Ref_1 = F.Look_up_Ref

Teradata sql query from grouping records using Intervals

In Teradata SQL how to assign same row numbers for the group of records created with in 8 seconds of time Interval.
Example:-
Customerid Customername Itembought dateandtime
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01
100 ALex Circketball 2017-02-10 10:10:06
100 ALex Baseball 2017-02-10 10:10:08
100 ALex volleyball 2017-02-10 10:11:01
100 ALex footbball 2017-02-10 10:11:05
100 ALex ringball 2017-02-10 10:11:08
100 Alex football 2017-02-10 10:12:10
My Expected result shoud have additional column with Row_number where it should assign the same number for all the purchases of the customer with in 8 seconds: Refer the below expected result
Customerid Customername Itembought dateandtime Row_number
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01 1
100 ALex Circketball 2017-02-10 10:10:06 1
100 ALex Baseball 2017-02-10 10:10:08 1
100 ALex volleyball 2017-02-10 10:11:01 2
100 ALex footbball 2017-02-10 10:11:05 2
100 ALex ringball 2017-02-10 10:11:08 2
100 Alex football 2017-02-10 10:12:10 3
This is one way to do it with a recursive cte. Reset the running total of difference from the previous row's timestamp when it gets > 8 to 0 and start a new group.
WITH ROWNUMS AS
(SELECT T.*
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TM) AS RNUM
/*Replace DATEDIFF with Teradata specific function*/
,DATEDIFF(SECOND,COALESCE(MIN(TM) OVER(PARTITION BY ID
ORDER BY TM ROWS BETWEEN 1 PRECEDING AND CURRENT ROW), TM),TM) AS DIFF
FROM T --replace this with your tablename and add columns as required
)
,RECURSIVE CTE(ID,TM,DIFF,SUM_DIFF,RNUM,GRP) AS
(SELECT ID,
TM,
DIFF,
DIFF,
RNUM,
CAST(1 AS int)
FROM ROWNUMS
WHERE RNUM=1
UNION ALL
SELECT T.ID,
T.TM,
T.DIFF,
CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN 0 ELSE C.SUM_DIFF+T.DIFF END,
T.RNUM,
CAST(CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN T.RNUM ELSE C.GRP END AS int)
FROM CTE C
JOIN ROWNUMS T ON T.RNUM=C.RNUM+1 AND T.ID=C.ID
)
SELECT ID,
TM,
DENSE_RANK() OVER(PARTITION BY ID ORDER BY GRP) AS row_num
FROM CTE
Demo in SQL Server
I am going to interpret the problem differently from vkp. Any row within 8 seconds of another row should be in the same group. Such values can chain together, so the overall span can be more than 8 seconds.
The advantage of this method is that recursive CTEs are not needed, so it should be faster. (Of course, this is not an advantage if the OP does not agree with the definition.)
The basic idea is to look at the previous date/time value; if it is more than 8 seconds away, then add a flag. The cumulative sum of the flag is the row number you are looking for.
select t.*,
sum(case when prev_dt >= dateandtime - interval '8' second
then 0 else 1
end) over (partition by customerid order by dateandtime
) as row_number
from (select t.*,
max(dateandtime) over (partition by customerid order by dateandtime row between 1 preceding and 1 preceding) as prev_dt
from t
) t;
Using Teradata's PERIOD data type and the awesome td_normalize_overlap_meet:
Consider table test32:
SELECT * FROM test32
+----+----+------------------------+
| f1 | f2 | f3 |
+----+----+------------------------+
| 1 | 2 | 2017-05-11 03:59:00 PM |
| 1 | 3 | 2017-05-11 03:59:01 PM |
| 1 | 4 | 2017-05-11 03:58:58 PM |
| 1 | 5 | 2017-05-11 03:59:26 PM |
| 1 | 2 | 2017-05-11 03:59:28 PM |
| 1 | 2 | 2017-05-11 03:59:46 PM |
+----+----+------------------------+
The following will group your records:
WITH
normalizedCTE AS
(
SELECT *
FROM TABLE
(
td_normalize_overlap_meet(NEW VARIANT_TYPE(periodCTE.f1), periodCTE.fper)
RETURNS (f1 integer, fper PERIOD(TIMESTAMP(0)), recordCount integer)
HASH BY f1
LOCAL ORDER BY f1, fper
) as output(f1, fper, recordcount)
),
periodCTE AS
(
SELECT f1, f2, f3, PERIOD(f3, f3 + INTERVAL '9' SECOND) as fper FROM test32
)
SELECT t2.f1, t2.f2, t2.f3, t1.fper, DENSE_RANK() OVER (PARTITION BY t2.f1 ORDER BY t1.fper) as fgroup
FROM normalizedCTE t1
INNER JOIN periodCTE t2 ON
t1.fper P_INTERSECT t2.fper IS NOT NULL
Results:
+----+----+------------------------+-------------+
| f1 | f2 | f3 | fgroup |
+----+----+------------------------+-------------+
| 1 | 2 | 2017-05-11 03:59:00 PM | 1 |
| 1 | 3 | 2017-05-11 03:59:01 PM | 1 |
| 1 | 4 | 2017-05-11 03:58:58 PM | 1 |
| 1 | 5 | 2017-05-11 03:59:26 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:28 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:46 PM | 3 |
+----+----+------------------------+-------------+
A Period in Teradata is a special data type that holds a date or datetime range. The first parameter is the start of the range and the second is the ending time (up to, but not including which is why it's "+ 9 seconds"). The result is that we get a 8 second time "Period" where each record might "intersect" with another record.
We then use td_normalize_overlap_meet to merge records that intersect, sharing the f1 field's value as the key. In your case that would be customerid. The result is three records for this one customer since we have three groups that "overlap" or "meet" each other's time periods.
We then join the td_normalize_overlap_meet output with the output from when we determined the periods. We use the P_INTERSECT function to see which periods from the normalized CTE INTERSECT with the periods from the initial Period CTE. From the result of that P_INTERSECT join we grab the values we need from each CTE.
Lastly, Dense_Rank() gives us a rank based on the normalized period for each group.

Merging sql row groups

My applications splits a single row data into different row chunks which are always in sorted order of startdate.
Where rowpart = 0 is the start and rowpart=2 is always the end
rowpart=1 is the middle part,which can be repeated n no of times.
I need to return row in such form like startdate of rowpart=0 and enddate of rowpart=2(if present or else return enddate for rowpart )
Rowpart = 0 is the start of new row chunk
Rowpart = 2 is always the end of the chunk
Chunks can be spread across different dates.
+-----+-------------------------+-------------------------+----------+
| Id | startdate | enddate | rowpart |
+-----+-------------------------+-------------------------+----------+
| 100 | 2016-11-30 00:00:00.000 | 2016-11-30 01:00:00.000 | 0 |
| 100 | 2016-11-30 02:00:00.000 | 2016-11-30 03:00:00.000 | 1 |
| 100 | 2016-11-30 10:00:00.000 | 2016-12-01 00:00:00.000 | 0 |
| 100 | 2016-12-01 02:00:00.000 | 2016-12-01 02:30:00.000 | 1 |
| 100 | 2016-12-01 10:00:00.000 | 2016-12-01 10:30:00.000 | 1 |
| 100 | 2016-12-01 16:00:00.000 | 2016-12-01 16:30:00.000 | 2 |
| 101 | 2016-12-11 10:00:00.000 | 2016-12-11 10:30:00.000 | 0 |
+-----+-------------------------+-------------------------+----------+
So the above table should return:
+-----+-------------------------+-------------------------+
| Id | startdate | enddate |
+-----+-------------------------+-------------------------+
| 100 | 2016-11-30 00:00:00.000 | 2016-11-30 03:00:00.000 |
| 100 | 2016-12-30 10:00:00.000 | 2016-12-01 16:30:00.000 |
| 101 | 2016-12-11 10:00:00.000 | 2016-12-11 10:30:00.000 |
+-----+-------------------------+-------------------------+
Any help would be appreciated
This should work:
;WITH temp
AS
(
SELECT Id, startdate,enddate,rowpart,
--Find out First Record
CASE WHEN rowpart=0
THEN 1
ELSE 0
END AS is_first,
--Find out Last Record, Check if next rowpart is 0 or NULL:
CASE WHEN COALESCE(LEAD(rowpart) OVER (ORDER BY Id, startdate),0) = 0 --Check if next rowpart is 0 or NULL
THEN 1
ELSE 0
END AS is_last
FROM #tab
)
SELECT DISTINCT
Id,
CASE WHEN is_first = 1
THEN startdate
ELSE LAG(startdate) OVER (ORDER BY Id, startdate)
END AS startdate,
CASE WHEN is_last = 1
THEN enddate
ELSE LEAD(enddate) OVER (ORDER BY Id, startdate)
END AS enddate
FROM temp
WHERE is_first = 1 OR is_last = 1
ORDER BY Id, startdate
What i try to do here: Inside the CTE i mark the first and the last record for each sequence. If rowpart=0 --> it's the first record. If the next record is null or the the rowpart of the next record is 0 then we have the last record.
So when querying the CTE we can eliminate the "records in between". What remains are 1 or 2 records per sequence (the first and the last, in some cases this is the same record).
Then we replace startdate with the startdate of the first record of the sequence and enddate with the enddate of the last record of the sequence.
Eliminate duplicate values with DISTINCT and you get the desired output.
This is a dirty piece of SQL, but at least it works ;-)
If you didn't know SQL Servers LEAD and LAGfunction to access previous or following row values check this out: http://blog.sqlauthority.com/2013/09/22/sql-server-how-to-access-the-previous-row-and-next-row-value-in-select-statement/
Looks like a simple Group by is all you need
Try this
select Id,min(startdate),max(enddate)
From yourtable
Group by Id,cast(startdate as date)
Select
Id,
startdate,
enddate
from (
select Id,
startdate,
enddate,ROW_NUMBER()OVER(PARTITION BY CONVERT(DATE,startdate) ORDER BY startdate DESC )RN from #Table1
GROUP BY Id, startdate, enddate)T
WHERE T.RN = 1
Check This. using CTE and Joins :
with CTE as
(
select distinct *,
CASE WHEN COALESCE(LEAD(rowpart) OVER (ORDER BY Id, startdate),0) = 0
THEN 1
ELSE 0
end as RN2
from #table
)
select distinct bb.id,bb.startdate,aa.enddate from
(
select C2.*,ROW_NUMBER()OVER( ORDER BY id, startdate ) RN3
from CTE C2 where RN2= 1
) aa
join
(
select distinct *,
ROW_NUMBER()OVER( ORDER BY id, startdate ) RN3
from CTE c1 where rowpart=0
) bb on aa.RN3=bb.RN3
OutPut :
WITH
your_table_lead AS
(
SELECT
your_table.*,
LAG(rowpart, 1, 2) OVER (PARTITION BY id
ORDER BY startdate) AS last_rowpart,
LEAD(rowpart, 1, 0) OVER (PARTITION BY id
ORDER BY startdate) AS next_rowpart
FROM
your_table
),
filtered_sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY startdate) AS id_seq_num
FROM
your_table_lead
WHERE
rowpart IN (0, 2)
OR next_rowpart = 0
OR last_rowpart = 2
)
SELECT
id,
MIN(startdate),
MAX(enddate)
FROM
filtered_sorted
GROUP BY
id,
id_seq_num - CASE rowpart WHEN 2 THEN 1 ELSE rowpart END
I'm on my phone so apologies for typos, etc.
The first steps just try to filter out everything except the first and last entry of each 'group'. If the rowpart is 0 or 2, the row is included, or if the Next Row's rowpart is 0 then that row is included too (If there isn't a next row, use 0).
Then the 'trick' is to find a way to group up the 'pairs'.
If we have a sequence of 0,2,0,1,0,2,2,0 then what we want is to group them like a,a,b,b,c,c,d,e.
That can be done by turning all the 2's into 1's, the deducting the value from the ROW_NUMBER().
0,2,0,1,0,2,2,0 => 0,1,0,1,0,1,1,0
1,2,3,4,5,6,7,8 - 0,1,0,1,0,1,1,0 => 1,1,3,3,5,5,6,8
So, now we have 5 distinct 'groups', on which we can apply MIN() and MAX().

Remove duplicate rows query result except for one in Microsoft SQL Server?

How would I delete all duplicate month from a Microsoft SQL Server Table?
For example, with the following syntax I just created:
SELECT * FROM Cash WHERE Id = '2' AND TransactionDate between '2014/07/01' AND '2015/02/28'
and the query result is:
+----+-------------------------+
|Id | TransactionDate |
+----+-------------------------+
| 2 | 2014-07-22 00:00:00.000 |
| 2 | 2014-08-09 00:00:00.000 |
| 2 | 2014-08-25 00:00:00.000 |
| 2 | 2014-08-29 00:00:00.000 |
| 2 | 2015-01-27 00:00:00.000 |
| 2 | 2015-01-28 00:00:00.000 |
+----+-------------------------+
How would I remove duplicates month which is only return any 1 value for any 1 month each, like this result:
+----+-------------------------+
|Id | TransactionDate |
+----+-------------------------+
| 2 | 2014-07-22 00:00:00.000 |
| 2 | 2014-08-09 00:00:00.000 |
| 2 | 2015-01-27 00:00:00.000 |
+----+-------------------------+
You can do it with the help of ROW_NUMBER.
This will tell you which are the rows you are going to keep
SELECT id,transactionDate, ROW_NUMBER() OVER ( PARTITION BY YEAR(TransactionDate ),MONTH(TransactionDate ) ORDER BY TransactionDate ) firstTrans
FROM Cash
WHERE Id = '2' AND
TransactionDate between '2014/07/01' AND '2015/02/28'
You can delete the other rows with a CTE.
with myCTE (id,transactionDate, firstTrans) AS (
SELECT id,transactionDate, ROW_NUMBER() OVER ( PARTITION BY YEAR(TransactionDate ),MONTH(TransactionDate ) ORDER BY TransactionDate ) firstTrans
FROM Cash
WHERE Id = '2' AND
TransactionDate between '2014/07/01' AND '2015/02/28'
)
delete from myCTE where firstTrans <> 1
Will only keep one transaction for each month of each year.
EDIT:
filter by the row_number and will only return the rows you want
select id, transactionDate from (SELECT id,transactionDate, ROW_NUMBER() OVER ( PARTITION BY YEAR(TransactionDate ),MONTH(TransactionDate ) ORDER BY TransactionDate ) firstTrans
FROM Cash
WHERE Id = '2' AND
TransactionDate between '2014/07/01' AND '2015/02/28') where firstTrans = 1
When you run this query you will get the highest Id for each month in each year.
SELECT MAX(<IdColumn>) AS Id, YEAR(<DateColumn>) AS YE, MONTH(<DateColumn>) AS MO FROM <YourTable>
GROUP BY YEAR(<DateColumn>), MONTH(<DateColumn>)
If needed, for example, you can late delete rows that their Id is not in this query.
Select only the first row per month
SELECT *
FROM Cash c
WHERE c.Id = '2'
AND c.TransactionDate between '2014/07/01' AND '2015/02/28'
AND NOT EXISTS ( SELECT 'a'
FROM Cash c2
WHERE c2.Id = c.Id
AND YEAR(c2.TransactionDate) * 100 + MONTH(c2.TransactionDate) = YEAR(c.TransactionDate) * 100 + MONTH(c.TransactionDate)
AND c2.TransactionDate < c.TransactionDate
)