Related
I need to take cumulative based on condition. If it is holiday I dont want to take cumulative.
If the row is the first row in date order and if that day is holiday then it should take daywiseplan value. For all subsequent rows, if IsHolday equals zero, accumulate DaywisePlan into the total. If IsHolday equals one, accumulate the value of DaywisePlan on the next row in date order where IsHolday equals zero
Date
DaywisePlan
IsHoliday
ExpectedOutput
7/1/2022
34
1
34
7/2/2022
34
1
34
7/3/2022
34
0
34
7/4/2022
34
0
68
7/5/2022
34
0
102
7/6/2022
34
0
136
7/7/2022
34
1
136
7/8/2022
34
1
136
7/9/2022
34
0
170
7/10/2022
34
0
204
7/11/2022
34
1
204
7/12/2022
34
0
238
in one Query id´ can't think of, but use a CTE , it is quite easy using The window Function SUM and FIRST_VALUE
if you have more month and want to have a SUM for all month sereately, you need to PARTITION both window function mothwise
WITH CTE AS
(SELECT
[Date], [DaywisePlan], [IsHoliday],
FIRST_VALUE([DaywisePlan]) OVER(PARTITION BY [IsHoliday] ORDER By [Date]) [First],
SUM(CASE WHEN [IsHoliday] = 0 THEN [DaywisePlan] ELSe 0 END) OVER(ORDER By [Date]) as [Sum]
FROM tab1)
SELECT [Date], [DaywisePlan], [IsHoliday]
,CASE WHEN [Sum] = 0 AND [IsHoliday] = 1 THEN [Sum]+ [first] ELSe [Sum] END as [Sum] FROM CTE
Date | DaywisePlan | IsHoliday | Sum
:---------------------- | ----------: | --------: | --:
2022-07-01 02:00:00.000 | 34 | 1 | 34
2022-07-02 02:00:00.000 | 34 | 1 | 34
2022-07-03 02:00:00.000 | 34 | 0 | 34
2022-07-04 02:00:00.000 | 34 | 0 | 68
2022-07-05 02:00:00.000 | 34 | 0 | 102
2022-07-06 02:00:00.000 | 34 | 0 | 136
2022-07-07 02:00:00.000 | 34 | 1 | 136
2022-07-08 02:00:00.000 | 34 | 1 | 136
2022-07-09 02:00:00.000 | 34 | 0 | 170
2022-07-10 02:00:00.000 | 34 | 0 | 204
2022-07-11 02:00:00.000 | 34 | 1 | 204
2022-07-12 02:00:00.000 | 34 | 0 | 238
db<>fiddle here
In the comments I proposed logic based on the first version of your expected results.
The expected results in the question as currently posed do not match that logic. Instead they seem to match this logic:
Do not accumulate DaywisePlan until arriving at the first row in order of date ascending where IsHolday equals zero. For that row and all subsequent rows, if IsHolday equals zero, accumulate DaywisePlan into the total.
You have also used an ambiguous date format which I infer (given the nature of your question) to be 'month/day/year', but could also be valid 'day/month/year' values. Here it happens to be the caes that the interpretation makes no difference to the ordering, but you should make it a habit of using non-ambiguous date formats like 'yyyyMMdd'.
In any case, here is a query which will produce the original expected results, and another query which will produce the new expected results. I have used similar CTE's for both to make the logic (and the difference between them) a little easier to read.
create table #mytable
(
[Date] date primary key,
DaywisePlan int,
IsHoliday bit,
ExpectedOutput int
);
set dateformat mdy;
-- original dataset
insert #mytable values
('7/1/2022', 34, 1, 34 ),
('7/2/2022', 34, 1, 34 ),
('7/3/2022', 34, 0, 68 ),
('7/4/2022', 34, 0, 102 ),
('7/5/2022', 34, 0, 136 ),
('7/6/2022', 34, 0, 170 ),
('7/7/2022', 34, 1, 170 ),
('7/8/2022', 34, 1, 170 ),
('7/9/2022', 34, 0, 204 ),
('7/10/2022', 34, 0, 238 ),
('7/11/2022', 34, 1, 238 ),
('7/12/2022', 34, 0, 272 );
-- logic producing original dataset
with working as
(
select [date],
DaywisePlan,
IsHoliday,
ExpectedOutput,
FullAccum = sum(DayWisePlan)
over (order by [date] rows unbounded preceding),
HoldayAccum = sum
(
iif(isHoliday = 1 and [date] != t.mindate, DayWisePlan, 0)
) over (order by [date] rows unbounded preceding)
from #mytable
cross join (select min([date]) from #myTable) t(mindate)
)
select [date],
daywiseplan,
isholiday,
expectedoutput,
CalculatedOutput = FullAccum - HoldayAccum
from working;
-- edited dataset
delete from #mytable;
insert #mytable values
('7/1/2022', 34, 1, 34 ),
('7/2/2022', 34, 1, 34 ),
('7/3/2022', 34, 0, 34 ),
('7/4/2022', 34, 0, 68 ),
('7/5/2022', 34, 0, 102 ),
('7/6/2022', 34, 0, 136 ),
('7/7/2022', 34, 1, 136 ),
('7/8/2022', 34, 1, 136 ),
('7/9/2022', 34, 0, 170 ),
('7/10/2022', 34, 0, 204 ),
('7/11/2022', 34, 1, 204 ),
('7/12/2022', 34, 0, 238 );
-- logic to produce edited dataset
with working as
(
select [date],
DaywisePlan,
IsHoliday,
ExpectedOutput,
firstNonHoliday = (select min([date]) from #myTable where IsHoliday = 0),
FullAccum = sum(DayWisePlan)
over (order by [date] rows unbounded preceding),
HoldayAccum = sum
(
iif(isHoliday = 1, DayWisePlan, 0)
) over (order by [date] rows unbounded preceding)
from #mytable
)
select [date],
daywiseplan,
isholiday,
expectedoutput,
CalculatedOutput = iif([date] < firstNonHoliday, daywiseplan, FullAccum - HoldayAccum)
from working;
If you just mean to say "ignore any holidays after the first non-holiday" then the logic can be significantly simplified (keeping the CTE for comparative purposes):
with working as
(
select [date],
DaywisePlan,
IsHoliday,
ExpectedOutput,
firstNonHoliday = (select min([date]) from #myTable where IsHoliday = 0),
FullAccum = sum(iif(isHoliday = 0, DayWisePlan, 0))
over(order by date rows unbounded preceding)
from #mytable
)
select [date],
daywiseplan,
isholiday,
expectedoutput,
CalculatedOutput = iif([date] <= firstNonHoliday, dayWisePlan, fullaccum)
from working;
I have table load_ext which is an external table for the below file structure
customer
interval_type
data_count
Start_time
interval1
interval2
interval3
,..interval24
67891
60
5
06022022040000AM
0.07
0.767
0.65
0.69
0
0...
12345
60
8
06022022120000PM
0.07
0.767
0.65
0.69
0.767
0.69
0
0
To explain the above columns, All columns are varchar2. Interval type is in minutes, data_count column says the number of intervals to be posted starting from the start_time column, Interval1 is the value for 00:00:00 to 01:00:00 AM and likewise. Target table will have the same structure but the above intervals should be moved to the respective columns. For example, the value of interval1 column in the first row should be moved to column interval4 and the same for all other columns to the respective interval periods.
My target table should have the data like below
customer
interval_type
data_count
Start_time
interval1
interval2
interval3
interval4
interval5
..interval24
67891
60
5
06022022040000AM
0
0
0
0.07
0.767
0.65
0.81
0
0
12345
60
8
06022022120000PM
0
0
0
0
0
0
0
0
0.07
0.65
0.07
0.65
0
0...
I am providing the table data with ',' delimiter as the table structure is too big to post in the same format. This has to be done in Oracle only, we are using Oracle 19.
Unpivot the columns to rows, add the hours from the start time to the interval and then pivot back to columns:
SELECT *
FROM (
SELECT customer,
interval_type
data_count,
start_time,
MOD(
interval_name + TO_CHAR(TO_DATE(start_time, 'DDMMYYYYHH12MISSAM'), 'HH24'),
24
) AS interval_name,
value
FROM table_name
UNPIVOT (value FOR interval_name IN (
interval1 AS 00,
interval2 AS 01,
interval3 AS 02,
interval4 AS 03,
interval5 AS 04,
interval6 AS 05,
interval7 AS 06,
interval8 AS 07,
interval9 AS 08,
interval10 AS 09,
interval11 AS 10,
interval12 AS 11,
interval13 AS 12,
interval14 AS 13,
interval15 AS 14,
interval16 AS 15,
interval17 AS 16,
interval18 AS 17,
interval19 AS 18,
interval20 AS 19,
interval21 AS 20,
interval22 AS 21,
interval23 AS 22,
interval24 AS 23
))
)
PIVOT (
MAX(value) FOR interval_name IN (
00 AS interval24,
01 AS interval1,
02 AS interval2,
03 AS interval3,
04 AS interval4,
05 AS interval5,
06 AS interval6,
07 AS interval7,
08 AS interval8,
09 AS interval9,
10 AS interval10,
11 AS interval11,
12 AS interval12,
13 AS interval13,
14 AS interval14,
15 AS interval15,
16 AS interval16,
17 AS interval17,
18 AS interval18,
19 AS interval19,
20 AS interval20,
21 AS interval21,
22 AS interval22,
23 AS interval23
)
);
Which, for the sample data:
CREATE TABLE table_name (
customer,
interval_type,
data_count,
Start_time,
interval1,
interval2,
interval3,
interval4,
interval5,
interval6,
interval7,
interval8,
interval9,
interval10,
interval11,
interval12,
interval13,
interval14,
interval15,
interval16,
interval17,
interval18,
interval19,
interval20,
interval21,
interval22,
interval23,
interval24
) AS
SELECT 67891, 60, 5, '06022022040000AM', 0.07, 0.767, 0.65, 0.69, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 FROM DUAL UNION ALL
SELECT 12345, 60, 8, '06022022120000PM', 0.07, 0.767, 0.65, 0.69, 0.767, 0.69, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 FROM DUAL;
Outputs:
CUSTOMER
DATA_COUNT
START_TIME
INTERVAL24
INTERVAL1
INTERVAL2
INTERVAL3
INTERVAL4
INTERVAL5
INTERVAL6
INTERVAL7
INTERVAL8
INTERVAL9
INTERVAL10
INTERVAL11
INTERVAL12
INTERVAL13
INTERVAL14
INTERVAL15
INTERVAL16
INTERVAL17
INTERVAL18
INTERVAL19
INTERVAL20
INTERVAL21
INTERVAL22
INTERVAL23
12345
60
06022022120000PM
0
0
0
0
0
0
0
0
0
0
0
0
.07
.767
.65
.69
.767
.69
0
0
0
0
0
0
67891
60
06022022040000AM
0
0
0
0
.07
.767
.65
.69
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
db<>fiddle here
Here is one way to deal with it. First UNPIVOT the intervals, then index them (COLUMN_NO) and use MODEL clause to shift the values according to DATA_COUNT and finaly PIVOT it back again. Here is the sample with 12 intervals:
WITH
tbl AS
(
SELECT '67891' "CUSTOMER",
'60' "INTERVAL_TYPE",
'5' "DATA_COUNT",
'06022022040000AM' "START_TIME",
'0.07' "INTERVAL1",
'0.767' "INTERVAL2",
'0.65' "INTERVAL3",
'0.69' "INTERVAL4",
'0.62' "INTERVAL5",
'0.61' "INTERVAL6",
'0.70' "INTERVAL7",
'0.68' "INTERVAL8",
'0.62' "INTERVAL9",
'0.59' "INTERVAL10",
'0.69' "INTERVAL11",
'0.60' "INTERVAL12" FROM DUAL UNION ALL
--
SELECT '12345' "CUSTOMER",
'60' "INTERVAL_TYPE",
'8' "DATA_COUNT",
'06022022120000PM' "START_TIME",
'0.07' "INTERVAL1",
'0.767' "INTERVAL2",
'0.65' "INTERVAL3",
'0.69' "INTERVAL4",
'0.767' "INTERVAL5",
'0.69' "INTERVAL6",
'0.70' "INTERVAL7",
'0.68' "INTERVAL8",
'0.62' "INTERVAL9",
'0.59' "INTERVAL10",
'0.69' "INTERVAL11",
'0.60' "INTERVAL12" FROM DUAL
),
datarows AS
(
SELECT CUSTOMER, INTERVAL_TYPE, DATA_COUNT, START_TIME,
VALUE_NAME, VALUE_OF
FROM tbl
UNPIVOT (
VALUE_OF FOR VALUE_NAME
IN (INTERVAL1, INTERVAL2, INTERVAL3, INTERVAL4, INTERVAL5, INTERVAL6,
INTERVAL7, INTERVAL8, INTERVAL9, INTERVAL10, INTERVAL11, INTERVAL12)
)
),
dataset AS
(
SELECT
CUSTOMER, INTERVAL_TYPE, DATA_COUNT, START_TIME,
REPLACE(VALUE_NAME, 'INTERVAL', '') "COLUMN_NO",
VALUE_NAME,
VALUE_OF,
VALUE_OF "ORIG_VALUE"
FROM
datarows
),
combined AS
(
SELECT
CUSTOMER,
INTERVAL_TYPE,
DATA_COUNT,
START_TIME,
COLUMN_NO,
VALUE_NAME,
Nvl(VALUE_OF, '0') "VALUE_OF",
ORIG_VALUE
FROM
dataset
MODEL
PARTITION BY (CUSTOMER)
DIMENSION BY (COLUMN_NO, DATA_COUNT)
MEASURES (VALUE_OF, VALUE_NAME, INTERVAL_TYPE, START_TIME, ORIG_VALUE)
RULES ITERATE(12)
(
VALUE_OF[ITERATION_NUMBER + 1, ANY] = CASE
WHEN CV(COLUMN_NO) - To_Number(CV(DATA_COUNT)) + 1 < 0 - To_Number(CV(DATA_COUNT)) - 1 --CV(COLUMN_NO)
THEN '0'
ELSE
ORIG_VALUE[To_Char(To_Number(CV(COLUMN_NO)) - To_Number(CV(DATA_COUNT)) + 2), CV(DATA_COUNT)]
END
)
)
SELECT * FROM
(
SELECT
CUSTOMER,
INTERVAL_TYPE,
DATA_COUNT,
START_TIME,
VALUE_NAME,
VALUE_OF
FROM
combined
WHERE CUSTOMER = '67891'
)
PIVOT(
MAX(VALUE_OF)
FOR VALUE_NAME
IN ('INTERVAL1', 'INTERVAL2', 'INTERVAL3', 'INTERVAL4', 'INTERVAL5', 'INTERVAL6',
'INTERVAL7', 'INTERVAL8', 'INTERVAL9', 'INTERVAL10', 'INTERVAL11', 'INTERVAL12')
)
UNION ALL
SELECT * FROM
(
SELECT
CUSTOMER,
INTERVAL_TYPE,
DATA_COUNT,
START_TIME,
VALUE_NAME,
VALUE_OF
FROM
combined
WHERE CUSTOMER = '12345'
)
PIVOT(
MAX(VALUE_OF)
FOR VALUE_NAME
IN ('INTERVAL1%' AS INTERV1, 'INTERVAL2', 'INTERVAL3', 'INTERVAL4', 'INTERVAL5', 'INTERVAL6',
'INTERVAL7', 'INTERVAL8', 'INTERVAL9', 'INTERVAL10', 'INTERVAL11', 'INTERVAL12')
)
--
--
-- CUSTOMER INTERVAL_TYPE DATA_COUNT START_TIME 'INTERVAL1' 'INTERVAL2' 'INTERVAL3' 'INTERVAL4' 'INTERVAL5' 'INTERVAL6' 'INTERVAL7' 'INTERVAL8' 'INTERVAL9' 'INTERVAL10' 'INTERVAL11' 'INTERVAL12'
-- -------- ------------- ---------- ---------------- ------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ------------ ------------ ------------
-- 67891 60 5 06022022040000AM 0 0 0.07 0.767 0.65 0.69 0.62 0.61 0.70 0.68 0.62
-- 12345 60 8 06022022120000PM 0 0 0 0 0 0.07 0.767 0.65 0.69 0.767 0.69
Hopefully, you could use it to solve the problem. Regards...
Thank you for trying to help me, I figured out an alternate and faster way of doing it. Instead of using the load_ext which is an external table, I preferred using the utl_file to read the file directly line by line and writing the same by to new file, in between, I used an existing table that has rows already created for each interval.
My existing for interval periods has start and end time stamp for every 15 mins and its respective period number. So I just considered this period number based on the timestamp from the file for the respective line and generate as many spaces and concatenate to the last column of the file. The last column in the file contains all the intervals but only limited to the number provided in the data_count field of the file.
My steps are like this.
Rename and Open the source file in loop
Read the one line at a time, retrieve the data field.
Based on the timestamp, get the period number from the existing table and add data_count to this.
Generate as many spaces as derived in step 3 and concatenate to data field.
write this to a new file named as actual source file.
My job is done in a faster way compared to the SQL query.
Thank You Again
I have a table that has the below data.
COUNTRY LEVEL NUM_OF_DUPLICATES
US 9 6
US 8 24
US 7 12
US 6 20
US 5 39
US 4 81
US 3 80
US 2 430
US 1 178
US 0 430
I wrote a query that will calculate the sum of cumulative rows and got the below output .
COUNTRY LEVEL NUM_OF_DUPLICATES POOL
US 9 6 6
US 8 24 30
US 7 12 42
US 6 20 62
US 5 39 101
US 4 81 182
US 3 80 262
US 2 130 392
US 1 178 570
US 0 254 824
Now I want to to filter the data and take only where the POOL <=300, if the POOL field does not have the value 300 then I should take the first value after 300. So, in the above example we do not have the value 300 in the field POOL, so we take the next immediate value after 300 which is 392. So I need a query so that I can pull the records POOL <= 392(as per the example above) which will yield me the output as
COUNTRY LEVEL NUM_OF_DUPLICATES POOL
US 9 6 6
US 8 24 30
US 7 12 42
US 6 20 62
US 5 39 101
US 4 81 182
US 3 80 262
US 2 130 392
Please let me know your thoughts. Thanks in advance.
declare #t table(Country varchar(5), Level int, Num_of_Duplicates int)
insert into #t(Country, Level, Num_of_Duplicates)
values
('US', 9, 6),
('US', 8, 24),
('US', 7, 12),
('US', 6, 20),
('US', 5, 39),
('US', 4, 81),
('US', 3, 80),
('US', 2, 130/*-92*/),
('US', 1, 178),
('US', 0, 430);
select *, sum(Num_of_Duplicates) over(partition by country order by Level desc),
(sum(Num_of_Duplicates) over(partition by country order by Level desc)-Num_of_Duplicates) / 300 as flag,--any row which starts before 300 will have flag=0
--or
case when sum(Num_of_Duplicates) over(partition by country order by Level desc)-Num_of_Duplicates < 300 then 1 else 0 end as startsbefore300
from #t;
select *
from
(
select *, sum(Num_of_Duplicates) over(partition by country order by Level desc) as Pool
from #t
) as t
where Pool - Num_of_Duplicates < 300 ;
The logic here is quite simple:
Calculate the running sum POOL value up to the current row.
Filter rows so that the previous row's total is < 300, you can either subtract the current row's value, or use a second sum
If the total up to the current row is exactly 300, the previous row will be less, so this row will be included
If the current row's total is more than 300, but the previous row is less then it will also be included
All higher rows are excluded
It's unclear what ordering you want. I've used NUM_OF_DUPLICATES column ascending, but you may want something else
SELECT
COUNTRY,
LEVEL,
NUM_OF_DUPLICATES,
POOL
FROM (
SELECT *,
POOL = SUM(NUM_OF_DUPLICATES) OVER (ORDER BY NUM_OF_DUPLICATES ROWS UNBOUNDED PRECEDING)
-- alternative calculation
-- ,POOLPrev = SUM(NUM_OF_DUPLICATES) OVER (ORDER BY NUM_OF_DUPLICATES ROWS UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM YourTable
) t
WHERE POOL - NUM_OF_DUPLICATES < 300;
-- you could also use POOLPrev above
I used two temp tables to get the answer.
DECLARE #t TABLE(Country VARCHAR(5), [Level] INT, Num_of_Duplicates INT)
INSERT INTO #t(Country, Level, Num_of_Duplicates)
VALUES ('US', 9, 6),
('US', 8, 24),
('US', 7, 12),
('US', 6, 20),
('US', 5, 39),
('US', 4, 81),
('US', 3, 80),
('US', 2, 130),
('US', 1, 178),
('US', 0, 254);
SELECT
Country
,Level
, Num_of_Duplicates
, SUM (Num_of_Duplicates) OVER (ORDER BY id) AS [POOL]
INTO #temp_table
FROM
(
SELECT
Country,
level,
Num_of_Duplicates,
ROW_NUMBER() OVER (ORDER BY country) AS id
FROM #t
) AS A
SELECT
[POOL],
ROW_NUMBER() OVER (ORDER BY [POOL] ) AS [rank]
INTO #Temp_2
FROM #temp_table
WHERE [POOL] >= 300
SELECT *
FROM #temp_table WHERE
[POOL] <= (SELECT [POOL] FROM #Temp_2 WHERE [rank] = 1 )
DROP TABLE #temp_table
DROP TABLE #Temp_2
Requirement is to Group record of table based on 10 second time interval. Given table
Id DateTime Rank
1 2011-09-27 18:36:15 1
2 2011-09-27 18:36:15 1
3 2011-09-27 18:36:19 1
4 2011-09-27 18:36:23 1
5 2011-09-27 18:36:26 1
6 2011-09-27 18:36:30 1
7 2011-09-27 18:36:32 1
8 2011-09-27 18:36:14 2
9 2011-09-27 18:36:16 2
10 2011-09-27 18:36:35 2
Group Should be like this
Id DateTime Rank GroupRank
1 2011-09-27 18:36:15 1 1
2 2011-09-27 18:36:15 1 1
3 2011-09-27 18:36:19 1 1
4 2011-09-27 18:36:23 1 1
5 2011-09-27 18:36:26 1 2
6 2011-09-27 18:36:30 1 2
7 2011-09-27 18:36:32 1 2
8 2011-09-27 18:36:14 2 3
9 2011-09-27 18:36:16 2 3
10 2011-09-27 18:36:35 2 4
For Rank 1 Minimum time is 18:36:15 and based on that all records between 18:36:15 to 18:36:24 should be in a group and so on.
I want GroupRank in the same table. so it would be something with dense_Rank() Over clause. Can anyone help me to write the query in SQL.
You need to do this in two steps, the first is to separate each record into its 10 second groups, by getting the number of seconds difference from the minimum time for each rank, dividing it by 10, then rounding it down to the nearest integer.
SELECT *,
SecondGroup = FLOOR(DATEDIFF(SECOND,
MIN([DateTime]) OVER(PARTITION BY [Rank]),
[DateTime]) / 10.0)
FROM #T;
Which gives:
Id DateTime Rank SecondGroup
---------------------------------------------------
1 2011-09-27 18:36:15.000 1 0
2 2011-09-27 18:36:15.000 1 0
3 2011-09-27 18:36:19.000 1 0
4 2011-09-27 18:36:23.000 1 0
5 2011-09-27 18:36:26.000 1 1
6 2011-09-27 18:36:30.000 1 1
7 2011-09-27 18:36:32.000 1 1
8 2011-09-27 18:36:14.000 2 0
9 2011-09-27 18:36:16.000 2 0
10 2011-09-27 18:36:35.000 2 2
Then you can do your DENSE_RANK ordering by Rank and SecondGroup:
SELECT Id, [DateTime], [Rank],
GroupRank = DENSE_RANK() OVER(ORDER BY [Rank], SecondGroup)
FROM ( SELECT *,
SecondGroup = FLOOR(DATEDIFF(SECOND,
MIN([DateTime]) OVER(PARTITION BY [Rank]),
[DateTime]) / 10.0)
FROM #T
) AS t;
Which gives your desired output.
SAMPLE DATA
CREATE TABLE #T (Id INT, [DateTime] DATETIME, [Rank] INT);
INSERT #T (Id, [DateTime], [Rank])
VALUES
(1, '2011-09-27 18:36:15', 1),
(2, '2011-09-27 18:36:15', 1),
(3, '2011-09-27 18:36:19', 1),
(4, '2011-09-27 18:36:23', 1),
(5, '2011-09-27 18:36:26', 1),
(6, '2011-09-27 18:36:30', 1),
(7, '2011-09-27 18:36:32', 1),
(8, '2011-09-27 18:36:14', 2),
(9, '2011-09-27 18:36:16', 2),
(10, '2011-09-27 18:36:35', 2);
I m trying to achieve flag setting for the condition in my table below
p_id mon_year e_id flag
---- --------- ----- -----
1 2011/11 20 0
1 2011/11 21 1
1 2012/01 22 1
1 2012/02 23 0
1 2012/02 24 0
1 2012/02 25 1
2 2011/11 28 0
2 2011/11 29 1
2 2012/01 30 1
grouping by p_id,e_id and mon_year, the flag is set for the last value in the month.
I m confused how can i achieve this
I tried to achieved this by using row_number and partition to seperate out the value. By still looking for to achieved
Output by using row_number query , i have got is as below:
Grouping by
p_id mon_year e_id row
---- --------- ----- -----
1 2011/11 20 1
1 2011/11 21 2
1 2012/01 22 1
1 2012/02 23 1
1 2012/02 24 2
1 2012/02 25 3
2 2011/11 28 1
2 2011/11 29 2
2 2012/01 30 1
Max of this value would set the flag column. But i m really bugged how to achieve it. Any help would be useful.
Thanks !!
I think this is what you're going for. . . The output exactly matches your example:
declare #t table (p_id int, [year] int, [month] int, [day] int)
insert #t select 1, 2011, 11, 20
union select 1, 2011, 11, 21
union select 1, 2012, 01, 22
union select 1, 2012, 02, 23
union select 1, 2012, 02, 24
union select 1, 2012, 02, 25
union select 2, 2011, 11, 28
union select 2, 2011, 11, 29
union select 2, 2012, 01, 30
select p_id, [year], [month], [day]
, case when r=1 then 1 else 0 end flag
from
(
select p_id, [year], [month], [day]
, row_number() over (partition by p_id, [year], [month] order by [day] desc) r
from #t
) x
order by p_id, [year], [month], [day]
Output:
p_id year month day flag
1 2011 11 20 0
1 2011 11 21 1
1 2012 1 22 1
1 2012 2 23 0
1 2012 2 24 0
1 2012 2 25 1
2 2011 11 28 0
2 2011 11 29 1
2 2012 1 30 1
Try ordering by descending. In that way, you don't have to look for maximum ROW_NUMBER but when ROW_NUMBER is 1 ;)
Something like this (I didn't completely understand what you want to achieve, so this is probably not 100% accurate):
WITH r_MyTable
AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY mon_year ORDER BY p_id, e_id DESC) AS GroupRank
FROM MyTable
)
UPDATE r_MyTable
SET flag = CASE WHEN GroupRank = 1 THEN 1 ELSE 0 END;
You can use max statement on e_id to get last value for the month, code is below:
IF OBJECT_ID('tempdb..#tmptest') IS NOT NULL
DROP TABLE #tmptest
SELECT
*
INTO
#tmptest
FROM
(
SELECT '1' p_id, '2011/11' mon_year, '20' e_id, '0' flag UNION ALL
SELECT '1', '2011/11', '21', '1' UNION ALL
SELECT '1', '2012/01', '22', '1' UNION ALL
SELECT '1', '2012/02', '23', '0' UNION ALL
SELECT '1', '2012/02', '24', '0' UNION ALL
SELECT '1', '2012/02', '25', '1' UNION ALL
SELECT '2', '2011/11', '28', '0' UNION ALL
SELECT '2', '2011/11', '29', '1' UNION ALL
SELECT '2', '2012/01', '30', '1'
) as tmp
SELECT
tmptest.*
FROM
(
SELECT
MAX(e_id) e_id
,p_id
,mon_year
FROM
#tmptest
GROUP BY
p_id,mon_year
) tblLastValueEID
INNER JOIN
#tmptest tmptest
ON
tmptest.p_id = tblLastValueEID.p_id
AND
tmptest.mon_year = tblLastValueEID.mon_year
AND
tmptest.e_id = tblLastValueEID.e_id