I've data like this.
ID IND
1 0
2 0
3 1
4 0
5 1
6 0
7 0
I want to count the zeros before the value 1. So that, the output will be like below.
ID IND OUT
1 0 0
2 0 0
3 1 2
4 0 0
5 1 1
6 0 0
7 0 2
Is it possible without pl/sql? I tried to find the differences between row numbers but couldn't achieve it.
The match_recognize clause, introduced in Oracle 12.1, can do quick work of such "row pattern recognition" problems. The solution is just a bit complex due to the special treatment of a "last row" with ID = 0, but it is straightforward otherwise.
As usual, the with clause is not part of the solution; I include it to test the query. Remove it and use your actual table and column names.
with
inputs (id, ind) as (
select 1, 0 from dual union all
select 2, 0 from dual union all
select 3, 1 from dual union all
select 4, 0 from dual union all
select 5, 1 from dual union all
select 6, 0 from dual union all
select 7, 0 from dual
)
select id, ind, out
from inputs
match_recognize(
order by id
measures case classifier() when 'Z' then 0
when 'O' then count(*) - 1
else count(*) end as out
all rows per match
pattern ( Z* ( O | X ) )
define Z as ind = 0, O as ind != 0
);
ID IND OUT
---------- ---------- ----------
1 0 0
2 0 0
3 1 2
4 0 0
5 1 1
6 0 0
7 0 2
You can treat this as a gaps-and-islands problem. You can define the "islands" by the number of "1"s one or after each row. Then use a window function:
select t.*,
(case when ind = 1 or row_number() over (order by id desc) = 1
then sum(1 - ind) over (partition by grp)
else 0
end) as num_zeros
from (select t.*,
sum(ind) over (order by id desc) as grp
from t
) t;
If id is sequential with no gaps, you can do this without a subquery:
select t.*,
(case when ind = 1 or row_number() over (order by id desc) = 1
then id - coalesce(lag(case when ind = 1 then id end ignore nulls) over (order by id), min(id) over () - 1)
else 0
end)
from t;
I would suggest removing the case conditions and just using the then clause for the expression, so the value is on all rows.
Related
I have a BigQuery Table which looks like Below:
ID SessionNumber CountOfAction Category
1 1 1 B
1 2 3 A
1 3 1 A
1 4 4 B
1 5 5 B
I am trying to get the running total of all previous rows for CountofAction where category = A. The final Output should be
ID SessionNumber CountOfAction
1 1 0 --no previous rows have countofAction for category = A
1 2 0 --no previous rows have countofAction for category = A
1 3 3 --previous row (Row 2) has countofAction = 3 for category = A
1 4 4 --previous rows (Row 2 and 3) have countofAction = 3 and 1 for category = A
1 5 4 --previous rows (Row 2 and 3) have countofAction = 3 and 1 for category = A
Below is the query I have written but it doesn't give me desired output
select
ID,
SessionNumber ,
SUM(CountofAction) OVER(Partition by clieIDntid ORDER BY SessionNumber ROWS BETWEEN UNBOUNDED
PRECEDING AND 1 PRECEDING)as CumulativeCountofAction
From TAble1 where category = 'A'
I would really appreciate any help on this! Thanks in advance
Filtering on category in the where clause evicts (id, sessionNumber) tuples where category 'A' does not appear, which is not what you want.
Instead, you can use aggregation and a conditional sum():
select
id,
sessionNumber,
sum(sum(if(category = 'A', countOfAction, 0))) over(
partition by id
order by sessionNumber
rows between unbounded preceding and 1 preceding
) CumulativeCountofAction
from mytable t
group by id, sessionNumber
order by id, sessionNumber
Below is for BigQuery Standard SQL
#standardSQL
SELECT ID, SessionNumber,
IFNULL(SUM(IF(category = 'A', CountOfAction, 0)) OVER(win), 0) AS CountOfAction
FROM `project.dataset.table`
WINDOW win AS (ORDER BY SessionNumber ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
If to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 ID, 1 SessionNumber, 1 CountOfAction, 'B' Category UNION ALL
SELECT 1, 2, 3, 'A' UNION ALL
SELECT 1, 3, 1, 'A' UNION ALL
SELECT 1, 4, 4, 'B' UNION ALL
SELECT 1, 5, 5, 'B'
)
SELECT ID, SessionNumber,
IFNULL(SUM(IF(category = 'A', CountOfAction, 0)) OVER(win), 0) AS CountOfAction
FROM `project.dataset.table`
WINDOW win AS (ORDER BY SessionNumber ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
result is
Row ID SessionNumber CountOfAction
1 1 1 0
2 1 2 0
3 1 3 3
4 1 4 4
5 1 5 4
Can I somehow assign a new group to a row when a value in a column changes in T-SQL?
I would be grateful if you can provide solution that will work on unlimited repeating numbers without CTE and functions. I made a solution that work in sutuation with 100 consecutive identical numbers(with
coalesce(lag()over(), lag() over(), lag() over() ) - it is too bulky
but can not make a solution for a case with unlimited number of consecutive identical numbers.
Data
id somevalue
1 0
2 1
3 1
4 0
5 0
6 1
7 1
8 1
9 0
10 0
11 1
12 0
13 1
14 1
15 0
16 0
Expected
id somevalue group
1 0 1
2 1 2
3 1 2
4 0 3
5 0 3
6 1 4
7 1 4
8 1 4
9 0 5
10 0 5
11 1 6
12 0 7
13 1 8
14 1 8
15 0 9
16 0 9
If you just want a group identifier, you can use:
select t.*,
min(id) over (partition by some_value, seqnum - seqnum_1) as grp
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by somevalue order by id) as sequm_1
from t
) t;
If you want them enumerated . . . well, you can enumerate the id above using dense_rank(). Or you can use lag() and a cumulative sum:
select t.*,
sum(case when some_value = prev_sv then 0 else 1 end) over (order by id) as grp
from (select t.*,
lag(somevalue) over (order by id) as prev_sv
from t
) t;
Here's a different approach:
First I created a view to provide the group increment on each row:
create view increments as
select
n2.id,n2.somevalue,
case when n1.somevalue=n2.somevalue then 0 else 1 end as increment
from
(select 0 as id,1 as somevalue union all select * from mytable) n1
join mytable n2
on n2.id = n1.id+1
Then I used this view to produce the group values as cumulative sums of the increments:
select id, somevalue,
(select sum(increment) from increments i1 where i1.id <= i2.id)
from increments i2
I have a table that looks like below:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|0| 0
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 0
2 |3/1/16|1| 0
3 |3/1/16|2| 0
I'm trying to make it so that flag is populated if X=2 in the PREVIOUS month. As such, it should look like this:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|2| 1
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 1
2 |3/1/16|1| 0
3 |3/1/16|2| 1
I use this in SQL:
`select ID, date, X, flag into Work_Table from t
(
Select ID, date, X, flag,
Lag(X) Over (Partition By ID Order By date Asc) As Prev into Flag_table
From Work_Table
)
Update [dbo].[Flag_table]
Set flag = 1
where prev = '2'
UPDATE t
Set t.flag = [dbo].[Flag_table].flag FROM T
JOIN [dbo].[Flag_table]
ON t.ID= [dbo].[Flag_table].ID where T.date = [dbo].[Flag_table].date`
However I cannot do this in Bigquery. Any ideas?
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
You can test / play with it using dummy data from your question as
#standardSQL
WITH `project.dataset.work_table` AS (
SELECT 1 id, '1/1/16' dt, 2 x, 0 flag UNION ALL
SELECT 2, '1/1/16', 0, 0 UNION ALL
SELECT 3, '1/1/16', 0, 0 UNION ALL
SELECT 1, '2/1/16', 0, 0 UNION ALL
SELECT 2, '2/1/16', 1, 0 UNION ALL
SELECT 3, '2/1/16', 2, 0 UNION ALL
SELECT 1, '3/1/16', 2, 0 UNION ALL
SELECT 2, '3/1/16', 1, 0 UNION ALL
SELECT 3, '3/1/16', 2, 0
)
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
ORDER BY dt, id
with result as
Row id dt x flag
1 1 1/1/16 2 0
2 2 1/1/16 0 0
3 3 1/1/16 0 0
4 1 2/1/16 0 1
5 2 2/1/16 1 0
6 3 2/1/16 2 0
7 1 3/1/16 2 0
8 2 3/1/16 1 0
9 3 3/1/16 2 1
I have data in the below format
g_name amt flag
g1 0 0
g1 0 0
g1 10 1
g1 0 0
g1 15 2
g1 0 0
and I would require in the below format
n1 will have data starting from row where amt hits 1 and it keeps retaining it till the end, similarly n2 will have data starting from row where amt hits 2 and it keeps retaining it till the end, please help me with any window functions with out needing joins. please.
g_name amt flag n1 n2
g1 0 0 0 0
g1 0 0 0 0
g1 10 1 10 0
g1 0 0 10 0
g1 15 2 10 15
g1 0 0 10 15
I added a column for ordering - change as needed. I also added a few more rows with a different g_name, presumably this must be done "by g_name".
This is a good test case for the first_value() analytic function. It has the ability to ignore nulls - so we make the amt NULL when flag is not 1 (or 2, etc.) and then apply first_value() with the proper PARTITION BY and ORDER BY clauses.
with
test_data ( id, g_name, amt, flag ) as (
select 1, 'g1', 0, 0 from dual union all
select 2, 'g1', 0, 0 from dual union all
select 3, 'g1', 10, 1 from dual union all
select 4, 'g1', 0, 0 from dual union all
select 5, 'g1', 15, 2 from dual union all
select 6, 'g1', 0, 0 from dual union all
select 1, 'g2', 0, 0 from dual union all
select 2, 'g2', 4, 1 from dual union all
select 3, 'g2', 3, 2 from dual union all
select 4, 'g2', 0, 0 from dual
)
-- end of test data; solution (SQL query) begins below this line
select id, g_name, amt, flag,
coalesce (first_value(case when flag = 1 then amt end ignore nulls)
over (partition by g_name order by id), 0) as n1,
coalesce (first_value(case when flag = 2 then amt end ignore nulls)
over (partition by g_name order by id), 0) as n2
from test_data
order by g_name, id
;
ID G_NAME AMT FLAG N1 N2
--- ------ ---------- ---------- ---------- ----------
1 g1 0 0 0 0
2 g1 0 0 0 0
3 g1 10 1 10 0
4 g1 0 0 10 0
5 g1 15 2 10 15
6 g1 0 0 10 15
1 g2 0 0 0 0
2 g2 4 1 4 0
3 g2 3 2 4 3
4 g2 0 0 4 3
SQL tables represent unordered sets. There is no ordering, unless a column specifies that ordering. Let me assume that such a column exists.
If so, you can do this with analytic functions:
select t.*,
max(case when flag = 1 then amt else 0 end) over (order by ??) as n1,
max(case when flag = 2 then amt else 0 end) over (order by ??) as n2
from t;
The ?? specifies the ordering.
I have a table that has values and group ids (simplified example). I need to get the average for each group of the middle 3 values. So, if there are 1, 2, or 3 values it's just the average. But if there are 4 values, it would exclude the highest, 5 values the highest and lowest, etc. I was thinking some sort of window function, but I'm not sure if it's possible.
http://www.sqlfiddle.com/#!11/af5e0/1
For this data:
TEST_ID TEST_VALUE GROUP_ID
1 5 1
2 10 1
3 15 1
4 25 2
5 35 2
6 5 2
7 15 2
8 25 3
9 45 3
10 55 3
11 15 3
12 5 3
13 25 3
14 45 4
I'd like
GROUP_ID AVG
1 10
2 15
3 21.6
4 45
Another option using analytic functions;
SELECT group_id,
avg( test_value )
FROM (
select t.*,
row_number() over (partition by group_id order by test_value ) rn,
count(*) over (partition by group_id ) cnt
from test t
) alias
where
cnt <= 3
or
rn between floor( cnt / 2 )-1 and ceil( cnt/ 2 ) +1
group by group_id
;
Demo --> http://www.sqlfiddle.com/#!11/af5e0/59
I'm not familiar with the Postgres syntax on windowed functions, but I was able to solve your problem in SQL Server with this SQL Fiddle. Maybe you'll be able to easily migrate this into Postgres-compatible code. Hope it helps!
A quick primer on how I worked it.
Order the test scores for each group
Get a count of items in each group
Use that as a subquery and select only the middle 3 items (that's the where clause in the outer query)
Get the average for each group
--
select
group_id,
avg(test_value)
from (
select
t.group_id,
convert(decimal,t.test_value) as test_value,
row_number() over (
partition by t.group_id
order by t.test_value
) as ord,
g.gc
from
test t
inner join (
select group_id, count(*) as gc
from test
group by group_id
) g
on t.group_id = g.group_id
) a
where
ord >= case when gc <= 3 then 1 when gc % 2 = 1 then gc / 2 else (gc - 1) / 2 end
and ord <= case when gc <= 3 then 3 when gc % 2 = 1 then (gc / 2) + 2 else ((gc - 1) / 2) + 2 end
group by
group_id
with cte as (
select
*,
row_number() over(partition by group_id order by test_value) as rn,
count(*) over(partition by group_id) as cnt
from test
)
select
group_id, avg(test_value)
from cte
where
cnt <= 3 or
(rn >= cnt / 2 - 1 and rn <= cnt / 2 + 1)
group by group_id
order by group_id
sql fiddle demo
in the cte, we need to get count of elements over each group_id by window function + calculate row_number inside each group_id. Then, if this count > 3 then we need to get middle of the group by dividing count by 2 and then get +1 and -1 element. If count <= 3, then we should just take all elements.
This works:
SELECT A.group_id, avg(A.test_value) AS avg_mid3 FROM
(SELECT group_id,
test_value,
row_number() OVER (PARTITION BY group_id ORDER BY test_value) AS position
FROM test) A
JOIN
(SELECT group_id,
CASE
WHEN count(*) < 4 THEN 1
WHEN count(*) % 2 = 0 THEN (count(*)/2 - 1)
ELSE (count(*) / 2)
END AS position_start,
CASE
WHEN count(*) < 4 THEN count(*)
WHEN count(*) % 2 = 0 THEN (count(*)/2 + 1)
ELSE (count(*) / 2 + 2)
END AS position_end
FROM test GROUP BY group_id) B
ON A.group_id=B.group_id
AND A.position >= B.position_start
AND A.position <= B.position_end
GROUP BY A.group_id
Fiddle link: http://www.sqlfiddle.com/#!11/af5e0/56
If you need to calculate the average values for groups then you can do this:
SELECT CASE WHEN NUMBER_FIRST_GROUP <> 0
THEN SUM_FIRST_GROUP / NUMBER_FIRST_GROUP
ELSE NULL
END AS AVG_FIRST_GROUP,
CASE WHEN NUMBER_SECOND_GROUP <> 0
THEN SUM_SECOND_GROUP / NUMBER_SECOND_GROUP
ELSE NULL
END AS AVG_SECOND_GROUP,
CASE WHEN NUMBER_THIRD_GROUP <> 0
THEN SUM_THIRD_GROUP / NUMBER_THIRD_GROUP
ELSE NULL
END AS AVG_THIRD_GROUP,
CASE WHEN NUMBER_FOURTH_GROUP <> 0
THEN SUM_FOURTH_GROUP / NUMBER_FOURTH_GROUP
ELSE NULL
END AS AVG_FOURTH_GROUP
FROM (
SELECT
SUM(CASE WHEN GROUP_ID = 1 THEN 1 ELSE 0 END) AS NUMBER_FIRST_GROUP,
SUM(CASE WHEN GROUP_ID = 1 THEN TEST_VALUE ELSE 0 END) AS SUM_FIRST_GROUP,
SUM(CASE WHEN GROUP_ID = 2 THEN 1 ELSE 0 END) AS NUMBER_SECOND_GROUP,
SUM(CASE WHEN GROUP_ID = 2 THEN TEST_VALUE ELSE 0 END) AS SUM_SECOND_GROUP,
SUM(CASE WHEN GROUP_ID = 3 THEN 1 ELSE 0 END) AS NUMBER_THIRD_GROUP,
SUM(CASE WHEN GROUP_ID = 3 THEN TEST_VALUE ELSE 0 END) AS SUM_THIRD_GROUP,
SUM(CASE WHEN GROUP_ID = 4 THEN 1 ELSE 0 END) AS NUMBER_FOURTH_GROUP,
SUM(CASE WHEN GROUP_ID = 4 THEN TEST_VALUE ELSE 0 END) AS SUM_FOURTH_GROUP
FROM TEST
) AS FOO