how can i reset the count to 0 in sql when i have a condition that is false? - sql

i have a sql table which the following data shown in the picture
I need to create a query in sql which counts for ticker the number of consecutive days per year in which
the close_value is greater than the open_value, if close_value is less than the open value the counter must be reset to zero and I have to save the counter in that instant

This is an example of a gaps-and-islands problem. You can use the difference of row_numbers():
select ticker, min(date), max(date), min(open_value), max(close_value),
count(*) as num_rows
from (select t.*,
row_number() over (partition by ticker order by date) as seqnum,
row_number() over (partition by ticker, (case when close_value > open_value then 1 else 2 end) order by date) as seqnum_2
from t
) t
where close_value > open_value
group by ticker, (seqnum - seqnum_2);
This returns all such periods. You haven't specified what the result set should look like, but this should be pretty close.

Related

How do I add an autoincrement Counter based on Conditions and conditional reset in Google-Bigquery

Have my table in Big query and have a problem getting an incremental field based on a condition.
Basically every time the score hits below 95% it returns Stage 1 for the first week. If it hits below 95% for a second straight week it returns Stage 2 etc etc. however, if it goes above 95 % the counter resets to "Good". and thereafter returns Stage 1 if it goes below 95% etc etc.
You can use row_number() -- but after assigning a group based on the count of > 95% values up to each row:
select t.*,
(case when row_number() over (partition by grp order by month, week) = 1
then 'Good'
else concat('Stage ', row_number() over (partition by grp order by month, week) - 1)
end) as level
from (select t.*,
countif(score > 0.95) over (order by month, week) as grp
from t
) t;
Consider below
select * except(grp),
(case when Average_score >= 95 and 1 = row_number() over grps then 'Good'
else format('Stage %i', row_number() over grps - sign(grp))
end) as Level
from (
select *, countif(Average_score >= 95) over (order by Month, Week) as grp
from `project.dataset.table`
)
window grps as (partition by grp order by Month, Week)
If applied to sample data in your question - output is

Grouping Consecutive Timestamps (Redshift)

Got something that I cant get my head around
raw data shows every 15 min intervals and I would like to group them based on if they are consecutive 15 min intervals (see screenshot below) I will like to do this multiple times for each user and for alot of users... Any ideas on how to do this using sql only that can scale to 1000's users?
Any help would be appreicated
Thanks
This is a type of gaps-and-islands problem. Use lag() to get the difference, then a cumulative sum to identify the group:
select user_id, min(start_time), max(end_time)
from (select t.*,
sum( case when prev_end_time <> start_time then 0 else 1 end) over (partition by user_id order by start_time) as grp
from (select t.*,
lag(end_time) over (partition by user_id order by start_time) as prev_end_time
from t
) t
) t
group by user_id, grp;

Determine MIN Date from Consecutive Occurrences

I have a table that contains the following columns: Date, Customer, Active Flag. I need to add a fourth column called Start. The Start column should return the first date the client was active, based on consecutive active flags.
shows the three columns I currently have and the results I wish to return for the Start column.
Your insight into what my SQL code should look like to achieve this would be appreciated. Thanks!!
You can do this without subqueries, if I assume one date per month per customer:
select t.*,
(case when activeflag = 1
then coalesce(max(case when activeflag = 0 then date end) over (partition by customer order by date) + interval '1 month',
min(case when activeflag = 1 then date end) over (partition by customer)
)
end) as start
from t;
Subqueries, though, might make this easier. You can treat this as a gaps-and-islands problem:
select t.*,
(case when activeflag = 1
then min(date) over (partition by customerid, seqnum - seqnum_a)
end) as start
from (select t.*,
row_number() over (partition by customerid order by date) as seqnum,
row_number() over (partition by customerid, activeflag order by date) as seqnum_a
from t
) t

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1

Select first record with aggregate functions

I have a thermometer that starts logging data every morning whenever my machine turns on.
I would like to select the min, max, and average temperatures, as well as the temperatures when the machine turns on and off for every day.
My table structure is as follows:
Time Logged, Date Logged, Temperature
I group by Date Logged to get the aggregates for the day, but I can't seem to find a good way to select the temperature at the first and last time stamps recorded.
Any help?
You want to use a windows function like this:
select t.DateLogged, min(t.Temperature), max(t.Temperature), avg(t.Temperature),
max(case when t.seqnum_asc = 1 then t.Temperature end) as FirstTemperature,
max(case when t.seqnum_desc = 1 then t.Temperature end) as LastTemperature,
from (select t.*,
row_number() over (partition by dateLogged order by timeLogged) as seqnum_asc,
row_number() over (partition by dateLogged order by timeLogged desc) as seqnum_desc
from t
) t
group by t.DateLogged
order by DateLogged
What this is doing is adding two new variables. One enumerates the values during the day starting at 1 with the first reading (seqnum_asc). The other enumeration has 1for the last reading (seqnum_desc`).
To get the values, a conditional summation is used.
If you like, you can actually do pretty much the same thing using min() and max() as window fucnctions, rather than row_number():
select t.DateLogged, min(t.Temperature), max(t.Temperature), avg(t.Temperature),
max(case when timeLogged = mintime then t.Temperature end) as FirstTemperature,
max(case when timeLogged = maxtime then t.Temperature end) as LastTemperature,
from (select t.*,
 min(timeLogged) over (partition by dateLogged) as minTime,
max(timeLogged) over (partition by dateLogged) as maxTime
from t
) t
group by t.DateLogged
order by DateLogged