How to group data not matching a condition in Postgres - sql

I am trying to group data using Postgres.
Input Data:
My expectation result IF the output is 0, it will grouping with others, and IF output more than 0, it will grouping with other results, so we will know, time period for 0 output.

This is a gaps and islands problem. One approach uses the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Seq ORDER BY Datetime) rn1,
ROW_NUMBER() OVER (PARTITION BY Seq, Output > 0
ORDER BY Datetime) rn2
FROM yourTable
)
SELECT
Seq,
MIN(Datetime) AS MinDatetime,
MAX(Datetime) AS MaxDatetime,
SUM(Output) AS sum_output
FROM cte
GROUP BY
Seq,
Output > 0,
rn1 - rn2
ORDER BY
Seq,
MIN(Datetime);

Related

How to get increment number when there are any change in a column in Bigquery?

I have data date, id, and flag on this table. How I can get the value column where this column is incremental number and reset from 1 when there are any change in flag column?
Consider below approach
select * except(changed, grp),
row_number() over(partition by id, grp order by date) value
from (
select *, countif(changed) over(partition by id order by date) grp
from (
select *,
ifnull(flag != lag(flag) over(partition by id order by date), true) changed
from `project.dataset.table`
))
if applied to sample data in your question - output is
You seem to want to count the number of falses since the last true. You can use:
select t.* except (grp),
(case when flag
then 1
else row_number() over (partition by id, grp order by date) - 1
end)
from (select t.*,
countif(flag) over (partition by id order by date) as grp
from t
) t;
If you know that the dates have no gaps, you can actually do this without a subquery:
select t.*,
(case when flag then 1
else date_diff(date,
max(case when flag then date end) over (partition by id),
day)
end)
from t;

Difference between last and second last event in a table of events

I have the following table
which created by
create table events (
event_type integer not null,
value integer not null,
time timestamp not null,
unique (event_type, time)
);
given the data in the pic, I want to write a query that for each event_type that has been
registered more than once returns the difference between the latest and
the second latest value.
Given the above data, the output should be like
event_type value
2 -5
3 4
I solved it using the following :
CREATE VIEW [max_date] AS
SELECT event_type, max(time) as time, value
FROM events
group by event_type
having count(event_type) >1
order by time desc;
select event_type, value
from
(
select event_type, value, max(time)
from(
Select E1.event_type, ([max_date].value - E1.value) as value, E1.time
From events E1, [max_date]
Where [max_date].event_type = E1.event_type
and [max_date].time > E1.time
)
group by event_type
)
but this seems like a very complicated query and I wonder if there is an easier way?
Use window functions:
select e.*,
(value - prev_value)
from (select e.*,
lag(value) over (partition by event_type order by time) as prev_value,
row_number() over (partition by event_type order by time desc) as seqnum
from events e
) e
where seqnum = 1 and prev_value is not null;
You could use lag() and row_number()
select event_type, val
from (
select
event_type,
value - lag(value) over(partition by event_type order by time desc) val,
row_number() over(partition by event_type order by time desc) rn
from events
) t
where rn = 1 and val is not null
The inner query ranks records having the same event_type by descending time, and computes the difference between each value and the previous one.
Then, the outer query just filters on the top record per group.
Here is a way to do this using a combination of analytic functions and aggregation. This approach is friendly in the event that your database does not support LEAD and LAG.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY event_type ORDER BY time DESC)
FROM events
)
SELECT
event_type,
MAX(CASE WHEN rn = 1 THEN value END) - MAX(CASE WHEN rn = 2 THEN value END) AS value
FROM cte
GROUP BY
event_type
HAVING
COUNT(*) > 1;

How to select last record from table consider to Year and WorkingPeriod(Month)

I have a table like this:
I want last [Status] for each [Guid], consider to latest [Year] and [WorkingPeriodTitle].
By the way I know that [WorkingPeriodTitle] should be replace by [WorkingPeriodId].
With ROW_NUMBER() window function:
select
t.[PaymentAllocationGuid], t.[Status]
from (
select *,
row_number() over (partition by [PaymentAllocationGuid] order by [Year] desc, [WorkingPeriodTitle] desc) rn
from tablename
) t
where t.rn = 1
SELECT *,
LAST_VALUE(Status) OVER (PARTITION BY PaymentAllocationGuid ORDER BY Year,
WorkingPeriodTitle RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS LastStatus
FROM tablexyz

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1

RowNumber() and SUM() in one query

is there some way who to get last record using rownumber() and SUM of one field (money in this case)?
I've tried to come up with a query like:
SELECT
[date]
,...
FROM
(
SELECT
CAST(t.timestamp AS DATE) AS [date]
,.../some fields/
,row_number() over (partition by ca.logical_number order by t.timestamp DESC) as rownumber --last update(record) transaction
--,amount_transferred =
--(
-- SELECT
-- ,SUM(t.money_value) AS amount_transferred
-- FROM
-- TO_Transaction t
-- GROUP BY
-- CAST(t.timestamp AS Date)
--)
) AS t
WHERE rownumber=1
What the query is supposed to do is to find current purse balance and all money transferred during a day.
Any help would be aprreciated.
Thanks.
you can also do sum(field) over (...)
select
row_number() over (partition by ca.logical_number order by t.timestamp DESC) as rownumber,
sum(amount_transfered) over (partition by ca.logical_number ) as total_amount_transfered
from ...