SUMIF then restart count - sql

How can I do a SUMIF function so that it adds up values when the value in another column is "False", but then when it hits a value that is "True", it restarts the count over again, but includes the value of the first "True" encounter in the SUM calculation? I would also like it so that it adds up the value in chronological order.
I did some research and I think I need to use an over partition and make a row number column to call all row number = "1", but I'm not sure how to do this.
Edit: the Sum should also include the "distance" value for the first "true" value it encounters
Edit 2: Ultimately, I am trying to calculate the average distance each vehicle travels before an Alert is triggered to "True" which means it needs to be taken to the shop to be fixed. Perhaps there is a better way to do this than what I was originally thinking?
Sorry for the poor phrasing...

You want to define groups. It sounds like you want the definition to be the number of "trues" up to and including a given row. Then, you can do a cumulative sum within each group. So:
select t.*,
sum(distance) over (partition by vehicleid, grp
order by date
rows between unbounded preceding and current row
)
from (select t.*,
sum(case when alert = 'True' then 1 else 0 end) over
(partition by vehicleid
order by date
rows between unbounded preceding and current row
) as grp
from t
) t;
Here is a db<>fiddle that illustrates that this code works.

You are right in thinking that you can use SUM analytical function. Something like this will do the cumulative sum for you.
For you to restart the SUM when the alert is True, you include the alert in the partition window and Order by date to achieve the order.
SELECT SUM(CASE WHEN alert = 'FALSE'
THEN distance
ELSE 0
END)
OVER(PARTITION BY alert
ORDER BY date) cumm_sum
, date
, alert
FROM Table

Related

Return Only Most Recent Instance of Item From Query (Where Multiple Instances Exist)

I have written the following subquery, which is returning instances of item counts from my application's log table.
The idea is that from this subquery I will be pulling information on item counts from a specific date, to be compared to the same information from a different date - info such as, for a given location on the system, what the latest quantity of all items counted within it was.
select
LOCATION,
ITEM,
SUM(CASE
WHEN ACTION = 'COUNT-OK'
THEN QUANTITY
ELSE QUANTITY * CHANGE --If ACTION <> 'OK', then we need to adjust the quantity
END) AS QuantityCalc,
DATE_TIME,
from LOG_TABLE
where ACTION IN ('COUNT-ADJ','COUNT-OK')
AND (CAST(DATE_TIME AS DATE) = #CountDate) --Declared elsewhere
group by LOCATION, ITEM, DATE_TIME
order by DATE_TIME desc
My issue is with the rows returned. Because these are application logs, there is a row for each count being done on the system, so only the most recent 'QuantityCalc' for a given item in a location would be accurate.
I need a way to return only the most recent instance of a count happening (where the LOCATION and ITEM values are the same). I am using a SUM in the main query which is pulling the QuantityCalc value from this subquery to find the total Quantity by Item and Location per specific count (to compare them side by side). This is currently being thrown off by instances such as the below.
I've attached an example image of what this query returns. My issue is with Item2 in Location B and Item3 in location C, and I'd be looking for the query to ONLY return rows 2, 3, 5 and 8 (including header).
Thank you
You can pre-filter the logs for the latest row per location/item tuple, then aggregate. We would typically use row_number() to enumerate the rows in a subquery:
select
LOCATION,
ITEM,
sum(case when ACTION = 'COUNT-OK' then QUANTITY else QUANTITY * CHANGE end) AS QuantityCalc,
DATE_TIME,
from (
select l.*,
row_number() over(partition by LOCATION, ITEM order by date_time desc) AS RN
from LOG_TABLE
where ACTION IN ('COUNT-ADJ','COUNT-OK') and CAST(DATE_TIME AS DATE) = #CountDate
) l
where RN = 1
group by LOCATION, ITEM, DATE_TIME
order by DATE_TIME desc
Side note: the filtering on date_time can probably be optimized; rather than casting your column to date, we can check it directly against a range defined from the date parameter. The syntax of date arithmetic widely varies across databases (and you did not well which one you are using), but in standard SQL that would be:
DATE_TIME >= #CountDate and DATE_TIME < #CountDate + interval '1' day

Is there a way to calculate percentile using percentile_cont() function over a rolling window in Big Query?

I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.
If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.

Apply condition between rows of the same column

I have a table where the first column is time, which increases by 1 second increments.
The second column brings the code that started the day in its first line. This could be any between 1-5. The 0 (zero) value indicates the code didn't change. When the code changes, it is indicated the time it changed and which number it changed to (thus, Event), but for as long as it stays the same it will be shown 0 (zero) again.
My intent is to make a new column specifying the present code at any time. So far, I have been doing this in Excel, with the following conditions and results:
Is there a way for this Excel condition to be applied in a query to create this new column?
I've been testing CASE WHEN statements, and I tried to implement Lag or Lead functions in it. But so far none of them worked to apply the same value of the previous row when the event is 0 (zero).
If event is always increasing, you can use a window max():
select time, event, max(event) over(order by time) code
from mytable
Else, it is a bit more complicated. One option is to build the groups with a window sum:
select time, event, max(event) over(partition by grp order by time) code
from (
select
t.*,
sum(case when event > 0 then 1 else 0 end) over(order by time) grp
from mytable t
) t

aggregate multiple rows based on time ranges

i do have a customerand he use over a specific period of time different devices, tracked with a valid_from and valid_to date. but, every time something changes for this device there will be a new row written without any visible changes for the row based data, besides a new valid from/to.
what i'm trying to do is to aggregate the first two rows into one, same for row 3 and 4, while leaving 5 and 6 as they are. all my solutions i came up so far with are working for a usage history for the user not switching back to device a. everything keeps failing.
i'd really appreciate some help, thanks in advance!
If you know that the previous valid_to is the same as the current valid_from, then you can use lag() to identify where a new grouping starts. Then use a cumulative sum to calculate the grouping and finally aggregation:
select cust, act_dev, min(valid_from), max(valid_to)
from (select t.*,
sum(case when prev_valid_to = valid_from then 0 else 1 end) over (partition by cust order by valid_from) as grouping
from (select t.*,
lag(valid_to) over (partition by cust, act_dev order by valid_from) as prev_valid_to
from t
) t
) t
group by cust, act_dev, grouping;
Here is a db<>fiddle.

sql row_number stay the same unless a criteria is met

First time post here, and I've done a bunch of searches to find this but don't know the terminology to search for to begin with. I have a table in SQL Server 2012 containing timesheet data with these columns: Name, ID, ENTEREDONDTM, EVENTDATE, STARTDTM, ENDDTM, STARTREASON, ENDREASON
I'm trying to do a row_number where the value in row_number stays the same unless StartReason = 'newShift' in which case I would like for it to increase by 1.
My end goal is to find a total shift length per shift and I know how to do those calculations based on startdtm and enddtm, but there is no current column with a shiftID for me to group by.
You can use Rank () windowed function, partitioned by StartReason and add +1 (to reserve the first).
Before use this value, you can use a case to compare the value.
Exemple: case StartReason when 'newShift' then 1 else Rank () over (Partition by StartReason ) +1