how to use current row data in a condition inside a window function on a partition with range between unbounded preceding and current row

how to use current row data in a condition inside a window function on a partition with range between unbounded preceding and current row - hive

I need to use the current row values in a condition used in a window function on a partition with range. See my example query I am looking for.
select
count(case when orderdate >=
(**current_row.orderdate** -30) then 1 end)
over (partition by customerid order by orderdate range between unbounded preceding and current row)
from xyz
Not getting correct syntax.
Please see below example output required. This is just an example, the 30 days and the logic is just a sample.

Related

How fill NULLs with previous value in SQL

I have the following table
There are some NULL values in price column, which I want to replace with the previous date value (date is manual_date). Additionally, price column is calculated on different dates (calculation_table), so the nulls should be filled based on this group filter.
The final output should show values similar to output_price.
I found here a code that does the same thing, however, I could not figure out how to do it with my data (one of the error says I have not ts in (PARTITION BY symbol ORDER BY ts). This is true, but in the website, there is also no ts specified + I tried to replace it ts with manual_date)
I tried following code for my data
select manual_date,TS_FIRST_VALUE(price, 'const') output_price
from MYDATA
TIMESERIES manual_date AS '1 month'
OVER(PARTITION BY calculation_date ORDER BY ts) --tried also ORDER BY manual_date

Vertical supports IGNORE NULLS on last_value(). So you can use:
last_value(price ignore nulls) over (
order by manual_date
rows between unbounded preceding and current row
) as output_price

Is there a way to calculate percentile using percentile_cont() function over a rolling window in Big Query?

I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.

If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.

SUMIF then restart count

How can I do a SUMIF function so that it adds up values when the value in another column is "False", but then when it hits a value that is "True", it restarts the count over again, but includes the value of the first "True" encounter in the SUM calculation? I would also like it so that it adds up the value in chronological order.
I did some research and I think I need to use an over partition and make a row number column to call all row number = "1", but I'm not sure how to do this.
Edit: the Sum should also include the "distance" value for the first "true" value it encounters
Edit 2: Ultimately, I am trying to calculate the average distance each vehicle travels before an Alert is triggered to "True" which means it needs to be taken to the shop to be fixed. Perhaps there is a better way to do this than what I was originally thinking?
Sorry for the poor phrasing...

You want to define groups. It sounds like you want the definition to be the number of "trues" up to and including a given row. Then, you can do a cumulative sum within each group. So:
select t.*,
sum(distance) over (partition by vehicleid, grp
order by date
rows between unbounded preceding and current row
)
from (select t.*,
sum(case when alert = 'True' then 1 else 0 end) over
(partition by vehicleid
order by date
rows between unbounded preceding and current row
) as grp
from t
) t;
Here is a db<>fiddle that illustrates that this code works.

You are right in thinking that you can use SUM analytical function. Something like this will do the cumulative sum for you.
For you to restart the SUM when the alert is True, you include the alert in the partition window and Order by date to achieve the order.
SELECT SUM(CASE WHEN alert = 'FALSE'
THEN distance
ELSE 0
END)
OVER(PARTITION BY alert
ORDER BY date) cumm_sum
, date
, alert
FROM Table

Need to Update based on ID and Date

I have the following SQL statement, which I think should update 1 field, using some pretty simple standard deviation logic, and based on ID and Date. I think the ID and Date has to be included to get everything aligned right. So, here is the code that I'm testing.
UPDATE Price_Test2
SET Vol30Days = STDEV(PX_BID) OVER (ORDER BY ID_CUSIP, AsOfDate ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) FROM Price_Test2
WHERE ID_CUSIP in (SELECT DISTINCT ID_CUSIP FROM Price_Test2)
It seems like it should work fine, but something is off because I'm getting an error that says: Cannot use both a From clause and a subquery in the where clause or in the data values list in an Update statement.
I am using SQL Server 2019.

You are using aggregation functions in an update. What you want is an updatable subquery (or CTE):
UPDATE p
SET Vol30Days = new_Vol30Days,
Vol60Days = new_Vol60Days,
Vol90Days = new_Vol90Days
FROM (SELECT p.*,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) as new_Vol30day,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 60 PRECEDING AND CURRENT ROW) as new_Vol60day,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 90 PRECEDING AND CURRENT ROW) as new_Vol60day
FROM prices p
) p;

SQL Server 2012 Windowing function to calculate a running total

I need some help with windowing functions.
I have been playing around with sql 2012 windowing functions recently. I know that you can calculate the sum within a window and the running total within a window. But i was wondering; is it possible to calculate the previous running total i.e. the running total not including the current row ? I assume you would need to use the ROW or RANGE argument and I know there is a CURRENT ROW option but I would need a CURRENT ROW - I which is invalid syntax. My knowledge of the ROW and RANGE arguments is limited so any help would be gratefully received.
I know that there are many solutions to this problem, but I am looking to understand the ROW, RANGE arguments and I assume the problem can be cracked with these. I have included one possible way to calculate the previous running total but I wonder if there is a better way.
USE AdventureWorks2012
SELECT s.SalesOrderID
, s.SalesOrderDetailID
, s.OrderQty
, SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID) AS RunningTotal
, SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID) - s.OrderQty AS PreviousRunningTotal
-- Sudo code - I know this does not work
--, SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID
-- ORDER BY SalesOrderDetailID
-- ROWS BETWEEN UNBOUNDED PRECEDING
-- AND CURRENT ROW - 1)
-- AS SudoCodePreviousRunningTotal
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.SalesOrderID
, s.SalesOrderDetailID
, s.OrderQty
Thanks in advance

You could subtract the current row's value:
SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID) - s.OrderQty
Or according to the syntax at MSDN and ypercube's answer:
<window frame preceding> ::=
{
UNBOUNDED PRECEDING
| <unsigned_value_specification> PRECEDING
| CURRENT ROW
}
-->
SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas