Manipulating hive variable - hive

I would like to subtract a number from the hive variable passed. For example:
SET hiveconf:window_size = 12
SELECT id , max(marks) OVER(ORDER BY Date_time ROWS BETWEEN ${hiveconf:window_size}-1 PRECEDING AND CURRENT ROW) from Students;
But ${hiveconf:window_size}-1 in window function is giving error.
Can anyone provide any suggestions on this.

It does not like inline calculation of ROWS BETWEEN boundary. Subtract 1 before executing query.
This will work:
SET hiveconf:window_size=11;
SELECT id , max(marks) OVER(ORDER BY Date_time ROWS BETWEEN ${hiveconf:window_size} PRECEDING AND CURRENT ROW) from Students
;
Alternatively you can calculate it in the shell and pass to the Hive script as a variable. See here how to pass a variable from the shell: https://stackoverflow.com/a/37821218/2700344

Related

how to use current row data in a condition inside a window function on a partition with range between unbounded preceding and current row

I need to use the current row values in a condition used in a window function on a partition with range. See my example query I am looking for.
select
count(case when orderdate >=
(**current_row.orderdate** -30) then 1 end)
over (partition by customerid order by orderdate range between unbounded preceding and current row)
from xyz
Not getting correct syntax.
Please see below example output required. This is just an example, the 30 days and the logic is just a sample.

How fill NULLs with previous value in SQL

I have the following table
There are some NULL values in price column, which I want to replace with the previous date value (date is manual_date). Additionally, price column is calculated on different dates (calculation_table), so the nulls should be filled based on this group filter.
The final output should show values similar to output_price.
I found here a code that does the same thing, however, I could not figure out how to do it with my data (one of the error says I have not ts in (PARTITION BY symbol ORDER BY ts). This is true, but in the website, there is also no ts specified + I tried to replace it ts with manual_date)
I tried following code for my data
select manual_date,TS_FIRST_VALUE(price, 'const') output_price
from MYDATA
TIMESERIES manual_date AS '1 month'
OVER(PARTITION BY calculation_date ORDER BY ts) --tried also ORDER BY manual_date
Vertical supports IGNORE NULLS on last_value(). So you can use:
last_value(price ignore nulls) over (
order by manual_date
rows between unbounded preceding and current row
) as output_price

Is there a way to calculate percentile using percentile_cont() function over a rolling window in Big Query?

I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.
If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.

Need to Update based on ID and Date

I have the following SQL statement, which I think should update 1 field, using some pretty simple standard deviation logic, and based on ID and Date. I think the ID and Date has to be included to get everything aligned right. So, here is the code that I'm testing.
UPDATE Price_Test2
SET Vol30Days = STDEV(PX_BID) OVER (ORDER BY ID_CUSIP, AsOfDate ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) FROM Price_Test2
WHERE ID_CUSIP in (SELECT DISTINCT ID_CUSIP FROM Price_Test2)
It seems like it should work fine, but something is off because I'm getting an error that says: Cannot use both a From clause and a subquery in the where clause or in the data values list in an Update statement.
I am using SQL Server 2019.
You are using aggregation functions in an update. What you want is an updatable subquery (or CTE):
UPDATE p
SET Vol30Days = new_Vol30Days,
Vol60Days = new_Vol60Days,
Vol90Days = new_Vol90Days
FROM (SELECT p.*,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) as new_Vol30day,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 60 PRECEDING AND CURRENT ROW) as new_Vol60day,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 90 PRECEDING AND CURRENT ROW) as new_Vol60day
FROM prices p
) p;

how to use median as a analytic function (oracle SQL)

Can you explain why the following works:
select recdate,avg(logtime)
over
(ORDER BY recdate rows between 10 preceding and 0 following) as logtime
from v_download_times;
and the following doesn’t
select recdate,median(logtime)
over
(ORDER BY recdate rows between 10 preceding and 0 following) as logtime
from v_download_times;
(median instead of avg)
I get an ORA-30487 error.
and I would be grateful for a workaround.
The error message is ORA-30487: ORDER BY not allowed here. And sure enough, if we consult the documentation for the MEDIAN function it says:
"You can use MEDIAN as an analytic function. You can specify only the
query_partition_clause in its OVER clause."
But it is not redundant if you only want to take it from a certain number of rows preceding the current one.
A way around may be limiting your data set just for the median purpose, like
select
median(field) over (partition by field2)
from ( select * from dataset
where period_back between 0 and 2 )
MEDIAN doesn't allow an ORDER BY clause. As APC points out in his answer, the documentation tells us we can only specify the query_partition_clause.
ORDER BY is redundant as we're looking for the central value -- it's the same regardless of order.