RANGE BETWEEN function with Timestamp values in SQL Impala

RANGE BETWEEN function with Timestamp values in SQL Impala - sql

I am trying to calculate a moving number of events (impressions) that happen per minute. How can I use the range between function with timestamp values to define the 1-minute interval?
I have something like this:
count(impression) over (partition by user
ORDER BY trunc(cast(entrytime as TIMESTAMP), "MI")
RANGE BETWEEN interval 1 minutes Preceding
and interval 1 minutes Following) as densityperminute
but this doesn't seem to work. Any ideas on how to fix this?

I believe that's not supported, unfortunately. From the documentation for 6.1:
Currently, Impala supports only some combinations of arguments to the
RANGE clause:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (the default when
ORDER BY is specified and the window clause is omitted)
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Source
(Forgive me answering an old question, but I'm currently looking into this for a school project and this came up in my search!)

Related

Window function - N preceding and unbounded question

Say I create a window function and specify:
ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
How does the window function treat the first 9 rows? Does it only calculate up to however many rows above it are available?

I couldn't find this documented in SQL Server's documentation but I could find it in Postgres, and I believe it is standardised1:
In any case, the distance to the end of the frame is limited by the distance to the end of the partition, so that for rows near the partition ends the frame might contain fewer rows than elsewhere.
(My emphasis)
1Have also search MySQL documentation to no avail; This Q is just tagged sql so should be based on the standard but I can't find any downloadable drafts of those at the moment either.

It does the computation ,considering the 10 rows prior to the current row and the current row ,for the given partition window .For example if you want to sum up a number based on the last 3 years and current year ,you can do sum(amount) over (order by year asc) rows between 3 PRECEDING and CURRENT ROW.
To answer your question "Does it only calculate up to however many rows above it are available?" - Yes it considers only those rows which are available

SQL average of previous range of columns into current column

I am trying to get the following calculations but at row level, in the image below I calculated the avg of values for each day (it can have n number of rows) then I used the LAG function to insert the avg of the previous row into the next row LAG_VAL column.
Now I am doing the calculations at row level, I have been able to get the average for that range of data using windowed functions (analytics)
ROUND(AVG(SUMCOUNTSFT3) OVER (partition by to_date(to_char(DATETIMEOFREADING, 'DD/MM/RRRR'))),2) as AVG_SUMCOUNTSFT3
but I have been not able to calculate the avg value of the previous day an insert that into the range of the next day as illustrated in the previous image.
Not sure if there is a way to implement this with the RANGE function of if I need to use PLSQL.

Off the top of my head (without a matching schema to test with) this windowing clause should work:
(partition by to_date(to_char(DATETIMEOFREADING, 'DD/MM/RRRR') order by dates range between interval '1' day preceding and interval '1' day preceding)
This is plain SQL, so it works inside as well as outside of PL/SQL.

where days between 28 PRECEDING and 1 PRECEDING -> what data range is taking

I am creating a code to check the last 28 days of a table.
I came across with this code
WINDOW w AS (PARTITION BY abs_wd.abs_wd_employee_id ORDER BY abs_wd.date_diff RANGE BETWEEN 28 PRECEDING AND 1 PRECEDING)
My question is: Is this taking from yesterdays day to 28s back from yesterday? In another way, if I use only "PRECEDING", is this already counting from Yesterday or I have to add the "1" for that"
Thanks! :)

In hive window clause, the word PRECEDING means the number of rows previous to the current row. Similarly the word FOLLOWING would me the rows following after the current row.
For e.g. in your query BETWEEN 28 PRECEDING would basically mean 28 rows behind the current row.
So answer your question, by default the window works on the current row but if you want to start it from the previous window you would need to have a range which would start from 1 row preceding (as you have correctly done in the question), like below:
WINDOW w AS (PARTITION BY abs_wd.abs_wd_employee_id ORDER BY abs_wd.date_diff ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING)
Example:
Based on the sample cloudera customers dataset, I ran a sum aggregate on the orders table with a window of 2 PRECEDING and 1 PRECEDING. If you look into the row 4, the window function returns a sum of total orders in a window from yesterday's date and day before yesterday's date.
Query used in the example:
sum(total_orders) over(order by order_date rows between 2 preceding and 1 preceding) as window_result
More on window function in hive doc.

Redshift SQL - Running Sum using Unbounded Proceding and Following

When we use the window function to calculate the running sum like SUM(sales) over (partition by dept order by date), if we don't specify the range/window, is the default setting as between unbounded proceding and current row, basically from the first row until the current row?
According to this doc it seems to be the case, but I wanted to double check.
Thanks!

The problem you are running into is 'what does the database engine assume in ambiguous circumstances?' I've run into this exact case before when porting from SQLserver to Redshift - SQL server assumes that is you order but don't specify a frame that you want unbounded preceding to current row. Other DBs do not make the same assumption - if you don't specify a frame it will be unbounded preceding to unbounded following and yet other will throw an error if you specify and "order by" but don't specify a frame. Bottom line - don't let the DB engine guess what you want, be specific.
Gordon is correct that this is based on rows, not ranges. If you want a running sum by date (not row), you can group by date and run the window function - windows execute after group by in a single query.

LAST_VALUE() rows between unbounded preceding and unbounded following

What is the use of this statement?
Please elaborate with an example. I came across it while using the LAST_VALUE function.

from https://forums.oracle.com/forums/thread.jspa?threadID=1018352
when you ORDER a set of records in analytic functions you can specify a range of rows to consider,ignoring the others.
You can do this using the ROWS clause
UNBOUNDED PRECEDING
The range starts at the first row of the partition.
UNBOUNDED FOLLOWING
The range ends at the last row of the partition.
CURRENT ROW
range begins at the current row or ends at the current row
n PRECEDING or n FOLLOWING
The range starts or ends n rows before or after the current row

This is explained quite well in the manual:
http://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

RANGE BETWEEN function with Timestamp values in SQL Impala - sql

Related

Window function - N preceding and unbounded question

SQL average of previous range of columns into current column

where days between 28 PRECEDING and 1 PRECEDING -> what data range is taking

Redshift SQL - Running Sum using Unbounded Proceding and Following

LAST_VALUE() rows between unbounded preceding and unbounded following

Categories

Resources