Postgres Window Function Syntax - sql

Why does the following query:
select ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY time DESC) as rownum FROM users where rownum < 20;
produce the following error?
ERROR: column "rownum" does not exist
LINE 1: ...d ORDER BY time DESC) as rownum FROM users where rownum < 2...
How can I structure this query so that I get the first 20 items, as defined by my window function?
user_id and time are both defined columns on users.

It would work like this:
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY time DESC) AS rownum
FROM users
) x
WHERE rownum < 20;
The point here is the sequence of events. Window functions are applied after the WHERE clause. Therefore rownum is not visible, yet. You have to put it into a subquery or CTE and apply the WHERE clause on rownum in the next query level.
Per documentation:
Window functions are permitted only in the SELECT list and the ORDER BY
clause of the query. They are forbidden elsewhere, such as in GROUP BY,
HAVING and WHERE clauses. This is because they logically execute
after the processing of those clauses. Also, window functions execute
after regular aggregate functions. This means it is valid to include
an aggregate function call in the arguments of a window function, but
not vice versa.

Because the where clause executes before the select so it does not know about that alias yet. Do it like this:
select *
from (
select ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY time DESC) as rownum
FROM users
) s
where rownum < 20;

Related

Rank Function inside case statement

I am trying to use Rank Function inside a case statement and give where rank_number = 1 , it's throwing error as unexpected where Condition. Can some one help me how to assign rank in where clause inside case statement
You can't use the RANK() analytic function (or any other one, for that matter) in the WHERE clause of a query. The results of the rank computation are not yet available. But they are available in the SELECT clause or the ORDER BY clause. One workaround would be to subquery:
SELECT *
FROM
(
SELECT t.*, RANK() OVER (ORDER BY blah) rnk
FROM yourTable t
) s
WHERE rnk = 1;
Some databases support a QUALIFY clause, where it is possible to use analytic functions. Assuming you are using something like Teradata or BigQuery, you could use:
SELECT *
FROM yourTable
WHERE 1 = 1
QUALIFY RANK() OVER (ORDER BY blah) = 1;

Select one random row by group (Oracle 10g)

This post is similar to this thread in that I have multiple observations per group. However, I want to randomly select only one of them. I am also working on Oracle 10g.
There are multiple rows per person_id in table df. I want to order each group of person_ids by dbms_random.value() and select the first observation from each group. To do so, I tried:
select
person_id, purchase_date
from
df
where
row_number() over (partition by person_id order by dbms_random.value()) = 1
The query returns:
ORA-30483: window functions are not allowed here
30483. 00000 - "window functions are not allowed here"
*Cause: Window functions are allowed only in the SELECT list of a query. And, window function cannot be an argument to another window or group function.
Use a subquery:
select person_id, purchase_date
from (select df.*,
row_number() over (partition by person_id order by dbms_random.value()) as seqnum
from df
) df
where seqnum = 1;
One option would be using WITH..AS Clause :
WITH t AS
(
SELECT df.*,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY dbms_random.value()) AS rn
FROM df
)
SELECT person_id, purchase_date
FROM t
WHERE rn = 1
Aggregate queries (using GROUP BY and aggregate functions) are much faster than equivalent analytic functions that do the same job. So, if you have a lot of data to process, or if the data is not excessively large but you must run this query often, you may want a more efficient query that uses aggregation instead of analytic functions.
Here is one possible approach:
select person_id,
max(purchase_date) keep (dense_rank first order by dbms_random.value())
as random_purchase_date
from df
group by person_id
;

can we get totalcount and last record from postgresql

i am having table having 23 records , I am trying to get total count of record and last record also in single query. something like that
select count(*) ,(m order by createdDate) from music m ;
is there any way to pull this out only last record as well as total count in PostgreSQL.
This can be done using window functions
select *
from (
select m.*,
row_number() over (order by createddate desc) as rn,
count(*) over () as total_count
from music
) t
where rn = 1;
Another option would be to use a scalar sub-query and combine it with a limit clause:
select *,
(select count(*) from order_test.orders) as total_count
from music
order by createddate desc
limit 1;
Depending on the indexes, your memory configuration and the table definition might be faster then the two window functions.
No, it's not not possible to do what is being asked, sql does not function that way, the second you ask for a count () sql changes the level of your data to an aggregation. The only way to do what you are asking is to do a count() and order by in a separate query.
Another solution using windowing functions and no subquery:
SELECT DISTINCT count(*) OVER w, last_value(m) OVER w
FROM music m
WINDOW w AS (ORDER BY date DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
The point here is that last_value applies on partitions defined by windows and not on groups defined by GROUP BY.
I did not perform any test but I suspect my solution to be the less effective amongst the three already posted. But it is also the closest to your example query so far.

Ambiguous column name using row_number() without alias

I'm trying to implement pagination in a query that is built using information from a view, and I need to use the row_number() function over a column when I don't know which table it is from.
SELECT * FROM (
SELECT class.ID as ID, user.ID as USERID, row_number() over (ORDER BY
ID desc) as row_number FROM class, user
) out_q WHERE row_number > #startrow ORDER BY row_number
The problem is that I only have the result column name (ID or USERID) that came from a previous query. If I execute this query, it will raise the error 'Ambiguous column name "ID"'. Is there a way to specify that I'm referencing the column ID that is being selected and not from a different table?
Is it possible to specify an alias to the query result itself?
I have already tried the following,
SELECT TOP 30 * FROM (
SELECT *, row_number() over (ORDER BY ID desc) as row_number FROM(
SELECT class.ID as ID, user.ID as USERID FROM class, user
) in_q
) out_q WHERE row_number > #startrow ORDER BY row_number
It works, but the SGBD gets confused on which query plan it has to use, because of the small row goal present in the outer query and the big set of results returned by the inner query, when #startrow is a small number, the query executes in less than one second, when it is a big number the query takes minutes to execute.
Your problem is the id in the row_number itself. If you want a stable sort, then include both ids:
SELECT *
FROM (SELECT class.ID as ID, user.ID as USERID,
row_number() over (ORDER BY class.ID desc, user.id) as row_number
FROM class CROSS JOIN user
) out_q
WHERE row_number > #startrow
ORDER BY row_number;
I assume the cartesian product is intentional. Sometimes, this indicates an error in the query. In general, I would advise you to avoid using commas in the from clause. If you do want a cartesian product, then be explicit by using CROSS JOIN.
You could try using the option you already tried, then use the OPTIMIZE FOR hint.
OPTION ( OPTIMIZE FOR (#startrow = 100000) );
See a description of the hint in MSDN docs here: https://msdn.microsoft.com/en-us/library/ms181714.aspx.

SQL - Order after filtering

How can I order the data and then filter it in TSQL (SQL Server)?
I've tried something like this:
SELECT [Job].*,
ROW_NUMBER() OVER (ORDER BY [Job].[Date]) AS RowNum
FROM [Job]
ORDER BY Rank
WHERE RowNum >= #Start AND RowNum < #End
Doesn't work. I also tried to use a subquery, which throws:
The ORDER BY clause is invalid in
views, inline functions, derived
tables, subqueries, and common table
expressions, unless TOP or FOR XML is
also specified.
I don't want to use TOP or FOR XML.
How to solve this?
Use a CTE. Note, the "inner" ORDER BY in this case is implied by the ROW_NUMBER/OVER.
;WITH cBase AS
(
SELECT
[Job].*,
ROW_NUMBER() OVER (ORDER BY [Job].[Date]) AS RowNum
FROM
[Job]
)
SELECT
*
FROM
cBase
WHERE
RowNum >= #Start AND RowNum < #End
--ORDER BY
--output order
Edit:
Your search between #Start and #End is on the sequence generated by the ROW_NUMBER on date.
Rank has no relation to this sequence. Rank (assuming it's a column in the table) will be ignored because your sequence is on Date. You don't need to sort on it.
If "rank" is actually "RowNum" then you still don't need an "inner" sort because it's a set operation. You'll need it on the outermost sort though.
If rank is a secondary sort on Date then use this:
ROW_NUMBER() OVER (ORDER BY [Job].[Date], [Job].[Rank]) AS RowNum