rank based on counts (hiveql) - hive

I want to rank conversation ids by their count of occurrences, so the first occurance is ranked as 1, second is ranked as 2, third is 3 etc.
I am getting a syntax error so most likely something is off
select
conversationid,
rank() over (partition by conversationid order by count(*) desc) as rnk
from my_table
group by conversationid
Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: line 7:54 Not yet supported place for UDAF 'count'

If you want to rank conversations by their count, then you don't want a partition by clause in the window function:
select conversationid, rank() over(order by count(*) desc) rnk
from mytable
group by conversationid
This assigns rank 1 to the most frequent conversation(s).

Related

BigQuery/SQL: Select first row of each group

I am trying to select the first row of each group. Eg: below table I would want to keep product for www/edf/ and cate for www/abc/.
I have tried multiple ways, eg: ROW_NUMBER() OVER(PARTITION BY [...]) but somehow not getting expected outputs. The challenge for me here is that category does not have numbers as values, otherwise I can filter it down using max or min.
SELECT landing_page, ROW_NUMBER OVER ( PARTITION BY landing_page ORDER BY Page_Type DESC) AS ROW_NUM
from `xxxx.TEST.draft`
However I got this error: OVER keyword must follow a function call
Appreciate any help!
Landing_page
Page_type
www/edf/
product
www/edf/
home
www/abc/
cate
www/abc/
home
I believe you are looking for the function [FIRST_VALUE][1]?
SELECT
landing_page,
FIRST_VALUE(URL)
OVER ( PARTITION BY landing_page ORDER BY Page_Type DESC) AS first_url
FROM `xxxx.TEST.draft`
I answered my own question, sharing the code in case anyone needs:
SELECT session_id,landing_page, Page_Type, ROW_NUMBER () OVER (PARTITION BY session_id,landing_page ORDER BY Page_Type DESC) AS ROW_NUM
from `xxx._TEST.draft`
order by session_id desc

Generate custom group ranking in sql

As posted, I am trying to generate group ranking based on Is_True_Mod column. Here Until next 1 comes, I want 1 group to be there. Please find expected output in SQL. Here in expected output, rows grouped based on Is_True_Mode column. Regular ranking showing for reference ( order by ranking should be their )
You can identify the groups using a cumulative sum. Then you can you row_number() to enumerate the rows:
select t.*,
row_number() over (partition by grp order by regularranking) as expected_output
from (select t.*,
sum(is_true_mode) over (order by regularranking) as grp
from t
) t;

Oracle top-N query to get number of rows assign an increasing number to each row in the STUD table after sorting by dClass DESC

I am trying with Oracle top-N query to get number of rows assign an increasing number to each row in the STUD table after sorting by dClass DESC.
select studName, dClass,row_number() over (order by dClass desc) rn
from STUD where row_number() over (order by dClass desc) <= 3 order by dClass desc;
Below is the ERROR
ERROR at line 6:
ORA-30483: window functions are not allowed here.
Here, What change i have do get the records.
Use a subquery to restrict based on the row number you assign:
SELECT studName, dClass
FROM
(
SELECT studName, dClass, ROW_NUMBER() OVER (ORDER BY dClass desc) rn
FROM STUD
) t
WHERE rn <= 3
ORDER BY dClass DESC;
The reason for the error is that the WHERE clause is applied before the row number in the SELECT clause has even been computed. In other word, it is not yet available, but wrapping with a subquery gets around this problem.
If you are using Oracle 12c, this can be done using FETCH FIRST n ROWS ONLY
SELECT studName, dClass
FROM STUD
ORDER BY dClass DESC
FETCH FIRST 3 ROWS ONLY;

Selecting type(s) of account with 2nd maximum number of accounts

Suppose we have an accounts table along with the already given values
I want to find the type of account with second highest number of accounts. In this case, result should be 'FD'. In case their is a contention for second highest count I need all those types in the result.
I'm not getting any idea of how to do it. I've found numerous posts for finding second highest values, say salary, in a table. But not for second highest COUNT.
This can be done using cte's. Get the counts for each type as the first step. Then use dense_rank (to get multiple rows with same counts in case of ties) to get the rank of rows by type based on counts. Finally, select the second ranked row.
with counts as (
select type, count(*) cnt
from yourtable
group by type)
, ranks as (
select type, dense_rank() over(order by cnt desc) rnk
from counts)
select type
from ranks
where rnk = 2;
One option is to use row_number() (or dense_rank(), depending on what "second" means when there are ties):
select a.*
from (select a.type, count(*) as cnt,
row_number() over (order by count(*) desc) as seqnum
from accounta a
group by a.type
) a
where seqnum = 2;
In Oracle 12c+, you can use offset/fetch:
select a.type, count(*) as cnt
from accounta a
group by a.type
order by count(*) desc
offset 1
fetch first 1 row only

Limit result set in sql window function

Assume I would like to rewrite the following aggregate query
select id, max(hittime)
from status
group by id
using an aggregate windowing function like
select id, max(hittime) over(partition by id order by hittime desc) from status
How can I specify, that I am only interested in the first result within the partition?
EDIT: I was thinking that there might be a solution with [ RANGE | ROWS ] BETWEEN frame_start AND frame_end. What to get not only max(hittime) but also the second, third ...
I think what you need is a ranking function, either ROW_NUMBER or DENSE_RANK depending on how you want to handle ties.
select id, hittime
from (
select id, hittime,
dense_rank() over(partition by id order by hittime desc) as ranking
from status
) as x
where ranking = 1; --to get max hittime
--where ranking <=2; --max and second largest
Use distinct statement.
select DISTINCT id, max(hittime) over(partition by id order by hittime desc) from status