ROW_NUMBER() with Qualify clause in Vertica - sql

select a.pm_id, a.pm_name
from loc_table a
qualify row_number() over(partition by pm_id order by pm_name asc) =1;
Can we write it this way in Vertica, I tried it but the qualify keyword is not taken by Vertica and the from Clause has to be at the end.
Can anybody explain what the above query does and how can we achieve the same in Vertica.

Vertica does not have the QUALIFY clause.
What it does have, is the analytic limit clause
Re-write your query like below, and run an easy global search-replace if you need that often:
SELECT
a.pm_id
, a.pm_name
FROM loc_table a
LIMIT 1 OVER(PARTITION BY pm_id ORDER BY pm_name ASC);

I think you need a subquery in Vertica:
select pm_id, pm_name
from (select l.pm_id, l.pm_name,
row_number() over (partition by pm_id order by pm_name asc) as seqnum
from loc_table l
) l
where seqnum = 1;
This is pretty much exactly what qualify does. Just like having filters on aggregation columns, qualify filters on window functions.

Related

Get last two rows from a row_number() window function in snowflake

Hopefully, someone can help me...
I'm trying to get the last two values from a row_number() window function. Let's say my results contain row numbers up to 6, for example. How would it be possible to get the rows where the row number is 5 and 6?
Let me know if it can be done with another window function or in another way.
Kind regards,
Using QUALIFY:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(ORDER BY ... DESC) <= 2;
This approach could be further extended to get two rows per each partition:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ... DESC) <= 2;
You can use top with order by desc like:
select top 2 row_number() over([partition by] [order by]) as rn
from table
order by rn desc
I'd say #Shmiel is the formal and elegant way, just in case, would be the same as :
WITH CTE AS
(SELECT product,
user_id,
ROW_NUMBER() OVER (PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id
FROM CTE
WHERE RN < 3;
You will use order by [order_condition] with "desc". And then you will use RN(row number) to select as many rows as you want

MAX TO_DATE OVER two IDs of an IF statement

I’m using the this query to connect with amazon redshift.
And I have the following query:
Select b.*, c."releasedate",
DENSE_RANK() OVER(PARTITION BY b.originboardid ORDER BY TO_DATE(SUBSTRING(b.sprintenddate,0,9), 'DD/Mon/YY') DESC) AS "rank_sprint",
DENSE_RANK() OVER(PARTITION BY b.originboardid ORDER BY TO_DATE(c.releasedate, 'YYYY-MM-DD') DESC) AS "rank_release",
RANK() OVER (ORDER BY b.issueid, b.sprintid DESC) as "rank_issue",
MAX(IF (b.issueorigin='completed') AND (b.changeto='In Progress') and (b.changefield='status')
max(TO_DATE(SUBSTRING(b.changecreation,0,10),'YYYY-MM-DD')) OVER(b.issueid,b.sprintid)
) OVER (b.issueid,b.sprintid) as "lastinprogress"
from digitalplatforms.issues_braze b
Left join jira.releases c
On b.version_id=c.versionid
and its outputing the following error:
[Amazon](500310) Invalid operation: syntax error at or near "max"
Position: 459;
Also if I query just:
Select b.*, c.“releasedate”,
DENSE_RANK() OVER(PARTITION BY b.originboardid ORDER BY TO_DATE(SUBSTRING(b.sprintenddate,0,9), ‘DD/Mon/YY’) DESC) AS “rank_sprint”,
DENSE_RANK() OVER(PARTITION BY b.originboardid ORDER BY TO_DATE(c.releasedate, ‘YYYY-MM-DD’) DESC) AS “rank_release”,
RANK() OVER (ORDER BY b.issueid, b.sprintid DESC) as “rank_issue”
from digitalplatforms.issues_braze b
Left join jira.releases c
On b.version_id=c.versionid
it works.
Can someone help?
Thank you
There is no "IF" statement in SQL. SQL is not procedural. You need to rewrite you query using "CASE" or "DECODE" statements.
Also you cannot nest window functions. If you logic requires this then these need to operate at different levels of the query (SELECT level). However, are you sure both of these need to window functions - MAX() OVER vs. MAX()? They are using the same OVER clause so I expect not.
Just guessing based on you query but does this give you what you want?
MAX(DECODE((b.issueorigin='completed') AND (b.changeto='In Progress') and (b.changefield='status')), true,
TO_DATE(SUBSTRING(b.changecreation,0,10),'YYYY-MM-DD')
) OVER (b.issueid,b.sprintid) as "lastinprogress"

Windowed functions cannot be used in the context of another windowed function or aggregate.SQL Server error

I am trying to get the row number for the rank. Below is the query,
SELECT *
FROM (
SELECT DISTINCT TOP 100 PERCENT rank() OVER (
PARTITION BY o.panel_id
,o.combo_type_code ORDER BY row_number() OVER (
ORDER BY o.panel_id
)
) AS rank
,panel_code
FROM tbk_offer_head o
,tbk_combo_type ct
,tbk_panel p
WHERE o.panel_id = p.panel_id
AND o.combo_type_code = ct.combo_type_code
AND o.panel_id IN (
SELECT p.panel_id
FROM tbk_panel p
WHERE p.campaign_id = 7392
)
) A
WHERE A.rank = 1
ORDER BY panel_code
Getting the error, Windowed functions cannot be used in the context of another one. Please help how can i solve this problem.
I have no idea what you are really trying to do. But you should definitely learn to use proper, explicit JOIN syntax.
But there is no need to nest the functions. Your logic should be equivalent to:
row_number() over (partition by o.panel_id, o.combo_type_code
order by o.panel_id
) as rank
Why does this use row_number() instead of rank()? Your original order by used row_number() which never has duplicates. Hence, if rank() could use it, the values would all be distinct, and the rank() would be equivalent to row_number() -- even when panel_id is duplicated.

Order groups by partition size in sql?

I'm trying to select the group_items of the top N largest groups with the same grouping_attribute from a table, and doing something like this:
SELECT grouping_attribute, group_item,
ROW_NUMBER() OVER (PARTITION BY grouping_attribute ORDER BY ???) AS rn
FROM a_table
WHERE rn < N;
But I don't know what to put in the ORDER BY clause to make it happen. I'm trying to order the rows by the size of their corresponding partitions. COUNT(*) doesn't run. I was hoping there was some way to refer to the size of the partition, but I can't find anything.
If I understand correctly, you want count(*) not row_number(). Use count(*) to get the size of the partitions and then order the resulting rows afterwards. For instance:
SELECT a.*
FROM (SELECT grouping_attribute, group_item,
COUNT(*) over (partition by grouping_attribute) as cnt
FROM a_table
) a
ORDER BY cnt DESC;

Query uses rank() needs optimization

select * from
(
Select DISTINCT
DocManREPORT_View.DOCINPUTDATE,
DocManREPORT_View.REACTIVATEDATE,
DocManREPORT_View.TRACENO,
DocManREPORT_View.CLIENTNAME,
DocManREPORT_View.DOCUMENTID,DocManREPORT_View.BARCODEID,
DocManREPORT_View.INPUTMODE,
DocManREPORT_View.INPUTSOURCE,PI.start_time,
RANK() OVER (PARTITION BY process_instance_id
ORDER BY last_modified_date desc) rank,
PI.STATUS AS PROCESSSTATUS
FROM DocManREPORT_View
INNER JOIN PROCESS_INSTANCE PI ON
(pi.instance_id = DocManREPORT_View.process_instance_id)
)
where rank = 1;
I presume DISTINCT clause could screw up the performance. I would recommend you to get rid of it by including into partition by clause and have a look what have you got.
If you can, try to use the
RANK() OVER (PARTITION BY process_instance_id
ORDER BY last_modified_date desc) rank,
Inside the VIEW, since I tihnk the View has already every data to make this step inside.