How to write a derived query in Netezza SQL? - sql

I need to query the data for inviteid based. For each inviteid I need to have the top 5 IDs and ID Descriptions.
I see that the query I wrote is taking all the time in the world to fetch. I didn't notice an error or anything wrong with it.
The code is:
SELECT count(distinct ID),
IDdesc,
inviteid,
A
FROM (
SELECT
ID,
IDdesc,
inviteid,
RANK() OVER(order by invtypeid asc ) A
FROM Fact_s
--WHERE dateid ='26012013'
GROUP BY invteid,IDdesc,ID
ORDER BY invteid,IDdesc,ID
) B
WHERE A <=5
GROUP BY A, IDDESC, inviteid
ORDER BY A

I'm not sure I understood you requirement completely, but as far as I can tell the group by in the derived table is not necessary (just as the order by as Mark mentioned) because you are using a window function.
And you probably want row_number() instead of rank() in there.
Including the result of rank() in the outer query seems dubious as well.
So this leads to the following statement:
SELECT count(distinct ID),
IDdesc,
inviteid
FROM (
SELECT ID,
IDdesc,
inviteid,
row_number() OVER (order by invtypeid asc ) as rn
FROM Fact_s
) B
WHERE rn <= 5
GROUP BY IDDESC, inviteid;

Related

Get last two rows from a row_number() window function in snowflake

Hopefully, someone can help me...
I'm trying to get the last two values from a row_number() window function. Let's say my results contain row numbers up to 6, for example. How would it be possible to get the rows where the row number is 5 and 6?
Let me know if it can be done with another window function or in another way.
Kind regards,
Using QUALIFY:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(ORDER BY ... DESC) <= 2;
This approach could be further extended to get two rows per each partition:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ... DESC) <= 2;
You can use top with order by desc like:
select top 2 row_number() over([partition by] [order by]) as rn
from table
order by rn desc
I'd say #Shmiel is the formal and elegant way, just in case, would be the same as :
WITH CTE AS
(SELECT product,
user_id,
ROW_NUMBER() OVER (PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id
FROM CTE
WHERE RN < 3;
You will use order by [order_condition] with "desc". And then you will use RN(row number) to select as many rows as you want

Rank() based on column entries while the data is ordered by date

I'm trying to use dense_rank() function over the pagename column after the data is ordered by time_id.
Expected output in rank column, rn, is: [1,2,2,3,4].
Currently I wrote it as:
with tbl2 as
(select UID, pagename, date_id, time_id, source--, dense_rank() over(partition by UID order by pagename) as rn
from tbl1
order by time_id)
select *, dense_rank() over(partition by UID order by time_id, pagename) as rn
from tbl2
Any help would be appreciated
Edit 1: What I am trying to achieve here is to give ranks, as per the user on-screen action flow, to the pages that are visited. Suppose if the same page 'A' is visited back after visiting a different page 'B' then the ranks for these page visits A, B, A will be 1,2,3 (note that the same page A has different ranks 1 & 3)
step-by-step demo:db<>fiddle
SELECT
*,
SUM(is_diff) OVER (ORDER BY date_id, time_id, page)
FROM (
SELECT
*,
CASE WHEN page = lag(page) over (order by date_id, time_id) THEN 0 ELSE 1 END as is_diff
FROM mytable
)s
This looks exactly like a problem I asked some years ago: Window functions: PARTITION BY one column after ORDER BY another
You want to execute a window function on columns (uuid, page) but want to keep the current order which is given by unrelated columns (date_id, time_id).
The problem is, that PARTITION BY orders the records before the ORDER BY clause. So, it defines the primary order and this is not expected.
Once I found a solution for that. I adapted it to your used case. Please read the explanation over there: https://stackoverflow.com/a/52439794/3984221
Interesting part: Your special rank() case is not explicitly required in the query, because my solution creates that out-of-the-box ("by accident" so-to-speak ;) ).
Hmmm . . . If you want the pages ordered by their earliest time, then use two levels of window functions:
select t.*,
dense_rank() over (partition by uid order by min_rn, pagename) as ranking
from (select t.*,
min(rn) over (partition by uid, pagename) as min_rn
from t
) t
Note: This uses rn as a convenient shortcut because the date/time is split into two columns. You can also combine them:
select t.*,
dense_rank() over (partition by uid order by min_dt, pagename) as ranking
from (select t.*,
min(date_id || time_id) over (partition by uid, pagename) as min_dt
from t
) t;
Note: This solution is different from S_man's. On your sample data, they do the same thing. However, if the user returns to a page, then his gives page a new ranking. This gives the page the same ranking as the first time it appears. It is not clear what you really want.
You can use DENSE_RANK() like this for your requirment,
SELECT
u_id,
page_name,
date_id,
time_id,
source,
DENSE_RANK()
OVER (
PARTITION BY page_name
ORDER BY u_id DESC
) rn
FROM ( SELECT * FROM tbl1 ORDER BY time_id ) AS result;

MS SQL add max()-1 to qyery

how to add to the query max(o.Acct)-1 rows. I need to visualize the last two o.Acct rows. My query is currently showing only the max(o.Acct)
SELECT Max(o.Acct) AS [MaxAcct],o.ObjectID,o.Opertype
FROM Operations o
GROUP By o.ObjectID,o.Opertype
If you want to see the last two rows (per group), you're better off using ROW_NUMBER() rather than GROUP BY.
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ObjectID,
Opertype
ORDER BY Acct DESC
)
AS sequence_id
FROM
Operations
)
sortedOperations
WHERE
sequence_id <= 2
ORDER BY
ObjectID,
Opertype,
Acct
If you want the last two of something, I'm thinking order by and top. Something like this:
select top (2) o.*
from Operations o
order by o.acct desc;

Need help to find the middle row using Row_Number

SELECT median.spaid
,median.total
,ROW_NUMBER() OVER (
ORDER BY median.total
) AS row
FROM (
SELECT SpaID
,COUNT(1) AS Total
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID
) AS median
ORDER BY median.total
My issue here is that I need to find the middle row for column "Total" using Row_number. I need to find which "SpaID" is linked to the middle row of the "Total" column.
This is a shot in the dark based on very sparse details but I think you are looking for something like this.
with numberedResults as
(
select spaid
, ROW_NUMBER() over(order by count(*)) as RowNum
from [order]
where DateCreated between '20140401' AND '20140630'
group by SpaID
)
, Medians as
(
select MAX(RowNum) / 2 as Median
, MAX(RowNum) as TotalCount
from numberedResults
)
select *
from numberedResults r
join Medians m on m.Median = r.RowNum
I would suggest not relying on ROW_NUMBER in your query as results using ROW_NUMBER can at times be unpredictable. I understand it seems bulky - -the challenge is the "median" is the middle of grouped rows. Here's the query I believe should work for you:
SELECT SpaID, d FROM
(SELECT SpaID,
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID)
WHERE D=
(SELECT ROUND(MAX(D)/2,0)
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014')
Here is one method of finding the median:
SELECT o.*
FROM (SELECT SpaID, COUNT(*) AS Total,
ROW_NUMBER() OVER (ORDER BY COUNT(*)) as seqnum,
COUNT(*) OVER () as cnt
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '2014-04-01' AND '2014-04-30'
GROUP BY SpaID
) o
WHERE 2*o.seqnum IN (cnt - 1, cnt);
This is approximate when you have an even number of rows. You are looking for the exact row id, so you have to choose either the one before or after the median (which is between two rows).
Note: You should expression date constants using the ISO standard formats, either YYYYMMDD or YYYY-MM-DD. The first is the safest way in SQL Server (although I personally prefer the hyphens for readability).

Windowed functions can only appear in the SELECT or ORDER BY clauses

Can anyone explain why can't we use windowed functions in group by clause and why it's allowed only in SELECT and ORDER BY
I was trying to group the records based on row_number() and a column in SQL Server as like this:
SELECT Invoice
from table1
group by row_number() over(order by Invoice),Invoice
I am getting an error
Windowed functions can only appear in the SELECT or ORDER BY
I can select this row_number() in SELECT clause but I want to know why can't we use it group by?
Windowed functions are defined in the ANSI spec to logically execute after the processing of GROUP BY, HAVING, WHERE.
To be more specific they are allowed at steps 5.1 and 6 in the Logical Query Processing flow chart here .
I suppose they could have defined it another way and allowed GROUP BY, WHERE, HAVING to use window functions with the window being the logical result set at the start of that phase but suppose they had and we were allowed to construct queries such as
SELECT a,
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForSelect
FROM YourTable
WHERE NTILE(2) OVER (PARTITION BY a ORDER BY b) > 1
GROUP BY a,
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b)
HAVING NTILE(2) OVER (PARTITION BY a ORDER BY b) = 1
With four different logical windows in play good luck working out what the result of this would be! Also what if in the HAVING you actually wanted to filter by the expression from the GROUP BY level above rather than with the window of rows being the result after the GROUP BY?
The CTE version is more verbose but also more explicit and easier to follow.
WITH T1 AS
(
SELECT a,
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForWhere
FROM YourTable
), T2 AS
(
SELECT a,
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForGroupBy
FROM T1
WHERE NtileForWhere > 1
), T3 AS
(
SELECT a,
b,
NtileForGroupBy,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForHaving
FROM T2
GROUP BY a,b, NtileForGroupBy
)
SELECT a,
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForSelect
FROM T3
WHERE NtileForHaving = 1
As these are all defined in the SELECT statement and are aliased it is easily achievable to disambiguate results from different levels e.g. simply by switching WHERE NtileForHaving = 1 to NtileForGroupBy = 1
You can work around that by placing the window function in a subquery:
select invoice
, rn
from (
select Invoice
, row_number() over(order by Invoice) as rn
from Table1
) as SubQueryAlias
group by
invoice
, rn