Sql server ROW_NUMBER() & Rank() function detail....how it works - sql-server-2005

i never use sql server ROW_NUMBER() function. so i read some article regarding ROW_NUMBER(),PARTITION & RANK() etc but still not clear to me.
i found the syntax is like
SELECT top 10 ROW_NUMBER() OVER(ORDER BY JID DESC) AS 'Row Number',
JID,Specialist, jobstate, jobtype FROM bbajobs
SELECT top 10 ROW_NUMBER() OVER(PARTITION BY JID ORDER BY JID DESC) AS 'Row Number',
JID,Specialist, jobstate, jobtype FROM bbajobs
i have few question
1) what over() function does. why we need to specify column name in over function like OVER(ORDER BY JID DESC)
2) i saw sometime people use PARTITION keyword. what it is?
it is also used in over function like OVER(PARTITION BY JID ORDER BY JID DESC)
3) in what type of situation we have to use PARTITION keyword
4) when we specify PARTITION keyword in over then also we need to specify order by also why. only PARTITION keyword can not be used in over clause.
5) what type of situation one should use RANK function
6) what is CTE and what is the advantage of using CTE. it is just like temporary view.
anyone get any performance boost if he/she use CTE other than reusability?
please discuss my points in detail. it will be very much helpful if some one make me understand with small & easy example for all the keyword like ROW_NUMBER(),PARTITION & RANK(). thanks

OVER Clause (Transact-SQL)
Ranking Functions (Transact-SQL)
ROW_NUMBER (Transact-SQL)
RANK (Transact-SQL)

You need ORDER BY because sets have no order otherwise. You need it for a standard SELECT
PARTITION BY resets the COUNT per partition
Many
See point 1. You can use PARTITION by itself for SUM, COUNT etc
See MSDN
Separate question

Related

There is any equivalent for TOP and BOTTOM in SQL that available in influxdb?

I am porting the queries in influx DB to timescaleDB (Postgres SQL). I am currently stuck in the TOP and BOTTOM functions. Is there any equivalent in Postgres SQL or any suggestions to achieve it?
For constant one, I did like that,
TOP('field', 1) -> MAX('field')
BOTTOM('field', 1) -> MIN('field')
What about others like,
TOP('field', 5)
BOTTOM('field', 5)
Edit 1:
Does Using LIMIT with ORDER BY also work with GROUP BY because the limit is executed after group by Right What if want something like this
Thank You
Using window functions is probably the most versatile way to do this:
select *
from (
select t.*,
dense_rank() over (partition by ??? order by ??? asc) as rnk
from the_table t
) x
where x.rnk = 3; --<< adjust here
Rows in a relational database have no implied sort order. So "top" or "bottom" only makes sense if you also provide an order by. From your question is completely unclear what that would be.
Using order by .. asc returns the "bottom rows", using order by .. desc returns the "top rows"
If you want top/bottom for the entire table (instead of one row "per group"), the leave out the partition by
dense_rank() will return multiple rows with the same "rank" when the rows have the same highest (or lowest) value in the column you are sorting by. If you don't want that (and pick an arbitrary one from those "duplicates") then use row_number() instead.

Is there a way to add ROW_NUMBER() without using OVER (ORDER BY ...) in SQL

Is there a way to add ROW_NUMBER() simply based on the default row order without using OVER (ORDER BY ...)?
There is no implicit ordering to rows in a table, it is a logical unordered set.
however you can do row_number() over (order by (select null))
As suggested by Itzik Ben-Gan from his book on window functions.
For PostgreSQL and MYSQL 8.0
row_number()over()
For SQL Server and oracle it will be:
row_number()over(order by (select null))
But this without mentioning proper order by clause it's not guaranteed to have same row number for a column everytime.

ROW_NUMBER Without ORDER BY

I've to add row number in my existing query so that I can track how much data has been added into Redis. If my query failed so I can start from that row no which is updated in other table.
Query to get data start after 1000 row from table
SELECT * FROM (SELECT *, ROW_NUMBER() OVER (Order by (select 1)) as rn ) as X where rn > 1000
Query is working fine. If any way that I can get the row no without using order by.
What is select 1 here?
Is the query optimized or I can do it by other ways. Please provide the better solution.
There is no need to worry about specifying constant in the ORDER BY expression. The following is quoted from the Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions written by Itzik Ben-Gan (it was available for free download from Microsoft free e-books site):
As mentioned, a window order clause is mandatory, and SQL Server
doesn’t allow the ordering to be based on a constant—for example,
ORDER BY NULL. But surprisingly, when passing an expression based on a
subquery that returns a constant—for example, ORDER BY (SELECT
NULL)—SQL Server will accept it. At the same time, the optimizer
un-nests, or expands, the expression and realizes that the ordering is
the same for all rows. Therefore, it removes the ordering requirement
from the input data. Here’s a complete query demonstrating this
technique:
SELECT actid, tranid, val,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM dbo.Transactions;
Observe in the properties of the Index Scan iterator that the Ordered
property is False, meaning that the iterator is not required to return
the data in index key order
The above means that when you are using constant ordering is not performed. I will strongly recommend to read the book as Itzik Ben-Gan describes in depth how the window functions are working and how to optimize various of cases when they are used.
Try just order by 1. Read the error message. Then reinstate the order by (select 1). Realise that whoever wrote this has, at some point, read the error message and then decided that the right thing to do is to trick the system into not raising an error rather than realising the fundamental truth that the error was trying to alert them to.
Tables have no inherent order. If you want some form of ordering that you can rely upon, it's up to you to provide enough deterministic expression(s) to any ORDER BY clause such that each row is uniquely identified and ordered.
Anything else, including tricking the system into not emitting errors, is hoping that the system will do something sensible without using the tools provided to you to ensure that it does something sensible - a well specified ORDER BY clause.
You can use any literal value
ex
order by (select 0)
order by (select null)
order by (select 'test')
etc
Refer this for more information
https://exploresql.com/2017/03/31/row_number-function-with-no-specific-order/
What is select 1 here?
In this scenario, the author of query does not really have any particular sorting in mind.
ROW_NUMBER requires ORDER BY clause so providing it is a way to satisfy the parser.
Sorting by "constant" will create "undeterministic" order(query optimizer is able to choose whatever order it found suitable).
Easiest way to think about it is as:
ROW_NUMBER() OVER(ORDER BY 1) -- error
ROW_NUMBER() OVER(ORDER BY NULL) -- error
There are few possible scenarios to provide constant expression to "trick" query optimizer:
ROW_NUMBER() OVER(ORDER BY (SELECT 1)) -- already presented
Other options:
ROW_NUMBER() OVER(ORDER BY 1/0) -- should not be used
ROW_NUMBER() OVER(ORDER BY ##SPID)
ROW_NUMBER() OVER(ORDER BY DB_ID())
ROW_NUMBER() OVER(ORDER BY USER_ID())
db<>fiddle demo

selecting first result from output of a subquery

i want to select first and last outcome from a subquery in oracle.
i cant use "rownum" since i am using "order by" which completely changes the sequence of "rownum".
pls suggest some solutions.
thanx fr help.
Use keep if you have an aggregation query. That is what it is designed for. It looks something like this:
select x,
max(outcome) keep (dense_rank first order by datetime asc) as first_outcome,
max(outcome) keep (dense_rank first order by datetime desc) as last_outcome,
from t
group by x;
Use first_value() and last_value() if there is no aggregation:
select t.*,
first_value(outcome) over (partition by x order by datetime) as first_outcome,
last_value(outcome) over (partition by x order by datetime) as last_outcome
from t;
You can't use "rownum" because you want both the first and the last values - otherwise you could use rownum by putting your code in a subquery and selecting from it and filtering by rownum in the outer query. As it is, you need to use ROW_NUMBER() analytic function and such (both with order by ... and with order by ... desc, so you can get both the first and the last outcome in one single outer query.
If ties are possible you may prefer DENSE_RANK to get all rows tied for first (or for last); instead, ROW_NUMBER() will return "one of" the rows tied for first (or for last); which one, specifically, is random.
If you want to see an example, provide sample data for your problem.
I solved this by using ROW_NUMBER() function with OVER(order by..).

SQL random aggregate

Say I have a simple table with 3 fields: 'place', 'user' and 'bytes'. Let's say, that under some filter, I want to group by 'place', and for each 'place', to sum all the bytes for that place, and randomly select a user for that place (uniformly from all the users that fit the 'where' filter and the relevant 'place'). If there was a "select randomly from" aggregate function, I would do:
SELECT place, SUM(bytes), SELECT_AT_RANDOM(user) WHERE .... GROUP BY place;
...but I couldn't find such an aggregate function. Am I missing something? What could be a good way to achieve this?
If your RDBMS supports analytical functions.
WITH T
AS (SELECT place,
Sum(bytes) OVER (PARTITION BY place) AS Sum_bytes,
user,
Row_number() OVER (PARTITION BY place ORDER BY random_function()) AS RN
FROM YourTable
WHERE .... )
SELECT place,
Sum_bytes,
user
FROM T
WHERE RN = 1;
For SQL Server Crypt_gen_random(4) or NEWID() would be examples of something that could be substituted in for random_function()
I think your question is DBMS specific. If your DBMS is MySql, you can use a solution like this:
SELECT place_rand.place, SUM(place_rand.bytes), place_rand.user as random_user
FROM
(SELECT place, bytes, user
FROM place
WHERE ...
ORDER BY rand()) place_rand
GROUP BY
place_rand.place;
The subquery orders records in random order. The outer query groups by place, sums bytes, and returns first random user, since user is not in an aggregate function and neither in the group by clause.
With a custom aggregate function, you could write expressions as simple as:
SELECT place, SUM(bytes), SELECT_AT_RANDOM(user) WHERE .... GROUP BY place;
SELECT_AT_RAMDOM would be the custom aggregate function.
Here is precisely an implementation in PostgreSQL.
I would do a bit of a variation on Martin's solution:
select place, sum(bytes), max(case when seqnum = 1 then user end) as random_user
from (select place, bytes,
row_number() over (partition by place order by newid()) as sequm
from t
) t
group by place
(Where newid() is just one way to get a random number, depending on the database.)
For some reason, I prefer this approach, because it still has the aggregation function in the outer query. If you are summarizing a bunch of fields, then this seems cleaner to me.