BigQuery SQL Rank() Over() with an exception - sql

I'm trying to create a rank() over() but need it to return null when meeting a certain criteria. Then continue the rank on the next row.
Example of what i'm trying to accomplish is the column rank_over_except in this image.
It is ranking() over() the identifier and is ordered by original_nr column. In this case it doesn't "rank" when the fruit is a pear.
I have tried using a CASE WHEN THEN statement. But that is simply continuing the count. The null would simply replace a 2 on the 2nd row of rank_over_except in this example. And the third row would be a 3. So that's not working as expected.
I don't see any option to write the rank() over(). Maybe there's another way of doing this and not use rank()? I've gone through the BigQuery docs, but no luck in finding a solution.

Starting from the original numbering, we can use arithmethics and a conditional count to compute the new rank:
select t.*,
original_nr - countif(fruit = 'apple') over(partition by identifier order by original) rank_over_except
from mytable t
A more canonical approach defines a partition which contains everything but apples with a CASE expression:
select t.*,
case when fruit != 'apple'
then rank() over(
partition by identifier, case when fruit = 'apple' then 1 else 0 end
order by original_nr
)
end rank_over_except
from mytable t
order by id
Both solutions are more efficient than the union approach, that requires scanning the table twice.

The solution that worked fine for my case is based on the comment of #SR3142. Maybe with row_number #GMB's solution can also work, but the size of the data was neglegible in my case.
select
t.*,
ROW_NUMBER() OVER (PARTITION BY identifier ORDER BY original_nr ASC) rank_over_except
from table as t
WHERE t.fruit NOT IN ("pear", "apple")
UNION ALL
SELECT
NULL AS rank_over_except,
t.*
FROM table AS t
WHERE
t.fruit IN ("pear", "apple")
Reason to use row_number() over rank() is that row_number does not take in to account the data from the input column. So it's an actual "dumb" next in line number.

Related

Find the second largest value with Groupings

In SQL Server, I am attempting to pull the second latest NOTE_ENTRY_DT_TIME (items highlighted in screenshot). With the query written below it still pulls the latest date (I believe it's because of the grouping but the grouping is required to join later). What is the best method to achieve this?
SELECT
hop.ACCOUNT_ID,
MAX(hop.NOTE_ENTRY_DT_TIME) AS latest_noteid
FROM
NOTES hop
WHERE
hop.GEN_YN IS NULL
AND hop.NOTE_ENTRY_DT_TIME < (SELECT MAX(hope.NOTE_ENTRY_DT_TIME)
FROM NOTES hope
WHERE hop.GEN_YN IS NULL)
GROUP BY
hop.ACCOUNT_ID
Data sample in the table:
One of the "easier" ways to get the Nth row in a group is to use a CTE and ROW_NUMBER:
WITH CTE AS(
SELECT Account_ID,
Note_Entry_Dt_Time,
ROW_NUMBER() OVER (PARTITION BY AccountID ORDER BY Note_Entry_Dt_Time DESC) AS RN
FROM dbo.YourTable)
SELECT Account_ID,
Note_Entry_Dt_Time
FROM CTE
WHERE RN = 2;
Of course, if an ACCOUNT_ID only has 1 row, then it will not be returned in the result set.
The OP's statement "The row will not always be 2." from the comments conflicts with their statement "I am attempting to pull the second latest NOTE_ENTRY_DT_TIME" in the question. At a best guess, this means that the OP has rows with the same date, that could be the "latest" date. If so, then would simply need to replace ROW_NUMBER with DENSE_RANK. Their sampple data, however, doesn't suggest this is the case.
You can use window functions:
select *
from (
select
n.*,
row_number() over(partition by account_id order by note_entry_dt_time desc) rn
from notes n
) t
where rn = 2

Sequence within a partition in SQL server

I have been looking around for 2 days and have not been able to figure out this one. Using dataset below and SQL server 2016 I would like to get the row number of each row by 'id' and 'cat' ordered by 'date' in asc order but would like to see a reset of the sequence if a different value in the 'cat' column for the same 'id' is found(see rows in green). Any help would be appreciated.
This is a gaps and islands problem. The simplest solution in this case is probably a difference of row numbers:
select t.*,
row_number() over (partition by id, cat, seqnum - seqnum_c order by date) as row_num
from (select t.*,
row_number() over (partition by id order by date) as seqnum,
row_number() over (partition by id, cat order by date) as seqnum_c
from t
) t;
Why this works is a bit tricky to explain. But, if you look at the sequence numbers in the subquery, you'll see that the difference defines the groups you want to define.
Note: This assumes that the date column provides a stable sort. You seem to have duplicates in the column. If there really are duplicates and you have no secondary column for sorting, then try rank() or dense_rank() instead of row_number().

How to assign 1 unique number for rows containing the same value

Appreciate if you can help me with a case that I'm working on on Oracle (pl/sql).
Suppose I have 1 table named TableA:TableA
The rule of TableA sorting is :
CASE_ID & CONTRACT with 'SD' TRIGGER have to be placed on top, regardless of the SCORE.
After all of the CONTRACT & CASE_ID with 'SD' TRIGGER are placed on top, the next CASE_ID & CONTRACT are sorted by SCORE descending.
I want to place 1 unique number for 1 CASE_ID, ascending from 1, so CONTRACT with the same CASE_ID will have the same number. an Example of the solution that I'm trying to obtain is :Example Solution
I have tried using DENSE_RANK with the following query:
select a.*
,dense_rank() over (partition by a.case_id order by rn)
from (
select a.*,rownum as rn from TableA a
)a
But the solution still is not the way I want it to be, there are some CASE_ID assigned with the same NUMBER
Appreciate if you can give me some input regarding this.
Thank you very much!
You can use dense_rank(). It looks like:
select a.*,
dense_rank() over (order by a.case_id)
from TableA a;
No partition by is needed.
Dense Rank with a partition would just give you consecutive ranks in a partition. To use dense_rank without partition refer to Gordon's answer.
One other way is to create a row_number over distinct case_id and join back to original table;
SELECT TableA.* , bar.NUMBER
FROM TableA
JOIN
(SELECT foo.CASE_ID as case_id, ROW_NUMBER() OVER (Partition by 1) as NUMBER
FROM ((SELECT DISTINCT CASE_ID FROM TableA)as foo) as bar
ON TableA.CASE_ID = bar.case_id ;

Rank Over Partition By in Oracle SQL (Oracle 11g)

I have 4 columns in a table
Company Part Number
Manufacturer Part Number
Order Number
Part Receipt Date
Ex.
I just want to return one record based on the maximum Part Receipt Date which would be the first row in the table (The one with Part Receipt date 03/31/2015).
I tried
RANK() OVER (PARTITION BY Company Part Number,Manufacturer Part Number
ORDER BY Part Receipt Date DESC,Order Number DESC) = 1
at the end of the WHERE statement and this did not work.
This would seem to do what you want:
select t.*
from (select t.*
from t
order by partreceiptdate desc
) t
where rownum = 1;
Analytic functions like rank() are available in the SELECT clause, they can't be invoked directly in a WHERE clause. To use rank() the way you want it, you must declare it in a subquery and then use it in the WHERE clause in the outer query. Something like this:
select company_part_number, manufacturer_part_number, order_number, part_receipt_date
from ( select t.*, rank() over (partition by... order by...) as rnk
from your_table t
)
where rnk = 1
Note also that you can't have a column name like company part number (with spaces in it) - at least not unless they are enclosed in double-quotes, which is a very poor practice, best avoided.

How do I use ROW_NUMBER()?

I want to use the ROW_NUMBER() to get...
To get the max(ROW_NUMBER()) --> Or i guess this would also be the count of all rows
I tried doing:
SELECT max(ROW_NUMBER() OVER(ORDER BY UserId)) FROM Users
but it didn't seem to work...
To get ROW_NUMBER() using a given piece of information, ie. if I have a name and I want to know what row the name came from.
I assume it would be something similar to what I tried for #1
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
but this didn't work either...
Any Ideas?
For the first question, why not just use?
SELECT COUNT(*) FROM myTable
to get the count.
And for the second question, the primary key of the row is what should be used to identify a particular row. Don't try and use the row number for that.
If you returned Row_Number() in your main query,
SELECT ROW_NUMBER() OVER (Order by Id) AS RowNumber, Field1, Field2, Field3
FROM User
Then when you want to go 5 rows back then you can take the current row number and use the following query to determine the row with currentrow -5
SELECT us.Id
FROM (SELECT ROW_NUMBER() OVER (ORDER BY id) AS Row, Id
FROM User ) us
WHERE Row = CurrentRow - 5
Though I agree with others that you could use count() to get the total number of rows, here is how you can use the row_count():
To get the total no of rows:
with temp as (
select row_number() over (order by id) as rownum
from table_name
)
select max(rownum) from temp
To get the row numbers where name is Matt:
with temp as (
select name, row_number() over (order by id) as rownum
from table_name
)
select rownum from temp where name like 'Matt'
You can further use min(rownum) or max(rownum) to get the first or last row for Matt respectively.
These were very simple implementations of row_number(). You can use it for more complex grouping. Check out my response on Advanced grouping without using a sub query
If you need to return the table's total row count, you can use an alternative way to the SELECT COUNT(*) statement.
Because SELECT COUNT(*) makes a full table scan to return the row count, it can take very long time for a large table. You can use the sysindexes system table instead in this case. There is a ROWS column that contains the total row count for each table in your database. You can use the following select statement:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2
This will drastically reduce the time your query takes.
You can use this for get first record where has clause
SELECT TOP(1) * , ROW_NUMBER() OVER(ORDER BY UserId) AS rownum
FROM Users
WHERE UserName = 'Joe'
ORDER BY rownum ASC
ROW_NUMBER() returns a unique number for each row starting with 1. You can easily use this by simply writing:
ROW_NUMBER() OVER (ORDER BY 'Column_Name' DESC) as ROW_NUMBER
May not be related to the question here. But I found it could be useful when using ROW_NUMBER -
SELECT *,
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS Any_ID
FROM #Any_Table
select
Ml.Hid,
ml.blockid,
row_number() over (partition by ml.blockid order by Ml.Hid desc) as rownumber,
H.HNAME
from MIT_LeadBechmarkHamletwise ML
join [MT.HAMLE] h on ML.Hid=h.HID
SELECT num, UserName FROM
(SELECT UserName, ROW_NUMBER() OVER(ORDER BY UserId) AS num
From Users) AS numbered
WHERE UserName='Joe'
You can use Row_Number for limit query result.
Example:
SELECT * FROM (
select row_number() OVER (order by createtime desc) AS ROWINDEX,*
from TABLENAME ) TB
WHERE TB.ROWINDEX between 0 and 10
--
With above query, I will get PAGE 1 of results from TABLENAME.
If you absolutely want to use ROW_NUMBER for this (instead of count(*)) you can always use:
SELECT TOP 1 ROW_NUMBER() OVER (ORDER BY Id)
FROM USERS
ORDER BY ROW_NUMBER() OVER (ORDER BY Id) DESC
Need to create virtual table by using WITH table AS, which is mention in given Query.
By using this virtual table, you can perform CRUD operation w.r.t row_number.
QUERY:
WITH table AS
-
(SELECT row_number() OVER(ORDER BY UserId) rn, * FROM Users)
-
SELECT * FROM table WHERE UserName='Joe'
-
You can use INSERT, UPDATE or DELETE in last sentence by in spite of SELECT.
SQL Row_Number() function is to sort and assign an order number to data rows in related record set. So it is used to number rows, for example to identify the top 10 rows which have the highest order amount or identify the order of each customer which is the highest amount, etc.
If you want to sort the dataset and number each row by seperating them into categories we use Row_Number() with Partition By clause. For example, sorting orders of each customer within itself where the dataset contains all orders, etc.
SELECT
SalesOrderNumber,
CustomerId,
SubTotal,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY SubTotal DESC) rn
FROM Sales.SalesOrderHeader
But as I understand you want to calculate the number of rows of grouped by a column. To visualize the requirement, if you want to see the count of all orders of the related customer as a seperate column besides order info, you can use COUNT() aggregation function with Partition By clause
For example,
SELECT
SalesOrderNumber,
CustomerId,
COUNT(*) OVER (PARTITION BY CustomerId) CustomerOrderCount
FROM Sales.SalesOrderHeader
This query:
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
will return all rows where the UserName is 'Joe' UNLESS you have no UserName='Joe'
They will be listed in order of UserID and the row_number field will start with 1 and increment however many rows contain UserName='Joe'
If it does not work for you then your WHERE command has an issue OR there is no UserID in the table. Check spelling for both fields UserID and UserName.