SQL Server ROW_NUMBER() Issue

SQL Server ROW_NUMBER() Issue - sql

I'm using row_number() expression but I don't get result as I expected. I have a sql table and some rows are duplicate. They have same 'BATCHID' and I want to get second row number for these, for others I use first row number. How can I do it?
SELECT * FROM (SELECT * , ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t
WHERE Rn=1
This code returns to me only first rows, but I want to get second rows for duplicated items.

ROW_NUMBER() gives every row a unique counter. You'd want to use RANK(), which is similar, but gives rows with identical values the same score:
SELECT *
FROM (SELECT * , RANK() OVER (PARTITION BY batchid ORDER BY scaqry) rk
FROM sayimdcpc) t
WHERE rk = 1

If some values are only shown once, but some twice (and perhaps more than twice), you don't want the "first" row, you want the "max" row. Try reversing your order condition:
SELECT *
FROM (SELECT * ,
ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY DESC) Rn
FROM SAYIMDCPC ) t
WHERE Rn=1
As a side note, it's still better to explicitly list out all columns; for instance, you probably don't need Rn outside of this query...

Simply try this,
SELECT * FROM (SELECT * , ROW_NUMBER() OVER (ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t
WHERE Rn=1

To rephrase it another way, it sounds like you're saying, "When there's a single row for the Batch ID, return the single row. When there are 2 or more rows, return the second row." That's going to require inspecting your Rn value to see what its max is. I don't think you can do it in a single query.
So I'd try something like this:
WITH NumberedRows AS (
SELECT * ,
ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn
FROM SAYIMDCPC
)
, MaxNumber AS (
SELECT Max(RN) as MaxRn,
BATCHID
FROM NumberedRows
GROUP BY BATCHID
)
, NonDupes AS (
SELECT *
FROM NumberedRows
WHERE BATCHID NOT IN (SELECT BATCHID FROM MaxNumber WHERE MaxNumber = 1)
)
, SecondRows AS (
SELECT (
FROM NumberedRows
WHERE BATCHID NOT IN (SELECT BATCHID FROM MaxNumber WHERE MaxNumber > 1)
AND Rn = 2
)
SELECT
FROM NonDupes
UNION ALL
SELECT *
FROM SecondRows

Please try this. It will select max row num, if no duplicate then it should be first one otherwise second
select * from (SELECT * , ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) d,
(SELECT batchid,max(Rn) maxRn FROM (SELECT * , ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t
group by batchid) q
where d.batchid = q.batchid and d.rn = q.maxrn

correct me if i am wrong
e.g sample data
BatchID, SCAQTY
1 , 10
2 , 10
2 , 20
2 , 30
is your expectation result like below?
**Expectation Result 1**
BatchID , SCAQty
1 , 10
2 , 30
or
**Expectation Result 2**
BatchID , SCAQty
1 , 10
2 , 20
2 , 30
based on my understanding what you want to perform is Expectation Result 1, so i guess Query below should able to help u, you just need to add desc for SCAQTY in your query
SELECT * FROM (SELECT * ,
ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY DESC) Rn FROM SAYIMDCPC ) t
WHERE Rn=1

Total result set with duplicates and non duplicates. The first column "IsDuplicate" indicates if the column is a duplicate or not.
;WITH d1 AS (
SELECT Seq = ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY)
,*
FROM SAYIMDCPC
)
SELECT IsDuplicate = CONVERT(BIT, Seq)
,*
FROM d1
This will give you only the duplicates:
;WITH d1 AS (
SELECT Seq = ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY)
,*
FROM SAYIMDCPC
)
SELECT IsDuplicate = CONVERT(BIT, Seq)
,*
FROM d1
WHERE Seq > 1
This will give you only the non duplicates (as in your first query)
;WITH d1 AS (
SELECT Seq = ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY)
,*
FROM SAYIMDCPC
)
SELECT IsDuplicate = CONVERT(BIT, Seq)
,*
FROM d1
WHERE Seq = 1

Related

Creating column and filtering it in one select statement

Wondering if it is possible to creating a new column and filter on that column. The following is an example:
SELECT row_number() over (partition by ID order by date asc) row# FROM table1 where row# = 1
Thanks!

Some databases support a QUALIFY clause which you might be able to use:
SELECT *
FROM table1
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) = 1;
On SQL Server, you may use a TOP 1 WITH TIES trick:
SELECT TOP 1 WITH TIES *
FROM table1
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date);
More generally, you would have to use a subquery:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) rn
FROM table1 t
)
SELECT *
FROM cte
WHERE rn = 1;

The WHERE clause is evaluated before the SELECT so your column has to exist before you can use a WHERE clause. You could achieve this by making a subquery of the original query.
SELECT *
FROM
(
SELECT row_number() over (partition by ID order by date asc) row#
FROM table1
) a
WHERE a.row# = 1

Selecting records that appear several times in a row

My problem is that I would like to select some records which appears in a row.
For example we have table like this:
x
x
x
y
y
x
x
y
Query should give answer like this:
x 3
y 2
x 2
y 1

SQL tables represent unordered sets. Your question only makes sense if there is a column that specifies the ordering. If so, you can use the difference-of-row-numbers to determine the groups and then aggregate:
select col1, count(*)
from (select t.*,
row_number() over (order by <ordering col>) as seqnum,
row_number() over (partition by col1 order by <ordering col>) as seqnum_2
from t
) t
group by col1, (seqnum - seqnum_2)

I made a SQL Fiddle
http://sqlfiddle.com/#!18/f8900/5
CREATE TABLE [dbo].[SomeTable](
[data] [nchar](1) NULL,
[id] [int] IDENTITY(1,1) NOT NULL
);
INSERT INTO SomeTable
([data])
VALUES
('x'),
('x'),
('x'),
('y'),
('y'),
('x'),
('x'),
('y')
;
select * from SomeTable;
WITH SomeTable_CTE (Data, total, BaseId, NextId)
AS
(
SELECT
Data,
1 as total,
Id as BaseId,
Id+1 as NextId
FROM SomeTable
where not exists(
Select * from SomeTable Previous
where Previous.Id+1 = SomeTable.Id
and Previous.Data = SomeTable.Data)
UNION ALL
select SomeTable_CTE.Data, SomeTable_CTE.total+1, SomeTable_CTE.BaseId as BaseId, SomeTable.Id+1 as NextId
from SomeTable_CTE inner join SomeTable on
SomeTable.Data = SomeTable_CTE.Data
and
SomeTable.Id = SomeTable_CTE.NextId
)
SELECT Data, max(total) as total
FROM SomeTable_CTE
group by Data, BaseId
order by BaseId

The elephant in the room is the missing column(s) to establish the order of rows.
SELECT col1, count(*)
FROM (
SELECT col1, order_column
, row_number() OVER (ORDER BY order_column)
- row_number() OVER (PARTITION BY col1 ORDER BY order_column) AS grp
FROM tbl
) t
GROUP BY col1, grp
ORDER BY min(order_column);
To exclude partitions with only a single row, add a HAVING clause:
SELECT col1, count(*)
FROM (
SELECT col1, order_column
, row_number() OVER (ORDER BY order_column)
- row_number() OVER (PARTITION BY col1 ORDER BY order_column) AS grp
FROM tbl
) t
GROUP BY col1, grp
HAVING count(*) > 1
ORDER BY min(order_column);
db<>fiddle here
Add a final ORDER BY to maintain original order (and a meaningful result). You may want to add a column like min(order_column) as well.
Related:
Find the longest streak of perfect scores per player
Select longest continuous sequence
Group by repeating attribute

Having issue susing the MAX command in Databricks

Just running a simple query
select COL1
from (
select *
, monotonically_increasing_id() as row_id
from db00sparkmigration_landingzone_template.tbl_Ingestion_note_load_errors_pipes
) aa
where row_id > 1 and row_id < max(row_id)
but getting the following error, any ideas?
Error in SQL statement: UnsupportedOperationException: Cannot evaluate expression: max(input[1, bigint, false])

I would recommend row_number() here - however you need a column that defines the ordering of the rows:
select col1
from (
select
t.*,
row_number() over(order by <ordering_col> asc) rn_asc,
row_number() over(order by <ordering_col> desc) rn_desc
from db00sparkmigration_landingzone_template.tbl_Ingestion_note_load_errors_pipes t
) aa
where 1 not in (rn_asc, rn_desc)

How to get MAX rownumber with rowNumber in same query

I have ROW_NUMBER() OVER (ORDER BY NULL) rnum in a sql statement to give me row numbers. is there a way to attach the max rnum to the same dataset going out?
what I want is the row_number() which I get, but I also want the MAXIMUM rownumber of the total return on each record.
SELECT
ROW_NUMBER() OVER (ORDER BY NULL) rnum,
C.ID, C.FIELD1, C."NAME", C.FIELD2, C.FIELD3
FROM SCHEMA.TABLE
WHERE (C.IS_CRNT = 1)
), MAX_NUM as (
SELECT DATA.ID, max(rnum) as maxrnum from DATA GROUP BY DATA.COMPONENT_ID
) select maxrnum, DATA.* from DATA JOIN MAX_NUM on DATA.COMPONENT_ID = MAX_NUM.COMPONENT_ID
DESIRED RESULT (ASSUMING 15 records):
1 15 DATA
2 15 DATA
3 15 DATA
etc...

I think you want count(*) as a window function:
SELECT ROW_NUMBER() OVER (ORDER BY NULL) as rnum,
COUNT(*) OVER () as cnt,
C.ID, C.FIELD1, C."NAME", C.FIELD2, C.FIELD3
FROM SCHEMA.TABLE
WHERE C.IS_CRNT = 1

Based on my assumptions in your dataset, this is the approach I would take:
WITH CTE AS (
select C.ID, C.FIELD1, C."NAME", C.FIELD2, C.FIELD3, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM SCHEMA.TABLE WHERE (C.IS_CRNT = 1))
SELECT *, (select count(*) from cte) "count" from cte;

Random records in Oracle table based on conditions

I have a Oracle table with the following columns
Table Structure
In a query I need to return all the records with CPER>=40 which is trivial. However, apart from CPER>=40 I need to list 5 random records for each CPID.
I have attached a sample list of records. However, in my table I have around 50,000 records.
Appreciate if you can help.

Oracle solution:
with CTE as
(
select t1.*,
row_number() over(order by DBMS_RANDOM.VALUE) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
SQL Server:
with CTE as
(
select t1.*,
row_number() over(order by newid()) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40

How about something like this...
SELECT *
FROM (SELECT CID,
CVAL,
CPID,
CPER,
Row_number() OVER (partition BY CPID ORDER BY CPID ASC ) AS RN
FROM Table) tmp
WHERE CPER>=40 OR pids <= 5
However, this is not random.

Assuming that you want five additional random records, you can do:
select t.*
from (select t.*,
row_number() over (partition by cpid,
(case when cper >= 40 then 1 else 2 end)
order by dbms_random.value
) as seqnum
from t
) t
where seqnum <= 5 or cper >= 40;
The row_number() is enumerating the rows for each cpid in two groups -- based on the cper value. The outer where is taking all cper values in the range you want as well as five from the other group.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server ROW_NUMBER() Issue - sql

ROW_NUMBER() gives every row a unique counter. You'd want to use RANK(), which is similar, but gives rows with identical values the same score: SELECT * FROM (SELECT * , RANK() OVER (PARTITION BY batchid ORDER BY scaqry) rk FROM sayimdcpc) t WHERE rk = 1

Simply try this, SELECT * FROM (SELECT * , ROW_NUMBER() OVER (ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t WHERE Rn=1

Related

Creating column and filtering it in one select statement

Selecting records that appear several times in a row

Having issue susing the MAX command in Databricks

How to get MAX rownumber with rowNumber in same query

Random records in Oracle table based on conditions

Categories

Resources