SQL Server ROW_NUMBER() Issue - sql

I'm using row_number() expression but I don't get result as I expected. I have a sql table and some rows are duplicate. They have same 'BATCHID' and I want to get second row number for these, for others I use first row number. How can I do it?
SELECT * FROM (SELECT * , ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t
WHERE Rn=1
This code returns to me only first rows, but I want to get second rows for duplicated items.

ROW_NUMBER() gives every row a unique counter. You'd want to use RANK(), which is similar, but gives rows with identical values the same score:
SELECT *
FROM (SELECT * , RANK() OVER (PARTITION BY batchid ORDER BY scaqry) rk
FROM sayimdcpc) t
WHERE rk = 1

If some values are only shown once, but some twice (and perhaps more than twice), you don't want the "first" row, you want the "max" row. Try reversing your order condition:
SELECT *
FROM (SELECT * ,
ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY DESC) Rn
FROM SAYIMDCPC ) t
WHERE Rn=1
As a side note, it's still better to explicitly list out all columns; for instance, you probably don't need Rn outside of this query...

Simply try this,
SELECT * FROM (SELECT * , ROW_NUMBER() OVER (ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t
WHERE Rn=1

To rephrase it another way, it sounds like you're saying, "When there's a single row for the Batch ID, return the single row. When there are 2 or more rows, return the second row." That's going to require inspecting your Rn value to see what its max is. I don't think you can do it in a single query.
So I'd try something like this:
WITH NumberedRows AS (
SELECT * ,
ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn
FROM SAYIMDCPC
)
, MaxNumber AS (
SELECT Max(RN) as MaxRn,
BATCHID
FROM NumberedRows
GROUP BY BATCHID
)
, NonDupes AS (
SELECT *
FROM NumberedRows
WHERE BATCHID NOT IN (SELECT BATCHID FROM MaxNumber WHERE MaxNumber = 1)
)
, SecondRows AS (
SELECT (
FROM NumberedRows
WHERE BATCHID NOT IN (SELECT BATCHID FROM MaxNumber WHERE MaxNumber > 1)
AND Rn = 2
)
SELECT
FROM NonDupes
UNION ALL
SELECT *
FROM SecondRows

Please try this. It will select max row num, if no duplicate then it should be first one otherwise second
select * from (SELECT * , ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) d,
(SELECT batchid,max(Rn) maxRn FROM (SELECT * , ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t
group by batchid) q
where d.batchid = q.batchid and d.rn = q.maxrn

correct me if i am wrong
e.g sample data
BatchID, SCAQTY
1 , 10
2 , 10
2 , 20
2 , 30
is your expectation result like below?
**Expectation Result 1**
BatchID , SCAQty
1 , 10
2 , 30
or
**Expectation Result 2**
BatchID , SCAQty
1 , 10
2 , 20
2 , 30
based on my understanding what you want to perform is Expectation Result 1, so i guess Query below should able to help u, you just need to add desc for SCAQTY in your query
SELECT * FROM (SELECT * ,
ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY DESC) Rn FROM SAYIMDCPC ) t
WHERE Rn=1

Total result set with duplicates and non duplicates. The first column "IsDuplicate" indicates if the column is a duplicate or not.
;WITH d1 AS (
SELECT Seq = ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY)
,*
FROM SAYIMDCPC
)
SELECT IsDuplicate = CONVERT(BIT, Seq)
,*
FROM d1
This will give you only the duplicates:
;WITH d1 AS (
SELECT Seq = ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY)
,*
FROM SAYIMDCPC
)
SELECT IsDuplicate = CONVERT(BIT, Seq)
,*
FROM d1
WHERE Seq > 1
This will give you only the non duplicates (as in your first query)
;WITH d1 AS (
SELECT Seq = ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY)
,*
FROM SAYIMDCPC
)
SELECT IsDuplicate = CONVERT(BIT, Seq)
,*
FROM d1
WHERE Seq = 1

Related

Creating column and filtering it in one select statement

Wondering if it is possible to creating a new column and filter on that column. The following is an example:
SELECT row_number() over (partition by ID order by date asc) row# FROM table1 where row# = 1
Thanks!
Some databases support a QUALIFY clause which you might be able to use:
SELECT *
FROM table1
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) = 1;
On SQL Server, you may use a TOP 1 WITH TIES trick:
SELECT TOP 1 WITH TIES *
FROM table1
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date);
More generally, you would have to use a subquery:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) rn
FROM table1 t
)
SELECT *
FROM cte
WHERE rn = 1;
The WHERE clause is evaluated before the SELECT so your column has to exist before you can use a WHERE clause. You could achieve this by making a subquery of the original query.
SELECT *
FROM
(
SELECT row_number() over (partition by ID order by date asc) row#
FROM table1
) a
WHERE a.row# = 1

Selecting records that appear several times in a row

My problem is that I would like to select some records which appears in a row.
For example we have table like this:
x
x
x
y
y
x
x
y
Query should give answer like this:
x 3
y 2
x 2
y 1
SQL tables represent unordered sets. Your question only makes sense if there is a column that specifies the ordering. If so, you can use the difference-of-row-numbers to determine the groups and then aggregate:
select col1, count(*)
from (select t.*,
row_number() over (order by <ordering col>) as seqnum,
row_number() over (partition by col1 order by <ordering col>) as seqnum_2
from t
) t
group by col1, (seqnum - seqnum_2)
I made a SQL Fiddle
http://sqlfiddle.com/#!18/f8900/5
CREATE TABLE [dbo].[SomeTable](
[data] [nchar](1) NULL,
[id] [int] IDENTITY(1,1) NOT NULL
);
INSERT INTO SomeTable
([data])
VALUES
('x'),
('x'),
('x'),
('y'),
('y'),
('x'),
('x'),
('y')
;
select * from SomeTable;
WITH SomeTable_CTE (Data, total, BaseId, NextId)
AS
(
SELECT
Data,
1 as total,
Id as BaseId,
Id+1 as NextId
FROM SomeTable
where not exists(
Select * from SomeTable Previous
where Previous.Id+1 = SomeTable.Id
and Previous.Data = SomeTable.Data)
UNION ALL
select SomeTable_CTE.Data, SomeTable_CTE.total+1, SomeTable_CTE.BaseId as BaseId, SomeTable.Id+1 as NextId
from SomeTable_CTE inner join SomeTable on
SomeTable.Data = SomeTable_CTE.Data
and
SomeTable.Id = SomeTable_CTE.NextId
)
SELECT Data, max(total) as total
FROM SomeTable_CTE
group by Data, BaseId
order by BaseId
The elephant in the room is the missing column(s) to establish the order of rows.
SELECT col1, count(*)
FROM (
SELECT col1, order_column
, row_number() OVER (ORDER BY order_column)
- row_number() OVER (PARTITION BY col1 ORDER BY order_column) AS grp
FROM tbl
) t
GROUP BY col1, grp
ORDER BY min(order_column);
To exclude partitions with only a single row, add a HAVING clause:
SELECT col1, count(*)
FROM (
SELECT col1, order_column
, row_number() OVER (ORDER BY order_column)
- row_number() OVER (PARTITION BY col1 ORDER BY order_column) AS grp
FROM tbl
) t
GROUP BY col1, grp
HAVING count(*) > 1
ORDER BY min(order_column);
db<>fiddle here
Add a final ORDER BY to maintain original order (and a meaningful result). You may want to add a column like min(order_column) as well.
Related:
Find the longest streak of perfect scores per player
Select longest continuous sequence
Group by repeating attribute

Having issue susing the MAX command in Databricks

Just running a simple query
select COL1
from (
select *
, monotonically_increasing_id() as row_id
from db00sparkmigration_landingzone_template.tbl_Ingestion_note_load_errors_pipes
) aa
where row_id > 1 and row_id < max(row_id)
but getting the following error, any ideas?
Error in SQL statement: UnsupportedOperationException: Cannot evaluate expression: max(input[1, bigint, false])
I would recommend row_number() here - however you need a column that defines the ordering of the rows:
select col1
from (
select
t.*,
row_number() over(order by <ordering_col> asc) rn_asc,
row_number() over(order by <ordering_col> desc) rn_desc
from db00sparkmigration_landingzone_template.tbl_Ingestion_note_load_errors_pipes t
) aa
where 1 not in (rn_asc, rn_desc)

How to get MAX rownumber with rowNumber in same query

I have ROW_NUMBER() OVER (ORDER BY NULL) rnum in a sql statement to give me row numbers. is there a way to attach the max rnum to the same dataset going out?
what I want is the row_number() which I get, but I also want the MAXIMUM rownumber of the total return on each record.
SELECT
ROW_NUMBER() OVER (ORDER BY NULL) rnum,
C.ID, C.FIELD1, C."NAME", C.FIELD2, C.FIELD3
FROM SCHEMA.TABLE
WHERE (C.IS_CRNT = 1)
), MAX_NUM as (
SELECT DATA.ID, max(rnum) as maxrnum from DATA GROUP BY DATA.COMPONENT_ID
) select maxrnum, DATA.* from DATA JOIN MAX_NUM on DATA.COMPONENT_ID = MAX_NUM.COMPONENT_ID
DESIRED RESULT (ASSUMING 15 records):
1 15 DATA
2 15 DATA
3 15 DATA
etc...
I think you want count(*) as a window function:
SELECT ROW_NUMBER() OVER (ORDER BY NULL) as rnum,
COUNT(*) OVER () as cnt,
C.ID, C.FIELD1, C."NAME", C.FIELD2, C.FIELD3
FROM SCHEMA.TABLE
WHERE C.IS_CRNT = 1
Based on my assumptions in your dataset, this is the approach I would take:
WITH CTE AS (
select C.ID, C.FIELD1, C."NAME", C.FIELD2, C.FIELD3, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM SCHEMA.TABLE WHERE (C.IS_CRNT = 1))
SELECT *, (select count(*) from cte) "count" from cte;

Random records in Oracle table based on conditions

I have a Oracle table with the following columns
Table Structure
In a query I need to return all the records with CPER>=40 which is trivial. However, apart from CPER>=40 I need to list 5 random records for each CPID.
I have attached a sample list of records. However, in my table I have around 50,000 records.
Appreciate if you can help.
Oracle solution:
with CTE as
(
select t1.*,
row_number() over(order by DBMS_RANDOM.VALUE) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
SQL Server:
with CTE as
(
select t1.*,
row_number() over(order by newid()) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
How about something like this...
SELECT *
FROM (SELECT CID,
CVAL,
CPID,
CPER,
Row_number() OVER (partition BY CPID ORDER BY CPID ASC ) AS RN
FROM Table) tmp
WHERE CPER>=40 OR pids <= 5
However, this is not random.
Assuming that you want five additional random records, you can do:
select t.*
from (select t.*,
row_number() over (partition by cpid,
(case when cper >= 40 then 1 else 2 end)
order by dbms_random.value
) as seqnum
from t
) t
where seqnum <= 5 or cper >= 40;
The row_number() is enumerating the rows for each cpid in two groups -- based on the cper value. The outer where is taking all cper values in the range you want as well as five from the other group.