ROW_NUMBER function does not start from 1 - sql

I would like to ask about strange behaviour in SQL Server whilst using ROW_NUMBER() Function. Typically it should start from 1 and Order values by the selected column in Order By clause, which for the most scenarios works for me just as it is supposed to, but I have a particular case when I use a basic Select Statement:
SELECT
ROW_NUMBER() OVER (ORDER BY VIN) AS RN,
*
FROM dbo.RawData
and I get such result:
RN VIN
6301 JTEBR3FJ00K096082
6302 JTEBR3FJ00K096132
6303 JTEBR3FJ00K096146
6304 JTEBR3FJ00K096163
6305 JTEBR3FJ00K096180
6306 JTEBR3FJ00K096275
1801 5TDDZRFHX0S820530
1802 5TDDZRFHX0S824111
1803 5TDDZRFHX0S824500
1804 5TDDZRFHX0S825971
1805 5TDDZRFHX0S826456
and those are the first columns in the return table. The whole ROW_NUMBER function works randomly, after chain from 6301 to 6306, the chain from 1801 to 1940 starts etc.
The VIN column (the one I sort data based on) is set to nvarchar(17)
could you please help with solving the issue which might occur in this case?
I would be grateful for any tips what might be wrong

You can use ORDER BY to order the rows in a desired way:
SELECT ROW_NUMBER() OVER (ORDER BY VIN) AS RN
,*
FROM dbo.RawData
ORDER BY RN;
As the row_number is calculated in the SELECTE, you can use its value in the ORDER BY clause without the need of nested query.

Related

How to select unique records from a result in oracle SQL?

I am running a SQL query on oracle database.
SELECT DISTINCT flow_id , COMPOSITE_NAME FROM CUBE_INSTANCE where flow_id IN(200148,
200162);
I am getting below results as follow.
200162 ABCWS1
200148 ABCWS3
200162 ABCWS2
200148 OutputLog
200162 OutputLog
In this result 200162 came thrice as composite Name is different in each result. But my requirement is to get only one row of 200162 which is 1st one. If result contains same flow_id multiple times then it should only display result of first flow_id and ignore whatever it has in 2nd and 3rd.
EXPECTED OUTPUT -
200162 ABCWS1
200148 ABCWS3
Could you please help me with modification of query?
Thank you in advance !!!
It appears that you want to take the lexicographically first composite name for each flow_id:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY flow_id ORDER BY COMPOSITE_NAME) rn
FROM CUBE_INSTANCE t
WHERE flow_id IN (200148, 200162)
)
SELECT flow_id, COMPOSITE_NAME
FROM cte
WHERE rn = 1;
There is no such thing as a "first" row, unless a column specifies that information.
But you can easily use aggregation for this purpose:
select ci.flow_id, min(ci.composite_name)
from cube_instance ci
where flow_id in (200148, 200162);
group by ci.flow_id
If you do have a column that specifies the ordering, you can still use aggregation. The equivalent of the "first" function in Oracle is:
select ci.flow_id,
min(ci.composite_name) keep (dense_rank first order by <ordering col>)
from cube_instance ci
where flow_id in (200148, 200162);
group by ci.flow_id

Sql Islands and Gaps Merge Contiguous records if relevant fields hold same values

I have created a test case here for my problem https://rextester.com/ZRXSQ14415
Its must each easier to show the problem to explain what I am trying to achieve.
I have a list of records across time and I wish to merge contiguous records into a single record.
Each record has a period Date, Risk Levels and a couple of flags. When these risks and flags are the same the records should be merged when they are different then they should be a separate row.
On the Rextester example, i have almost achieved my goal, however look at rows 3 + 4 of the result.
What I want to achieve is that rows 3 + 4 would be combined such that row 3
StartDate End Date Name ... ...
17.03.2019 20.03.2019 CPWJ40-A ... ...
As all flags and risk levels are the same.
Change the SEQ expression to
..
ROW_NUMBER() OVER (ORDER BY PeriodDate) - ROW_NUMBER() OVER (Partition BY ImplicitRisk,QCReadyRisk,IsQualityControlReady, ActivePeriod ORDER BY PeriodDate) AS SEQ
..
This way you'll get the proper grouping of islands of ImplicitRisk,QCReadyRisk,IsQualityControlReady, ActivePeriod.
This answer is purely to complement Serg answer with the full query.
SELECT MIN(d.PeriodDate) AS StartDate,
MAX(d.PeriodDate) AS EndDate,
ImplicitRisk,
QcReadyRisk,
IsQualityControlReady,
ActivePeriod,
LocationEventName
FROM
(
SELECT c.*,
ROW_NUMBER() OVER (ORDER BY PeriodDate) - ROW_NUMBER() OVER (Partition BY LocationEventId, ImplicitRisk, QCReadyRisk, IsQualityControlReady, ActivePeriod ORDER BY PeriodDate) AS grp
FROM tab c
--order by PeriodDate
) d
group by ImplicitRisk, QcReadyRisk, IsQualityControlReady, ActivePeriod, LocationEventName, grp
order by 1

How to track iterations of a value in sql

I am trying to track usage of a blade in a manufacturing process using SSMS 2017. The blade is loaded and used on product until it is seen to dull and then taken out for sharpening while another blade replaces it. We have 30 blades that are cycled for use and sharpening.
Using a table that provides product lot number (sequential) and blade name I would like to separate each batch use of the blade name into groups.
My sql skills are pretty basic so I've been trying row_number, rank, and some attempts at utilizing the lead/lag functions. So far this has only enabled me to break down each product into order based on blade name and identify the product on which a blade change is made. I feel like that could be useful but I'm having trouble figuring out exactly how to do it.
I would like to be able to assign each group of product manufactured with an iteration of a blade a identifying number. For example:
LotNo BladeID Iteration
418211 BH40 1
418212 BH40 1
418213 BH40 1
418214 ES11 2
418215 ES11 2
418216 BH40 3
I'm currently able to produce these incorrect results:
Using:
SELECT b.LotNo,
b.BladeID,
ROW_NUMBER() OVER (PARTITION BY b.BladeID ORDER BY b.BladeID)
FROM blades AS b
ORDER BY b.LotNo ASC;
I get:
LotNo BladeID Iteration
418211 BH40 1
418212 BH40 2
418213 BH40 3
418214 ES11 1
418215 ES11 2
418216 BH40 4
Here's a possible solution to your problem. It first creates a series of groups to identify when the number must change. Then it gets an order to assign the correct value to each group. And finally, it assigns the value for the iteration. I'm including the sample data in a consumable way so anyone can use it for testing purposes.
CREATE TABLE #Sample(
LotNo int,
BladeID varchar(10),
Iteration int
);
INSERT INTO #Sample
VALUES
(418211, 'BH40', 1),
(418212, 'BH40', 1),
(418213, 'BH40', 1),
(418214, 'ES11', 2),
(418215, 'ES11', 2),
(418216, 'BH40', 3);
GO
WITH cteGroups AS(
SELECT *,
ROW_NUMBER() OVER(ORDER BY LotNo) - ROW_NUMBER() OVER(PARTITION BY BladeID ORDER BY LotNo) AS island
FROM #Sample
),
cteOrdering AS(
SELECT *, MIN( LotNo) OVER( PARTITION BY island, BladeID) AS OrderCol
FROM cteGroups
)
SELECT LotNo,
BladeID,
Iteration,
DENSE_RANK() OVER( ORDER BY OrderCol) AS IterationCalc
FROM cteOrdering;
You can do this with lag() and a cumulative sum:
select s.*,
sum(case when prev_BladeID = BladeId then 0 else 1 end) over (order by LotNo) as Iteration
from (select s.*,
lag(s.BladeID) over (order by s.LotNo) as prev_BladeID
from #sample s
) s;
In addition to being simpler code than the difference of row numbers, I think this is also simpler conceptually. This is simply counting the number of times that the BladeID changes from one lot to the next.

Split the results of a query in half

I'm trying to export rows from one database to Excel and I'm limited to 65000 rows at a shot. That tells me I'm working with an Access database but I'm not sure since this is a 3rd party application (MRI Netsource) with limited query ability. I've tried the options posted at this solution (Is there a way to split the results of a select query into two equal halfs?) but neither of them work -- in fact, they end up duplicating results rather than cutting them in half.
One possibly related issue is that this table does not have a unique ID field. Each record's unique ID can be dynamically formed by the concatenation of several text fields.
This produces 91934 results:
SELECT * from note
This produces 122731 results:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY notedate) AS rn FROM note
) T1
WHERE rn % 2 = 1
EDIT: Likewise, this produces 91934 results, half of them with a tile_nr value of 1, the other half with a value of 2:
SELECT *, NTILE(2) OVER (ORDER BY notedate) AS tile_nr FROM note
However this produces 122778 results, all of which have a tile_nr value of 1:
SELECT bldgid, leasid, notedate, ref1, ref2, tile_nr
FROM (
SELECT *, NTILE(2) OVER (ORDER BY notedate) AS tile_nr FROM note) x
WHERE x.tile_nr = 1
I know that I could just use a COUNT to get the exact number of records, run one query using TOP 65000 ORDER BY notedate, and then another that says TOP 26934 ORDER BY notedate DESC, for example, but as this dataset changes a lot I'd prefer some way to automate this to save time.

What's wrong with this Oracle query?

Below is a query generated by the PetaPoco ORM for .NET. I don't have an Oracle client right now to debug it and I can't see anything obviously wrong (but I'm a SQL Server guy). Can anyone tell me why it is producing this error:
Oracle.DataAccess.Client.OracleException ORA-00923: FROM keyword not found where expected
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) peta_rn,
"ON_CUST_MAS"."CU_NO",
"ON_CUST_MAS"."CU_NAME",
"ON_CUST_MAS"."CU_TYPE",
"ON_CUST_MAS"."CONTACT",
"ON_CUST_MAS"."ADD1_SH",
"ON_CUST_MAS"."ADD2_SH",
"ON_CUST_MAS"."CITY_SH",
"ON_CUST_MAS"."POST_CODE",
"ON_CUST_MAS"."PROV_SH",
"ON_CUST_MAS"."COUNTRY",
"ON_CUST_MAS"."PHONE_NU",
"ON_CUST_MAS"."FAX_NU",
"ON_CUST_MAS"."EMAIL",
"ON_CUST_MAS"."PU_ORDER_FL",
"ON_CUST_MAS"."CREDIT_AMOUNT"
FROM "ON_CUST_MAS" ) peta_paged
WHERE peta_rn>0 AND peta_rn<=20
Edit: Just in case it helps, this is a paging query. Regular queries (select all, select by ID) are working fine.
The problem is that the SELECT NULL in the ORDER BY clause of your analytic function is syntactically incorrect.
over (ORDER BY (SELECT NULL))
could be rewritten
(ORDER BY (SELECT NULL from dual))
or more simply
(ORDER BY null)
Of course, it doesn't really make sense to get a row_number if you aren't ordering the results by anything. There is no reason to expect that the set of rows that are returned would be consistent-- you could get any set of 20 rows arbitrarily. And if you go to the second page of results, there is no reason to expect that the second page of results would be completely different than the first page or that any particular result would appear on any page if you page through the entire result set.
There should be and order defined within ORDER BY clause. For example, lets say your elements are displayed in order of column "on_cust_mas"."cu_no", than your query should look like:
SELECT *
FROM (SELECT Row_number()
over (
ORDER BY ("on_cust_mas"."cu_no")) peta_rn,
"on_cust_mas"."cu_no",
"on_cust_mas"."cu_name",
"on_cust_mas"."cu_type",
"on_cust_mas"."contact",
"on_cust_mas"."add1_sh",
"on_cust_mas"."add2_sh",
"on_cust_mas"."city_sh",
"on_cust_mas"."post_code",
"on_cust_mas"."prov_sh",
"on_cust_mas"."country",
"on_cust_mas"."phone_nu",
"on_cust_mas"."fax_nu",
"on_cust_mas"."email",
"on_cust_mas"."pu_order_fl",
"on_cust_mas"."credit_amount"
FROM "on_cust_mas") peta_paged
WHERE peta_rn > 0
AND peta_rn <= 20
If this is a different column that sets the order just switch it within ORDER BY clause. In fact there should be any order defined, otherwise it's not guaranteed that it won't change, and you cant be sure what will be displayed at any page.