ROW_NUMBER() doesnt count right on timestamp - sql

I always thought ROW_NUMBER() counts every row +1, but with my timestamp data it doesnt work.
ID TIME
1 2017-05-29 21:08:51.393401
1 2017-05-29 21:08:51.393401
1 2017-01-03 09:37:31.30511
1 2017-01-03 09:37:31.30511
...
WITH CTE AS( select ID,TIME, ROW_NUMER() OVER (PARTITION BY ID ORDER
BY TIME) AS TEST from XY )
RESULT
ID TIME TEST
1 2017-05-29 21:08:51.393401 1
1 2017-05-29 21:08:51.393401 1
1 2017-01-03 09:37:31.30511 2
1 2017-01-03 09:37:31.30511 2
...
The desired result should be 1, 2, 3, 4 and so on...
Edit: to solve the problem, select distinct.
But perhaps someone can reproduce the fact on a Netezza and confirm, that it´s not working as it should.

This looks like a bug in Netezza. The result you are getting looks like DENSE_RANK rather than ROW_NUMBER.
You should be able to circumvent the bug by extending the ORDER BY clause with a random number, so the DBMS picks one row arbirarily on a tie on time, as it is supposed to do.
WITH CTE AS
(
SELECT id, time, ROW_NUMER() OVER (PARTITION BY id ORDER BY time, RANDOM()) AS TEST
FROM xy
)
SELECT * FROM cte
ORDER BY id, test;

try like below by removing partition by id
WITH CTE AS (
select ID,TIME, ROW_NUMER() OVER (ORDER BY TIME) AS TEST from XY
) select * from cte

I find it hard to believe that this is a bug in Netezza. That is possible, but I would first explore whether the ids are really the same.
For instance, if id is a string and ends in a space, then this will return "1":
with t as (
select '1' as x union all
select '1 '
)
select *, row_number() over (partition by x order by x)
from t;
There are other reasons why values might look the same.
If id is an integer (or numeric), then what-you-see-is-what-you-get, so that would suggest a bug.

I found the solution for this issue, Add Partition on time by converting it to varchar datatype, try below
WITH CTE AS( select ID,TIME, ROW_NUMER() OVER (PARTITION BY ID, TO_VARCHAR(TIME) ORDER
BY TIME) AS TEST from XY ) e,

Related

sql server lag function where condition

In the database, I have +1000,+2000,+3000..... increasing values according to the previous value. These sometimes do not increase, but decrease, I wrote a listing query to find this out.
select NUMBER-lag(NUMBER) over (ORDER BY DATE_TIME) AS 'DIFF'
from exampleTable with(nolock)
WHERE CONDITION1='abcdef' AND DATE_TIME >='20220801'
This works and I export to excel and filter and find the ones less than 0, but should I add them directly to the where part in sql?
I tried HAVING because it is a non-normal field, and it didn't work either.
AND (NUMBER-lag(NUMBER) over (ORDER BY DATE_TIME))<0
ORDER BY DATE_TIME ASC
So basically it is like this,
;WITH CTE AS (
select NUMBER-lag(NUMBER) over (ORDER BY DATE_TIME) AS'DIFF' from exampleTable with(nolock)
WHERE CONDITION1='abcdef' AND DATE_TIME >='20220801'
)
SELECT * FROM CTE WHERE DIFF <0

Get a single value where the latest date

I need to get the latest price of an item (as part of a larger select statement) and I can't quite figure it out.
Table:
ITEMID DATE SALEPRICE
1 1/1/2014 10
1 2/2/2014 20
2 3/3/2014 15
2 4/4/2014 13
I need the output of the select to be '20' when looking for item 1 and '13' when looking for item 2 as per the above example.
I am using Oracle SQL
The most readable/understandable SQL (in my opinion) would be this:
select salesprice from `table` t
where t.date =
(
select max(date) from `table` t2 where t2.itemid = t.itemid
)
and t.itemid = 1 -- change item id here;
assuming your table's name is table and you only have one price per day and item (else the where condition would match more than one row per item). Alternatively, the subselect could be written as a self-join (should not make a difference in performance).
I'm not sure about the OVER/PARTITION used by the other answers. Maybe they could be optimized to better performance depending on the DBMS.
Maybe something like this:
Test data
DECLARE #tbl TABLE(ITEMID int,DATE DATETIME,SALEPRICE INT)
INSERT INTO #tbl
VALUES
(1,'1/1/2014',10),
(1,'2/2/2014',20),
(2,'3/3/2014',15),
(2,'4/4/2014',13)
Query
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY ITEMID ORDER BY [DATE] DESC) AS rowNbr,
tbl.*
FROM
#tbl AS tbl
)
SELECT
*
FROM
CTE
WHERE CTE.rowNbr=1
Try this!
In sql-server may also work in Oracle sql
select * from
(
select *,rn=row_number()over(partition by ITEMID order by DATE desc) from table
)x
where x.rn=1
You need Row_number() to allocate a number to all records which is partition by ITEMID so each group will get a RN,then as you are ordering by date desc to get Latest record
SEE DEMO

Calculating time between entries in sql

Guys i have a table that has a column named time. It capture the time of each record entry in to the database. I want to query and return another column displaying the duration between one entry and the entry before it. Example, if i store record for john today at 12:00 pm, and then Ali at 1:10 pm, i want another column that will show 01:10:00 (i.e HH:MM:SS).
I understand i can query each column number as follows.
SELECT ROW_NUMBER() OVER (ORDER BY [followuptime]) from [dbo].[FollowUp] .
i wanted to query the max row number AS follows but it fails and return error "windowed...."
SELECT MAX(ROW_NUMBER() OVER (ORDER BY [followuptime])) from [dbo].[FollowUp] .
I wanted to use the DATEDIFF(interval,start_time,end_time); function of sql , but as it is now, I am stuck. Please would appreciate your help or any alternative.
Since SQL-Server 2008R2 does not support LAG/LEAD you will need to do a self join using row_number to get the time from previous row:
WITH OrderedResults AS
( SELECT [id],
[followuptime],
[remark],
RowNumber = ROW_NUMBER() OVER (ORDER BY [followuptime])
FROM [dbo].[FollowUp]
)
SELECT a.ID,
a.FollowUpTime,
a.Remark,
PreviousTime = b.FollowUpTime,
MinutesDifference = DATEDIFF(MINUTE, b.FollowUpTime, a.FollowUpTime)
FROM OrderedResults a
LEFT JOIN OrderedResults b
ON b.RowNumber = a.RowNumber - 1
ORDER BY a.FollowUpTime;
Example on SQL Fiddle
You may not apply MAX to ROW_NUMBER. Use a CTE and query that.
;WITH MyCTE AS
(
SELECT ROW_NUMBER() OVER (ORDER BY [followuptime]) AS RowNum
FROM [dbo].[FollowUp]
)
SELECT MAX(RowNum)
FROM MyCTE

Query to Return Top Items for Each Distinct Column Value

If I have a table with the following fields
ID, SomeFK, SomeTime
How would I write a query return the latest/top 3 items (based on SomeTime) for each SomeFK.
So, the result might look like
SomeFK Sometime
0 2012-07-05
0 2012-07-04
0 2012-07-03
1 2012-07-03
1 2012-07-02
1 2012-07-01
2 2012-07-03
2 2012-07-02
2 2012-07-01
....etc....
Returning the latest items for a particular SomeFK is easy, but i just can't think how to do it for the above. I also feel it should be dead simple!
EDIT:
Apologies, I missed a key bit of information. this is for SQL2000, so ROW_NUMBER() can't be used!
SELECT SomeFk, SomeTime
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY SomeFK ORDER BY sometime desc) rn
FROM yourtable
) v
WHERE rn<=3
ORDER BY somefk, rn
For SQL 2000, I recommend upgrading to a supported platform.
But if you insist.
select *
from yourtable t1
where
(select COUNT(*)
from yourtable
where somefk = t1.somefk
and sometime>=t1.sometime
) <=3
I think I understood you correctly, can you not select the max SomeTime and then group, like this:
select SomeFK, max(SomeTime)
from Table
group by SomeFK
I could be off the mark here, as I'm not entirely sure what you mean by latest.
Based on the this link (supplied as a comment to the original question). One soltion is:
SELECT DISTINCT ID, SomeFK, SomeTime
FROM SomeTable t1
WHERE ID IN (SELECT TOP 3 ID
FROM SomeTable t2
WHERE t2.SomeFK= t1.SomeFK
ORDER BY SomeTime DESC)
ORDER BY SomeFK, SomeTime DESC
Although I've prefer the accepted solution now.

Group by every N records in T-SQL

I have some performance test results on the database, and what I want to do is to group every 1000 records (previously sorted in ascending order by date) and then aggregate results with AVG.
I'm actually looking for a standard SQL solution, however any T-SQL specific results are also appreciated.
The query looks like this:
SELECT TestId,Throughput FROM dbo.Results ORDER BY id
WITH T AS (
SELECT RANK() OVER (ORDER BY ID) Rank,
P.Field1, P.Field2, P.Value1, ...
FROM P
)
SELECT (Rank - 1) / 1000 GroupID, AVG(...)
FROM T
GROUP BY ((Rank - 1) / 1000)
;
Something like that should get you started. If you can provide your actual schema I can update as appropriate.
Give the answer to Yuck. I only post as an answer so I could include a code block. I did a count test to see if it was grouping by 1000 and the first set was 999. This produced set sizes of 1,000. Great query Yuck.
WITH T AS (
SELECT RANK() OVER (ORDER BY sID) Rank, sID
FROM docSVsys
)
SELECT (Rank-1) / 1000 GroupID, count(sID)
FROM T
GROUP BY ((Rank-1) / 1000)
order by GroupID
I +1'd #Yuck, because I think that is a good answer. But it's worth mentioning NTILE().
Reason being, if you have 10,010 records (for example), then you'll have 11 groupings -- the first 10 with 1000 in them, and the last with just 10.
If you're comparing averages between each group of 1000, then you should either discard the last group as it's not a representative group, or...you could make all the groups the same size.
NTILE() would make all groups the same size; the only caveat is that you'd need to know how many groups you wanted.
So if your table had 25,250 records, you'd use NTILE(25), and your groupings would be approximately 1000 in size -- they'd actually be 1010 in size; the benefit being, they'd all be the same size, which might make them more relevant to each other in terms of whatever comparison analysis you're doing.
You could get your group-size simply by
DECLARE #ntile int
SET #ntile = (SELECT count(1) from myTable) / 1000
And then modifying #Yuck's approach with the NTILE() substitution:
;WITH myCTE AS (
SELECT NTILE(#ntile) OVER (ORDER BY id) myGroup,
col1, col2, ...
FROM dbo.myTable
)
SELECT myGroup, col1, col2...
FROM myCTE
GROUP BY (myGroup), col1, col2...
;
Answer above does not actually assign a unique group id to each 1000 records. Adding Floor() is needed. The following will return all records from your table, with a unique GroupID for each 1000 rows:
WITH T AS (
SELECT RANK() OVER (ORDER BY your_field) Rank,
your_field
FROM your_table
WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
FROM T
And for my needs, I wanted my GroupID to be a random set of characters, so I changed the Floor(...) GroupID to:
TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 10) AS STRING),'seed1'))) GroupID
without the seed value, you and I would get the exact same output because we're just doing a SHA256 on the number 1, 2, 3 etc. But adding the seed makes the output unique, but still repeatable.
This is BigQuery syntax. T-SQL might be slightly different.
Lastly, if you want to leave off the last chunk that is not a full 1000, you can find it by doing:
WITH T AS (
SELECT RANK() OVER (ORDER BY your_field) Rank,
your_field
FROM your_table
WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
, COUNT(*) OVER(PARTITION BY TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 1000) AS STRING),'seed1')))) AS CountInGroup
FROM T
ORDER BY CountInGroup
You can also use Row_Number() instead of rank. No Floor required.
declare #groupsize int = 50
;with ct1 as ( select YourColumn, RowID = Row_Number() over(order by YourColumn)
from YourTable
)
select YourColumn, RowID, GroupID = (RowID-1)/#GroupSize + 1
from ct1
I read more about NTILE after reading #user15481328 answer
(resource: https://www.sqlservertutorial.net/sql-server-window-functions/sql-server-ntile-function/ )
and this solution allowed me to find the max date within each of the 25 groups of my data set:
with cte as (
select date,
NTILE(25) OVER ( order by date ) bucket_num
from mybigdataset
)
select max(date), bucket_num
from cte
group by bucket_num
order by bucket_num