sql row difference without cursor - sql

I'm trying to get date gap(in days) between rows.
For example my data is ordered by saleDate and looks like the bellow:
ID | saleDate ID | gapInDays
10 | 1/1/2014 10 | 4 -- (5/1/2014 - 1/1/2014).Days
20 | 5/1/2014 20 | 2
30 | 7/1/2014 ====>>> 30 | 3
40 | 10/1/2014 40 | 7
50 | 17/1/2014 50 | 1 -- last row will always be 1
doing it in code is not a big deal but because the amount of row is huge (few millions) I'm trying to do so in SP level. I assume I can use cursor but i understood it is very slow.
Any solution will be highly appreciated.
Pini.

If you are using SQL SERVER 2012/Oracle/Postgres/DB2, then you have LEAD(), LAG() Functions.
select ID,saleDate,LEAD(saleDate) over (order by saleDate) DateOfNextRow
,Isnull(Datediff(dd,saleDate,LEAD(saleDate) over (order by saleDate)),1) as gapInDays
from Order
For SQL SERVER 2005/2008, you can use Window Functions like ROW_NUMBER().

If you are using MS SQL Server 2012 (or another database that supports the same, or similar functions) you can use the LAG() function to access previous rows (or LEAD() to access subsequent rows)
Apparently you want this to work on SQL Azure that lacks theLAGandLEADwindowing functions.
One solution that should work is to use theROW_NUMBERranking function applied over the date column. Azure supports theROW_NUMBERso this code should work:
select t1.id, isnull(datediff(day, t1.saledate, t2.saledate), 1) as gapInDays
from
(select id, saledate, rn = row_number() over (order by saledate, id) from gaps) t1
left join
(select id, saledate, rn = row_number() over (order by saledate, id) from gaps) t2
on t1.rn = t2.rn-1
If you want it slightly more compact (and if Azure supports ctes which I believe it does) you can do it as a common table expression:
;with c as (
select id, saledate, r = row_number() over (order by saledate, id) from gaps
)
select c.id, isnull(datediff(day, c.saledate, c2.saledate), 1) as gapInDays
from c left join c c2 on c.r = c2.rn-1
In these queries I ordered the rows by saledate, if that is incorrect you might have to change it to order by id, saledate instead if it is the id that determines order.

If your ids are strictly sequential you could do something like this
select
a.id, b.saleDate - a.saleDate
from
yourTable as a, yourTable as b
where
a.id = b.id-1

If the database is SQL Server then following query should work.
WITH Sales AS
(
SELECT
*, ROW_NUMBER() OVER (ORDER BY SaleDate) AS RowNumber
FROM
TableName
)
SELECT
DATEDIFF(DAY, T1.SaleDate, T2.SaleDate)
FROM
Sales AS T1 INNER JOIN Sales AS T2
ON T1.RowNumber = T2.RowNumber - 1;

Related

PARTITION BY to consider only two specific columns for aggregation?

My table has the following data:
REF_NO
PRD_GRP
ACC_NO
ABC
12
1234
ABC
9C
1234
DEF
AB
7890
DEF
TY
9891
I'm trying to build a query that summarises the number of accounts per customer - the product group is irrelevant for this purpose so my expected result is:
REF_NO
PRD_GRP
ACC_NO
NO_OF_ACC
ABC
12
1234
1
ABC
9C
1234
1
DEF
AB
7890
2
DEF
TY
9891
2
I tried doing this using a window function:
SELECT
T.REF_NO,
T.PRD_GRP,
T.ACC_NO,
COUNT(T.ACC_NO) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC
FROM TABLE T
However, the NUM_OF_ACC value returned is 2 and not 1 in the above example for the first customer (ABC). It seems that the query is simply counting the number of unique rows for each customer, rather than identifying the number of accounts as desired.
How can I fix this error?
Link to Fiddle - https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=83344cbe95fb46d4a1640caf0bb6d0b2
You need COUNT(DISTINCT, which is unfortunately not supported by SQL Server as a window function.
But you can simulate it with DENSE_RANK and MAX
SELECT
T.REF_NO,
T.PRD_GRP,
T.ACC_NO,
MAX(T.rn) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC
FROM (
SELECT *,
DENSE_RANK() OVER (PARTITION BY T.REF_NO ORDER BY T.ACC_NO) AS rn
FROM [TABLE] T
) T;
DENSE_RANK will count up rows ordered by ACC_NO, but ignoring ties, therefore the MAX of that will be the number of distinct values.
db<>fiddle.uk
What you need is COUNT(DISTINCT T.ACC_NO) which is unfortunately not supported in window functions. Therefore you have to write a sub-query to allow you to use COUNT(DISTINCT T.ACC_NO) without a window function.
SELECT
T.REF_NO,
T.PRD_GRP,
T.ACC_NO,
-- Use of DISTINCT is not allowed with the OVER clause.
-- COUNT(DISTINCT T.ACC_NO) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC,
(
SELECT COUNT(DISTINCT T1.ACC_NO)
FROM TEST_DATA T1
WHERE T1.REF_NO = T.REF_NO
) AS NUM_OF_ACC
FROM TEST_DATA T
The simplest way to implement count(distinct) as a window functions is by summing two dense_ranks():
SELECT T.REF_NO, T.PRD_GRP, T.ACC_NO,
(-1 +
DENSE_RANK() OVER (PARTITION BY t.REF_NO ORDER BY T.ACC_NO ASC) +
DENSE_RANK() OVER (PARTITION BY t.REF_NO ORDER BY T.ACC_NO DESC)
) as cnt_distinct
FROM TABLE T

How to to get maximum sequence number in SQL

This is the data I have in my table. What I want is maximum sequence number for each order number.
Order No seq Sta
--------------------
32100 1 rd
32100 3 rd
23600 1 rd
23600 6 rd
I want to get the following result without using cursor.
Output:
Order No seq Sta
-----------------
32100 3 rd
23600 6 rd
If you want entire records you could use ROW_NUMBER:
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY Order ORDER BY No_Seq DESC) AS rn
FROM tab) s
WHERE rn = 1;
DBFiddle Demo
Please do not use keywords like Order and spaces in column names.
The most simple solution is using group by with max.
Give this a try:
select [Order No], max(seq), Sta
from myTable
group by [Order No]
Just use group by order no and order by sequence desc and you will get your record.
If you are using Oracle Database then you can use ROW_NUMBER() analytical function to achieve this result
Try the below query:
select
*
from
(select
ROW_NUMBER() OVER (PARTITION BY order_no ORDER BY seq DESC) as "ROW_NUM",
order_no, seq, sta
from
Order_Details) temp
where
temp.row_num = 1 ;
Demo
The following is probably the most efficient solution in most databases (with the right index):
select t.*
from t
where t.seq = (select max(t2.seq) from t t2 where t2.orderno = t.orderno);
You can also do this with group by:
select orderno, max(seq), sta
from t
group by orderno, sta;
Note that all columns referenced in the select are either group by keys or arguments to aggregation functions. This is proper SQL.

Alternative to using ROW_NUMBER for better performance

I have a small query below where it outputs a row number under the RowNumber column based on partitioning the 'LegKey' column and ordering by UpdateID desc. This is so the latest updated row (UpdateID) per legkey is always number 1
SELECT *
, ROW_NUMBER() OVER(PARTITION BY LegKey ORDER BY UpdateID DESC) AS RowNumber
FROM Data.Crew
Data outputted:
UpdateID LegKey OriginalSourceTableID UpdateReceived RowNumber
7359 6641 11 2016-08-22 16:35:27.487 1
7121 6641 11 2016-08-15 00:00:47.220 2
8175 6642 11 2016-08-22 16:35:27.487 1
7122 6642 11 2016-08-15 00:00:47.220 2
8613 6643 11 2016-08-22 16:35:27.487 1
7123 6643 11 2016-08-15 00:00:47.220 2
The problem I have with this method is that I am getting slow performance because I assume I am using the ORDER BY.
My question is that is there an alternative way to produce a similar result but have my query run faster? I am thinking a MAX() may work but I didn't get the same output as before. Maybe I did the MAX() statement incorrectly so was wondering if this is a good alternative if somebody can provide an example on how they would write the MAX() statement for this example?
Thank you
Presumably this is the query you want to optimize:
SELECT c.*
FROM (SELECT c.*,
ROW_NUMBER() OVER (PARTITION BY LegKey ORDER BY UpdateID DESC) AS RowNumber
FROM Data.Crew c
) c
WHERE RowNumber = 1;
Try an index on Crew(LegKey, UpdateId).
This index will also be used if you do:
SELECT c.*
FROM Data.Crew c
WHERE c.UpdateId = (SELECT MAX(c2.UpdateId)
FROM Data.Crew c2
WHERE c2.LegKey = c.LegKey
);
You can try one of the following:
declare #Table table(UpdateID int, LegKey int, OriginalSourceTableID int, UpdateReceived datetime)
Here using the MAX Date in subquery.
select * from #Table as a where a.UpdateReceived = (Select MAX(UpdateReceived) from #Table as b Where b.LegKey = a.LegKey)
Here you can use it in cte with group by.
with MaxDate as( Select LegKey, Max(UpdateReceived) as MaxDate from #Table group by LegKey )
select * from MaxDate as a
inner join #Table as b
on b.LegKey=a.LegKey
and b.UpdateReceived=a.MaxDate

SQL Server query distinct

I'm trying to do a query in SQL Server 2008. This is the structure of my table
Recno ClientNo ServiceNo
---------------------------------
1234 17 27
2345 19 34
3456 20 33
4567 17 34
I'm trying to select RecNo however, filtering by distinct ClientNo, so for some clients such as client no 17 - they have more than 1 entry, I'm trying to count that client only once. So basically, looking at this table, I'm only supposed to see 3 RecNo's, since there are only 3 distinct clients. Please help
Select RecNo, Count(ClientNo)
from TblA
where Count(clientNo)<2
Something like this?
EDIT:
The value of RecNo is not relevant, I only need to have an accurate number of records. In this case, I'd like to have 3 records.
oaky you are getting some crazy answers probably becuase your desired result is not clear so I suggest if some of these are not what you need that you clarify your desired result.
If you want the answer 3, I can only assume you want a count of DISTINCT ClientNo's if so it is simply aggregation.
SELECT COUNT(DISTINCT ClientNo) as ClientNoDistinctCount
FROM
TblA
GROUP BY
ClientNo
Ok, this will give you the count that you want:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY ClientNo ORDER BY Recno)
FROM TblA
)
SELECT COUNT(DISTINCT Recno) N
FROM CTE
WHERE RN = 1;
Try this..
;with cte1
As(SELECT Recno,clientno
,row_number() over(partition by clientno order by Recno )RNO FROM TblA)
Select Recno,clientno
From cre1 where RNO=1
Choose only ClientNo having the max Recno (or replace < with > to choose the min one).
Select *
from TblA t1
where not exists(select 1
from TblA t2
where t1.ClientNo = t2.ClientNo and t1.Recno < t2.Recno )
BTW, the other solution already mentioned, utilizing row_number() needs no CTE in this case
SELECT TOP(1) WITH TIES *
FROM TblA
ORDER BY ROW_NUMBER() OVER(PARTITION BY ClientNo ORDER BY Recno)

Tricky SQL SELECT Statement

I have a performance issue when selecting data in my project.
There is a table with 3 columns: "id","time" and "group"
The ids are just unique ids as usual.
The time is the creation date of the entry.
The group is there to cummulate certain entries together.
So the table data may look like this:
ID | TIME | GROUP
------------------------
1 | 20090805 | A
2 | 20090804 | A
3 | 20090804 | B
4 | 20090805 | B
5 | 20090803 | A
6 | 20090802 | B
...and so on.
The task is now to select the "current" entries (their ids) in each group for a given date. That is, for each group find the most recent entry for a given date.
Following preconditions apply:
I do not know the different groups in advance - there may be many different ones changing over time
The selection date may lie "in between" the dates of the entries in the table. Then I have to find the closest one in each group. That is, TIME is less than the selection date but the maximum of those to which this rule applies in a group.
What I currently do is a multi-step process which I would like to change into single SELECT statement:
SELECT DISTINCT group FROM table to find the available groups
For each group found in 1), SELECT * FROM table WHERE time<selectionDate AND group=loop ORDER BY time DESC
Take the first row of each result found in 2)
Obviously this is not optimal.
So I would be very happy if some more experienced SQL expert could help me to find a solution to put these steps in a single statement.
Thank you!
The following will work on SQL Server 2005+ and Oracle 9i+:
WITH groups AS (
SELECT t.group,
MAX(t.time) 'maxtime'
FROM TABLE t
GROUP BY t.group)
SELECT t.id,
t.time,
t.group
FROM TABLE t
JOIN groups g ON g.group = t.group AND g.maxtime = t.time
Any database should support:
SELECT t.id,
t.time,
t.group
FROM TABLE t
JOIN (SELECT t.group,
MAX(t.time) 'maxtime'
FROM TABLE t
GROUP BY t.group) g ON g.group = t.group AND g.maxtime = t.time
Here's how I would do it in SQL Server:
SELECT * FROM table WHERE id in
(SELECT top 1 id FROM table WHERE time<selectionDate GROUP BY [group] ORDER BY [time])
The solution will vary by database server, since the syntax for TOP queries varies. Basically you are looking for a "top n per group" query, so you can Google that if you want.
Here is a solution in SQL Server. The following will return the top 10 players who hit the most home runs per year since 1990. The key is to calculate the "Home Run Rank" of each player for each year.
select
HRRanks.*
from
(
Select
b.yearID, b.PlayerID, sum(b.Hr) as TotalHR,
rank() over (partition by b.yearID order by sum(b.hr) desc) as HR_Rank
from
Batting b
where
b.yearID > 1990
group by
b.yearID, b.playerID
)
HRRanks
where
HRRanks.HR_Rank <= 10
Here is a solution in Oracle (Top Salespeople per Department)
SELECT deptno, avg_sal
FROM(
SELECT deptno, AVG(sal) avg_sal
GROUP BY deptno
ORDER BY AVG(sal) DESC
)
WHERE ROWNUM <= 10;
Or using analytic functions:
SELECT deptno, avg_sal
FROM (
SELECT deptno, avg_sal, RANK() OVER (ORDER BY sal DESC) rank
FROM
(
SELECT deptno, AVG(sal) avg_sal
FROM emp
GROUP BY deptno
)
)
WHERE rank <= 10;
Or same again, but using DENSE_RANK() instead of RANK()
select * from TABLE where (GROUP, TIME) in (
select GROUP, max(TIME) from things
where TIME >= 20090804
group by GROUP
)
Tested with MySQL (but I had to change the table and column names because they are keywords).
SELECT *
FROM TABB T1
QUALIFY ROW_NUMBER() OVER ( PARTITION BY GROUPP,TIMEE order by id desc )=1