Math with previous row in SQL, avoiding nested queries? - sql

I want to do some math on the previous rows in an SQL request in order to avoid doing it in my code.
I have a table representing the sales of two entities (the data represented here is doesn't make much sense and it's just an excerpt) :
YEAR ID SALES PURCHASE MARGIN
2009 1 10796820,57 2662369,19 8134451,38
2009 2 2472271,53 2066312,34 405959,19
2008 1 9641213,19 1223606,68 8417606,51
2008 2 3436363,86 2730035,19 706328,67
I want to know how the sales, purchase, margin... have evolved and compare one year to the previous one.
In short I want an SQL result with the evolutions pre-computed like this :
YEAR ID SALES SALES_EVOLUTION PURCHASE PURCHASE_EVOLUTION MARGIN MARGIN_EVOLUTION
2009 1 10796820,57 11,99 2662369,19 117,58 8134451,38 -3,36
2009 2 2472271,53 -28,06 2066312,34 -24,31 405959,19 -42,53
2008 1 9641213,19 1223606,68 8417606,51
2008 2 3436363,86 2730035,19 706328,67
I could do some ugly stuff :
SELECT *, YEAR, ID, SALES , (SALES/(SELECT SALES FROM TABLE WHERE YEAR = OUTER_TABLE.YEAR-1 AND ID = OUTER_TABLE.ID) -1)*100 as SALES_EVOLUTION (...)
FROM TABLE as OUTER_TABLE
ORDER BY YEAR DESC, ID ASC
But I have arround 20 fields for which I would have to do a nested query, meaning I would have a very huge and ugly query.
Is there a better way to do this, with less SQL ?

Using sql server (but this should work for almost any sql), with the table provided you can use a LEFT JOIN
DECLARE #Table TABLE(
[YEAR] INT,
ID INT,
SALES FLOAT,
PURCHASE FLOAT,
MARGIN FLOAT
)
INSERT INTO #Table ([YEAR],ID,SALES,PURCHASE,MARGIN) SELECT 2009,1,10796820.57,2662369.19,8134451.38
INSERT INTO #Table ([YEAR],ID,SALES,PURCHASE,MARGIN) SELECT 2009,2,2472271.53,2066312.34,405959.19
INSERT INTO #Table ([YEAR],ID,SALES,PURCHASE,MARGIN) SELECT 2008,1,9641213.19,1223606.68,8417606.51
INSERT INTO #Table ([YEAR],ID,SALES,PURCHASE,MARGIN) SELECT 2008,2,3436363.86,2730035.19,706328.67
SELECT cur.*,
((cur.Sales / prev.SALES) - 1) * 100
FROM #Table cur LEFT JOIN
#Table prev ON cur.ID = prev.ID AND cur.[YEAR] - 1 = prev.[YEAR]
The LEFT JOIN will allow you to still see values from 2008, where an INNER JOIN would not.

Old skool solution:
SELECT c.YEAR, c.ID, c.SALES, c.PURCHASE, c.MARGIN
, p.YEAR, p.ID, p.SALES, p.PURCHASE, p.MARGIN
FROM tab AS c -- current
INNER JOIN tab AS p -- previous
ON c.year = p.year - 1
AND c.id = p.id
If you have a db with analytical functions (MS SQL, Oracle) you can use the LEAD or LAG analytical functions, see http://www.oracle-base.com/articles/misc/LagLeadAnalyticFunctions.php
I think this would be the correct application:
SELECT c.YEAR, c.ID, c.SALES, c.PURCHASE, c.MARGIN
, LAG(c.YEAR, 1, 0) OVER (ORDER BY ID,YEAR)
, LAG(c.ID, 1, 0) OVER (ORDER BY ID,YEAR)
, LAG(c.SALES, 1, 0) OVER (ORDER BY ID,YEAR)
, LAG(c.PURCHASE, 1, 0) OVER (ORDER BY ID,YEAR)
, LAG(c.MARGIN, 1, 0) OVER (ORDER BY ID,YEAR)
FROM tab AS c -- current
(not really sure, haven't played with this enough)

You can do it like this:
SELECT t1.*, t1.YEAR, t1.ID, t1.SALES , ((t1.sales/t2.sales) -1) * 100 as SALES_EVOLUTION
(...)
FROM Table t1 JOIN Table t2 ON t1.Year = (t2.Year + 1) AND t1.Id = t2.Id
ORDER BY t1.YEAR DESC, t1.ID ASC
Now, if you want to compare more years, you'd have to do more joins, so it is a slightly ugly solution.

Related

SQL - Get the sum of several groups of records

DESIRED RESULT
Get the hours SUM of all [Hours] including only a single result from each [DevelopmentID] where [Revision] is highest value
e.g SUM 1, 2, 3, 5, 6 (Result should be 22.00)
I'm stuck trying to get the appropriate grouping.
DECLARE #CompanyID INT = 1
SELECT
SUM([s].[Hours]) AS [Hours]
FROM
[dbo].[tblDev] [d] WITH (NOLOCK)
JOIN
[dbo].[tblSpec] [s] WITH (NOLOCK) ON [d].[DevID] = [s].[DevID]
WHERE
[s].[Revision] = (
SELECT MAX([s2].[Revision]) FROM [tblSpec] [s2]
)
GROUP BY
[s].[Hours]
use row_number() to identify the latest revision
SELECT SUM([Hours])
FROM (
SELECT *, R = ROW_NUMBER() OVER (PARTITION BY d.DevID
ORDER BY s.Revision)
FROM [dbo].[tblDev] d
JOIN [dbo].[tblSpec] s
ON d.[DevID] = s.[DevID]
) d
WHERE R = 1
If you want one row per DevId, then that should be in the GROUP BY (and presumably in the SELECT as well):
SELECT s.DevId, SUM(s.Hours) as hours
FROM [dbo].[tblDev] d JOIN
[dbo].[tblSpec] s
ON [d].[DevID] = [s].[DevID]
WHERE s.Revision = (SELECT MAX(s2.Revision) FROM tblSpec s2)
GROUP BY s.DevId;
Also, don't use WITH NOLOCK unless you really know what you are doing -- and I'm guessing you do not. It is basically a license that says: "You can get me data even if it is not 100% accurate."
I would also dispense with all the square braces. They just make the query harder to write and to read.

How to recursively calculate yearly rollover in SQL?

I need to calculate yearly rollover for a system that keeps track of when people have used days off.
The rollover calculation itself is simple: [TOTALDAYSALLOWED] - [USED]
Provided that number is not higher than [MAXROLLOVER] (and > 0)
Where this gets complicated is the [TOTALDAYSALLOWED] column, which is [NUMDAYSALLOWED] combined with the previous year's rollover to get the total number of days that can be used in a current year.
I've tried several different ways of getting this calculation, but all of them have failed to account for the previous year's rollover being a part of the current year's allowed days.
Creating columns for the LAG of days used, joining the data to itself but shifted back a year, etc. I'm not including examples of code I've tried because the approach was wrong in all of the attempts. That would just make this long post even longer.
Here's the data I'm working with:
Here's how it should look after the calculation:
This is a per-person calculation, so there's no need to consider any personal ID here. DAYTYPE only has one value currently, but I want to include it in the calculation in case another is added. The [HOW] column is only for clarity in this post.
Here's some code to generate the sample data (SQL Server or Azure SQL):
IF OBJECT_ID('tempdb..#COUNTS') IS NOT NULL DROP TABLE #COUNTS
CREATE TABLE #COUNTS (USED INT, DAYTYPE VARCHAR(20), THEYEAR INT)
INSERT INTO #COUNTS (USED, DAYTYPE, THEYEAR)
SELECT 1, 'X', 2019
UNION
SELECT 3, 'X', 2020
UNION
SELECT 0, 'X', 2021
IF OBJECT_ID('tempdb..#ALLOWANCES') IS NOT NULL DROP TABLE #ALLOWANCES
CREATE TABLE #ALLOWANCES (THEYEAR INT, DAYTYPE VARCHAR(20), NUMDAYSALLOWED INT, MAXROLLOVER INT)
INSERT INTO #ALLOWANCES (THEYEAR, DAYTYPE, NUMDAYSALLOWED, MAXROLLOVER)
SELECT 2019, 'X', 3, 3
UNION
SELECT 2020, 'X', 3, 3
UNION
SELECT 2021, 'X', 3, 3
SELECT C.*, A.NUMDAYSALLOWED, A.MAXROLLOVER
FROM #COUNTS C
JOIN #ALLOWANCES A ON C.DAYTYPE = A.DAYTYPE AND C.THEYEAR = A.THEYEAR
The tricky part is to limit the rollover amount. This is maybe possible with window functions, but I think this is easier to do with a recursive query:
with
data as (
select c.*, a.numdaysallowed, a.maxrollover,
row_number() over(partition by c.daytype order by c.theyear) rn
from #counts c
inner join #allowances a on a.theyear = c.theyear and a.daytype = c.daytype
),
cte as (
select d.*,
numdaysallowed as totaldaysallowed,
numdaysallowed - used as actualrollover
from data d
where rn = 1
union all
select d.*,
d.numdaysallowed + c.actualrollover,
case when d.numdaysallowed + c.actualrollover - d.used > d.maxrollover
then 3
else d.numdaysallowed + c.actualrollover - d.used
end
from cte c
inner join data d on d.rn = c.rn + 1 and d.daytype = c.daytype
)
select * from cte order by theyear
Demo on DB Fiddle

SQL - select last and previous different to last

The problem: a simplified membership table containing membership id, starting date for each membership and membership level description:
CREATE TABLE cover
(
[membership_id] int,
[cover_from_date] date,
[description] varchar(57)
);
INSERT INTO cover ([membership_id], [cover_from_date], [description])
VALUES (1, '1/1/2011', 'AA'),
(1, '1/2/2011', 'BB'),
(1, '1/3/2011', 'CC'),
(1, '1/4/2011', 'CC');
The task: to list the current membership and the immediate previous membership different to the current one. So from the above table I would like to see something like:
1, 1/4/2011, CC, 1/2/2011, BB
The attempted solution: I have managed to come up with a solution but it takes an enormous time to run on a large database and I'm sure there are better ways of resolving this problem. My no-doubt over complicated query is as follows:
with cte as
(
select
cover.membership_id, cover.cover_from_date,
cover.description,
row_number() over (partition by cover.membership_id order by cover.cover_from_date desc) AS version_no
from
cover
)
select
cte.membership_id,
cover_now.cover_from_date, cover_now.description,
cover_prev.cover_from_date, cover_prev.description
from
cte
left outer join
cte cover_now on cte.membership_id = cover_now.membership_id
and cover_now.version_no = 1
left outer join
cte cover_prev on cte.membership_id = cover_prev.membership_id
and cover_prev.version_no = (select min(x.version_no)
from cte x
where x.version_no >= 2
and x.membership_id = cover_now.membership_id
and x.description <> cover_now.description)
group by
cte.membership_id, cover_now.cover_from_date, cover_now.description,
cover_prev.cover_from_date, cover_prev.description
The entire fiddle is located here. Any tips on how to optimise the query would be appreciated.
First create an index on membership_id and cover_from_date in descending order. It will be heavily used by this query.
create index cover_by_date on cover (membership_id asc, cover_from_date desc)
Then:
select
membership.membership_id,
membership.cover_from_date,
membership.description,
previous_membership.cover_from_date,
previous_membership.description
from
(
select membership_id, description, cover_from_date, row_number() over (partition by membership_id order by cover_from_date desc) as rank
from cover
) as membership
left join (
select previous.membership_id, previous.description, previous.cover_from_date, row_number() over (partition by previous.membership_id order by previous.cover_from_date desc) as rank
from cover
join cover as previous on
cover.membership_id = previous.membership_id and
cover.description <> previous.description and
cover.cover_from_date > previous.cover_from_date
) as previous_membership on
previous_membership.membership_id = membership.membership_id and
previous_membership.rank = 1
where
membership.rank = 1

How can I improve the native query for a table with 7 millions rows?

I have the below view(table) in my database(SQL SERVER).
I want to retrieve 2 things from this table.
The object which has the latest booking date for each Product number.
It will return the objects = {0001, 2, 2019-06-06 10:39:58} and {0003, 2, 2019-06-07 12:39:58}.
If all the step number has no booking date for a Product number, it wil return the object with Step number = 1. It will return the object = {0002, 1, NULL}.
The view has 7.000.000 rows. I must do it by using native query.
The first query that retrieves the product with the latest booking date:
SELECT DISTINCT *
FROM TABLE t
WHERE t.BOOKING_DATE = (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER)
The second query that retrieves the product with booking date NULL and Step number = 1;
SELECT DISTINCT *
FROM TABLE t
WHERE (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER) IS NULL AND t.STEP_NUMBER = 1
I tried using a single query, but it takes too long.
For now I use 2 query for getting this information but for the future I need to improve this. Do you have an alternative? I also can not use stored procedure, function inside SQL SERVER. I must do it with native query from Java.
Try this,
Declare #p table(pumber int,step int,bookdate datetime)
insert into #p values
(1,1,'2019-01-01'),(1,2,'2019-01-02'),(1,3,'2019-01-03')
,(2,1,null),(2,2,null),(2,3,null)
,(3,1,null),(3,2,null),(3,3,'2019-01-03')
;With CTE as
(
select pumber,max(bookdate)bookdate
from #p p1
where bookdate is not null
group by pumber
)
select p.* from #p p
where exists(select 1 from CTE c
where p.pumber=c.pumber and p.bookdate=c.bookdate)
union all
select p1.* from #p p1
where p1.bookdate is null and step=1
and not exists(select 1 from CTE c
where p1.pumber=c.pumber)
If performance is main concern then 1 or 2 query do not matter,finally performance matter.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
Go
If more than 90% of data are where BookingDate is not null or where BookingDate is null then you can create Filtered Index on it.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
where BookingDate is not null
Go
Try row_number() with a proper ordering. Null values are treated as the lowest possible values by sql-server ORDER BY.
SELECT TOP(1) WITH TIES *
FROM myTable t
ORDER BY row_number() over(partition by PRODUCT_NUMBER order by BOOKING_DATE DESC, STEP_NUMBER);
Pay attention to sql-server adviced indexes to get good performance.
Possibly the most efficient method is a correlated subquery:
select t.*
from t
where t.step_number = (select top (1) t2.step_number
from t t2
where t2.product_number = t.product_number and
order by t2.booking_date desc, t2.step_number
);
In particular, this can take advantage of an index on (product_number, booking_date desc, step_number).

Find duplicates in MS SQL table

I know that this question has been asked several times but I still cannot figure out why my query is returning values which are not duplicates. I want my query to return only the records which have identical value in the column Credit. The query executes without any errors but values which are not duplicated are also being returned. This is my query:
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
That's because you're matching on the credit field back to your table, which contains duplicates. You need to isolate the rows that are duplicated with ROW_NUMBER:
;WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY CREDIT ORDER BY (SELECT NULL)) AS RN
FROM _bvGLTransactionsFull)
Select
CTE.AccountDesc,
_bvGLAccountsFinancial.Description,
CTE.TxDate,
CTE.Description,
CTE.Credit,
CTE.Reference,
CTE.UserName
From
_bvGLAccountsFinancial Inner Join
CTE On _bvGLAccountsFinancial.AccountLink = CTE.AccountLink
WHERE CTE.RN > 1
Group By
CTE.AccountDesc, _bvGLAccountsFinancial.Description,
CTE.TxDate, CTE.Description,
CTE.Credit, CTE.Reference,
CTE.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(CTE.Reference), CTE.TrCode
Having
CTE.TxDate > 01 / 11 / 2014 And
CTE.Reference Like '5_____' And
CTE.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Just as a side note, I would consider using aliases to shorten your queries and make them more readable. Prefixing the table name before each column in a join is very difficult to read.
I trust your code in terms of extracting all data per your criteria. With this, let me have a different approach and see your script "as-is". So then, lets keep first all the records in a temp.
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
-- temp table
INTO #tmpTable
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Then remove the "single occurrence" data by creating a row index and remove all those 1 time indexes.
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
You can change your preference through PARTITION BY <column name>.
Should you have any concerns, please raise it first as these are so far how I understood your case.
EDIT : To include those credits that has duplicates.
SELECT
tmp1.*
FROM #tmpTable tmp1
RIGHT JOIN (
SELECT
Credit
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
) AS tmp2
ON tmp1.Credit = tmp2.Credit