Optimizing SUM OVER PARTITION BY for several hierarchical groups - sql

I have a table like below:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 10
R1 C1 M1 B1 2017 20
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 5
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M1 B1 2017 35
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
I need to find SUM([Spend] over different groups as follow:
Total Spend over all the rows in the whole table
Total Spend for each Region
Total Spend for each Region and Country group
Total Spend for each Region, Country and Advertiser group
So I wrote this query below:
SELECT
[Period]
,[Region]
,[Country]
,[Manufacturer]
,[Brand]
,SUM([Spend]) OVER (PARTITION BY [Period]) AS [SumOfSpendWorld]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region]) AS [SumOfSpendRegion]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country]) AS [SumOfSpendCountry]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable
But that query takes >15 minutes for a table of just 450K rows. I'd like to know if there is any way to optimize this performance. Thank you in advanced for your answers/suggestions!

Your description of the problem suggests grouping sets to me:
SELECT YEAR([Period]) AS [Period], [Region], [Country], [Manufacturer],
SUM([Spend])
GROUP BY GROUPING SETS ( (YEAR([Period]),
(YEAR([Period]), [Region]),
(YEAR([Period]), [Region], [Country]),
(YEAR([Period]), [Region], [Country], [Manufacturer])
);
I don't know if this will be faster, but it certainly seems more aligned with your question.

Use cross apply here to speed the query up:
SELECT
periodyear
,[Region]
,[Country]
,[Manufacturer]
,[Brand]
,SUM([Spend]) OVER (PARTITION BY periodyear AS [SumOfSpendWorld]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region]) AS [SumOfSpendRegion]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region], [Country]) AS [SumOfSpendCountry]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable
cross apply (select YEAR([Period]) periodyear) a

Old school of SUM() OVER():
SELECT
[Period]
, [Region]
, [Country]
, [Manufacturer]
, [Brand]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] GROUP BY [Period]) AS [SumOfSpendWorld]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region GROUP BY [Period], [Region] ) AS [SumOfSpendRegion]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country GROUP BY [Period], [Region], [Country] ) AS [SumOfSpendCountry]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country AND e.Manufacturer = t.Manufacturer GROUP BY [Period], [Region], [Country], [Manufacturer] ) AS [SumOfSpendManufacturer]
FROM myTable e
While this is not the elegant way to do it, but it gets the job done. I would highly recommend looking over the table and analyze it to see which alternative approaches would be best for your situation. If you feel it's a dead-end, then I would suggest using temp tables to make things faster.
For instance, you could select the rows based on period and use bulk copy to insert them directly to the temp table, then do your magic. I've seen tables that forced me to use temp tables instead of a simple select query. Others forced me to extend the table into two tables.
So, it's not always going to be nice and clean !
I hope this would give you another insight that would help you in your journey.

Related

SQL Server stored procedure taking a LONG time to run after minor changes

I have a stored procedure that takes info from two tables I created to generate a summary table that is then used with several views.
Previously this took between 60-90 seconds to run. I had two calls to functions for different costs, and a third that makes another call for cost * qty. I removed all 3 and replaced with a new function that is almost an exact copy of one of the other cost functions
I wrote this as I was working through it, so it's evolved a bit. I improved on the speed, but it's still nowhere near as fast as it was before and I'm not sure why.
ALTER FUNCTION [dbo].[fn_getFactoryStdCost]
(#PartID int)
RETURNS decimal(20, 4)
AS
BEGIN
DECLARE #pureID int = 0
SET #pureID = (SELECT TOP(1) PURE_COST_ID
FROM visuser.PART_COST
WHERE EN_PART_ID = #partID
ORDER BY EN_REV_MASTER_ID DESC, IC_WAREHOUSE_ID DESC)
RETURN (SELECT TOP(1) (TOT_MATERIAL_N + TOT_MATERIAL_OVERHEAD_N)
FROM visuser.PURE_COST
WHERE PURE_COST_ID = #pureID
ORDER BY (TOT_MATERIAL_N + TOT_MATERIAL_OVERHEAD_N) DESC)
END
Replaced with. I added the WITH INLINE = OFF after it first got stuck to rule that out. The function by itself works just fine.
ALTER FUNCTION [dbo].[fn_getFactoryStdCost]
(#PartID int)
RETURNS decimal(20,4)
WITH INLINE = OFF
AS
BEGIN
DECLARE #pureID int = 0
SET #pureID = (SELECT TOP(1) PURE_COST_ID
FROM visuser.PART_COST
WHERE EN_PART_ID = #partID
ORDER BY EN_REV_MASTER_ID DESC, IC_WAREHOUSE_ID DESC)
RETURN (SELECT TOP(1) (TOT_MATERIAL_N + TOT_MATERIAL_OVERHEAD_N + TOT_RUN_VALUE_N + TOT_FIXED_OVERHEAD_N) FROM visuser.PURE_COST WHERE PURE_COST_ID = #pureID ORDER BY (TOT_MATERIAL_N + TOT_MATERIAL_OVERHEAD_N) DESC)
END
The other changes that I made was adding [Qty] > 0 AND to the [Part Count] line
And replacing the string based entries for the Commondity ID to ints (which is more appropriate) as the COMMODITY_ID is a reference to the COMMODITY_CODE which is what the strings were.
I expected it to run faster, not run indefinitely. The procedure is now taking forever to run. I'm now on 38min and counting. I also tried just copying the code in the procedure itself and running it and it is also taking forever, so it's something in the code itself.
The AllPartsList table has 1.04m lines, as does the bomBreakdown table. The bomBreakdown table is far more complex and takes 40-60s to generate. The bomSummary table will have 4,100 lines. The AllPartsList table has appropriate indexes, bomBreakdown doesn't.
ALTER PROCEDURE [dbo].[createBOMSummary]
AS
DECLARE #processTime int=0, #begin datetime, #end datetime
SET #begin = SYSDATETIME()
IF OBJECT_ID(N'dbo.bomSummary', N'U') IS NOT NULL
DROP TABLE bomSummary
SELECT
DISTINCT ap.[SourcePartID] AS [Assembly Part ID],
p.[PART_X] AS [Assembly Part #],
p.[DESCR_X] AS [Assembly Part Description],
(SELECT COUNT(DISTINCT [Component Part #]) FROM [bomBreakdown] WHERE [Qty] > 0 AND [Component Part ID] IS NOT NULL AND SourcePartID = ap.SourcePartID GROUP BY [SourcePartID]) AS [Part Count],
(SELECT SUM([Qty]) FROM [bomBreakdown] WHERE [Component Part ID] IS NOT NULL AND SourcePartID = ap.[SourcePartID] GROUP BY [SourcePartID]) AS [Total # of Parts],
([dbo].[fn_getFactoryStdCost](ap.[SourcePartID])) AS [Factory Std Cost],
COALESCE(
(SELECT COUNT(DISTINCT ComponentPartID)
FROM AllPartsList apl
LEFT JOIN visuser.EN_PART p1
ON p1.[EN_Part_ID] = apl.[ComponentPartID]
WHERE
apl.ComponentPartID IS NOT NULL AND
apl.SourcePartID = ap.SourcePartID AND
p1.Commodity_ID IN (15, 84, 85, 87, 81, 92) -- Commodity Codes: 009, 072, 073, 075, 079, 082
GROUP BY SourcePartID
), 0) AS [# of Docs], --0sec
COALESCE(
(SELECT COUNT(DISTINCT ComponentPartID)
FROM AllPartsList apl
LEFT JOIN visuser.EN_PART p1
ON p1.[EN_Part_ID] = apl.[ComponentPartID]
WHERE
apl.ComponentPartID IS NOT NULL AND
apl.SourcePartID = ap.SourcePartID AND
p1.Commodity_ID IN (28) -- Commodity Code 034
GROUP BY SourcePartID
), 0) AS [# of Software], --0sec
COALESCE(
(SELECT COUNT(*)
FROM visuser.[PART_COST]
WHERE [STD_PO_Cost_N] > 0 AND
EN_PART_ID IN
(SELECT DISTINCT ComponentPartID FROM AllPartsList WHERE ComponentPartID IS NOT NULL AND SourcePartID = ap.SourcePartID)
), 0) AS [# of Std Cost Items], --0sec
COALESCE(
(SELECT COUNT(DISTINCT ComponentPartID)
FROM AllPartsList apl
LEFT JOIN visuser.EN_PART p1
ON p1.[EN_Part_ID] = apl.[ComponentPartID]
WHERE
apl.ComponentPartID IS NOT NULL AND
apl.SourcePartID = ap.SourcePartID AND
p1.Commodity_ID IN (11) -- Commodity Code: 002
GROUP BY SourcePartID), 0
) AS [# of HR Devices] ,--0sec
COALESCE(
(SELECT COUNT(DISTINCT ComponentPartID)
FROM AllPartsList apl
LEFT JOIN visuser.EN_PART p1
ON p1.[EN_Part_ID] = apl.[ComponentPartID]
WHERE
apl.ComponentPartID IS NOT NULL AND
apl.SourcePartID = ap.SourcePartID AND
p1.Commodity_ID IN (5) -- Commodity Code: 007
GROUP BY SourcePartID), 0
) AS [# of 3rd Party Devices], --0sec
COALESCE(
(SELECT COUNT(DISTINCT ComponentPartID)
FROM AllPartsList apl
LEFT JOIN visuser.EN_PART p1
ON p1.[EN_Part_ID] = apl.[ComponentPartID]
WHERE
apl.ComponentPartID IS NOT NULL AND
apl.SourcePartID = ap.SourcePartID AND
p1.Commodity_ID IN (13) AND -- Commodity Code: 005
p1.MAKE_BUY_C = 'B'
GROUP BY SourcePartID
), 0) AS [# of Robots], --0sec
COALESCE(
(SELECT COUNT(*)
FROM visuser.[PART_COST] c
LEFT JOIN visuser.[EN_PART] p
ON p.[EN_PART_ID] = c.[EN_PART_ID]
WHERE
c.[STD_PO_Cost_N] > 0 AND
p.[MAKE_BUY_C] = 'B' AND
c.[EN_PART_ID] IN
(SELECT DISTINCT ComponentPartID FROM AllPartsList WHERE ComponentPartID IS NOT NULL AND SourcePartID = ap.SourcePartID)
), 0) AS [# of Buy Parts], --0sec
COALESCE(
(SELECT COUNT(*)
FROM visuser.[PART_COST] c
LEFT JOIN visuser.[EN_PART] p
ON p.[EN_PART_ID] = c.[EN_PART_ID]
WHERE
c.[STD_PO_Cost_N] > 0 AND
p.[MAKE_BUY_C] = 'M' AND
c.[EN_PART_ID] IN
(SELECT DISTINCT ComponentPartID FROM AllPartsList WHERE ComponentPartID IS NOT NULL AND SourcePartID = ap.SourcePartID)
), 0) AS [# of Make Parts]
INTO bomSummary
FROM AllPartsList ap
LEFT JOIN visuser.EN_PART p
ON p.[EN_Part_ID] = ap.[SourcePartID]
ORDER BY [PART_X]
SET #end = SYSDATETIME()
SET #processTime = DATEDIFF(s, #begin, #end)
PRINT #end
PRINT CHAR(10)+CHAR(13)
PRINT 'bomSummary Processing Time: ' + CONVERT(varchar, #processTime)
GO
Here's how the bomBreakdown table looks:
And the AllPartsList table:
If I comment out the function line two records takes 1m 20s to process, here's part of the execution plan. It looks like each COALESCE I have adds 4-6 seconds to the process time.
If I remove all the COALESCE then it takes 2min 50sec to process all 4981 records. Here's the execution list for it:
The execution plans suggested a couple additional indexes, so I added those and now 1 record takes 0 seconds, 2 took 5 secs, 10 took 1 sec, 100 took 2sec, 1000 took 28, and all 4981 took 4min 17sec.
The additional indexes certainly helped, I no longer see %s over 1000%, there are several still over 100% which makes me think there is some more optimization that could be done, I'm just not sure where. The execution plan is huge, so here just a few shots:
Not sure what was up with the 2 records. It's not the 90sec it was before, but it at least finishes now.
Odd thing I see is that it has (1000 rows affected), then (1 row affected). I have no idea what that 1 row is or where it's coming from. And I'd still like to know why making those few changes made such a hug difference.
I'm using:
SQL Server 2019 (v15.0.2070.41)
SSMS v18.5
Here are the results of my modifications based on allmhuran's suggestions:
SELECT
DISTINCT ap.[SourcePartID] AS [Assembly Part ID],
p.[PART_X] AS [Assembly Part #],
p.[DESCR_X] AS [Assembly Part Description],
oa2.[Part Count],
oa2.[Total # of Parts],
([dbo].[fn_getFactoryStdCost](ap.[SourcePartID])) AS [Factory Std Cost],
oa2.[# of Docs],
oa2.[# of Software],
'Logic Pending' AS [# of Std Cost Items],
oa2.[# of HR Devices],
oa2.[# of 3rd Party Devices],
oa2.[# of Robots],
oa2.[# of Buy Parts],
oa2.[# of Make Parts]
FROM AllPartsList ap
LEFT JOIN visuser.EN_PART p
ON p.[EN_Part_ID] = ap.[SourcePartID]
OUTER APPLY (
SELECT
[Part Count] = COUNT( DISTINCT IIF( [Qty] = 0, null, [Component Part #]) ),
[Total # of Parts] = SUM([Qty]),
[# of Docs] = COUNT( DISTINCT IIF( [Commodity Code] IN ('009', '072', '073', '075', '079', '082'), [Component Part #], null) ), -- Commodity Codes: 009, 072, 073, 075, 079, 082 : Commodity ID: 15, 84, 85, 87, 81, 92
[# of Software] = COUNT( DISTINCT IIF( [Commodity Code] IN ('034'), [Component Part #], null) ), -- Commodity Code 034 : Commodity ID: 28
[# of HR Devices] = COUNT( DISTINCT IIF( [Commodity Code] IN ('002'), [Component Part #], null) ), -- Commodity Code 002 : Commodity ID: 11
[# of 3rd Party Devices] = COUNT( DISTINCT IIF( [Commodity Code] IN ('007'), [Component Part #], null) ), -- Commodity Code 007 : Commodity ID: 5
[# of Robots] = COUNT( DISTINCT IIF( ( [Commodity Code] IN ('005') AND [Make/Buy] = 'B' ), [Component Part #], null) ), -- Commodity Code 005 : Commodity ID: 13
[# of Buy Parts] = COUNT( DISTINCT IIF( [Make/Buy] = 'B', [Component Part #], null) ),
[# of Make Parts] = COUNT( DISTINCT IIF( [Make/Buy] = 'M', [Component Part #], null) )
FROM bomBreakdown
WHERE
[Component Part ID] IS NOT NULL AND
[SourcePartID] = ap.[SourcePartID] AND
--[SourcePartID] = ap.[AssemblyPartID] AND
ap.SourcePartID = 964
GROUP BY [SourcePartID]
) oa2
OK, snuck in a bit of time to go through this.
Scalar function refactor
As mentioned in my comment, scalar functions do bad things to set based operations. In general, if you have a pattern like
create function scalar_UDF(#i int) returns int as begin
return #i * 2;
end
select c = scalar_UDF(t.c)
from t;
Then this turns your select into a row-by-agonising-row (RBAR) operation under the covers.
You can improve the performance by sticking with set based operations. One way to do this is to mark the scalar UDF as inline, which basically tells SQL it can rewrite your query to this before generating a query plan:
select c = t.c * 2
from t;
But scalar function inlining is a difficult thing for microsoft to solve, and is still a bit buggy. Another way is to handle it yourself, by using an inline table valued function and cross apply or outer apply
create function inline_TVF(#i int) returns table as return
(
select result = #i * 2
)
select c = u.result
from t
outer apply inline_TVF(t.c) u;
Actual factorization refactor
Part of your existing procedure looks like this:
select [Part Count] =
(
select count(distinct [Component Part #])
from bomBreakdown
where Qty > 0
and [Component Part ID] is not null
and SourcePartID = ap.SourcePartID
group by SourcePartID
),
[Total # of Parts] =
(
select sum(Qty)
from bomBreakdown
where [Component Part ID] is not null
and SourcePartID = ap.SourcePartID
group by SourcePartID
)
-- , more ...
Those two subqueries look really similar. It's this sort of pattern:
select a = (
select x1 from y where z
),
b = (
select x2 from y where almost_z
)
What we'd really like to do is something like the following. If we could, then the query only needs to hit the y table once, instead of hitting it twice. But of course the syntax wouldn't be valid:
select a = t.x1,
b = t.x2
from (
select x1 where z,
x2 where almost_z
from y
) t
Aha, but perhaps we can be a bit clever. If we look back to your specific case, we might change it into something like this:
select oa1.[Part Count],
oa1.[Total # of Parts]
into bomSummary
from AllPartsList ap
left join visuser.EN_PART p on p.EN_Part_ID = ap.SourcePartID
outer apply (
select [Part Count] = count
(
distinct iif
(
Qty = 0, null, [Component Part #]
)
),
[Total # of Parts] = sum(qty)
from bomBreakdown
where [Component Part ID] is not null
and SourcePartID = ap.SourcePartID
group by SourcePartID
)
oa1
Here, the iif(Qty = 0, null, [Component Part #]) will make the column null if the quantity is zero. Count will ignore those nulls. And we get the distinct, just like before. So we have sneakily managed to get a where clause in here: "count the distinct component part # values where the quantity is not equal to zero". Now we can just sum the Qty column as well, and we're done refactoring this.
The same kind of refactoring can be done in many places in this stored procedure. It would actually be a great learning exercise for refactoring SQL. I'm not going to do all of them, but just try to identify the patterns, and follow a factorisation process - the same kind you would do in algebra. Because, in many ways, this is algebra!
Please excuse any typos/syntax errors. I haven't been able to check this through an actual query window, my intent here is to demonstrate a few ideas, not to actually rewrite the original query.

SQL : How to select the most recent value by country

I have a table with 3 columns : day, country, value. There are many values by country with different dates. For example :
DAY COUNTRY VALUE
04-SEP-19 BELGIUM 2124
15-MAR-19 BELGIUM 2135
21-MAY-19 SPAIN 1825
18-JUL-19 SPAIN 1724
26-MAR-19 ITALY 4141
I want to select the most recent value by country. For example :
DAY COUNTRY VALUE
04-SEP-19 BELGIUM 2124
18-JUL-19 SPAIN 1724
26-MAR-19 ITALY 4141
What is the sql query I can use?
Thank you for your help.
You can use the row_number() window function (if your DBMS supports it)).
SELECT x.day,
x.country,
x.value
FROM (SELECT t.day,
t.country,
t.value,
row_number() OVER (PARTITION BY t.country
ORDER BY t.day DESC) rn
FROM elbat t) x
WHERE x.rn = 1;
Another way of doing this is using a window function (SQL Server, MySQL8 etc)
e.g.
ROW_NUMBER() OVER ( PARTITION BY COUNTRY ORDER BY CONVERT(DATE, [Day]) DESC )
Then just filter to where this function returns 1
full example:
WITH TestData
AS ( SELECT '04-SEP-19' AS [Day], 'BELGIUM' AS [COUNTRY], 2124 AS [VALUE]
UNION
SELECT '15-MAR-19' AS [Day], 'BELGIUM' AS [COUNTRY], 2135 AS [VALUE]
UNION
SELECT '21-MAY-19' AS [Day], 'SPAIN' AS [COUNTRY], 1825 AS [VALUE]
UNION
SELECT '18-JUL-19' AS [Day], 'SPAIN' AS [COUNTRY], 1724 AS [VALUE]
UNION
SELECT '26-MAR-19' AS [Day], 'ITALY' AS [COUNTRY], 4141 AS [VALUE] ),
TestDataRanked
AS ( SELECT *,
ROW_NUMBER() OVER ( PARTITION BY COUNTRY ORDER BY CONVERT(DATE, [Day]) DESC ) AS SelectionRank
FROM TestData )
SELECT [Day],
COUNTRY,
[VALUE]
FROM TestDataRanked
WHERE SelectionRank = 1;
I understand the problem as you want the most recent value for all countries, as the country can repeat in the table(?):
select distinct t1.DAY, t1.COUNTRY, t1.VALUE
FROM day_test t1
inner join day_test t2 on t1.day in
(select max(day) from day_test t3 where t1.country = t3.country )
and t1.country = t2.country
I tested it and it works.
Let's suppose that the type of day column is date.
In the subquery, you can find the tuple of (country, max date) and to add the value, you can join as mentioned in the comments or use IN
SELECT DISTINCT day, country, value
FROM yourTable
WHERE (country, day)
in (
SELECT country, MAX(day) as day
FROM yourTable
GROUP BY country, value
)
You can use the following query:
Just replace the TABLE_NAME with the name of your table.
SELECT
COUNTRY,
VALUE,
MAX(DATE) AS "MostRecent"
FROM TABLE_NAME
GROUP BY COUNTRY;

Multiple Sub-Queries In A SQL Query

I am creating a sample query that'll convert rows to column something as follows:
Person_Id Total Earned Leave Earned Leave Enjoyed Remaining Earned Leave Total Casual Leave Casual Leave Enjoyed Remaining Casual Leave
1001 20 10 10 20 4 16
So above is the output I get and used multiple sub-queries using the following query:
SELECT DISTINCT m.Person_Id, (SELECT k.Leave_Allocation FROM LeaveDetails k WHERE k.Leave_Name = 'Earn Leave'
AND k.Person_Id = 1001 AND k.[Year] = '2017') AS 'Total Earned Leave',
(SELECT o.Leave_Enjoy FROM LeaveDetails o WHERE o.Leave_Name = 'Earn Leave'
AND o.Person_Id = 1001 AND o.[Year] = '2017') AS 'Earned Leave Enjoyed',
(SELECT p.Leave_Remain FROM LeaveDetails p WHERE p.Leave_Name = 'Earn Leave'
AND p.Person_Id = 1001 AND p.[Year] = '2017') AS 'Remaining Earned Leave',
(SELECT k.Leave_Allocation FROM LeaveDetails k WHERE k.Leave_Name = 'Casual Leave'
AND k.Person_Id = 1001 AND k.[Year] = '2017') AS 'Total Casual Leave',
(SELECT o.Leave_Enjoy FROM LeaveDetails o WHERE o.Leave_Name = 'Casual Leave'
AND o.Person_Id = 1001 AND o.[Year] = '2017') AS 'Casual Leave Enjoyed',
(SELECT p.Leave_Remain FROM LeaveDetails p WHERE p.Leave_Name = 'Casual Leave'
AND p.Person_Id = 1001 AND p.[Year] = '2017') AS 'Remaining Casual Leave'
FROM LeaveDetails m WHERE m.Person_Id = 1001 AND m.[Year] = '2017'
I am not sure if I am going to have performance issue here as there will be lots of data and was arguing if this will be better than Pivot or Run-Time Table Creation. I just want to make sure if this is going to be a better choice for the purpose I am trying to accomplish. You can share your ideas as well samples using SQL Server, MySQL or Oracle for better performance issue - Thanks.
Sample Table and Data:
CREATE TABLE [dbo].[LeaveDetails](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Person_Id] [nvarchar](20) NULL,
[Leave_Name] [nvarchar](40) NULL,
[Leave_Allocation] [float] NULL,
[Leave_Enjoy] [float] NULL,
[Leave_Remain] [float] NULL,
[Details] [nvarchar](100) NULL,
[Year] [nvarchar](10) NULL,
[Status] [bit] NULL
)
INSERT [dbo].[LeaveDetails] ([Id], [Person_Id], [Leave_Name], [Leave_Allocation], [Leave_Enjoy], [Leave_Remain], [Details], [Year], [Status]) VALUES (1, N'1001', N'Earn Leave', 20, 10, 10, NULL, N'2017', 1)
INSERT [dbo].[LeaveDetails] ([Id], [Person_Id], [Leave_Name], [Leave_Allocation], [Leave_Enjoy], [Leave_Remain], [Details], [Year], [Status]) VALUES (2, N'1001', N'Casual Leave', 20, 4, 16, NULL, N'2017', 1)
Use conditional aggregation:
SELECT m.Person_Id,
MAX(CASE WHEN m.Leave_Name = 'Earn Leave' THEN k.Leave_Allocation END) as [Total Earned Leave],
MAX(CASE WHEN m.Leave_Name = 'Earn Leave' THEN m.Leave_Enjoy END) as [Earned Leave Enjoyed],
MAX(CASE WHEN m.Leave_Name = 'Earn Leave' THEN m.Leave_Remain END) as [Remaining Earned Leave],
MAX(CASE WHEN m.Leave_Name = 'Casual Leave' THEN k.Leave_Allocation END) as [Total Casual Leave],
MAX(CASE WHEN m.Leave_Name = 'Casual Leave' THEN k.Leave_Remain END) as [Casual Leave Enjoyed],
MAX(CASE WHEN m.Leave_Name = 'Casual Leave' THEN k.Leave_Remain END) as [Remaining Casual Leave]
FROM LeaveDetails m
WHERE m.Person_Id = 1001 AND m.[Year] = '2017'
GROUP BY m.Person_ID;
Note: I do not advocate having special characters (such as spaces) in column aliases. If you do, use the proper escape character (square braces). Only use single quotes for string and date constants.
PIVOT would work, but it looks like this is simply a single row that you want pivoted to a columnar output and the column names are known explicitly. If that's the case, you could just UNION the single column results together:
SELECT 'Person_ID' as col_name, Person_Id as col_value FROM LeaveDetails WHERE Person_Id = 1001 AND [Year] = '2017'
UNION
SELECT 'Leave_Enjoy' as col_name, Leave_Enjoy as col_value FROM LeaveDetails WHERE Person_Id = 1001 AND [Year] = '2017'
UNION
...
It's a lot simpler to write, cleaner to read, and should run a little faster - there is still one table scan for each column. Is the table indexed on Person_ID and Year?
If speed is an issue you could create a temp table of the one row:
SELECT * into #ld_temp FROM LeaveDetails WHERE Person_Id = 1001 AND [Year] = '2017'
then select from the temp table in the SELECT/UNION code:
SELECT 'Person_ID' as col_name, Person_Id as col_value FROM #ld_temp
UNION
SELECT 'Leave_Enjoy' as col_name, Leave_Enjoy as col_value FROM #ld_temp
UNION
...
Now you're down to just a single scan of the big table.
I hope this helps.

How to find the difference between dates within the same column using SQL?

I am trying to solve the following challenge:
1) If a patient visits the ER within 48 hours, I want to flag that as 1.
2) If the same patient visits the ER again after 48 hours, I want to flag that as 2.
3) Each subsequent visit must be flagged as 3, 4, 5 etcetera after the first 48 hours.
Here is what my table looks like:
PATIENT_ID ADMIT_DATE LOCATION
---------- ---------- --------
33 1/10/2014 ER
33 1/11/2014 ER
33 1/15/2014 ER
33 1/17/2014 ER
45 2/20/2014 OBS
45 2/21/2014 OBS
45 2/25/2014 OBS
45 2/30/2014 OBS
45 2/32/2014 OBS
And here is what the desired result should look like:
PATIENT_ID ADMIT_DATE LOCATION FLAG
---------- ---------- -------- ----
33 1/10/2014 ER 1
33 1/15/2014 ER 2
33 1/17/2014 ER 3
45 2/20/2014 OBS 1
45 2/25/2014 OBS 2
45 2/30/2014 OBS 3
45 2/32/2014 OBS 4
I have started something like this but could not complete it:
SELECT PATIENT_ID, ADMIT_DATE, LOCATION,
CASE WHEN MIN(ADMIT_DATE)-MAX(ADMIT_DATE)<48 THEN 1 ELSE 0 AS FLAG
FROM MYTABLE
GROUP BY PATIENT_ID, ADMIT_DATE, LOCATION
Can someone please help?
You can achieve this easy using LAG, DATEDIFF and ROWNUMBER functions. The LAG function helps you to get the previous ADMIT_DATE value. Then you can calculate the difference in hours using the DATEDIFF function. Finally, using ROWNUMBER you can simple rank your results.
This is full working example:
SET NOCOUNT ON
GO
DECLARE #DataSource TABLE
(
[ATIENT_ID] TINYINT
,[ADMIT_DATE] DATE
,[LOCATION] VARCHAR(3)
)
INSERT INTO #DataSource ([ATIENT_ID], [ADMIT_DATE], [LOCATION])
VALUES (33, '1-10-2014', 'ER')
,(33, '1-11-2014', 'ER')
,(33, '1-15-2014', 'ER')
,(33, '1-17-2014', 'ER')
,(45, '2-15-2014', 'OBS')
,(45, '2-16-2014', 'OBS')
,(45, '2-20-2014', 'OBS')
,(45, '2-25-2014', 'OBS')
,(45, '2-27-2014', 'OBS')
;WITH DataSource ([ATIENT_ID], [ADMIT_DATE], [LOCATION], [DIFF_IN_HOURS]) AS
(
SELECT [ATIENT_ID]
,[ADMIT_DATE]
,[LOCATION]
,DATEDIFF(
HOUR
,LAG([ADMIT_DATE], 1, NULL) OVER (PARTITION BY [ATIENT_ID], [LOCATION] ORDER BY [ADMIT_DATE] ASC)
,[ADMIT_DATE]
)
FROM #DataSource
)
SELECT [ATIENT_ID]
,[ADMIT_DATE]
,[LOCATION]
,ROW_NUMBER() OVER (PARTITION BY [ATIENT_ID], [LOCATION] ORDER BY [ADMIT_DATE] ASC)
FROM DataSource
WHERE [DIFF_IN_HOURS] >= 48
OR [DIFF_IN_HOURS] IS NULL -- these are first records
SET NOCOUNT OFF
GO
Note, I have fixed your sample data as it was wrong.
This is alternative solution without LAG function:
;WITH TempDataSource ([ATIENT_ID], [ADMIT_DATE], [LOCATION], [Rank]) AS
(
SELECT [ATIENT_ID]
,[ADMIT_DATE]
,[LOCATION]
,ROW_NUMBER() OVER (PARTITION BY [ATIENT_ID], [LOCATION] ORDER BY [ADMIT_DATE] ASC)
FROM #DataSource
),
DataSource ([ATIENT_ID], [ADMIT_DATE], [LOCATION], [DIFF_IN_HOURS]) AS
(
SELECT DS1.[ATIENT_ID]
,DS1.[ADMIT_DATE]
,DS1.[LOCATION]
,DATEDIFF(HOUR, DS2.[ADMIT_DATE], DS1.[ADMIT_DATE])
FROM TempDataSource DS1
LEFT JOIN TempDataSource DS2
ON DS1.[Rank] - 1 = DS2.[Rank]
AND DS1.[ATIENT_ID] = DS2.[ATIENT_ID]
AND DS1.[LOCATION] = DS2.[LOCATION]
)
SELECT [ATIENT_ID]
,[ADMIT_DATE]
,[LOCATION]
,ROW_NUMBER() OVER (PARTITION BY [ATIENT_ID], [LOCATION] ORDER BY [ADMIT_DATE] ASC)
FROM DataSource
WHERE [DIFF_IN_HOURS] >= 48
OR [DIFF_IN_HOURS] IS NULL -- these are first records
SELECT Patient_id,Admit_date, Location,
CASE WHEN DATEDIFF (HH , min(admit_date) , max(admit_date)) < 48 THEN count(flag)+1 ELSE 0 End As Flag
FROM tbl_Patient
GROUP BY PATIENT_ID, ADMIT_DATE, LOCATION
you can use DATEDIFF() available in sql-server like
SELECT DATEDIFF(hour,startDate,endDate) AS 'Duration'
You can visit http://msdn.microsoft.com/en-IN/library/ms189794.aspx

write query without using cursors

I wrote this query to display single client's account transactions:
select *
from (
select top 1
[id]
,[client_id]
,[transactionDate]
,N'revolving balance' [details]
,NULL [amount]
,NULL [debit]
,NULL [credit]
,[balance]
FROM [dbo].[bsitems]
where [client_id]=#client_id and
[transactionDate] < #transactionDateFrom
order by id desc) t1
union
SELECT [id]
,[client_id]
,[transactionDate]
,[details]
,[amount]
,[debit]
,[credit]
,[balance]
FROM [dbo].[bsitems]
where [client_id]=#client_id and
[transactionDate] between #transactionDateFrom and #transactionDateTo
How to display the transactions for all clients that exists in "client" table? Assume client table structure is (id, name)
For the second part (which is really what the transactions are), just use join:
SELECT b.id, b.client_id, b.transactionDate, b.details,
b.amount, b.debit, b.credit, b.balance
FROM [dbo].[bsitems] b join
clients c
on b.client_id = c.client_id
WHERE transactionDate between #transactionDateFrom and #transactionDateTo;
This assumes that the client_id is stored in the clients table.
The first part of the query is returning the most recent id before the from date. It returns no rows if there are no previous transactions. You can approach this in a similar way.
select id, client_id, transactiondate, details, amount, debit, credit, balance
from (SELECT b.id, b.client_id, b.transactionDate,
N'revolving balance' as b.details,
NULL as b.amount, NULL as b.debit, NULL as b.credit,
b.balance,
row_number() over (partition by client_id order by TransactionDate desc) as seqnum
FROM [dbo].[bsitems] b join
clients c
on b.client_id = c.client_id
WHERE transactionDate < #transactionDateFrom
) t
where seqnum = 1;
Instead of top 1 this is using row_number() to assign a sequential value to the transactions before the cutoff date. Then the most recent of these is chosen.
The final result is just the union all of these two queries:
select id, client_id, transactiondate, details, amount, debit, credit, balance
from (SELECT b.id, b.client_id, b.transactionDate,
N'revolving balance' as b.details,
NULL as b.amount, NULL as b.debit, NULL as b.credit,
b.balance,
row_number() over (partition by client_id order by TransactionDate desc) as seqnum
FROM [dbo].[bsitems] b join
clients c
on b.client_id = c.client_id
WHERE transactionDate < #transactionDateFrom
) t
where seqnum = 1
union all
SELECT b.id, b.client_id, b.transactionDate, b.details,
b.amount, b.debit, b.credit, b.balance
FROM [dbo].[bsitems] b join
clients c
on b.client_id = c.client_id
WHERE transactionDate between #transactionDateFrom and #transactionDateTo;
SELECT *
FROM (
SELECT TOP 1
[id]
,[client_id]
,[transactionDate]
,N'revolving balance' [details]
,NULL [amount]
,NULL [debit]
,NULL [credit]
,[balance]
FROM [dbo].[bsitems]
INNER JOIN [dbo].[client] ON [dbo].[bsitems].[client_id] = [dbo].[client].[id]
AND [transactionDate] < #transactionDateFrom
ORDER BY id DESC
) t1
UNION
SELECT
[id]
,[client_id]
,[transactionDate]
,[details]
,[amount]
,[debit]
,[credit]
,[balance]
FROM [dbo].[bsitems]
INNER JOIN [dbo].[client] ON [dbo].[bsitems].[client_id] = [dbo].[client].[id]
AND [transactionDate] between #transactionDateFrom and #transactionDateTo