Best way to calculate Max/Min of N columns in SQL Server - sql

Ok, firstly I've seen this thread. But none of the solutions are very satisfactory. The nominated answer looks like NULLs would break it, and the highest-rated answer looks nasty to maintain.
So I was wondering about something like the following :
CREATE FUNCTION GetMaxDates
(
#dte1 datetime,
#dte2 datetime,
#dte3 datetime,
#dte4 datetime,
#dte5 datetime
)
RETURNS datetime
AS
BEGIN
RETURN (SELECT Max(TheDate)
FROM
(
SELECT #dte1 AS TheDate
UNION ALL
SELECT #dte2 AS TheDate
UNION ALL
SELECT #dte3 AS TheDate
UNION ALL
SELECT #dte4 AS TheDate
UNION ALL
SELECT #dte5 AS TheDate) AS Dates
)
END
GO
Main problems I see are that if there are only 3 fields to compare, you'd still have to specify NULL for the other 2, and if you wanted to extend it to six comparisons it would break existing use. If it was a parameterized stored procedure you could specify a default for each parameter, and adding new parameters wouldn't break existing references. The same method could also obviously be extended to other datatypes or stuff like Min or Avg. Is there some major drawback to this that I'm not spotting? Note that this function works whether some, all or none of the values passed to it are nulls or duplicates.

You can solve null issue with ISNULL function:
SELECT ISNULL(#dte1,0) AS TheDate
UNION ALL
SELECT ISNULL(#dte2,0) AS TheDate
UNION ALL
SELECT ISNULL(#dte3,0) AS TheDate
UNION ALL
SELECT ISNULL(#dte4,0) AS TheDate
UNION ALL
SELECT ISNULL(#dte5,0) AS TheDate) AS Dates
But it will only work with MAX functions.
Here is another suggestion: http://www.sommarskog.se/arrays-in-sql-2005.html
They suggest comma delimited values in a form of string.
The function takes as many parameters as you wish and looks like this:
CREATE FUNCTION GetMaxDate
(
#p_dates VARCHAR(MAX)
)
RETURNS DATETIME
AS
BEGIN
DECLARE #pos INT, #nextpos INT, #date_tmp DATETIME, #max_date DATETIME, #valuelen INT
SELECT #pos = 0, #nextpos = 1
SELECT #max_date = CONVERT(DATETIME,0)
WHILE #nextpos > 0
BEGIN
SELECT #nextpos = charindex(',', #p_dates, #pos + 1)
SELECT #valuelen = CASE WHEN #nextpos > 0
THEN #nextpos
ELSE len(#p_dates) + 1
END - #pos - 1
SELECT #date_tmp = CONVERT(DATETIME, substring(#p_dates, #pos + 1, #valuelen))
IF #date_tmp > #max_date
SET #max_date = #date_tmp
SELECT #pos = #nextpos
END
RETURN #max_date
END
And calling:
DECLARE #dt1 DATETIME
DECLARE #dt2 DATETIME
DECLARE #dt3 DATETIME
DECLARE #dt_string VARCHAR(MAX)
SET #dt1 = DATEADD(HOUR,3,GETDATE())
SET #dt2 = DATEADD(HOUR,-3,GETDATE())
SET #dt3 = DATEADD(HOUR,5,GETDATE())
SET #dt_string = CONVERT(VARCHAR(50),#dt1,21)+','+CONVERT(VARCHAR(50),#dt2,21)+','+CONVERT(VARCHAR(50),#dt3,21)
SELECT dbo.GetMaxDate(#dt_string)

Why not just:
SELECT Max(TheDate)
FROM
(
SELECT #dte1 AS TheDate WHERE #dte1 IS NOT NULL
UNION ALL
SELECT #dte2 AS TheDate WHERE #dte2 IS NOT NULL
UNION ALL
SELECT #dte3 AS TheDate WHERE #dte3 IS NOT NULL
UNION ALL
SELECT #dte4 AS TheDate WHERE #dte4 IS NOT NULL
UNION ALL
SELECT #dte5 AS TheDate WHERE #dte5 IS NOT NULL) AS Dates
That shoud take care of the null problem without introducing any new values

I would pass the Dates in XML (you could use varchar/etc, and convert to the xml datatype too):
DECLARE #output DateTime
DECLARE #test XML
SET #test = '<VALUES><VALUE>1</VALUE><VALUE>2</VALUE></VALUES>'
DECLARE #docHandle int
EXEC sp_xml_preparedocument #docHandle OUTPUT, #doc
SET #output = SELECT MAX(TheDate)
FROM (SELECT t.value('./VALUE[1]','DateTime') AS 'TheDate'
FROM OPENXML(#docHandle, '//VALUES', 1) t)
EXEC sp_xml_removedocument #docHandle
RETURN #output
That would address the issue of handling as many possibilities, and I wouldn't bother putting nulls in the xml.
I'd use a separate parameter to specify the datetype rather than customize the xml & supporting code every time, but you might need to use dynamic SQL for it to work.

A better option is to restructure the data to support column based min/max/avg as this is what SQL is best at.
In SQL Server 2005 you can use the UNPIVOT operator to perform the transformation.
Not always appropriate for every problem, but can make things easier if you can use it.
See:
http://msdn.microsoft.com/en-us/library/ms177410.aspx
http://blogs.msdn.com/craigfr/archive/2007/07/17/the-unpivot-operator.aspx

If you have to do it over one row only, it doesn't matter how you will do it (everything would be fast enough).
For selecting Min/Max/Avg value of several columns PER ROW, solution with UNPIVOT should be much faster than UDF

an other possibility is to create a custom table type, like this:
CREATE TYPE [Maps].[TblListInt] AS TABLE( [ID] [INT] NOT NULL )
then,
CREATE FUNCTION dbo.GetMax(#ids maps.TblListInt READONLY) RETURNS INT
BEGIN
RETURN (select max(id) from #ids)
END
Of course, you can swap "int" with your required type.

Related

Assigning variables to use in query

I am moving from Oracle to SQL Server and I am noticing differences regarding assigning variables in a query. I wonder if someone could write me a simple example of how I can do this in SSMS please?
In the example below I am looking to assign the variable #date1 at the beginning of the select statement so that I can simply change the date at the top instead of having to change it several times in the query where #date1 is used several times.
SELECT *
FROM table
where date = #date1
Thanks
Based on your example the syntax would be as follows:
DECLARE #date1 DATETIME
SET #date1 = '2017-01-01 00:00:00.000'
Then reference #date1 in your query as you have above.
More broadly, the syntax is:
DECLARE #<name of variable> <type>
SET #<name of variable> = <value>
-- Simple declares
DECLARE #Variable1 VARCHAR(100)
DECLARE #Variable2 DATE
DECLARE #VariableTable TABLE (
numberColumnName INT,
textColumnName VARCHAR(MAX))
-- Chained declares
DECLARE
#Variable3 VARCHAR(100),
#Variable4 INT
-- Declare with initiation
DECLARE #Variable5 INT = 150
DECLARE #Variable6 DATE = '2018-05-05' -- Implicit conversion (varchar to date)
DECLARE #Variable7 FLOAT = 1945.15 * 1648.12 / #Variable5 -- Expressions can be used
DECLARE #Variable8 INT = (SELECT COUNT(1) FROM sys.objects)
-- Chained declares with initiation
DECLARE
#Variable9 VARCHAR(100) = 'Afla',
#Variable10 INT = 9164 * #Variable5
-- Change variable values (without declaring)
SET #Variable1 = 'Some value'
SET #Variable2 = CONVERT(DATE, GETDATE())
For your example:
DECLARE #DateFilter DATE = '2018-05-16' -- Use ISO standard date format (yyyy-MM-dd) when you hard-code them as literals
SELECT
*
FROM
YourTable AS T
WHERE
T.DateToFilter >= #DateFilter
DECLARE #date1 DATE = '2018-04-11'
This code may be fine, but be aware of dates formats :date (Transact-SQL)
and the need of using either Date, Datetime, or Datetime2.

Split String Value in SQL for Flexible Filtering

I have a function where in user can assign multiple categories (food, non food etc) to a certain Tenant. See sample Data Table
Table: tblSales
date tenant sales category
1/1/2015 tenant1 1000 Food,Non-Food,Kiosk
1/1/2015 tenant2 2000 Food
1/1/2015 tenant3 1000 Non-Food,Kiosk
The system should be able to load record when the user selected any of the categories listed in Category Column.
For example, User selected categories: Non-Food,Kiosk. Expected result should be:
date tenant sales category
1/1/2015 tenant1 1000 Food,Non-Food,Kiosk
1/1/2015 tenant3 1000 Non-Food,Kiosk
Since, Non-Food and Kiosk is seen in Tenants 1 and 3.
So, what I think, the process should be a string manipulation first on the value of Category column, splitting each word delimited by comma. I have code which does not work correctly
#Category nvarchar(500) = 'Non-Food,Kiosk' --User selected
SELECT date,tenant,sales,category
FROM tblSales
WHERE (category in (SELECT val FROM dbo.split (#Category, #delimeter)))
That does not seem to work because the one it is splitting is the User Selected Categories and not the value of the data itself. I tried this
#Category nvarchar(500) = 'Non-Food,Kiosk' --User selected
SELECT date,tenant,sales,category
FROM tblSales
WHERE ((SELECT val FROM dbo.split (category, #delimeter)) in (SELECT val FROM dbo.split (#Category, #delimeter)))
But it resulted to this error
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
In addition to Tim's answer (he is absolutely right about CSV fields in databases!) please note that SQL Server 2016 introduced STRING_SPLIT function. For a single category it's as simple as:
SELECT
date
,tenant
,sales
,category
FROM tblSales
WHERE #Category IN (SELECT value FROM STRING_SPLIT(category, ','))
For a comma delimited list of categories you have to use it twice together with EXISTS:
WHERE EXISTS
(
SELECT *
FROM STRING_SPLIT(category, ',')
WHERE value IN (SELECT value FROM STRING_SPLIT(#Category, ','))
)
If you're using an older SQL Server version you may write your own STRING_SPLIT function, take a look to T-SQL split string. You can use that function with the same syntax as above (please note I wrote code here and it's untested so you may need some fixes).
Note about performance: from QP you can check how sub-queries will be executed, from a naive point of view I'd say CTE, temp-tables and sub-queries have roughly same performance (in this simple case) but if this code is performance critical you'd better perform some benchmark (with real data and a real-world access scenario).
In general, it is bad practice to store CSV data into a database column, because, as you are currently seeing, it renders many of the advantages a database has not usable.
However, I think you might be able to get away with just using LIKE. Assuming the user selected the categories Non-Food and Kiosk, you could try the following query:
SELECT date,
tenant,
sales,
category
FROM tblSales
WHERE category LIKE 'Non-Food' OR
category LIKE 'Kiosk'
Try with the below code .
Create a function to split delemited strings.
CREATE FUNCTION SplitWords
(
#Input NVARCHAR(MAX),
#Character CHAR(1)
)
RETURNS #Output TABLE (
Item NVARCHAR(1000)
)
AS
BEGIN
DECLARE #StartIndex INT, #EndIndex INT
SET #StartIndex = 1
IF SUBSTRING(#Input, LEN(#Input) - 1, LEN(#Input)) <> #Character
BEGIN
SET #Input = #Input + #Character
END
WHILE CHARINDEX(#Character, #Input) > 0
BEGIN
SET #EndIndex = CHARINDEX(#Character, #Input)
INSERT INTO #Output(Item)
SELECT SUBSTRING(#Input, #StartIndex, #EndIndex - 1)
SET #Input = SUBSTRING(#Input, #EndIndex + 1, LEN(#Input))
END
RETURN
END
GO
create a input tabl inside your prcedure /script and keep the split data in that. here your input is #Category
DECLARE #input TABLE (item VARCHAR(50))
INSERT INTO #input
SELECT Item
FROM [dbo].SplitWords (#Category, ',')
make a join using like operator with your actual table
SELECT DISTINCT a.date,
a.tenant,
a.sales,
a.category
FROM tblSales s
JOIN #input a
ON category LIKE '%'+item+'%'
You can try following SQL Select statement where I used my user defined SQL function for split string task
declare #Category nvarchar(500) = 'Non-Food,Kiosk'
declare #cnt int = (select COUNT(*) from dbo.SPLIT(#Category,','))
;with cte as (
select
t.*, COUNT(*) over (partition by tenant) cnt
from dbo.SPLIT(#Category,',') u
inner join (
select
tblSales.*, c.val
from tblSales
cross apply dbo.SPLIT(tblSales.category,',') c
) t on u.val = t.val
)
select distinct tenant from cte where cnt = #cnt

How to extract date fields from string/text field in sql server 2005

There is a text filed in a table called as description. I would like to extract two date fields from this string when there is an occurrence of '~' character using sql server 2005 stored procedure. Help me out in this case.
Example: string: '长期租金;10/1/2012 ~ 10/31/2012'. At occurrence of ~ operator I would like to have from-date: 20121001 and to-date:20121031.
Here is a method which will give the start and end dates. I left most of the testing selects in place but commented out.
DECLARE #string AS NVARCHAR(255)
DECLARE #Seperator as char(1) = '~'
declare #CharStartDate as varchar(10)
declare #CharStopDate as varchar(10)
declare #StartDate as date
declare #StopDate as date
declare #I int
--SET #string = 'xvvvvvvcc;1/09/2012 ~ 1/10/2012xx'
--SET #string = 'xvvvvvvcc;12/31/2012 ~ 1/1/2012xx'
--SET #string = 'xvvvvvvcc;12/1/2012 ~ 10/0/2012xx'
SET #string = 'xvvvvvvcc;1/2/2012 ~ 1/3/2012xx'
--longest date 12/31/2011 = 10
--shortest date 1/1/2012 = 8
-- width of seperator = 3
SELECT
#CharStartDate = substring (#string, CHARINDEX(#Seperator,#string)-11,10)
,#CharStopDate = substring (#string, CHARINDEX(#Seperator,#string)+2,10)
--SELECT #CharStartDate,#CharStopDate
select #I = ascii(substring(#CharStartDate,1,1))
While #I > 57
BEGIN
set #CharStartDate = substring(#CharStartDate,2,10)
--select #CharStartDate
select #I = ascii(substring(#CharStartDate,1,1))
END
select #I = ascii(substring(REVERSE(#CharStopDate),1,1))
While #I > 57
BEGIN
set #CharStopDate = REVERSE(substring(REVERSE(#CharStopDate),2,10))
--select #CharStopDate
select #I = ascii(substring(REVERSE(#CharStopDate),1,1))
END
--select ascii(';'),ascii('9'),ascii('8'),ascii('7'),ascii('6'),ascii('6'),ascii('4'),ascii('3'),ascii('2'),ascii('1'),ascii('0')
SELECT #StartDate = #CharStartDate,#StopDate = #CharStopDate
--SELECT #I,#string,#Seperator,#CharStartDate,#CharStopDate,#StartDate,#StopDate
select datediff(dd,#StartDate,#StopDate) AS 'DateDiff',#StartDate as 'Start Date',#StopDate as 'Stop Date'
I will leave it to you to check for the seperator.
CREATE FUNCTION [dbo].[RemoveAlphaCharacters](#Temp nvarchar(max))
RETURNS nvarchar(max)
AS
BEGIN
WHILE PatIndex ('%[^0-9~/]%', #Temp) > 0
SET #Temp = Stuff(#Temp, PatIndex('%[^0-9~/]%', #Temp), 1, '')
RETURN #Temp
END
DECLARE #string nvarchar(max) = '长期租金;10/1/2012 ~ 10/31/2012'
SELECT CONVERT(date, SUBSTRING([dbo].[RemoveAlphaCharacters](#string), 0,
CHARINDEX('~', [dbo].[RemoveAlphaCharacters](#string))), 101) AS BDate,
CONVERT(date, SUBSTRING([dbo].[RemoveAlphaCharacters](#string),
CHARINDEX('~', [dbo].[RemoveAlphaCharacters](#string)) + 1,
CHARINDEX('~', REVERSE([dbo].[RemoveAlphaCharacters](#string)))), 101) AS EDate
In this instance you can use the following but really you need an exists clause or something like that to test the string for the tilde (~) and as everyone else has stated, this only works if the string always has a semicolon(;) and a tilde(~). You can convert to the strings into datetime fields if you need.
I have placed the string in a variable to make it easier to read...
DECLARE #string AS NVARCHAR(255)
SET #string = '长期租金;10/1/2012 ~ 10/31/2012'
SELECT StartDate = SUBSTRING(#string,CHARINDEX(';',#string)+1,LEN(#string)-CHARINDEX('~',#string)-1)
,EndDate = LTRIM(RIGHT(#string,LEN(#string)-CHARINDEX('~',#string)))
i have never used the older version of SQL cause i just graduated but doesnt it have the EXTRACT() function?.. The syntax goes like this below.
SELECT First_Name ,
EXTRACT ( CAST(Created_date AS DATE) FROM Created_date ) AS Date_only ;
You specify 'First_name' to let SQL know you want it as a column and 'created_date' is the field from which youre trying to separate the date. the cast function converts your field to DATE value before extractig it.
i hope this helps . thank you. if im wrong please let me know i would like to improve myself.

SQL query with start and end dates - what is the best option?

I am using MS SQL Server 2005 at work to build a database. I have been told that most tables will hold 1,000,000 to 500,000,000 rows of data in the near future after it is built... I have not worked with datasets this large. Most of the time I don't even know what I should be considering to figure out what the best answer might be for ways to set up schema, queries, stuff.
So... I need to know the start and end dates for something and a value that is associated with in ID during that time frame. SO... we can the table up two different ways:
create table xxx_test2 (id int identity(1,1), groupid int, dt datetime, i int)
create table xxx_test2 (id int identity(1,1), groupid int, start_dt datetime, end_dt datetime, i int)
Which is better? How do I define better? I filled the first table with about 100,000 rows of data and it takes about 10-12 seconds to set up in the format of the second table depending on the query...
select y.groupid,
y.dt as [start],
z.dt as [end],
(case when z.dt is null then 1 else 0 end) as latest,
y.i
from #x as y
outer apply (select top 1 *
from #x as x
where x.groupid = y.groupid and
x.dt > y.dt
order by x.dt asc) as z
or
http://consultingblogs.emc.com/jamiethomson/archive/2005/01/10/t-sql-deriving-start-and-end-date-from-a-single-effective-date.aspx
Buuuuut... with the second table.... to insert a new row, I have to go look and see if there is a previous row and then if so update its end date. So... is it a question of performance when retrieving data vs insert/update things? It seems silly to store that end date twice but maybe...... not? What things should I be looking at?
this is what i used to generate my fake data... if you want to play with it for some reason (if you change the maximum of the random number to something higher it will generate the fake stuff a lot faster):
declare #dt datetime
declare #i int
declare #id int
set #id = 1
declare #rowcount int
set #rowcount = 0
declare #numrows int
while (#rowcount<100000)
begin
set #i = 1
set #dt = getdate()
set #numrows = Cast(((5 + 1) - 1) *
Rand() + 1 As tinyint)
while #i<=#numrows
begin
insert into #x values (#id, dateadd(d,#i,#dt), #i)
set #i = #i + 1
end
set #rowcount = #rowcount + #numrows
set #id = #id + 1
print #rowcount
end
For your purposes, I think option 2 is the way to go for table design. This gives you flexibility, and will save you tons of work.
Having the effective date and end date will allow you to have a query that will only return currently effective data by having this in your where clause:
where sysdate between effectivedate and enddate
You can also then use it to join with other tables in a time-sensitive way.
Provided you set up the key properly and provide the right indexes, performance (on this table at least) should not be a problem.
for anyone who can use LEAD Analytic function of SQL Server 2012 (or Oracle, DB2, ...), retrieving data from the 1st table (that uses only 1 date column) would be much much quicker than without this feature:
select
groupid,
dt "start",
lead(dt) over (partition by groupid order by dt) "end",
case when lead(dt) over (partition by groupid order by dt) is null
then 1 else 0 end "latest",
i
from x

SQL Server FOR EACH Loop

I have the following SQL query:
DECLARE #MyVar datetime = '1/1/2010'
SELECT #MyVar
This naturally returns '1/1/2010'.
What I want to do is have a list of dates, say:
1/1/2010
2/1/2010
3/1/2010
4/1/2010
5/1/2010
Then i want to FOR EACH through the numbers and run the SQL Query.
Something like (pseudocode):
List = 1/1/2010,2/1/2010,3/1/2010,4/1/2010,5/1/2010
For each x in List
do
DECLARE #MyVar datetime = x
SELECT #MyVar
So this would return:-
1/1/2010
2/1/2010
3/1/2010
4/1/2010
5/1/2010
I want this to return the data as one resultset, not multiple resultsets, so I may need to use some kind of union at the end of the query, so each iteration of the loop unions onto the next.
edit
I have a large query that accepts a 'to date' parameter, I need to run it 24 times, each time with a specific to date which I need to be able to supply (these dates are going to be dynamic) I want to avoid repeating my query 24 times with union alls joining them as if I need to come back and add additional columns it would be very time consuming.
SQL is primarily a set-orientated language - it's generally a bad idea to use a loop in it.
In this case, a similar result could be achieved using a recursive CTE:
with cte as
(select 1 i union all
select i+1 i from cte where i < 5)
select dateadd(d, i-1, '2010-01-01') from cte
Here is an option with a table variable:
DECLARE #MyVar TABLE(Val DATETIME)
DECLARE #I INT, #StartDate DATETIME
SET #I = 1
SET #StartDate = '20100101'
WHILE #I <= 5
BEGIN
INSERT INTO #MyVar(Val)
VALUES(#StartDate)
SET #StartDate = DATEADD(DAY,1,#StartDate)
SET #I = #I + 1
END
SELECT *
FROM #MyVar
You can do the same with a temp table:
CREATE TABLE #MyVar(Val DATETIME)
DECLARE #I INT, #StartDate DATETIME
SET #I = 1
SET #StartDate = '20100101'
WHILE #I <= 5
BEGIN
INSERT INTO #MyVar(Val)
VALUES(#StartDate)
SET #StartDate = DATEADD(DAY,1,#StartDate)
SET #I = #I + 1
END
SELECT *
FROM #MyVar
You should tell us what is your main goal, as was said by #JohnFx, this could probably be done another (more efficient) way.
You could use a variable table, like this:
declare #num int
set #num = 1
declare #results table ( val int )
while (#num < 6)
begin
insert into #results ( val ) values ( #num )
set #num = #num + 1
end
select val from #results
This kind of depends on what you want to do with the results. If you're just after the numbers, a set-based option would be a numbers table - which comes in handy for all sorts of things.
For MSSQL 2005+, you can use a recursive CTE to generate a numbers table inline:
;WITH Numbers (N) AS (
SELECT 1 UNION ALL
SELECT 1 + N FROM Numbers WHERE N < 500
)
SELECT N FROM Numbers
OPTION (MAXRECURSION 500)
declare #counter as int
set #counter = 0
declare #date as varchar(50)
set #date = cast(1+#counter as varchar)+'/01/2013'
while(#counter < 12)
begin
select cast(1+#counter as varchar)+'/01/2013' as date
set #counter = #counter + 1
end
Off course an old question. But I have a simple solution where no need of Looping, CTE, Table variables etc.
DECLARE #MyVar datetime = '1/1/2010'
SELECT #MyVar
SELECT DATEADD (DD,NUMBER,#MyVar)
FROM master.dbo.spt_values
WHERE TYPE='P' AND NUMBER BETWEEN 0 AND 4
ORDER BY NUMBER
Note : spt_values is a Mircrosoft's undocumented table. It has numbers for every type. Its not suggestible to use as it can be removed in any new versions of sql server without prior information, since it is undocumented. But we can use it as quick workaround in some scenario's like above.
[CREATE PROCEDURE [rat].[GetYear]
AS
BEGIN
-- variable for storing start date
Declare #StartYear as int
-- Variable for the End date
Declare #EndYear as int
-- Setting the value in strat Date
select #StartYear = Value from rat.Configuration where Name = 'REPORT_START_YEAR';
-- Setting the End date
select #EndYear = Value from rat.Configuration where Name = 'REPORT_END_YEAR';
-- Creating Tem table
with [Years] as
(
--Selecting the Year
select #StartYear [Year]
--doing Union
union all
-- doing the loop in Years table
select Year+1 Year from [Years] where Year < #EndYear
)
--Selecting the Year table
selec]