How can Index be used when applying function to columns? - sql

Say if I have an index on a DateTime column (named [Timestamp]), and then I use the column in the following query,
SELECT [Id], [Timestamp]
FROM [dbo].[MyTable]
WHERE FORMAT([Timestamp], 'yyyy/MM/dd HH') = '2022/12/14 01'
ORDER BY [Id]
Will the index on the [Timestamp] column be used when executing this query? Otherwise, what'll be a good strategy to go for improving the performance of the query?

No, because you are calling a function on your column, so the index cannot know in advance which values meet your criteria. You need to avoid manipulating the column values in your WHERE clause at all costs. The best approach for comparing datetimes is usually a window compare e.g.
DECLARE #StartDate datetime2(0) = '2022/12/14 01:00:00'
, #EndDate datetime2(0) = '2022/12/14 02:00:00';
SELECT [Id], [Timestamp]
FROM [dbo].[MyTable]
WHERE [Timestamp] >= #StartDate AND [Timestamp] < #EndDate
ORDER BY [Id];
Note: You can check whether the index is used by inspecting the execution plan.

Related

SQLSVR - not able to create index with computed GETDATE column

What I want to achieve: To create an index with an existing date column data converted into UTC timing (Query is written below)
Issue: There's a column in table with local server date values, I need to convert them into UTC date value and then create an Index over it.
Why I need this: I have the same thing done in oracle, and I am trying to migrate stuff along with queries into Sql Server for a new client
Problem: Index doesn't take variables or user defined functions. But only takes table columns as parameters.
Only Work around: is to make a computed column on table and use it to create the index.
Steps I followed:
Ran the below queries at first
ALTER TABLE dbo.tableClient ADD tempComp1 AS DATEADD(minute, datediff(minute, GETUTCDATE(), getdate()), [svrDate])
GO
create index idx1 on dbo.tableClient([key] asc, [tempComp1] desc, [type])
GO
It gives the below error:
Column 'tempComp1' in table 'dbo.tableClient' cannot be used in an index or statistics or as a partition key because it is non-deterministic.
So I tried making the column as PERSISTED
ALTER TABLE dbo.tableClient ADD tempComp1 AS DATEADD(minute, datediff(minute, GETUTCDATE(), getdate()), [svrDate]) PERSISTED
it now giving the error:
Computed column 'tempComp1' in table 'tableClient' cannot be persisted because the column is non-deterministic.
Now, the funny thing is, if I do
SELECT datediff(minute, GETUTCDATE(), getdate())
it gives result: 330
Now if I try the same commands with 330
ALTER TABLE dbo.tableClient ADD tempComp1 AS DATEADD(minute, 330, [svrDate]) PERSISTED
GO
create index idx1 on dbo.tableClient([key] asc, [tempComp1] desc, [type])
GO
it works absolutely fine.

Cast/Convert nvarchar to datetime(2)

Since I described it poorly last time, again
i have table:
id time_stamp
1 12.01.20 15:34:34,000000000 EUROPE/PRAGUE
2 10.01.20 10:15:15,000000000 EUROPE/PRAGUE
3 09.01.20 05:55:42,000000000 EUROPE/PRAGUE
Table have huge amount these data. Column timestamp is data type "nvarchar". And i need to sort datas by date or use in clauses WHERE for constraint by date, so i need to convert "timestamp" to datetime. But I can't. I tried convert and cast but still failed.
From #Larnu i got advice, but didnt work, because I'm stupid and inaccurately described the problem.
UPDATE dbo.YourTable
SET [timestamp] = CONVERT(nvarchar(20),TRY_CONVERT(datetime2(0), [timestamp], 104), 126);
Now you can ALTER the table and change the data type to a datetime2(0):
ALTER TABLE dbo.YourTable ALTER COLUMN [timestamp] datetime2(0) NULL;
any advice on how to change with respect to the time_stamp column.
Thx.

Using CASE Statement in a table

Is it possible to have a column in a table (not from view)(SQL SERVER 2008) to change according to a value in another column i.e. if I have a column called "DUEDATE" can I have a column called "STATUS" that will change the status to "Now Due" if the "DUEDATE" is > GetDate()? If so how do you add that to a table?
You can alter the table and add a computed column:
ALTER TABLE dbo.TheTable
ADD Status AS CASE WHEN ...
You can't persist it, because it's non-deterministic, so don't add PERSISTED or try to put an index on it.
From an indexing perspective, don't try to query it using WHERE Status = 'whatever', because it will have to consider every row in the table. Instead, use an index on DueDate and WHERE DueDate <= GETDATE()
Yes, you can create a computed column:
CREATE TABLE [dbo].[SampleTable](
[DueDate] [date] NULL,
[ComputedValue] AS (CASE WHEN [Duedate] > GETDATE() THEN 'Now Due' ELSE '' END)
) ON [PRIMARY]
As this is a non-deterministic column (because of the value of GETDATE() that is different every time you use the table), adding it to the table doesn't give you much benefit over returning the same in a select query.

Creating datetime sequence in 2008 R2

I need to create a dateime sequence with minute variable time increments in a temp table . The output should look something like when 5 is used
2012-12-13 04:20:00.000
2012-12-13 04:25:00.000
2012-12-13 04:30:00.000
2012-12-13 04:35:00.000
2012-12-13 04:40:00.000
2012-12-13 04:50:00.000
Can this be done?
WITH DateTimeSequence
AS
(
SELECT CONVERT(datetime, '2012-12-13 04:20:00', 120) AS [datetime] -- Start Date
UNION ALL
SELECT DATEADD(mi, 5, [datetime])
FROM DateTimeSequence
WHERE DATEADD(mi, 5, [datetime]) <= CONVERT(datetime, '2012-12-13 04:50:00', 120) -- End Date
)
SELECT [datetime]
FROM DateTimeSequence
SQLFiddle Demo
I would suggest using a sequence table - every database should have one because they are so useful generating datetime sequences quickly and easily.
CREATE TABLE Sequence
(Number int PRIMARY KEY)
Now fill this table with the integers from 0 to 1,000,000 - don't worry, you only need to do this once.
You can then generate datetime sequences as long as you like (well up to 1,000,001) with a variation of
SELECT DATEADD(minute, Number * #stepsize, #StartDateTime)
FROM Sequence
WHERE Number<#NumberRequired
See this SQL Fiddle
This will generally be faster than using CTE and will be almost as fast as retrieving the info direct from a table. In fact, you may consider not using a temporary table but building a SP (or table valued function) with this at the guts as it will be roughly the same speed and a lot more flexible.

Recursive SQL query to speed up non-indexed query

This question is largely driven by curiosity, as I do have a working query (it just takes a little longer than I would like).
I have a table with 4 million rows. The only index on this table is an auto-increment BigInt ID. The query is looking for distinct values in one of the columns, but only going back 1 day. Unfortunately, the ReportDate column that is evaluated is not of the DateTime type, or even a BigInt, but is char(8) in the format of YYYYMMDD. So the query is a bit slow.
SELECT Category
FROM Reports
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
Note that the date converstion in the above statement is simply converting it to a YYYYMMDD format for comparison.
I was wondering if there was a way to optimize this query based on the fact that I know that the only data I am interested in is at the "bottom" of the table. I was thinking of some sort of recursive SELECT function which gradually grew a temporary table that could be used for the final query.
For example, in psuedo-sql:
N = 128
TemporaryTable = SELECT TOP {N} *
FROM Reports
ORDER BY ID DESC
/* Once we hit a date < Today, we can stop */
if(TemporaryTable does not contain ReportDate < Today)
N = N**2
Repeat Select
/* We now have a smallish table to do our query */
SELECT Category
FROM TemproaryTable
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
Does that make sense? Is something like that possible?
This is on MS SQL Server 2008.
I might suggest you do not need to convert the Date that is stored as char data in YYYYMMDD format; That format is inherently sortable all by itself. I would instead convert your date to output in that format.
Also, the way you have the conversion written, it is converting the current DateTime for every individual row, so even storing that value for the whole query could speed things up... but I think just converting the date you are searching for to that format of char would help.
I would also suggest getting the index(es) you need created, of course... but that's not the question you asked :P
Why not just create the index you need?
create index idx_Reports_ReportDate
on Reports(ReportDate, Category)
No, that doesn't make sense. The only way to optimize this query is to have a covering index for it:
CREATE INDEX ndxReportDateCategory ON Reports (ReportDate, Category);
Update
Considering your comment that you cannot modify the schema, then you should modify the schema. If you still can't, then the answer still applies: the solution is to have an index.
And finally, to answer more directly your question, if you have a strong correlation between ID and ReportData: the ID you seek is the biggest one that has a ReportDate smaller than the date you're after:
SELECT MAX(Id)
FROM Reports
WHERE ReportDate < 'YYYYMMDD';
This will do a reverse scan on the ID index and stop at the first ID that is previous to your desired date (ie. will not scan the entire table). You can then filter your reports base don this found max Id.
I think you will find the discussion on SARGability, on Rob Farley's Blog to be very interesting reading in relation to your post topic.
http://blogs.lobsterpot.com.au/2010/01/22/sargable-functions-in-sql-server/
An interesting alternative approach that does not require you to modify the existing column data type would be to leverage computed columns.
alter table REPORTS
add castAsDate as CAST(ReportDate as date)
create index rf_so2 on REPORTS(castAsDate) include (ReportDate)
One of the query patterns I occasionally use to get into a log table with similiar indexing to yours is to limit by subquery:
DECLARE #ReportDate varchar(8)
SET #ReportDate = Convert(varchar(8), GetDate(), 112)
SELECT *
FROM
(
SELECT top 20000 *
FROM Reports
ORDER BY ID desc
) sub
WHERE sub.ReportDate = #ReportDate
20k/4M = 0.5% of the table is read.
Here's a loop solution. Note: might want to make ID primary key and Reportdate indexed in the temp table.
DECLARE #ReportDate varchar(8)
SET #ReportDate = Convert(varchar(8), GetDate(), 112)
DECLARE #CurrentDate varchar(8), MinKey bigint
SELECT top 2000 * INTO #MyTable
FROM Reports ORDER BY ID desc
SELECT #CurrentDate = MIN(ReportDate), #MinKey = MIN(ID)
FROM #MyTable
WHILE #ReportDate <= #CurrentDate
BEGIN
SELECT top 2000 * INTO #MyTable
FROM Reports WHERE ID < #MinKey ORDER BY ID desc
SELECT #CurrentDate = MIN(ReportDate), #MinKey = MIN(ID)
FROM #MyTable
END
SELECT * FROM #MyTable
WHERE ReportDate = #ReportDate
DROP TABLE #MyTable