I'm supporting an existing application written by another developer and I have a question as to whether the choices the data type the developer chose to store dates is affecting the performance of certain queries.
Relevant information: The application makes heavy use of a "Business Date" field in one of our tables. The data type for this business date is nvarchar(10) rather than a datetime data type. The format of the dates is "MM/DD/YYYY", so Christmas 2007 is stored as "12/25/2007".
Long story short, we have some heavy duty queries that run once a week and are taking a very long time to execute.
I'm re-writing this application from the ground up, but since I'm looking at this, I want to know if there is a performance difference between using the datetime data type compared to storing dates as they are in the current database.
You will both save disk-space and increase performance if you use datetime instead of nvarchar(10).
If you use the date-fields to do date-calculation (DATEADD etc) you will see a massive increase in query-execution-speed, because the fields do not need to be converted to datetime at runtime.
Operations over DATETIMEs are faster than over VARCHARs converted to DATETIMEs.
If your dates appear anywhere but in SELECT clause (like, you add them, DATEDIFF them, search for them in WHERE clause etc), then you should keep them in internal format.
There are a lot of reasons you should actually use DateTime rather than a varchar to store a date. Performance is one... but i would be concerned about queries like this:
SELECT *
FROM Table
WHERE DateField > '12/25/2007'
giving you the wrong results.
I cannot back this up with numbers, but the datetime-type should be a lot faster, since it can easily be compared, unlike the varchar. In my opinion, it is also worth a shot to look into UNIX timestamps as your data type.
I believe from an architectural perspective a Datetime would be a more efficient data type as it would be stored as a two 4-byte integers, whereas your nvarchar(10) will be stored as up to 22 bytes (two times the number of characters entered + 2 bytes.). Therefore potentially more than double the amount of storage space is required now in comparison to using a Datetime.
This of course has possible implications for indexing, as the smaller the data item, the more records you can fit on an index data page. This in turn produces a smaller index which is of course quicker to traverse and therefore will return results faster.
In summary, Datetime is the way to go.
The date filtering in the nvarchar field is not easy possible, as the data in the index is sorted lexicographically which doesn't match the sorting you would expect for the date. It's the problem with the date format "mm/dd/yyyy". That means "12/25/2007" will be after "12/01/2008" in a nvarchar index, but that's not what you want. "yyyy/mm/dd" would have been fine.
So, you should use a date field and convert the string values to date. You will surely get a big performance boost. That's if you can change the table schema.
Yes. datetime will be far more efficient for date calculations than varchar or nvarchar (why nvarchar - there's no way you've got real unicode in there, right?). Plus strings can be invalid and misinterpreted.
If you are only using the date part, your system may have a smaller date-only version of datetime.
In addition, if you are just doing joins and certain types of operations (>/</= comparisions but not datediff), a date "id" column which is actually an int of the form yyyymmdd is commonly used in datawarehouses. This does allow "invalid" dates, unfortunately, but it also allows more obvious reserved, "special", dates, whereas in datetime, you might use NULL of 1/1/1900 or something. Integrity is usually enforced through a foerign key constraint to a date "dimension."
Seeing that you tagged the question as "sql server", I'm assuming you are using some version of SQL Server, so I recommend that you look at either using datetime or smalldatetime. In addition, in SQL Server 2008, you have a date type as well as a datetime2 with a much larger range. Check out this link which gives some details
One other problem with using varchar (or any other string datatype) is that the data likely contains invalid dates as they are not automatically validated on entry. If you try to chang e the filed to a datetime field, you amay have conversion problems wher people have added dates such as ASAP, Unknown, 1/32/2009, etc. You willneed to check for dates that won't convert using the handy isdate function and either fix or null them out before you try to chnge the data type.
Likely you also have a lot of code that converts the varchar type to date datatype on the fly so that you can do date math as well. All that code will also need to be fixed.
Chances are the datetime type is both more compact and faster, but more importantly using DATETIMES to store a date and time is a better architecture choice. You're less likely to run into weird problems looking for records between a certain date range and most database libraries will map them to your languages Date type, so the code is much cleaner, which is really much more important in the long run.
Even if it were slower, you'd spend more time debugging the strings-as-dates than all your users will ever see in savings combined.
Related
Thanks to some wonderful application design, I've come to find myself face-to-face with a real WTF - it seems that the application I support outputs the date and time into two separate columns in one particular table; the date goes into a 'Date' column as the datetime data type, whilst the time goes into a 'Time' column as the money data type in minutes and seconds (so, for example, 10:35:00 would be £10.35).
I need to amalgamate these two columns during a query I'm making to the database so it returns as one complete datetime column but obviously just doing...
...snip...
CAST(au.[Date] as datetime) + CAST(au.[Time] AS datetime) as 'LastUpdateDate'
...snip...
... doesn't work as I hoped (naivély) that it would.
My predecessor encountered this issue and came up with a... "creative" solution to this:
MIN(DATEADD(HOUR,CAST(LEFT(CONVERT(VARCHAR(10),[time],0),CHARINDEX('.',CONVERT(VARCHAR(10),[time],0),0)-1) AS INT),DATEADD(MINUTE,CAST(RIGHT(CONVERT(VARCHAR(10),[time],0),LEN(CONVERT(VARCHAR(10),[time],0)) - CHARINDEX('.',CONVERT(VARCHAR(10),[time],0),0)) AS INT),[date]))) AS CreatedDateTime
Unlike my predecessor, I would rather try to keep this solution as simple as possible. Do you think it would be possible to cast the values in this column to time by:
Casting the money value to string
Replacing the decimal point for a colon
Parsing this as a datetime object (to replace the CAST(au.[Time] as datetime) part of the first code block
Is this feasible? And if not, can anyone assist?
EDIT
Just to be 100% clear, I cannot change the underlying data type for the column as the application relies on the data type being money. This is purely so my sanely-written application that does housekeeping reports can actually read the data in as a complete datetime value.
I'd prefer an arithmetical convertation without any string castings
MIN(
DATEADD(
MINUTE,
FLOOR(au.[Time]) * 60 + (au.[Time]-FLOOR(au.[Time])) * 100,
au.[Date])
) AS CreatedDateTime
You can add a layer of sanity, if changing the column to time outright is not an option:
ALTER TABLE ... ADD SaneDate AS
DATEADD(MINUTE, FLOOR([Time]) * 60 + 100 * ([Time] - FLOOR([Time])), [Date])
One computed column and then you can stick to using that instead of repeating the calculations everywhere. If altering the tables in any way is out of the question, you could at least make a view or table-valued function to capture this logic. (Preferably not a scalar function, although that's more obvious -- those have horrendous performance in queries.)
I tend to prefer DATEADD over string manipulation when possible, simply because the results tend to be more predictable. In this case there's no real issue, since converting money to char(5) is perfectly predictable regardless of language settings, but still.
Just had a look at how to use the REPLACE command and it works as expected:
CAST(au.[Date] as datetime) + CAST(REPLACE(CAST(au.[Time] AS varchar(5)),'.',':') AS datetime) as 'LastUpdateDate'
now outputs 2018-01-10 10:32:00.000 whereas before it was providing some incorrect date and time value.
I suppose you could mathematically convert it as #JeroenMostert has suggested - to be fair I'm not 100% on the performance impact this solution may have against calculating the minutes and converting it that way so I'll give that a try as well just to be sure.
I was wondering if there was a way to store a date (example: 01/01/2013) as datetime without SQL Server CE adding the time (example: 12:00:00 AM).
I could always store it as the string "01/01/2013" but I really want to be able to compare the dates on querying the database.
I realize that as long as I only stored the date part, all of the times in the datetime field would have equal values (i.e. 12:00:00 AM), so comparing them wouldn't be a problem and I could just always ignore the time part, however, it seems ridiculous to have this unnecessary data appended to every entry in the table.
Is there a way to store only the date part of the datetime as datetime so that the dates can still be compared in the SQL query or do I just need to live with this overhead and move on?
Side Note:
I just spent the last 30 minutes searching Google and SO for an answer I was sure was already out there, but to my surprise, I couldn't find anything on this issue.
Update:
The conclusion I have come to is that I will just accept the time in the datetime format and let it always default to 12:00:00 AM by only adding the date part during the INSERT statement (e.g. 01/01/2013). As long as the time part always remains the same throughout, the dates will still be easily comparable and I can just trim it up when I convert it to string for screen display. I believe this will be the easiest way to handle this scenario. After all, I decided to use SQL for the power of its queries, otherwise, I might have just used XML instead of a database, in the first place.
No you really can't get rid of the time component. It is part of the data type defined by sql server. I was very annoyed by it until I found that I could still display the dates without the time using JQuery to reformat them with the date formatter plugi:
https://github.com/phstc/jquery-dateFormat
Good Luck!
select CONVERT(date, GETDATE())
Does SQLs built-in DateTime type has any merits over nvarchar type?
If it were you , which one would you use?
I need to store dates in my SQLServer database and I'm curious to know which one is better and why it is better.
I also want to know what happens if I for example store dates as string literals (I mean nvarchar )? Does it take longer to be searched? Or they are the same in terms of performance ?
And for the last question. How can I send a date from my c# application to the sql field of tye DateTime? Is it any different from the c#s DateTime ?
You're given a date datetype for a reason, why would you not use it?
What happens when you store "3/2/2012" in a text field? Is it March 2nd? Is it February 3rd?
Store the date in a date or datetime field, and do any formatting of the date after the fact.
EDIT
If you have to store dates like 1391/7/1, your choices are:
Assuming you're using SQL Server 2008 or greater, use the datetime2 data type; it allows dates earlier than 1753/01/01 (which is what datetime stops at).
Assuming you're using SQL Server 2005 or earlier, store the dates as Roman calendar dates, and then in your application, use date/time functions to convert the date and time to the Farsi calendar.
Use the correct datatype (date/datetime/datetime2 dependant on version and requirement for time component).
Advantages are more compact storage than storing as a string (especially nvarchar as this is double byte). Built in validation against invalid dates such as 30 February. Sorts correctly. Avoids the need to cast it back to the correct datatype anyway when using date functions on it.
If I'm storing a DateTime value, and I expect to perform date-based calculcations based on it, I'll use a DateTime.
Storing Dates as strings (varchars) introduces a variety of logistical issues, not the least of which is rendering the date in a proper format. Again, that bows in favor of DateTime.
I would go with the DateTime since you can use various functions on it directly.
string wouldn't be too much of a hassle but you will have to cast the data each time you want to do something with it.
There is no real performance variance while searching on both type of fields so going with DateTime is better than strings when working with date values.
you must realise the datetime datatype like other datatypes is provided for a reason and you should use the datatype that represents your data clearly.. Besides this you gain all the functionalities/operations that are special to the datetime datatype..
One of the biggest gains is correct sorting of data which will not be possible directly if you use nvarchar as your datatype.. Even if you think you dont need sorting right now there will be a time in the future where this will be useful.
Also date validation is something that you will benefit from. There is no confusion of the dateformat stored i.e dd/mm or mm/dd etc..
There is lot discussed about the subject. There is good post on the SQLCentral forum about this particular subject DateTime or nvarchar.
In short, nvarchar is twice as longer as datetime, so it takes more space and on the long range, any action affecting it will be slower. You will have some validation issues and many more.
I wanted to find out what is the "best practices" approach to a query against a record set of datetime with a date (no time).
I use several queries that return records based on a date range, from a recordset that uses a datetime data type, which means each record needs to be checked using a between range.
Example of a query would be:
Select *
FROM Usages
where CreationDateTime between '1/1/2012' AND '1/2/2012 11:59:59'
I know using BETWEEN is a resource hog, and that checking a datetime data type of a date is always going to be very resource intense, but I would like to hear what others use (or would use) in this situation.
Would I get any type of performance increase converting the datetime record to a Date like:
Select *
FROM Usages
where CONVERT(DATE,CreationDateTime) between '1/1/2012' AND '1/2/2012'
Or possibly doing a check of less then / greater then?
Select *
FROM Usages
where (CreationDateTime > '1/1/2012')
AND (CreationDateTime < '1/2/2012 11:59:59')
What you think you know is not correct.
Neither using BETWEEN or DATETIME data types is a resource hog.
Provided that you index the column, that the column really is a DATETIME and not a VARCHAR(), and that you don't wrap the field in a function, everything will be nice and quick.
That said, I would use >= and < instead. Not for performance, but logical correctness.
WHERE
myField >= '20120101'
AND myField < '20120102'
This will work no matter whether the field contains hours, minutes, or even (with a mythical data type) pico seconds.
With an index on the field it will also give a range scan.
You won't get any faster. No tricks or functions needed.
There are several considerations regarding dates.
First, you want to be sure that relevant indexes get used. In general, this means avoiding functions on the column. This applies to data types other than dates, but functions a prevalant for understanding dates. So, CONVERT() is a bad idea from a performance perspective, assuming that the column is indexed.
Second, you want to avoid unnecessary conversions between formats. So, a call to a function must happen for every row. Instead, converting a constant string to a date/time happens once at compile time. The first is less efficient. Another reason to avoid CONVERT(). However, in many queries, other processing (such as joins) is far more time-consuming than conversions, so this may not be important.
As for the choice between "between" and signed operations. The better practice is to use "<" and ">" and ">=" and "<=". It makes the logic clearer for dates and doesn't have an issue with things like seconds being accurate to 3 ms.
As far as I know, between on dates works as efficiently using indexes as other types of fields. However, for accuracy and portability it is best to do the individual comparisons.
So, the third version would be preferred.
I have a situation which I can resolve by adding a column which would be a datetime or a ntext type.
In terms of performace, which of these would be better to use. The number of records are more than 60000. I was thinking datetime since I could index on it, but not sure if ntext can be...any suggestion or discussion on which would handle memory,speed better...or any other performance issues?
Update: the column i will add are independent- one is date time and other is text, i can resolve the issue by having anyone, NOTE: I am not trying to store datetime as ntext here.
60000 records is nothing for SQL Server.
So there shouldn't be any noticeable difference. Maybe there would be a difference if it was a REALLY big table (hundreds of millions of records, and above...), but not with your amount of data.
However, as the others already said: your statement that you can either use datetime or ntext sounds very strange to me. If it's really date and/or time values, use datetime and not ntext!!!
EDIT:
Now that you clarified that you don't want to store date values in a text column:
I would suggest that you use the datetime column. It's better than ntext performance wise.
As a side note: if you prefer to use the text column, you should use nvarchar(max) instead of ntext. ntext is slower and deprecated.
If the data is a DATETIME, use a DATETIME.
You will have problems querying a text field when you need to do date and time operations.
Performance wise - DATETIME is 8 bytes and NVARCHAR for storing a date will be longer. Operations that require date/time work will require conversions with an NVARCHAR field, which will be more expensive than simply using a DATETIME column.
These are two completely different fields. If you need to store dates use datetime, if you need to store text use varchar or long text. Do not store dates as text!