I have a table which needs to store the quarter and year, and I need to know which is the best way to do this. I found this answer from 10 years ago on SO: Best way to store quarter and year in SQL Server. However, there are two suggestions given--one is storing quarter and year in separate columns and making them integers, the other being storing as a datetime and using the first day of the month for the day (i.e., 1/1/2021, 4/1/2021, etc.).
Considering this answer is 10 years old and there could be better ways now for storing this data, what is the best method?
FYI, this data will not be used for calculation purposes, but will probably be searched on.
Thanks!
I recommend always storing date related data as the datetime data type.
Storing them separately is the worst possible approach, searching becomes very difficult. Try writing the query returning all quarters between 3Q2019 and 1Q2021 when your year and quarter are separate.
Breaking it into separate parts puts the responsibility on the developer to handle the year boundary appropriately, which many do not.
DateTime data type also includes validation (Q5 2020 would throw an error) to prevent data errors.
Use the right tool for the job. DateTime data should always be stored in a DateTime datatype.
Related
Im making an app that uses both birthday and age to make some deductions,
but as the age can be obtain through the birthday and current date, Im questioning if I should be storing them both and not just the date, for one part I could use the age attribute to simplify some querys without converting dates, what would be the right thing to do following conventions?
Calculations based on data should be always... calculated, not stored. Well, not always, usually, but
it depends on situation. Below are couple of pros and cons:
Cons
calculation logic might change, so stored values will be no loner valid.
or invalid data could be entered (and you will receive invalid data when querying).
or the result changes with time, as age does, eg. today you have 20 years, but in one year you will have 21.
Pros
however, as #RonenAriely mentioned, storing calculated data in order to gain performance is one of pros of such approach.
So, to sum up, you should make calculations, like DATEDIFF(NOW(), DateOfBirth) to get an age, as the result changes in time and the function don't influence performance much.
I would say store just the DOB and calculate the age when using.
I mainly prefer this because age will continuously change and you have to make sure to update it depending on how accurately you are measuring it. This will kind of beat the purpose of computing once and using multiple times because you'll be recomputing a lot of times. Then since it is redundant it'll also unnecessarily occupy space in your tables.
Hope it helped
Generally only birth date is stored.
You can create a common helper method to calculate age. Preferably static to avoid additional memory consumption.
Also saving age in database makes less sense as in such a case you would be required to run a daily cron to see which user's age is increasing by 1 that day and then update in the database.
As said here,
you have to ensure that it is not possible for the derived value to
become out-of-date undetected.
Birthday never goes out-of-date so you would be OK!
Better to follow the normalised approach and only store date of birth. Age would be marginally quicker to retrieve but, for that to be correct, you'd have to refresh the table on a daily basis.
If you were running a DB search on age range, then you could convert min and max ages to an upper and lower date of birth based on the current date and then search accordingly.
I have a database with integer fields (columns) named fSystemDate, fOpenned, fStatusDate, etc... I think they represent dates, but I don't know their format. The values in those fields are how these: 76505, 76530, 76554, 76563.
I do not have examples with the real date associated with them.
Solved. See answers.
I found that this format is part of a programming language called Clarion and his date numbering starts at the date 28-December-1800.
I can convert clarion data to sql date in two ways:
SELECT DATEADD(day, 76505, '28-12-1800')
where the result would be 2010-06-15 00:00:00.
SELECT CONVERT(DateTime,76505 - 36163)
where the result is same. The number 36163 is used to adjust a SQL. This is the number of days between 1-Jan-1900 (MSSQL datetime numbering starts) and 28-Dec-1800 (Clarion datetime numbering starts).
The result in my case is correct because I asked them (my customer) examples of data from your application and compare information.
It's rather hard to help you given just a number. It looks like your dates are some sort of serial number. But without any other data points
epoch. An epoch is the zero point of a calendrical system.
increment. How big is a tick in the serial number? 1 day? 1 hour, 1 minute? A week? A month?
source hardware/operating system. From what computer system did the value originate? Different systems represent dates differently, using different calendrical systems with different epochs.
source software system. What software created the value? Was it custom software? What language what it written in? When? What is the backing store for the data? Databases, filesystems, etc., might all have their own internal date representation.
the represented value. If 76563 is indeed a representation of a date, what date does it represent? Or at least, does it represent a recent date? a date in the past? a date in the future?
It's impossible to answer your question. This page might help you:
http://www.itworld.com/article/2823335/data-center/128452-Just-dating-The-stories-behind-12-computer-system-reference-dates.html
It lists some common epochs for different computer systems:
Edited to note: here's one data point for you: Adding 76,563 days to 1 Jan 1800 yields the date 16 August 2009.
I've worked with various ORMs and database abstractions designed to make it easy to work with multiple databases, both relational and not. The more comprehensive solutions will usually give you access to some date functions that boil down to actual SQL (or whatever, in the case of non-SQL dbs). On the other hand, many of these abstractions don't provide direct access to SQL functions and you lose the ability to deal with dates directly. Instead, you're expected to use the upper-level language (PHP, Python, whatever) to do your date-wrangling, and finally only insert, select, what-have-you the formatted date.
So my question is this: if the SQL server never gets to do anything with the date itself, am I better off just using an int and putting epoch timestamps in it, or is there additional value to the database server "knowing" it's a date?
If you are using dates, store them as dates.
Not only does this make it easier to translate between the database and application, but when you need to do anything based on the dates (and you will, otherwise why have dates stored at all?).
That is, when you need to sort or query using the dates, you will not need to go trough special effort to re-convert to dates.
Other than what #Oded said, if you never ever use any date related functions, Still there are some issues;
At the moment, you cannot store epoch timestamp in milliseconds into an INT field (overflows).
Timestamp without milliseconds will overflow INT on Tue Jan 19 2038 # 03:14:08 GMT+0000 (GMT) as it will be greater than 2147483647.
BUT, Integer takes 4 bytes and Datetime takes 8 bytes. You are better off 4 bytes if you are within above two limitations.
I have this huge database of records that have been created over the past 5 or so years. I'm thinking it would be cool (and edifying) to try to create some time categories/segments for these records, the unit could be week or month or something like that, something to use for a graph.
Anyway, I need to develop a query that, given a datetime attr for each record in the table, would return all the records with a datetime falling in between X and Y (June 1, 2011 & June 7, 2011, for example).
I'm not good at using the time helpers yet and could not find any sufficiently similar questions on SO or elsewhere.
Solutions that use subjective increments like "week" or "month" that rails can understand would be strongly appreciated. I know how tricky the calendar can get in programming. Or I could just use some lowest common denominator (day) and do an extremely fine graph.
Client.where(:created_at => X..Y)
Source: Ruby on Rails Guides
We had this programming discussion on Freenode and this question came up when I was trying to use a VARCHAR(255) to store a Date Variable in this format: D/MM/YYYY. So the question is why is it so bad to use a VARCHAR to store a date. Here are the advantages:
Its faster to code. Previously I used DATE, but date formatting was a real pain.
Its more power hungry to use string than Date? Who cares, we live in the Ghz era.
Its not ethically correct (lolwut?) This is what the other user told me...
So what would you prefer to use to store a date? SQL VARCHAR or SQL DATE?
Why not put screws in with a hammer?
Because it isn't the right tool for the job.
Some of the disadvantages of the VARCHAR version:
You can't easily add / subtract days to the VARCHAR version.
It is harder to extract just month / year.
There is nothing stopping you putting non-date data in the VARCHAR column in the database.
The VARCHAR version is culture specific.
You can't easily sort the dates.
It is difficult to change the format if you want to later.
It is unconventional, which will make it harder for other developers to understand.
In many environments, using VARCHAR will use more storage space. This may not matter for small amounts of data, but in commercial environments with millions of rows of data this might well make a big difference.
Of course, in your hobby projects you can do what you want. In a professional environment I'd insist on using the right tool for the job.
When you'll have database with more than 2-3 million rows you'll know why it's better to use DATETIME than VARCHAR :)
Simple answer is that with databases - processing power isn't a problem anymore. Just the database size is because of HDD's seek time.
Basically with modern harddisks you can read about 100 records / second if they're read in random order (usually the case) so you must do everything you can to minimize DB size, because:
The HDD's heads won't have to "travel" this much
You'll fit more data in RAM
In the end it's always HDD's seek times that will kill you. Eg. some simple GROUP BY query with many rows could take a couple of hours when done on disk compared to couple of seconds when done in RAM => because of seek times.
For VARCHAR's you can't do any searches. If you hate the way how SQL deals with dates so much, just use unix timestamp in 32 bit integer field. You'll have (basically) all advantages of using SQL DATE field, you'll just have to manipulate and format dates using your choosen programming language, not SQL functions.
Two reasons:
Sorting results by the dates
Not sensitive to date formatting changes
So let's take for instance a set of records that looks like this:
5/12/1999 | Frank N Stein
1/22/2005 | Drake U. La
10/4/1962 | Goul Friend
If we were to store the data your way, but sorted on the dates in assending order SQL will respond with the resultset that looks like this:
1/22/2005 | Drake U. La
10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein
Where if we stored the dates as a DATETIME, SQL will respond correctly ordering them like this:
10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein
1/22/2005 | Drake U. La
Additionally, if somewhere down the road you needed to display dates in a different format, for example like YYYY-MM-DD, then you would need to transform all your data or deal with mixed content. When it's stored as a SQL DATE, you are forced to make the transform in code, and very likely have one spot to change the format to display all dates--for free.
Between DATE/DATETIME and VARCHAR for dates I would go with DATE/DATETIME everytime. But there is a overlooked third option. Storing it as a INTEGER unsigned!
I decided to go with INTEGER unsigned in my last project, and I am really satisfied with making that choice instead of storing it as a DATE/DATETIME. Because I was passing along dates between client and server it made the ideal type for me to use. Instead of having to store it as DATE and having to convert back every time I select, I just select it and use it however I want it. If you want to select the date as a "human-readable" date you can use the FROM_UNIXTIME() function.
Also a integer takes up 4 bytes while DATETIME takes up 8 bytes. Saving 50% storage.
The sorting problem that Berin proposes is also solved using integer as storage for dates.
I'd vote for using the date/datetime types, just for the sake of simplicity/consistency.
If you do store it as a character string, store it in ISO 8601 format:
http://www.iso.org/iso/date_and_time_format
http://xml.coverpages.org/ISO-FDIS-8601.pdf
http://www.cl.cam.ac.uk/~mgk25/iso-time.html
Among other things, ISO 8601 date/time string (A) collate properly, (B) are human readable, (C) are locale-indepedent, and (D) are readily convertable to other formats. To crib from the ISO blurb, ISO 8601 strings offer
representations for the following:
Date
Time of the day
Coordinated universal time (UTC)
Local time with offset to UTC
Date and time
Time intervals
Recurring time intervals
Representations can be in one of two formats: a basic format
that has a minimal number of characters and an extended format
that adds characters to enhance human readability. For example,
the third of January 2003 can be represented as either 20030103
or 2003-01-03.
[and]
offer the following advantages over many of the locally used
representations:
Easily readable and writeable by systems
Easily comparable and sortable
Language independent
Larger units are written in front of smaller units
For most representations the notation is short and of constant length
One last thing: If all you need to do is store a date, then storing it in the ISO 8601 short form YYYYMMDD in a char(8) column takes no more storage than a datetime value (and you don't need to worry about the 3 millisecond gap between the last tick of the one day and the first tick of the next. But that's a matter for another discussion. If you break it up into 3 columns — YYYY char(4), MM char(2), DD char(2) you'll use up the same amount of storage, and get more options for indexing. Even better, store the fields as a short for yyyy (4 bytes), and a tinyint for each of MM and DD — now you're down to 6 bytes for the date. The drawback, of course, to decomposing the date components into their constituent parts is that conversion to proper date/time data types is complicated.