i have a list of books i want to store in a database. one of the attributes is the date when the book was first published. of the older books (older than 100 years) i often just know the decade (like 185X) or in case of the very old books just the century (like 15XX).
how would you save those dates in a datetime2 field? 15XX as 1500? i want to be able to query for books which are older than a hundred years, for example. so i somehow want to store those values as a valid datetime2 value. any recommendations? 15XX as '1500-01-01 00:00' seems reasonable to me. any drawbacks of that approach?
The only drawback is when someone asks for all books published from 1550 to 1650. Your 15XX became 1500, so it won't be included in his results.
What you really have is a period of uncertainty when given book was published. I'd store 2 dates: one when the period started, and the other when ended. Modern books will have it set to same dates, but the oldest ones can be stored as 1500-01-01 00:00 - 1599-12-31 23:59
Of course it will complicate selects. You have to decide if it's worth it. You may declare that asking for "1550 to 1650" is plain stupid.
In extension to #dragon112's answer, is there the possibility that you would need 15XX as BOTH of the first two options? (In the same way as NULL is and isn't any value, at the same time.)
the oldest possible date for that book (for 15xx it would be 1500)
the youngest possible date for that book (for 15xx it would be 1599)
If so, you could store two dates and make a date range within which the book was published.
This does make your queries/system more complex. When writing the SQL bot of these are syntactically correct, but you'd need to pick which is appropriate in any given situation, as they could give different results...
WHERE
earliestPublishDate > '1550-01-01'
WHERE
latestPublishDate > '1550-01-01'
So, the most important question when determining how to store your data:
- How are you going to interrogate it?
You need to know your use-cases (or likely use cases) in order to determine your correct data representation.
In my opinion there are 3 ways of saving the date of such books:
the oldest possible date for that book (for 15xx it would be 1500)
the youngest possible date for that book (for 15xx it would be 1599)
halfway the above (for 15xx it would be 1550)
These approaches are irrelevant for the code itself, but they will influence your results when you query for a certain age. So whatever feels best for you should be fine in my opinion.
In other words when you query for a book of 500 years old would you want to get a book that is from 15xx or not? As it is the year 2012 right now the book will not be returned by the database (2012 - 500 = 1512).
Interesting question, I would consider the following solution:
Save the values as two fields on the database.
The first are stored in the format as you mentioned '1500-01-01 00:00' for sorting purposes. The second field are used to record the original value 15XX, its data type is of an alphanumeric type.
With this approach you are not losing the fact that the data is unknown. But you still you meet your requirement of searching for books older than a certain date.
The date time field is then strictly a calculated from the alphanumeric field.
If you have no need to store time with date then use datatype "Date", no need to go for datetime2 to just allowing date from 01-01-0001.
Date also support dates from 0001-01-01 through 9999-12-31. Datetime2 has more time accuracy than datetime.
DECLARE #var VARCHAR(100)
SET #var = ''
SET #var = CASE LEN(#var)
WHEN 1 THEN #var + '000'
WHEN 2 THEN #var + '00'
WHEN 3 THEN #var + '0'
ELSE #var
END
SELECT CAST(#var AS DATE)
Related
Long time listener, first time caller.
At work we have all of the date columns for most tables stored in as a simple "string" (varchar) formats. Such as yyyymmdd (eg. 20220625) or yyyymm (202206) etc.
Now for a lot of queries that are time based we need to compare to current date, or some fixed offset from current date.
Now two obvious versions that I know of to get current utc date into either of those formats are the following (for yyyymm as example):
SELECT LEFT(CONVERT(VARCHAR, GETUTCDATE(), 112), 6) ...
SELECT CONVERT(VARCHAR(6), GETUTCDATE(), 112) ...
I'm wondering if anyone knows of a better way, either both idiomatically or performance wise to convert those, and/or is there anything wrong with the second one to be worried about versus the first one in regards to either security/reliability etc? The second one definitely satisfies my code golf sensibilities, but not if it's at the expense of something I'm unaware of.
Also for some extra context the majority of our code runs in SQL Server or T-SQL, BUT we also need to attempt to be as platform agnostic as possible as there are customers on Oracle and/or Mysql.
Any insight/help would be highly appreciated.
There is no problem with either approach. Both work just fine. It is a matter of personal preference which to choose. The first looks more explicit, the second is shorter and thus easier to read maybe. As to performance: You want to get the current day or month only once in a query, so the call doesn't realy affect query runtime.
As to getting this platform agnostic is quite a different story. SQL dialects differ. Especially when it comes to date/time handling. You already notice that SQL Server's date functions are quite restricted. In Oracle and MySQL you would simple state the format you want (TO_CHAR(SYSDATE, 'YYYYMM') in Oracle and DATE_FORMAT(CURRENT_DATE, '%Y%m') in MySQL). But you also see that the function calls differ.
Now, you could write a user defined function GET_CURRENT_MONTH_FORMATTED for this which would return the string for the current month, e.g. '202206'. Then you'd have the different codes hidden in that function and the SQL queries would all look the same. The problem, though, is how to tell the DBMS that the function result is deterministic for a particular timestamp? If you run the query on December 31, 2022 at 23:50 and it runs until January 1, 2023 at 0:20, you want the DBMS to call this function only once for the query resulting in '202212' and not being called again, suddenly resulting in another string '202301'. I don't even know whether this is possible. I guess it is not.
I think you cannot write a query that does what you want and looks the same in all mentioned DBMS.
I have a bunch of data in my database and I want to filter out data that has been stored that for longer than a week. I'm using SQLServer and I found that I could use the 'DATEDIFF' function.
At the moment it works great and fast but I don't have a lot of records at the moment therefore anything runs quite smoothly.
After some research online I found out that the comparison of integers in databases is faster than the comparison of strings, I assume at this point that the comparison of datetimes (using the given function) is even slower at a major scale.
Let's say my database table content looks like this:
Currently I would filter out records that are older like a week like so:
SELECT * FROM onlineMainTable WHERE DATEDIFF(wk, Date, GETDATE()) > 1
I assume that this query would be quite slow if there were a thousand rows tin the table.
The status column represents a calculation status, I wondered if I would speed up the process if I were to look for a specific status instead of matching datetimes, for me in order to set that status to the one that represents 'old records' I need to update those rows before I select them, it would look something like this:
UPDATE table SET Status = -1 WHERE NOT Status = -1 AND DATEDIFF(wk, Date, GETDATE()) > 1;
SELECT * FROM table WHERE Status = -1;
I used the '-1' as an example.
So obviously I could be wrong but I think updating in this case would be fast enough since there won't be that many records to update since older ones have already been updated with its status. The selection would be faster aswell since I would be matching integers instead of datetimes.
The downside to my (possible) solution is that I would query twice every time I fetch data, even when it might not be needed (if every row is newer than 1 week).
It comes down to this: Should I compare datetimes or should I update an integer column based on that datetime and then select using the comparison of those ints?
If there is a different/better way of doing this i'm all ears.
Context
I am making a webapp for quotation requests. Requests should expire after a week since they won't be valid at that point. I need to both display valid requests and expired requests (so costumers have an overview). All these requests are stored in a database table.
Indexes are the objects that are design to improve select queries performances the drawback is that they slow down insert delete and update operations, so they have to be used when necessary. Generally DBMS provide tools to explain queries execution plan.
Maybe you just need to add an index on Date column:
create index "index_name" on onlineMainTable(Date)
and query could be
SELECT * FROM onlineMainTable WHERE Date > DATEADD(week,-1,GETDATE());
PostgreSQL provides the date format datatype to store dates. The problem with these dates is however they can't - as far as I know - reason about uncertainty.
Sometimes one does not know the full date of something, but knows it happened in January 1995 or in "1999 or 2000" (date2). There can be several reasons for that:
People don't remember the exact date;
The exact date is fundamentally unknown: for instance a person was last seen on some day and found death a few days later; or
We deal with future events so there is still some chance something goes wrong.
I was wondering if there is a datatype to store such "dates" and how they are handed. It would result in thee-valued logic for some operations like for instance date2 < 20001/01/01 should be true, date2 < 2000/01/01 be possible and date2 < 1998/01/01 should be false.
If no such datatype is available, what are good practices to construct such "table" onself?
There are several different ways to approach fuzzy dates. In PostgreSQL, you can use
a pair of date columns (earliest_possible_date, latest_possible_date),
a date column and a precision column ('2012-01-01', 'year'), or
a range data type (daterange), or
a varchar ('2013-01-2?', '2013-??-05'), or
another table or tables with any of those data types.
The range data type is peculiar to recent versions of PostgreSQL. You can use the others in any SQL dbms.
The kind of fuzziness you need is application-dependent. How you query fuzzy dates depends on which data type or structure you pick. You need a firm grasp on what kinds of fuzziness you need to store, and on the kind of questions your users need answered. And you need to test to make sure your database can answer their questions.
For example, in legal systems dates might be remembered poorly or defaced. Someone might say "It was some Thursday in January 2014. I know it was a Thursday, because it was trash pick-up day", or "It was the first week in either June or July last year". To record that kind of fuzziness, you need another table.
Or a postmark might be marred so that you can read only "14, 2014". You know it was postmarked on the 14th, but you don't know which month. Again, you need another table.
Some (all?) of these won't give you three-valued logic unless you jump through some hoops. ("Possible" isn't a valid Boolean value.)
To add to what Mike posted I would use date comments such as:
date Comment
-------------------------------------------------------------------
1/1/2010 Sometime in 2010
7/8/2014 Customer says they will pay the second week in July
1/1/2015 Package will arrive sometime next year in January
Also, you can use date parts. Create a separate column for the Year, Month, and Day. What ever in unknown leave it blank.
I'm trying to filter out and report records in a database that fall between a specified date range. I'm there are other threads here on how to do something similar, but my dates are stored as date timestamps (which is why I think the issue is arising)
My current query is as follows:
"SELECT * FROM JOURNAL WHERE Date_Time>'10/10/2013 00:00:00'"
(Note that journal is the name of the table I'm pulling the data from and date_time is the field in which the date is stored. I'm aware the query doesn't quite do what I want it to yet, but I was just testing out a simpler case at first.)
When I run this query (as part of an excel macro), excel reports that it can't find any records even though I know their are records past this date. Does anyone know how to do this properly?
Edit: I've got it, it was an issue unrelated to the query (something else in the macro) Thanks so much for the help (changing the date format worked)
have you tried other date format? like this:
"SELECT * FROM JOURNAL WHERE Date_Time>'2013-10-10:00:00:00'"
A simple between statement is what you need:
SELECT * FROM JOURNAL WHERE Date_Time between '10/10/2013 00:00:00' and '[otherdate]'
You need to run this to check for one important thing: If the server is running the BETWEEN as inclusive or not. If it's inclusive, both dates are included. If not, the range will begin either before or after one or both.
I've seen SQL servers that are the same in every respect actually treat this condition differently. So it's a good idea to check that.
We have date columns in our database that are just a day - like birth date. However, SQL Server stores them as a date & time and the time in the records has various values (no idea how it ended up that way).
The problem is people will run a query for all birthdates <= {some date} and the ones that are equal aren't returned because a DateTime (using ADO.NET) set to a given date has a time of midnight.
I understand what's going on. The question is how best to handle this. We could force in a time of 23:23:59.999999999 to the date but that feels like it would have problems.
What's the standard best practice for handling this?
Simply add 1 day to {some_date} and use a less than comparison. Just make sure it's the next day at 12am...
If you need to query this frequently, I would probably add a computed, persisted column that casts your DATETIME to just a DATE (assuming you're on SQL Server 2008 or newer):
ALTER TABLE dbo.YourTableName
ADD JustDay AS CAST(YourDateTimeColumn AS DATE) PERSISTED
That way, you can now query on JustDay and it's just a DATE - no time portion involved. Since it's computed, there's no need to update it constantly; SQL Server will do this automagically for you. And since it's persisted, it's part of the table's on-disk structure and just as fast as a query on any other column - and it can even be indexed, if needed.
It's a classic space - vs - speed tradeoff - since you're now storing the date-only portion of all your birthdays, too, you're on-disk structure will be larger; on the other hand, since you have a nice, date-only column that can be indexed, you have a great way to speed up searches.
You say
The problem is people will run a query for all birthdates <= {some
date}
You could leave it as is and make sure people get rid of the time by using something like the following in their WHERE clauses:
CONVERT(DATETIME,CONVERT(CHAR(8),birthdates,112))<= {some date}
..or in later versions of SQL-Server:
CONVERT(DATE,birthdates)<= {some date}
But this is a workaround and best to take the other advice and get rid of the time in the actual target data.
One more option is:
DATEDIFF(d, birthdates, {some date}) <= 0