How to represent dates with uncertainty in PostgreSQL

How to represent dates with uncertainty in PostgreSQL - sql

PostgreSQL provides the date format datatype to store dates. The problem with these dates is however they can't - as far as I know - reason about uncertainty.
Sometimes one does not know the full date of something, but knows it happened in January 1995 or in "1999 or 2000" (date2). There can be several reasons for that:
People don't remember the exact date;
The exact date is fundamentally unknown: for instance a person was last seen on some day and found death a few days later; or
We deal with future events so there is still some chance something goes wrong.
I was wondering if there is a datatype to store such "dates" and how they are handed. It would result in thee-valued logic for some operations like for instance date2 < 20001/01/01 should be true, date2 < 2000/01/01 be possible and date2 < 1998/01/01 should be false.
If no such datatype is available, what are good practices to construct such "table" onself?

There are several different ways to approach fuzzy dates. In PostgreSQL, you can use
a pair of date columns (earliest_possible_date, latest_possible_date),
a date column and a precision column ('2012-01-01', 'year'), or
a range data type (daterange), or
a varchar ('2013-01-2?', '2013-??-05'), or
another table or tables with any of those data types.
The range data type is peculiar to recent versions of PostgreSQL. You can use the others in any SQL dbms.
The kind of fuzziness you need is application-dependent. How you query fuzzy dates depends on which data type or structure you pick. You need a firm grasp on what kinds of fuzziness you need to store, and on the kind of questions your users need answered. And you need to test to make sure your database can answer their questions.
For example, in legal systems dates might be remembered poorly or defaced. Someone might say "It was some Thursday in January 2014. I know it was a Thursday, because it was trash pick-up day", or "It was the first week in either June or July last year". To record that kind of fuzziness, you need another table.
Or a postmark might be marred so that you can read only "14, 2014". You know it was postmarked on the 14th, but you don't know which month. Again, you need another table.
Some (all?) of these won't give you three-valued logic unless you jump through some hoops. ("Possible" isn't a valid Boolean value.)

To add to what Mike posted I would use date comments such as:
date Comment
-------------------------------------------------------------------
1/1/2010 Sometime in 2010
7/8/2014 Customer says they will pay the second week in July
1/1/2015 Package will arrive sometime next year in January
Also, you can use date parts. Create a separate column for the Year, Month, and Day. What ever in unknown leave it blank.

Related

SQL Server : best practice query for date manipulation

Long time listener, first time caller.
At work we have all of the date columns for most tables stored in as a simple "string" (varchar) formats. Such as yyyymmdd (eg. 20220625) or yyyymm (202206) etc.
Now for a lot of queries that are time based we need to compare to current date, or some fixed offset from current date.
Now two obvious versions that I know of to get current utc date into either of those formats are the following (for yyyymm as example):
SELECT LEFT(CONVERT(VARCHAR, GETUTCDATE(), 112), 6) ...
SELECT CONVERT(VARCHAR(6), GETUTCDATE(), 112) ...
I'm wondering if anyone knows of a better way, either both idiomatically or performance wise to convert those, and/or is there anything wrong with the second one to be worried about versus the first one in regards to either security/reliability etc? The second one definitely satisfies my code golf sensibilities, but not if it's at the expense of something I'm unaware of.
Also for some extra context the majority of our code runs in SQL Server or T-SQL, BUT we also need to attempt to be as platform agnostic as possible as there are customers on Oracle and/or Mysql.
Any insight/help would be highly appreciated.

There is no problem with either approach. Both work just fine. It is a matter of personal preference which to choose. The first looks more explicit, the second is shorter and thus easier to read maybe. As to performance: You want to get the current day or month only once in a query, so the call doesn't realy affect query runtime.
As to getting this platform agnostic is quite a different story. SQL dialects differ. Especially when it comes to date/time handling. You already notice that SQL Server's date functions are quite restricted. In Oracle and MySQL you would simple state the format you want (TO_CHAR(SYSDATE, 'YYYYMM') in Oracle and DATE_FORMAT(CURRENT_DATE, '%Y%m') in MySQL). But you also see that the function calls differ.
Now, you could write a user defined function GET_CURRENT_MONTH_FORMATTED for this which would return the string for the current month, e.g. '202206'. Then you'd have the different codes hidden in that function and the SQL queries would all look the same. The problem, though, is how to tell the DBMS that the function result is deterministic for a particular timestamp? If you run the query on December 31, 2022 at 23:50 and it runs until January 1, 2023 at 0:20, you want the DBMS to call this function only once for the query resulting in '202212' and not being called again, suddenly resulting in another string '202301'. I don't even know whether this is possible. I guess it is not.
I think you cannot write a query that does what you want and looks the same in all mentioned DBMS.

Microsoft Access 2010 Date Conversion

I don't have much experience so I apologize in advance for a potentially dumb question. I did not create these tables nor the queries that have been used in the past. With that said --
For the past several months I have been using a date conversion query that was given to me to update columns from an integer to a date. It used to work just fine and I swear everything is the same for my latest data extractions, but at some point the dates started getting wonky. For example, a typical date column might look like:
58222
58158
59076
58103
And the conversion query looks something like this:
IIf([D_posting]<>0,[D_posting]-18261,0)
And returns the above dates as:
05/27/2059
03/24/2059
09/27/2061
01/28/2059
Which obviously is wrong. The situation kind of reminds me of how I remember we generated random numbers in C++ (which was a long time ago), but for the life of me I can't figure out how to reverse engineer the correct subtraction factor without a reference point.
I also tried using the CDate() function instead, and it resulted in a bunch of future dates also, leading me to wonder if there's something else wrong. I work for a small physicians group so it might be something in the Electronic Health Records software, but I'd like suggestion on what I should check to make sure it's nothing that I've done.

You could create a query that uses the 'cdate' function (see below) to return the date. You can modify the code so that it subtracts the offset (maybe 18261?)
In the immediate window of VBA you can tinker with the following:
The 'cdate' will take a number and convert it to a date:
?cdate(41925)
10/13/2014
The 'cdbl' will take a date and convert to a number.
?CDbl(Date())
41926

Constructing an sql Query to get records betwen two dates

I'm trying to filter out and report records in a database that fall between a specified date range. I'm there are other threads here on how to do something similar, but my dates are stored as date timestamps (which is why I think the issue is arising)
My current query is as follows:
"SELECT * FROM JOURNAL WHERE Date_Time>'10/10/2013 00:00:00'"
(Note that journal is the name of the table I'm pulling the data from and date_time is the field in which the date is stored. I'm aware the query doesn't quite do what I want it to yet, but I was just testing out a simpler case at first.)
When I run this query (as part of an excel macro), excel reports that it can't find any records even though I know their are records past this date. Does anyone know how to do this properly?
Edit: I've got it, it was an issue unrelated to the query (something else in the macro) Thanks so much for the help (changing the date format worked)

have you tried other date format? like this:
"SELECT * FROM JOURNAL WHERE Date_Time>'2013-10-10:00:00:00'"

A simple between statement is what you need:
SELECT * FROM JOURNAL WHERE Date_Time between '10/10/2013 00:00:00' and '[otherdate]'
You need to run this to check for one important thing: If the server is running the BETWEEN as inclusive or not. If it's inclusive, both dates are included. If not, the range will begin either before or after one or both.
I've seen SQL servers that are the same in every respect actually treat this condition differently. So it's a good idea to check that.

Whats the best way to handle a SQL query on a Date (no time)?

We have date columns in our database that are just a day - like birth date. However, SQL Server stores them as a date & time and the time in the records has various values (no idea how it ended up that way).
The problem is people will run a query for all birthdates <= {some date} and the ones that are equal aren't returned because a DateTime (using ADO.NET) set to a given date has a time of midnight.
I understand what's going on. The question is how best to handle this. We could force in a time of 23:23:59.999999999 to the date but that feels like it would have problems.
What's the standard best practice for handling this?

Simply add 1 day to {some_date} and use a less than comparison. Just make sure it's the next day at 12am...

If you need to query this frequently, I would probably add a computed, persisted column that casts your DATETIME to just a DATE (assuming you're on SQL Server 2008 or newer):
ALTER TABLE dbo.YourTableName
ADD JustDay AS CAST(YourDateTimeColumn AS DATE) PERSISTED
That way, you can now query on JustDay and it's just a DATE - no time portion involved. Since it's computed, there's no need to update it constantly; SQL Server will do this automagically for you. And since it's persisted, it's part of the table's on-disk structure and just as fast as a query on any other column - and it can even be indexed, if needed.
It's a classic space - vs - speed tradeoff - since you're now storing the date-only portion of all your birthdays, too, you're on-disk structure will be larger; on the other hand, since you have a nice, date-only column that can be indexed, you have a great way to speed up searches.

You say
The problem is people will run a query for all birthdates <= {some
date}
You could leave it as is and make sure people get rid of the time by using something like the following in their WHERE clauses:
CONVERT(DATETIME,CONVERT(CHAR(8),birthdates,112))<= {some date}
..or in later versions of SQL-Server:
CONVERT(DATE,birthdates)<= {some date}
But this is a workaround and best to take the other advice and get rid of the time in the actual target data.

One more option is:
DATEDIFF(d, birthdates, {some date}) <= 0

sql store date when just the decade / century is known

i have a list of books i want to store in a database. one of the attributes is the date when the book was first published. of the older books (older than 100 years) i often just know the decade (like 185X) or in case of the very old books just the century (like 15XX).
how would you save those dates in a datetime2 field? 15XX as 1500? i want to be able to query for books which are older than a hundred years, for example. so i somehow want to store those values as a valid datetime2 value. any recommendations? 15XX as '1500-01-01 00:00' seems reasonable to me. any drawbacks of that approach?

The only drawback is when someone asks for all books published from 1550 to 1650. Your 15XX became 1500, so it won't be included in his results.
What you really have is a period of uncertainty when given book was published. I'd store 2 dates: one when the period started, and the other when ended. Modern books will have it set to same dates, but the oldest ones can be stored as 1500-01-01 00:00 - 1599-12-31 23:59
Of course it will complicate selects. You have to decide if it's worth it. You may declare that asking for "1550 to 1650" is plain stupid.

In extension to #dragon112's answer, is there the possibility that you would need 15XX as BOTH of the first two options? (In the same way as NULL is and isn't any value, at the same time.)
the oldest possible date for that book (for 15xx it would be 1500)
the youngest possible date for that book (for 15xx it would be 1599)
If so, you could store two dates and make a date range within which the book was published.
This does make your queries/system more complex. When writing the SQL bot of these are syntactically correct, but you'd need to pick which is appropriate in any given situation, as they could give different results...
WHERE
earliestPublishDate > '1550-01-01'
WHERE
latestPublishDate > '1550-01-01'
So, the most important question when determining how to store your data:
- How are you going to interrogate it?
You need to know your use-cases (or likely use cases) in order to determine your correct data representation.

In my opinion there are 3 ways of saving the date of such books:
the oldest possible date for that book (for 15xx it would be 1500)
the youngest possible date for that book (for 15xx it would be 1599)
halfway the above (for 15xx it would be 1550)
These approaches are irrelevant for the code itself, but they will influence your results when you query for a certain age. So whatever feels best for you should be fine in my opinion.
In other words when you query for a book of 500 years old would you want to get a book that is from 15xx or not? As it is the year 2012 right now the book will not be returned by the database (2012 - 500 = 1512).

Interesting question, I would consider the following solution:
Save the values as two fields on the database.
The first are stored in the format as you mentioned '1500-01-01 00:00' for sorting purposes. The second field are used to record the original value 15XX, its data type is of an alphanumeric type.
With this approach you are not losing the fact that the data is unknown. But you still you meet your requirement of searching for books older than a certain date.
The date time field is then strictly a calculated from the alphanumeric field.

If you have no need to store time with date then use datatype "Date", no need to go for datetime2 to just allowing date from 01-01-0001.
Date also support dates from 0001-01-01 through 9999-12-31. Datetime2 has more time accuracy than datetime.

DECLARE #var VARCHAR(100)
SET #var = ''
SET #var = CASE LEN(#var)
WHEN 1 THEN #var + '000'
WHEN 2 THEN #var + '00'
WHEN 3 THEN #var + '0'
ELSE #var
END
SELECT CAST(#var AS DATE)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to represent dates with uncertainty in PostgreSQL - sql

Related

SQL Server : best practice query for date manipulation

Microsoft Access 2010 Date Conversion

Constructing an sql Query to get records betwen two dates

Whats the best way to handle a SQL query on a Date (no time)?

sql store date when just the decade / century is known

Categories

Resources