Rails - storing a date when the day is optional

Rails - storing a date when the day is optional - sql

I'm collecting a date to store in the database but have to account for the fact that the user may not know the exact date. I want to offer the option to only enter the month and year. What would be the best way to store these values on the database?
If I use a Date object without the day and then store it as a date in the database then it defaults to the 1st of the month, which would be inaccurate. I had the idea of potentially storing the day, month and year separately as integers and then having a method on the model that returned a Date object and whether or not the day was accurate (ie had been inputted by the user and not just defaulted by the system).
It seems a little messy though, is there a better way to do this?
Thanks.

There are multiple solutions available. Choose one which serves your use-cases the best:
1. Individual date-part fields
You've already experimented with this idea. It seems something like this:
create table t(
year int,
month int,
day int
)
A month of 2017-03 could be represented with (2017, 3, NULL).
Note that all fields are NULL-able. You can make year NOT NULL if you want to some information at least.
With this model, you must use client logic to construct some kind of date-like object for further use.
The big disadvantage of this model is that it is hard to index. You could index f.ex. make_date(year, coalesce(month, 1), coalesce(day, 1)) but using it in queries is rather inconvenient. Also, to disallow some value compositions, which make no sense (f.ex. given a year and a day, but not a month), you should add a (really long) CHECK constraint too, f.ex.
CHECK (CASE
WHEN year IS NULL THEN month IS NULL AND day IS NULL
ELSE CASE WHEN month IS NULL THEN day IS NULL END
END)
2. Sample date and precision
create table t(
sample_date date,
sample_precision date_precision -- enum of f.ex. 'year', 'month', 'day'
)
A month of 2017-03 could be represented with ('2017-03-28', 'month').
This doesn't require long CHECK constraints, but it is fairly hard to select by dates if sample_date is truly just a sample (f.ex. when the whole month of 2017-03 should be represented in a row, a sample date could even be 2017-03-28). When you use the first date as sample_date (from the values it can take, based on sample_precision) things will get slightly easier. But then the following CHECK constraint would be needed for integrity:
CHECK (date_trunc(sample_precision::text, sample_date)::date = sample_date)
(More on improving this further, later.)
3. Possible range
You can store a possible range of dates. With either a possible_start and possible_end or with PostgreSQL's daterange type.
create table t(
possible_start date,
possible_end date,
-- or
possible_range daterange
)
A month of 2017-03 could be represented with ('2017-03-01', '2017-03-31').
In this model when possible_start = possible_end then the date value is exact. You could query two different things now:
Which rows happening around given date(s) for sure (contains)
Which rows possibly happening around given date(s) (intersects)
Both of these types of queries can use indexes with daterange.
The beauty of this is that you are not limited to month ranges. You can use literally any length of ranges. Its only drawback is that the range must be contiguous.
2. + 3. ?
There is a variant, which has all of 3.'s advantages, but looks like 2. with the interval type:
create table t(
possible_start date,
possible_length interval day
)
A month of 2017-03 could be represented with ('2017-03-01', '1 month').
(The day qualifier restricts the minimum precision of the interval to be a day. It is not required for timestamp or timestamptz based solutions.)
The last possible date could be represented with (possible_start + possible_length - interval '1 day')::date. Or, the whole range as (for a daterange index): daterange(possible_start, (possible_start + possible_length)::date) (ranges are implicitly exclusive on their end).
http://rextester.com/AWIO2403

you can store a date in one column and precision in other, to compare values, you can use date_trunc(precision_column,DATE), eg:
t=# create table so36(d date,p text);
CREATE TABLE
t=# insert into so36 select now(),'day';
INSERT 0 1
t=# insert into so36 select '2017-03-01','month';
INSERT 0 1
t=# select *,date_trunc(p,d),date_trunc(p,now()) from so36;
d | p | date_trunc | date_trunc
------------+-------+------------------------+------------------------
2017-03-28 | day | 2017-03-28 00:00:00+00 | 2017-03-28 00:00:00+00
2017-03-01 | month | 2017-03-01 00:00:00+00 | 2017-03-01 00:00:00+00
(2 rows)

Related

What is the fastest way to populate table with dates after certain day?

Let's assume that we have the following input parameters:
date [Date]
period [Integer]
The task is the following: build the table which has two columns: date and dayname.
So, if we have date = 2018-07-12 and period = 3 the table should look like this:
date |dayname
-------------------
2018-07-12|THURSDAY
2018-07-13|FRIDAY
2018-07-14|SATURDAY
My solution is the following:
select add_days(date, -1) into previousDay from "DUMMY";
for i in 1..:period do
select add_days(previousDay, i) into nextDay from "DUMMY";
:result.insert((nextDay, dayname(nextDay));
end for;
but I don't like the loop. I assume that it might be a problem in the performance if there are more complicated values that I want to put to result table.
What would be the better solution to achieve the target?

Running through a loop and inserting values one by one is most certainly the slowest possible option to accomplish the task.
Instead, you could use SAP HANA's time series feature.
With a statement like
SELECT to_date(GENERATED_PERIOD_START)
FROM SERIES_GENERATE_TIMESTAMP('INTERVAL 1 DAY', '01.01.0001', '31.12.9999')
you could generate a bounded range of valid dates with a given interval length.
In my tests using this approach brought the time to insert a set of dates from ca. 9 minutes down to 7 seconds...
I've written about that some time ago here and here if you want some more examples for that.
In those examples, I even included the use of series tables that allow for efficient compression of timestamp column values.

Series Data functions include SERIES_GENERATE_DATE which returns a set of values in date data format. So you don't have to bother to convert returned data into desired date format.
Here is a sample code
declare d int := 5;
declare dstart date := '01.01.2018';
SELECT generated_period_start FROM SERIES_GENERATE_DATE('INTERVAL 1 DAY', :dstart, add_days(:dstart, :d));

YYYY-MM column type in PostgreSQL

I need to a value associated to a month and a user in a table. And I want to perform queries on it. I don't know if there is a column data type for this type of need. If not, should I:
Create a string field and build year-month concatenation (2017-01)
Create a int field and build year-month concatenation (201701)
Create two columns (one year and one month)
Create a date column at the beginning of the month (2017-01-01 00:00:00)
Something else?
The objective is to run queries like (pseudo-SQL):
SELECT val FROM t WHERE year_month = THIS_YEAR_MONTH and user_id='adc1-23...';

I would suggest not thinking too hard about the problem and just using the first date/time of the month. Postgres has plenty of date-specific functions -- from date_trunc() to age() to + interval -- to support dates.
You can readily convert them to the format you want, get the difference between two values, and so on.
If you phrase your query as:
where year_month = date_trunc('month', now()) and user_id = 'adc1-23...'
Then it can readily take advantage of an index on (user_id, year_month) or (year_month, user_id).

If you are interested in display values in YYYY-MM formt you can use to_char(your_datatime_colum,'YYYY-MM')
example:
SELECT to_char(now(),'YYYY-MM') as year_month

Store date range in a single column in Oracle SQL

Here trip 1 involves 2 activity_code in a single day and also concludes in a single day and most other activities are just single day but i have one trip that span over more than one day.
What could be the best possible way to store date range for that column that span more than one days.
Splitting the column into multiple begin date and end date just doesn't make sense as there would be many blank columns?
trip_id(pk,fk) Activity_code(pk,fk) date
1 a1 1st October 2015
1 a2 1st October 2015
2 a3 2nd -5th October 2015
Keep in mind that i need to search the activity_code on basis of month. such as list all the activity code that occur in October ?
Is it possible to insert a range of date in a single column or any other design solution ?
Is there any datatype that can represent the date range in single value ?
PS: oracle 11g e

Store the date ranges as FirstDate/LastDate or FirstDate/Duration.
This allows you to store the values in the native format for dates. Storing dates as strings is a bad, bad idea, because strings don't have all the built-in functionality provided for native date types.
Don't worry about the additional storage for a second date or duration. In fact, the two columns together are probably smaller than storing the value as a string.

Splitting the date into start date and end date would be ideal. Storing dates as strings is not recommended. If you store your dates as strings then there is a possibility of malformed data being stored in the column since a VARCHAR2 column will allow any value. You will have to build strong validations in your script while inserting the data which is unnecessary.
Secondly, you will not be able to perform simple operations like calculating the duration/length of the trip easily if both the start_date and end_date are stored in the same column. If they are stored in different columns it would be as simple as
SELECT trip_id, activity_code, end_date - start_date FROM trips;

Date/Time data types and declaration in SQL Server

I would like to have a date and time column in my table. The main purpose of having these 2 columns is to be able to return query results like:
Number of treatments done in the period November 2011.
Number of people working in shifts between 00:01 and 08:00 hours.
I have two tables, which have the following attributes in them(among others):
Shift(day, month, year)
Treatment(start_time, date)
For the first table- Shift, query results need to return values in
terms of (ex: December 30,2012)
For the second table, start_time needs to have values like 0001 and
0800(as I mentioned above). While, date can return values like
'November 2011'.
Initially I thought using the date datatype for declaring each of the day/month/year/date variables would do the job. But this doesn't seem to work out. Should I use int, varchar and int respectively for day, month and year respectively? Also, since the date variable does not have component parts, will date datatype work here? Lastly, if I use timestamp data type for the start_time attribute, what should be the value I enter in the insert column- should it be 08:00:00?
I'm using SQL Server 2014.
Thank You for your help.

AFAIK it is better to use one column by type of DateTime instead of two columns which hold Date and Time separately.
Also you could simply query this column either by Date or Time by casting it to corresponding type :
DECLARE #ChangeDateTime AS DATETIME = '2012-12-09 16:07:43.937'
SELECT CAST(#ChangeDateTime AS DATE) AS [ChangeDate],
CAST(#ChangeDateTime AS TIME) AS [ChangeTime]
results to :
ChangeDate ChangeTime
---------- ----------------
2012-12-09 16:07:43.9370000

How to design SQL tables when column data arrives in multiple types/margins of error?

I've been given a stack of data where a particular value has been collected sometimes as a date (YYYY-MM-DD) and sometimes as just a year.
Depending on how you look at it, this is either a variance in type or margin of error.
This is a subprime situation, but I can't afford to recover or discard any data.
What's the optimal (eg. least worst :) ) SQL table design that will accept either form while avoiding monstrous queries and allowing maximum use of database features like constraints and keys*?
*i.e. Entity-Attribute-Value is out.

You could store the year, month and day components in separate columns. That way, you only need to populate the columns for which you have data.

if it comes in as just a year make it default to 01 for month and date, YYYY-01-01
This way you can still use a date/datetime datatype and don't have to worry about invalid dates

Either bring it in as a string unmolested, and modify it so it's consistent in another step, or modify the year-only values during the import like SQLMenace recommends.

I'd store the value in a DATETIME type and another value (just an integer will do, or some kind of enumerated type) that signifies its precision.
It would be easier to give more information if you mentioned what kind of queries you will be doing on the data.

Either fix it, then store it (OK, not an option)
Or store it broken with a fixed computed columns
Something like this
CREATE TABLE ...
...
Broken varchar(20),
Fixed AS CAST(CASE WHEN Broken LIKE '[12][0-9][0-9][0-9]' THEN Broken + '0101' ELSE Broken END AS datetime)
This also allows you to detect good from bad source data

If you don't always have a full date, what sort of keys and constraints would you need? Perhaps store two columns of data; a full date, and a year. For data that has only year, the year is stored and date is null. For items with full info, both are populated.

I'd put three columns in the table:
The provided value (YYYY-MM-DD or YYYY)
A date column, Date or DateTime data type, which is nullable
A year column, as an integer or char(4) depending upon your needs.
I'd always populate the year column, populate the date column only when the provided value is a date.
And, because you've kept the provided value, you can always re-process down the road if needs change.

An alternative solution would be to that of a date mask (like in IP). Store the date in a regular datetime field, and insert an additional field of type smallint or something, where you could indicate which is present (could go even binary here):
If you have YYYY-MM-DD, you would have 3 bits of data, which will have the values 1 if data is present and 0 if not.
Example:
Date Mask
2009-12-05 7 (111)
2009-12-01 6 (110, only year and month are know, and day is set to default 1)
2009-01-20 5 (101, for some strange reason, only the year and the date is known. January has 31 days, so it will never generate an error)
Which solution is better depends on what you will do with it.
This is better when you want to select those with full dates, which are between a certain period (less to write). Also this way it's easier to compare any dates which have masks like 7,6,4. It may also take up less memory (date + smallint may be smaller than int+int+int, and only if datetime uses 64 bit, and smallint uses up as much as int, it will be the same).

I was going to suggest the same solution as #ninesided did above. Additionally, you could have a date field and a field that quantitatively represents your uncertainty. This offers the advantage of being able to represent things like "on or about Sept 23, 2010". The problem is that to represent the case where you only know the year, you'd have to set your date to be the middle of the year, with 182.5 days' uncertainty (assuming non-leap year), which seems ugly.
You could use a similar but distinct approach with a mask that represents what date parts you're confident about - that's what SQLMenace offered in his answer above.

+1 each to recommendations from ninesided, Nikki9696 and Jeff Siver - I support all those answers though none was exactly what I decided upon.
My solution:
a date column used only for complete dates
an int column used for years
a constraint to ensure integrity between the two
a trigger to populate the year if only date is supplied
Advantages:
can run simple (one-column) queries on the date column with missing data ignored (by using NULL for what it was designed for)
can run simple (one-column) queries on the year column for any row with a date (because year is automatically populated)
insert either year or date or both (provided they agree)
no fear of disagreement between columns
self explanatory, intuitive
I would argue that methods using YYYY-01-01 to signify missing data (when flagged as such with a second explanatory column) fail seriously on points 1 and 5.
Example code for Sqlite 3:
create table events
(
rowid integer primary key,
event_year integer,
event_date date,
check (event_year = cast(strftime("%Y", event_date) as integer))
);
create trigger year_trigger after insert on events
begin
update events set event_year = cast(strftime("%Y", event_date) as integer)
where rowid = new.rowid and event_date is not null;
end;
-- various methods to insert
insert into events (event_year, event_date) values (2008, "2008-02-23");
insert into events (event_year) values (2009);
insert into events (event_date) values ("2010-01-19");
-- select events in January without expressions on supplementary columns
select rowid, event_date from events where strftime("%m", event_date) = "01";

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Rails - storing a date when the day is optional - sql

Related

What is the fastest way to populate table with dates after certain day?

YYYY-MM column type in PostgreSQL

Store date range in a single column in Oracle SQL

Date/Time data types and declaration in SQL Server

How to design SQL tables when column data arrives in multiple types/margins of error?

Categories

Resources