Precision of aggregate function on 'INTERVAL HOUR TO MINUTE' datatype in SQL - sql

I'm running a very small database that contains a table with a column containing data of type INTERVAL HOUR TO MINUTE. Although this means the table will only store time intervals with minute precision, the database system I am using (PostgreSQL) will return an interval with microsecond precision on a aggregate function such as AVG(). Can I rely on this behavior, or is it possible that in the future the database system will return values with only minute precision? How do other DBMS's behave in this respect?
I'm asking because values in the table do not require finer than minute precision, but I expect higher precision when I use an aggregate function.

An aggregate function such as avg() has to return the general form of an interval, as the average of multiple values can lie in between. This will definitely not change in future releases. Also, the datatypes are identical internally. Just the least significant parts get truncated.
The behavior is similar with other datatypes. If you compute an average over an integer column, you get a result of type numeric that can hold exact results.
If you want the results to be truncated (not your request), you can always cast to interval hour to minute explicitly to be sure.
SELECT avg(i)::interval hour to minute from mytbl;
I can't say much about other RDBMSes. Maybe additional answers can fill in here?

Related

Create Excel DateTime Serial/Decimal Fraction values in SQL

I am trying to recreate a staff members Excel work in SQL to save time and also drive reporting.
In their spreadsheet, they take 2 time values, minus the smallest from the largest to arrive at a difference, convert that time value to a serialised time value:
They then sum that serial integer to define performance calculations.
Is there a conversion or similar process in SQL that can return the same/similar serial time value so I can perform equivalent calculations (or has anyone experience with a function that achieves this)?
I have tried the following line in the code (based on the Excel DateTime explanation here) and the value isn't the same result as Excel...
datediff(MINUTE,cf_pick_pack.date_start, cf_pick_pack.date_end) * (convert(float,1.00000000/1440)) as 'duration_serial'
SQL returns 0.00902777^, which is short of the 0.00923611 that Excel returns.
Ok, my bad. SQL was calculating the difference to the nearest minute because the datediff was set to minutes.
The following works...
datediff(SECOND,cf_pick_pack.date_start, cf_pick_pack.date_end) * (convert(float,1.00000000/86400)) as 'duration_serial'

Date vs Timestamp vs Interval (Second to Day, etc.) on the basis of performance, efficiency and superiority in Oracle

Date and Timestamps both have time added and interval is used in case of manipulation of dates via addition yearwise, datewise etc.
Still unsure about the exact actual difference though when it comes to dates in oracle especially.
Is there any major difference in terms of efficiency or some other difference on the usage of date, timestamp and interval?
Your question is not clear but this information may help you.
TIMESTAMP supports fractional seconds, unlike DATE which supports only seconds
TIMESTAMP exist in three flavors:
TIMESTAMP does not contain any time zone information.
TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE contain time zone information
Regarding calculation and manipulation there is actually no difference between TIMESTAMP and DATE. There are only a very few functions which support only either of these two types.
DATE is an old data type. TIMESTAMP was introduced later (well "later" means in 9i, i.e. 20 years ago)
INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND are interval data types, they do not contain any absolute date information.
Hope this gave some hints. Otherwise please elaborate your question.
date does not store fractional seconds so comparing time with date less than 1 sec wont work!!!!!

Hive Timestamp Differences in Milliseconds

A previous solution regarding obtaining an answer in milliseconds for differences between two timestamps does not work in Hive 1.0 on Amazon EMR. Hive returns a blank column when casting a timestamp as double in my testing today. No errors are thrown when doing the CAST. Being able to calculate a time difference in fractions of a second between two columns of type "timestamp" are critical to our analysis. Any ideas?
You should try to convert into unix_timestamp using unix_timestamp(timestamp) but I think you will still be losing milliseconds.
select (unix_timestamp(DATE1)-unix_timestamp(DATE2)) TIMEDIFF from TABLE;

SQL equals does not work for timestamps?

My table has a category 'timestamp' where the timestamps are formatted 2015-06-22 18:59:59
However, using DBVisualizer Free 9.2.8 and Vertica, when I try to pull up rows by timestamp with a
SELECT * FROM table WHERE timestamp = '2015-06-22 18:59:59';
(directly copy-pasting the stamp), nothing comes up. Why is this happening and is there a way around it?
FYI, saying "the timestamps are formatted 2015-06-22 18:59:59" is incorrect if you are indeed using a TIMESTAMP type. Such types have their own internal representation of a date-time value, almost always a count since epoch. In your case with Vertica, 8 bytes are used for such storage. The formatting of the date-time value happens when a string representation is generated. Never confuse the string representation with the date-time value. Conflating the two may well be related to your problem/confusion.
A few different thoughts about possible problems…
String Literals
Are you sure Vertica takes strings as timestamp literals? That format you used is common SQL format. But given that Vertica seems to be a specialized database, I would double-check that.
If strings are not allowed, you may need to call some kind of function to transform the string into a date-time values.
Fractional Second
As the comment by Martin Smith points out, the doc for Timestamp-related data types in Vertica 7.1 says those types can have a fractional second to resolution of microseconds. That means up to 6 decimal places of a fraction.
So if you are searching for "2015-06-22 18:59:59" but the stored value is "2015-06-22 18:59:59.012345", no match on the query.
Half-Open
The fractional seconds issue described above is often the cause of problems people have when handling a span of time. If you naïvely try to pinpoint the ending time, you are likely to have problems. Seeing the "59:59" in your example string makes me think this applies to you.
The better approach to spans of time is "Half-Open" (or Half-Closed, whatever) where the beginning is inclusive while the ending is exclusive. Common notation for this is [). In comparison logic this means: value >= start AND value < stop. Notice the lack of EQUALS SIGN in the stop comparison. In English we would say "look for an hour's worth of invoices starting at 2:00 PM and going up to, but not including, 3:00 PM".
Half-Open for a week means Monday-Monday, for a month the first of one month to the first of the next month, and for a year the January 1 of one year to January 1 of the following year.
Half-Open means not using BETWEEN in SQL. SQL's BETWEEN has often be criticized. Instead do something like the following to look for an hour's worth of invoices. Notice the Z on the end of string literal which means "UTC time zone" ("Z" for "Zulu"). (But verify, as my SQL syntax may need fixing.)
SELECT *
FROM some_table_
WHERE invoice_received_ >= '2015-06-22 18:00:00Z'
AND invoice_received_ < '2015-06-22 19:00:00Z'
;
This query will catch any values such as '2015-06-22 18:59:59.654321" which seems to be eluding you.
Reserved Word
I hope you have not really named your table 'table' and your column 'timestamp'. Such use of keywords and reserved words can cause explicit errors or more subtle weird problems.
Tip: The easy way to avoid any of the over a thousand reserved words in various databases is to append a trailing underscore. The SQL standard explicitly promises to never using a trailing underscore in its reserved words. So use "timestamp_" rather than "timestamp". Another example: "invoice_" table and "received_" column. I recommend doing that as a habit on everything your name in SQL: columns, tables, constraints, indexes, and so on.
Time Zone
You are using the TIMESTAMP which is short for TIMESTAMP WITHOUT TIME ZONE. Or so I presume; the Vertica doc is vague but that is the common usage as seen in the Postgres doc, and may even be standard SQL.
Anyways, TIMESTAMP WITHOUT TIME ZONE is usually the wrong type for most business purposes. The WITH time zone is misnamed and often misunderstood as a consequence: It means "with respect for time zone" where data inputs that include an offset or other time zone information from UTC are adjusted to UTC during the INSERT/UPDATE operations. The WITHOUT type simply ignores any such offset or time zone information.
The WITHOUT type should only be used for the concept of a date-time generally without being tied to any one locality. For example, saying "Christmas this year starts at beginning of December 25, 2015". That means in any time zone rather than a specific time zone. Obviously Christmas starts earlier in Paris, for example, than in Montréal.
If you are timestamping legal documents such as invoices, or booking appointments with people across time zones, or scheduling shipments in various localities, you should be using WITH time zone type.
So back to your possible problem: Test how Vertica or your client app or your database driver is handling your input string. It may be adjusting time zones as part of the parsing of the string using your client machine’s current default time zone. When sent to the database, that value will not match the stored value if during storage no adjustment to UTC was made.
Tip: Generally best practice is to do all your storage and business logic in UTC, adjusting to local time zones only where expected by user.

Can this sql-statement be shortened?

I have this SQL query to a PostgreSQL database. Can it be shortened? I am thinking about the where part.
SELECT *
FROM reservations
WHERE (starts_at BETWEEN ? AND ?) OR (ends_at BETWEEN ? AND ?)
The values for the question-marks is:
The beginning of the current date in datetime format
The end of the the current date in datetime format
Same as point one
Same as point two
The code is meant to return all the reservations that begins or ends on a certain date. And works as it is supposed to. But I have to supply the same information multiple times into the query.
I so not actually use this exactly SQL, so there might be an obvious error somewhere, but please focus on the where part
I'm not a huge fan of BETWEEN in this context, because timestampor datetime can be fractional. In particular, specifying the last possible value on a given date is much more complicated than specifying the first possible value (midnight) because you have to specify the time as 23:59:59.999... out to whatever precision your RDBMS uses. PostgreSQL's timestamp is supposed to be accurate to the microsecond (1e-6 seconds), for example, so it's easy to specify a range that either includes times you don't want, or misses times that you do.
On the other hand, if you use BETWEEN with midnight of the following day so you don't have to know the precision of the time, you're including a time that doesn't exist in the date you're interested in. If your application is only precise to the second, or the minute, or to 5 minutes, then you may mis-categorize data or, worse, count it twice since it suddenly counts as being in two dates.
I would prefer:
WHERE (starts_at >= ? AND starts_at < ?)
OR (ends_at >= ? AND ends_at < ?)
Where the ? map to:
Midnight of the target date.
Midnight of the date after the target date.
Midnight of the target date.
Midnight of the date after the target date.
It's not as short, but it's decidedly safer unless you really want to specify your intervals that precisely.
However, you should not do the following, even though it's shorter:
WHERE DATE(starts_at) = ?
OR DATE(ends_at) = ?
You don't want to do that because it's not SARGEable.
This is also an example of why shortness or brevity is a poor measure of code quality. Generally, I'd order my preference like so:
Accuracy.
Performance.
Readability/maintainability.
Brevity.
No, you can't improve upon this.
WHERE (starts_at BETWEEN ? AND ?) OR (ends_at BETWEEN ? AND ?)