Oddities with postgres SQL [negative date interval and alias that doesn't work only in condition clause] - sql

I'm coming to you guys with with two small oddities I can't seem to understand with postgres:
(1)
SELECT "LASTREQUESTED",
(DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
ORDER BY "TIME"
I'm calculating when users can make their next request [once every 8 hours], but if you look at entry 16 I get "1 day -06:20:47" instead of "18:00:00" ish, unlike every other line. [The table LASTREQUESTED is a simple timestamp, nothing different here from the other entries for line 16], why is that?
(2)
On the same request, if I try to add a condition on the "TIME" column, the compiler says it doesn't exist although using it to order by is ok. I don't get why.
SELECT (DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
WHERE "TIME" > 0
ORDER BY "TIME";

Question #1: negative hours but positive days?
According to the PostgreSQL documentation, this is a situation where PostgreSQL differs from the SQL standard:
According to the SQL standard all fields of an interval value must have the same sign…. PostgreSQL allows the fields to have different signs….
Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. …
You can see a more extreme example of this with the following query:
=# select interval '1 day' - interval '300 hours';
?column?
------------------
1 day -300:00:00
(1 row)
So this is not a single interval in seconds expressed in a strange way; instead, it's an interval of 0 months, +1 day, and -1,080,000.0 seconds. If you are certain that there's no daylight savings time issues with the timestamps that you got these intervals from, you can use justify_hours to convert days into 24-hour periods and get an interval that makes more sense:
=# select justify_hours(interval '1 day' - interval '300 hours');
justify_hours
--------------------
-11 days -12:00:00
Question #2: SELECT columns can't be used in WHERE?
This is standard PostgreSQL behavior. See this duplicate question. Solutions presented there include:
Repeat the expression twice, once in the SELECT list, and again in the WHERE clause. (I've done this more times than I want to remember…)
SELECT (my - big * expression) AS x
FROM stuff
WHERE (my - big * expression) > 5
ORDER BY x
Create a subquery without that WHERE filter, and put the WHERE conditions in the outer query
SELECT *
FROM (SELECT (my - big * expression) AS x
FROM stuff) AS subquery
WHERE x > 5
ORDER BY x
Use a WITH statement to achieve something similar to the subquery trick.

I don't now exactly why it's calculating as-is (maybe because you subtract an Interval from another Interval) but when you change the calculation to Timestamp minus Timestamp it works as expected:
DATE_TRUNC('seconds', CURRENT_TIMESTAMP - (LASTREQUESTED + INTERVAL '8 hours'))
See Fiddle
Regarding #2: Based on Standard SQL the columns in the Select-list are calculated after FROM/WHERE/GROUP BY/HAVING, but before ORDER, that's why you can't use an alias in WHERE. There are some good articles on that topic written by Itzik Ben-Gan (based on MS SQL Server, but similar for PostgreSQL).

Related

Get the count of distinct userids for last couple of days

Let's say the last 7 days for this table:
Userid Download time
Rab01 2020-04-29 03:28
Klm01 2020-04-29 04:01
Klm01 2020-04-30 05:10
Rab01 2020-04-29 12:14
Osa_3 2020-04-25 09:01
Following is the required output:
Count Download_time
1 2020-04-25
2 2020-04-29
1 2020-04-30
Tested with PostgreSQL. You also tagged Redshift, which forked at Postgres 8.2, a long time ago. There may be discrepancies ..
Since you seem to be happy with standard ISO format, a simple cast to date would be most efficient:
SELECT count(DISTINCT userid) AS "Count"
, download_time::date AS "Download_Day"
FROM tbl
WHERE download_time >= CURRENT_DATE - 7
AND download_time < CURRENT_DATE
GROUP BY 2;
db<>fiddle here
CURRENT_DATE is standard SQL and works for both Postgres and Redshift. Related:
How do I determine the last day of the previous month using PostgreSQL?
About the "last 7 days": I took the last 7 whole days (excluding today - necessarily incomplete), with syntax that can use a plain index on (download_time). Related:
Get dates of a day of week in a date range
Slow LEFT JOIN on CTE with time intervals
Interval (days) in PostgreSQL with two parameters
Ideally, you have a composite index on (download_time, userid) (and fulfill some preconditions) to get very fast index-only scans. See:
Is a composite index also good for queries on the first field?
count(DISTINCT ...) is typically slow. For big tables with many duplicates, there are faster techniques. Disclose your exact setup and cardinalities if you need to optimize performance.
If the actual data type is timestamptz, not just timestamp, you also need to define the time zone defining day boundaries. See:
Ignoring time zones altogether in Rails and PostgreSQL
About the optional short syntax GROUP BY 2:
Select first row in each GROUP BY group?
About capitalization of identifiers:
Are PostgreSQL column names case-sensitive?
You can use date_trunc function for get day only part from datetime and use it for grouping.
The query may be next:
SELECT
count(distinct Userid) as Count, -- get unuque users count
to_char(date_trunc('day', Download_time), 'YYYY-MM-DD') AS Download_Day -- convert time do day
FROM table
WHERE DATE_PART('day', NOW() - Download_time) < 7 -- last 7 days
GROUP BY Download_Day; -- group by day
Fiddle

examine if one time series column of table has two adjacent time points which have interval larger than certain length

I am dealing with data preprocessing on a table containing time series column
toy example Table A
timestamp value
12:30:24 1
12:32:21 3
12:33:21 4
timestamp is ordered and always go incrementally
Is that possible to define an function or something else to return "True expression" when table has two adjacent time points which have interval larger than certain length and return "False" otherwise?
I am using postgresql, thank you
SQL Fiddle
select bool_or(bigger_than) as bigger_than
from (
select
time - lag(time) over (order by time)
>
interval '1 minute' as bigger_than
from table_a
) s;
bigger_than
-------------
t
bool_or will stop searching as soon as it finds the first true value.
http://www.postgresql.org/docs/current/static/functions-aggregate.html
Your sample data shows a time value. But it works the same for a timestamp
Something like this:
select count(*) > 0
from (
select timestamp,
lag(timestamp) over (order by value) as prev_ts
from table_a
) t
where timestamp - prev_ts < interval '1' minute;
It calculates the difference between a timestamp and it's "previous" timestamp. The order of the timestamps is defined by the value column. The outer query then counts the number of rows where the difference is smaller than 1 minute.
lag() is called a window functions. More details on those can be found in the manual:
http://www.postgresql.org/docs/current/static/tutorial-window.html

Oracle timestamp difference greater than X hours/days/months

I am trying to write a query to run on Oracle database. The table ActionTable contains actionStartTime and actionEndTime columns. I need to find out which action took longer than 1 hour to complete.
actionStartTime and actionEndTime are of timestamp type
I have a query which gives me the time taken for each action:
select (actionEndTime - actionStartTime) actionDuration from ActionTable
What would be my where clause that would return only actions that took longer than 1 hour to finish?
Subtracting two timestamps returns an interval. So you'd want something like
SELECT (actionEndTime - actionStartTime) actionDuration
FROM ActionTable
WHERE actionEndTime - actionStartTime > interval '1' hour

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.

Restrict SQL results by time, MySQL

I have a table containing events which happen in my application like people logging in and people changing settings.
They have a date/time against the event in the following format:
2010-01-29 10:27:29
Is it possible to use SQL to select the events that have only happened in the last 5 mins?
So if the date/time was 2010-01-29 10:27:29 it would only select the events that happened between 10:27:29 and 10:22:29?
Cheers
Eef
SELECT foo FROM bar WHERE event_time > DATE_SUB(NOW(), INTERVAL 5 MINUTES)
(Not sure if it's minutes or minute)
WHERE my_timestamp < DATE_SUB(now(), INTERVAL 5 MINUTE)
You should provide table and column names to make it easy for us to answer your question.
You can write SQL as
SELECT *
FROM Table
WHERE DateTimeColumnName <= '2010/01/29 10:27:29'
AND DateTimeColumnName >= '2010/01/29 10:22:29'
or you can use BETWEEN
SELECT *
FROM Table
WHERE DateTimeColumnName BETWEEN '2010/01/29 10:22:29' AND '2010/01/29 10:27:29'
Now see if there are datetime functions in MySQL to do a Date Math so just pass a single date stamp, and do the math to subtract 5 min from it and use it as the second parameter in the between clause.