Postgres: Select number of rows where date in tstzrange - sql

Let's say I have a table of cases with the column period of type tstzrange (see below)
period
------------------------------------
["2018-06-14 21:19:55.802427+02",)
(1 row)
What I want to do is a query of the type
select count(*) from cases where
current_date is in range (period); # this last part is just pseudocode
I want to count the amount of rows where the current_date (just the day, month, year) equals the upper range of the range. How can I do this?
I looked into upper_inc() but couldn't quite understand how to use it with current_date.

You can use upper to get upper bound timestamp, convert it to date and compare.
SELECT '2018-07-17'::date = upper('[2018-07-16 01:00:00,2018-07-17 06:00:00)'::tstzrange)::date

Related

Timestamp to date in SQL

Here is what I did:
Select count(check_id)
From Checks
Where timestamp::date > '2012-07-31'
Group by 1
Is it right to do it like I did or is there a better way? Should/could I have used the DateDIFF function in my WHERE clause? Something like: DATEDIFF(day, timestamp, '2012/07/31') > 0
Also, I need to figure out how I'd calculate the total rate of acceptance for this
time period? Can anyone provide their expertise with this?
Is it right to do it like I did or is there a better way?
Using a cast like that is a perfectly valid way to convert a timestamp to a date (I don't understand the reference to the non-existing datediff though - why would adding anything to a timestamp change it)
However, the cast has one drawback: if there is an index on the column "timestamp" it won't be used.
But as you just want a range after a certain date, there is no reason to cast the column to begin with.
The following will achieve the same thing as your query, but can make use of an index on the column "timestamp" in case there is one and using it is considered beneficial by the optimizer.
Select count(distinct check_id)
From Checks
Where "timestamp" > date '2012-07-31' + 1
Note the + 1 which selects the day after, otherwise the query would include rows that are on that date but after midnight.
I removed the unnecessary group by from your query.
If you want to get a count per day, then you will need to include the day in the SELECT list. In that case casting is a good way to do it:
Select "timestamp"::date, count(distinct check_id)
From Checks
Where "timestamp" > date '2012-07-31' + 1
group by "timestamp"::date

How to select rows until the sum of a column reaches N, where the column is of type TIME

I would like to select enough audio calls to have 00:10:00 minutes of audio. I have tried to achieve this by writing the following SQL (postgres) statement
SELECT file_name, audio_duration
FROM (
SELECT distinct file_name, audio_duration, SUM(audio_duration)
OVER (ORDER BY audio_duration) AS total_duration
FROM data
) AS t
WHERE
t.total_duration <='00:10:00'
GROUP BY file_name, audio_duration
My problem is that it doesn't seem to be calculating the total duration correctly.
I suspect this is due the audio_duration column being the TIME type.
If anyone have any hints or suggestions on how to make this query, it would be greatly appreciated.
You should really define that column to be an interval. A time column stores a moment in time, e.g. "3 in the afternoon".
However you can cast a single time value to an interval. You also don't need the window function to first calculate the "running total" if you want the total duration per file:
SELECT file_name, sum(audio_duration::interval) as total_duration
FROM data
GROUP BY file_name
HAVING sum(audio_duration::interval) <= interval '10 minute';
To permanently change the column type to an interval you can use:
alter table data
alter duration type interval;
I fully agree with #a_horse_with_no_name that Interval is the better datatype, but must admit that the Time datatype in not incorrect. While you cannot add (+) time datatypes you can SUM them. Summing time datatypes result in an interval, and produces the same result as summing corresponding intervals. Time besides being moment is also the interval from the beginning of day to that moment. Demo (fiddle)
with as_time (dur) as ( values ('10:34:45 AM'::time), ('03:14:50 PM'::time), ('11:15:25 PM'::time))
, as_intv (dur) as ( values ('10:34:45'::interval), ('15:14:50'::interval),('23:15:25'::interval))
select *
from (select sum(dur) sum_time from as_time) st
, (select sum(dur) sum_intv from as_intv) si;
BTW: The answer to the rhetorical question "what is the sum of "8 in the morning" and "3 in the afternoon"? Well it's 23:00:00.

How to compare time stamps from consecutive rows

I have a table that I would like to sort by a timestamp desc and then compare all consecutive rows to determine the difference between each row. From there, I would like to find all the rows whose difference is greater than ~2hours.
I'm stuck on how to actually compare consecutive rows in a table. Any help would be much appreciated.
I'm using Oracle SQL Developer 3.2
You didn't show us your table definition, but something like this:
select *
from (
select t.*,
t.timestamp_column,
t.timestamp_column - lag(timestamp_column) over (order by timestamp_column) as diff
from the_table t
) x
where diff > interval '2' hour;
This assumes that timestamp_column is defined as timestamp not date (otherwise the result of the difference wouldn't be an interval)

examine if one time series column of table has two adjacent time points which have interval larger than certain length

I am dealing with data preprocessing on a table containing time series column
toy example Table A
timestamp value
12:30:24 1
12:32:21 3
12:33:21 4
timestamp is ordered and always go incrementally
Is that possible to define an function or something else to return "True expression" when table has two adjacent time points which have interval larger than certain length and return "False" otherwise?
I am using postgresql, thank you
SQL Fiddle
select bool_or(bigger_than) as bigger_than
from (
select
time - lag(time) over (order by time)
>
interval '1 minute' as bigger_than
from table_a
) s;
bigger_than
-------------
t
bool_or will stop searching as soon as it finds the first true value.
http://www.postgresql.org/docs/current/static/functions-aggregate.html
Your sample data shows a time value. But it works the same for a timestamp
Something like this:
select count(*) > 0
from (
select timestamp,
lag(timestamp) over (order by value) as prev_ts
from table_a
) t
where timestamp - prev_ts < interval '1' minute;
It calculates the difference between a timestamp and it's "previous" timestamp. The order of the timestamps is defined by the value column. The outer query then counts the number of rows where the difference is smaller than 1 minute.
lag() is called a window functions. More details on those can be found in the manual:
http://www.postgresql.org/docs/current/static/tutorial-window.html

Efficient way of counting a large content from a cloumn or a two in a database using selected time period

I need to list number of column1 that have been added to the database over the selected time period (since the day the list is requested)-daily, weekly (last 7 days), monthly (last 30 days) and quarterly (last 3 months). for example below is the table I created to perform this task.
Column | Type | Modifiers
------------------+-----------------------------+-----------------------------------------------------
column1 character varying (256) not null default nextval
date timestamp without time zone not null default now()
coloumn2 charater varying(256) ..........
Now, I need the total count of entries in column1 with respect the selected time period.
Like,
Column 1 | Date | Coloumn2
------------------+-----------------------------+-----------------------------------------------------
abcdef 2013-05-12 23:03:22.995562 122345rehr566
njhkepr 2013-04-10 21:03:22.337654 45hgjtron
ffb3a36dce315a7 2013-06-14 07:34:59.477735 jkkionmlopp
abcdefgggg 2013-05-12 23:03:22.788888 22345rehr566
From above data, for daily selected time period it should be count= 2
I have tried doing this query
select count(column1) from table1 where date='2012-05-12 23:03:22';
and have got the exact one record matching the time stamp. But I really needed to do it in proper way I believe this is not an efficient way of retrieving the count. Anyone who could help me know the right and efficient way of writing such query would be great. I am new to the database world, and I am trying to be efficient in writing any query.
Thanks!
[EDIT]
Each query currently is taking 175854ms to get process. What could be the efficient way to lessen the time to have it processed accordingly. Any help would be really great. I am using Postgresql to do the same.
To be efficient, conditions should compare values of the sane type as the columns being compared. In this case, the column being compared - Date - has type timestamp, so we need to use a range of tinestamp values.
In keeping with this, you should use current_timestamp for the "now" value, and as confirmed by the documentation, subtracting an interval from a timestamp yields a timestamp, so...
For the last 1 day:
select count(*) from table1
where "Date" > current_timestamp - interval '1 day'
For the last 7 days:
select count(*) from table1
where "Date" > current_timestamp - interval '7 days'
For the last 30 days:
select count(*) from table1
where "Date" > current_timestamp - interval '30 days'
For the last 3 months:
select count(*) from table1
where "Date" > current_timestamp - interval '3 months'
Make sure you have an index on the Date column.
If you find that the index is not being used, try converting the condition to a between, eg:
where "Date" between current_timestamp - interval '3 months' and current_timestamp
Logically the same, but may help the optimizer to choose the index.
Note that column1 is irrelevant to the question; being unique there is no possibility of the row count being different from the number of different values of column1 found by any given criteria.
Also, the choice of "Date" for the column name is poor, because a) it is a reserved word, and b) it is not in fact a date.
If you want to count number of records between two dates:
select count(*)
from Table1
where "Date" >= '2013-05-12' and "Date" < '2013-05-13'
-- count for one day, upper bound not included
select count(*)
from Table1
where "Date" >= '2013-05-12' and "Date" < '2013-06-13'
-- count for one month, upper bound not included
select count(*)
from Table1
where
"Date" >= current_date and
"Date" < current_date + interval '1 day'
-- current date
What I understand from your wording is
select date_trunc('day', "date"), count(*)
from t
where "date" >= '2013-01-01'
group by 1
order by 1
Replace 'day' for 'week', 'month', 'quarter' as needed.
http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
Create an index on the "date" column.
select count(distinct column1) from table1 where date > '2012-05-12 23:03:22';
I assume "number of column1" means "number of distinct values in column1.
Edit:
Regarding your second question (speed of the query): I would assume that an index on the date column should speed up the runtime. Depending on the data content, this could even be declared unique.
To throw another option into the mix...
Add a column of type "date" and index that -- named "datecol" for this example:
create index on tbl_datecol_idx on tbl (datecol);
analyze tbl;
Then your query can use an equality operator:
select count(*) from tbl where datecol = current_date - 1; --yesterday
Or if you can't add the date datatype column, you could create a functional index on the existing column:
create index tbl_date_fbi on tbl ( ("date"::DATE) );
analyze tbl;
select count(*) from tbl where "date"::DATE = current_date - 1;
Note1: you do not need to query "column1" directly as every row has that attribute filled due to the NOT NULL.
Note2: Creating a column named "date" is poor form, and even worse that it is of type TIMESTAMP.