PostgreSQL: Query for tstzrange that contains last instant of a quarter - sql

Given a PostgreSQL table that is supposed to contain rows with continuous, non-overlapping valid_range ranges such as:
CREATE TABLE tracking (
id INT PRIMARY KEY,
valid_range TSTZRANGE NOT NULL,
EXCLUDE USING gist (valid_range WITH &&)
);
INSERT INTO tracking (id, valid_range) VALUES
(1, '["2017-03-01 13:00", "2017-03-31 14:00")'),
(2, '["2017-03-31 14:00", "2017-04-01 00:00")'),
(3, '["2017-04-01 00:00",)');
That creates a table that contains:
id | valid_range
----+-----------------------------------------------------
1 | ["2017-03-01 13:00:00-07","2017-03-31 14:00:00-06")
2 | ["2017-03-31 14:00:00-06","2017-04-01 00:00:00-06")
3 | ["2017-04-01 00:00:00-06",)
I need to query for the row that was the valid row at the end of a given quarter, where I'm defining "at the end of a quarter" as "the instant in time right before the date changed to be the first day of the new quarter." In the above example, querying for the end of Q1 2017 (Q1 ends at the end of 2017-03-31, and Q2 begins 2017-04-01), I want my query to return only the row with ID 2.
What is the best way to express this condition in PostgreSQL?
SELECT * FROM tracking WHERE valid_range #> TIMESTAMPTZ '2017-03-31' is wrong because it returns the row that contains midnight on 2017-03-31, which is ID 1.
valid_range #> TIMESTAMPTZ '2017-04-01' is also wrong because it skips over the row that was actually valid right at the end of the quarter (ID 2) and instead returns the row with ID 3, which is the row that starts the new quarter.
I'm trying to avoid using something like ...ORDER BY valid_range DESC LIMIT 1 in the query.
Note that the end of the ranges must always be exclusive, I cannot change that.

The best answer I've come up with so far is
SELECT
*
FROM
tracking
WHERE
lower(valid_range) < '2017-04-01'
AND upper(valid_range) >= '2017-04-01'
This seems like the moral equivalent of saying "I want to reverse the inclusivity/exclusivity of the bounds on this TSTZRANGE column for this query" which makes me think I'm missing a better way of doing this. I wouldn't be surprised if it also negates the benefits of typical indexes on a range column.

You can use <# operator for check when value is within range:
SELECT *
FROM tracking
WHERE to_timestamp('2017-04-01','YYY-MM-DD')::TIMESTAMP WITH TIME ZONE <# valid_range;
Test PostgreSQL queries online

Related

Generate a random value in a row based on a value from another table

I want to create a large amount of mock data in a table (in Postgresql). The schema of the table looks like this
price float,
id id,
period timestamptz
For price, this will be a random float number between 1-5
For id, this will be a value from another table that contain all value in id column (which may have a lot of id)
For period, this will generate a random datetime value in a specific range of time.
Here, I want to create a single query that can generate all these rows equal to amount of id I have to a specific range of time that I select.
E.g.
Let say I have 3 ids (a,b,c) in another table and I want to generate time series between 2019-08-01 00:00:00+00 and 2019-08-05 00:00:00+00
The result from this query will generate value like this:
price id period
3.4 b 2019-08-03 10:01:22+00
2.5 a 2019-08-04 05:44:31+00
4.8 c 2019-08-04 14:51:10+00
The price and id are random. Also period, but with specific range. Key thing is that, all ids need to be generated.
Generating random number and datetime is not hard but how can I create a query that can generate rows based on all id gathered from another table.
Ps. I have edited the example which might mislead my question
This answers a reasonable interpretation of the original question.
Getting a random value from a second table can be a little tricky. If the second table is not too big, then this works:
select distinct on (gs.ts) gs.ts, ids.id, cast(random() * 4.1 + 1 as numeric(2, 1))
from generate_series('2019-08-01 00:00:00+00'::timestamp, '2019-08-05 00:00:00+00'::timestamp, interval '30 minute') gs(ts) cross join
ids
order by gs.ts, random()
Use the function make_timestamptz generating a random integer for each part, except year and month. This will create random timestamps. As for getting the id from another table just select from that table.
/*
function to generate random integers. (Lots of then needed.)
*/
create or replace function utl_gen_random_integer(
int1_in integer,
int2_in integer)
returns integer
language sql volatile strict
as
$$
/* return a random integer between, inclusively, two integers, relative values of the integers does not matter. */
with ord as ( select greatest(int1_in, int2_in) as hi
, least(int1_in, int2_in) as low
)
select floor(random()*(hi-low+1)+l)::integer from ord;
$$;
-- create the id source table and populate
create table id_source( id text) ;
insert into id_source( id)
with id_range as ( select 'abcdefgh'::text idl)
select substring(idl,utl_gen_random_integer(1,length(idl)), 1)
from id_range, generate_series(1,20) ;
And the generation query:
select trunc((utl_gen_random_integer(1,4) + (utl_gen_random_integer(0,100))/100.0),2) Price
, id
, make_timestamptz ( 2019 -- year
, 08 -- month
, utl_gen_random_integer(1,5) -- day
, utl_gen_random_integer(1,24)-1 -- hours
, utl_gen_random_integer(1,60)-1 -- min
, (utl_gen_random_integer(1,60)-1)::float -- sec
, '+00'
)
from id_source;
The result generates the time at UTC (+00). However any subsequent Postgres will display the result converted to local time with offset. To view in UCT append "at time zone 'UCT'" to the query.

Get a count of records created each week in SQL

I have a table Questions. How can I get a count of all questions asked in a week?
More generically, how can I bucket records by the week they were created in?
Questions
id created_at title
----------------------------------------------------
1 2014-12-31 09:43:42 "Add things"
2 2013-11-23 02:98:55 "How do I ruby?"
3 2015-01-15 15:11:19 "How do I python?"
...
I'm using SQLLite, but PG answers are fine too.
Or if you have the answer using Rails ActiveRecord, that is amazing, but not required.
I've been trying to use DATEPART() but haven't come up with anything successful yet: http://msdn.microsoft.com/en-us/library/ms174420.aspx
In postgreSQL it's as easy as follows:
SELECT id, created_at, title, date_trunc('week', created_at) created_week
FROM Questions
If you wanted to get the # of questions per week, simply do the following:
SELECT date_trunc('week', created_at) created_week, COUNT(*) weekly_cnt
FROM Questions
GROUP BY date_trunc('week', created_at)
Hope this helps. Note that date_trunc() will return a date and not a number (i.e., it won't return the ordinal number of the week in the year).
Update:
Also, if you wanted to accomplish both in a single query you could do so as follows:
SELECT id, created_at, title, date_trunc('week', created_at) created_week
, COUNT(*) OVER ( PARTITION BY date_trunc('week', created_at) ) weekly_cnt
FROM Questions
In the above query I'm using COUNT(*) as a window function and partitioning by the week in which the question was created.
If the created_at field is already indexed, I would simply look for all rows with a created_at value between X and Y. That way the index can be used.
For instance, to get rows with a created_at value in the 3rd week of 2015, you would run:
select *
from questions
where created_at between '2015-01-11' and '2015-01-17'
This would allow the index to be used.
If you want to be able to specify a week in the where clause, you could use the date_part or extract functions to add a column to this table storing the year and week #, and then index that column so that queries can take advantage of it.
If you don't want to add the column, you could of course use either function in the where clause and query against the table, but you won't be able to take advantage of any indexes.
Because you mentioned not wanting to add a column to the table, I would recommend adding a function based index.
For example, if your ddl were:
create table questions
(
id int,
created_at timestamp,
title varchar(20)
);
insert into questions values
(1, '2014-12-31 09:43:42','"Add things"'),
(2, '2013-11-23 02:48:55','"How do I ruby?"'),
(3, '2015-01-15 15:11:19','"How do I python?"');
create or replace function to_week(ts timestamp)
returns text
as 'select concat(extract(year from ts),extract(week from ts))'
language sql
immutable
returns null on null input;
create index week_idx on questions (to_week(created_at));
You could run:
select q.*, to_week(created_at) as week
from questions q
where to_week(created_at) = '20153';
And get:
| ID | CREATED_AT | TITLE | WEEK |
|----|--------------------------------|--------------------|-------|
| 3 | January, 15 2015 15:11:19+0000 | "How do I python?" | 20153 |
(reflecting the third week of 2015, ie. '20153')
Fiddle: http://sqlfiddle.com/#!15/c77cd/3/0
You could similarly run:
select q.*,
concat(extract(year from created_at), extract(week from created_at)) as week
from questions q
where concat(extract(year from created_at), extract(week from created_at)) =
'20153';
Fiddle: http://sqlfiddle.com/#!15/18c1e/3/0
But it would not take advantage of the function based index, because there is none. In addition, it would not use any index you might have on the created_at field because, while that field might be indexed, you really aren't searching on that field. You are searching on the result of a function applied against that field. So the index on the column itself cannot be used.
If the table is large you will either want a function based index or a column holding that week that is itself indexed.
SQLite has no native datetime type like MS SQL Server does, so the answer may depend on how you are storing dates. Not all T-SQL will work in SQLite.
You can store datetime as an integer that counts seconds since 1/1/1970 12:00 AM. There are 604,800 seconds in a week. So you could query on an expression like
rawdatetime / 604800 -- iff rawdatetime is integer
More on handling datetimes in SQLite here: https://www.sqlite.org/datatype3.html
Get the week number using strfdate(%V)
Store it in DB, and use it to identify in which week a question was asked
http://apidock.com/ruby/DateTime/strftime
SQL can do it too with the DATE_FORMAT(datetime,'%u')
So use:
SELECT DATE_FORMAT(column,'%u') FROM Table

Can SQL view have infinite number of rows? (Repeating schedule, each row a day?)

Can I have a view with an infinite number of rows? I don't want to
select all the rows at once, but is it possible to have a view that
represents a repeating weekly schedule, with rows for any date?
I have a database with information about businesses, their hours on
different days of the week. Their names:
# SELECT company_name FROM company;
company_name
--------------------
Acme, Inc.
Amalgamated
...
(47 rows)
Their weekly schedules:
# SELECT days, open_time, close_time
FROM hours JOIN company USING(company_id)
WHERE company_name='Acme, Inc.';
days | open_time | close_time
---------+-----------+-----------
1111100 | 08:30:00 | 17:00:00
0000010 | 09:00:00 | 12:30:00
Another table, not shown, has holidays they're closed.
So I can trivially create a user-defined function in the form of a
stored procedure that takes a particular date as an argument and
returns the business hours of each company:
SELECT company_name,open_time,close_time FROM schedule_for(current_date);
But I want to do it as a table query, in order that any
SQL-compatible host-language library will have no problem
interfacing with it, like this:
SELECT company_name, open_time, close_time
FROM schedule_view
WHERE business_date=current_date;
Relational database theory tells me that tables (relations) are
functions in the sense of being a unique mapping from each
primary key to a row (tuple). Obviously if the WHERE clause on
the above query were omitted it would result in a table (view)
having an infinite number of rows, which would be a practical issue. But
I'm willing to agree never to query such a view without a WHERE
clause that restricts the number of rows.
How can such a view be created (in PostgreSQL)? Or is a view even the way to do what I want?
Update
Here are some more details about my tables. The days of the week are saved as bits, and I select the appropriate row using a bit mask that has a single bit shifted once for each day of the requested week. To wit:
The company table:
# \d company
Table "company"
Column | Type | Modifiers
----------------+------------------------+-----------
company_id | smallint | not null
company_name | character varying(128) | not null
timezone | timezone | not null
The hours table:
# \d hours
Table "hours"
Column | Type | Modifiers
------------+------------------------+-----------
company_id | smallint | not null
days | bit(7) | not null
open_time | time without time zone | not null
close_time | time without time zone | not null
The holiday table:
# \d holiday
Table "holiday"
Column | Type | Modifiers
---------------+----------+-----------
company_id | smallint | not null
month_of_year | smallint | not null
day_of_month | smallint | not null
The function I currently have that does what I want (besides invocation) is defined as:
CREATE FUNCTION schedule_for(requested_date date)
RETURNS table(company_name text, open_time timestamptz, close_time timestamptz)
AS $$
WITH field AS (
/* shift the mask as many bits as the requested day of the week */
SELECT B'1000000' >> (to_char(requested_date,'ID')::int -1) AS day_of_week,
to_char(requested_date, 'MM')::int AS month_of_year,
to_char(requested_date, 'DD')::int AS day_of_month
)
SELECT company_name,
(requested_date+open_time) AT TIME ZONE timezone AS open_time,
(requested_date+close_time) AT TIME ZONE timezone AS close_time
FROM hours INNER JOIN company USING (company_id)
CROSS JOIN field
CROSS JOIN holiday
/* if the bit-mask anded with the DOW is the DOW */
WHERE (hours.days & field.day_of_week) = field.day_of_week
AND NOT EXISTS (SELECT 1
FROM holiday h
WHERE h.company_id = hours.company_id
AND field.month_of_year = h.month_of_year
AND field.day_of_month = h.day_of_month);
$$
LANGUAGE SQL;
So again, my goal is to be able to get today's schedule by doing this:
SELECT open_time, close_time FROM schedule_view
wHERE company='Acme,Inc.' AND requested_date=CURRENT_DATE;
and also be able to get the schedule for any arbitrary date by doing this:
SELECT open_time, close_time FROM schedule_view
WHERE company='Acme, Inc.' AND requested_date=CAST ('2013-11-01' AS date);
I'm assuming this would require creating the view here referred to as schedule_view but maybe I'm mistaken about that. In any event I want to keep any messy SQL code hidden from usage at the command-line-interface and client-language database libraries, as it currently is in the user-defined function I have.
In other words, I just want to invoke the function I already have by passing the argument in a WHERE clause instead of inside parentheses.
You could create a view with infinite rows by using a recursive CTE. But even that needs a starting point and a terminating condition or it will error out.
A more practical approach with set returning functions (SRF):
WITH x AS (SELECT '2013-10-09'::date AS day) -- supply your date
SELECT company_id, x.day + open_time AS open_ts
, x.day + close_time AS close_ts
FROM (
SELECT *, unnest(arr)::bool AS open, generate_subscripts(arr, 1) AS dow
FROM (SELECT *, string_to_array(days::text, NULL) AS arr FROM hours) sub
) sub2
CROSS JOIN x
WHERE open
AND dow = EXTRACT(ISODOW FROM x.day);
-- AND NOT EXISTS (SELECT 1 FROM holiday WHERE holiday = x.day)
-> SQLfiddle demo. (with constant day)
Expanding SRFs side-by-side is generally frowned upon (and for good reason, it's not in the SQL standard and show surprising behavior if the number of elements is not the same). The new feature WITH ORDINALITY in the upcoming Postgres 9.4 will allow cleaner syntax. Consider this related answer on dba.SE or similarly:
PostgreSQL unnest() with element number
I am assuming bit(7) as most effective data type for days. To work with it, I am converting it to an array in the first subquery sub.
Note the difference between ISODOW and DOW as field pattern for EXTRACT().
Updated question
Your function looks good, except for this line:
CROSS JOIN holiday
Otherwise, if I take the bit-shifting route, I end up with a similar query:
WITH x AS (SELECT '2013-10-09'::date AS day) -- supply your date
,y AS (SELECT day, B'1000000' >> (EXTRACT(ISODOW FROM day)::int - 1) AS dow
FROM x)
SELECT c.company_name, y.day + open_time AT TIME ZONE c.timezone AS open_ts
, y.day + close_time AT TIME ZONE c.timezone AS close_ts
FROM hours h
JOIN company c USING (company_id)
CROSS JOIN y
WHERE h.days & y.dow = y.dow;
AND NOT EXISTS ...
EXTRACT(ISODOW FROM requested_date)::int is just a faster equivalent of to_char(requested_date,'ID')::int
"Pass" day in WHERE clause?
To make that work you would have to generate a huge temporary table covering all possible days before selecting rows for the day in the WHERE clause. Possible (I would employ generate_series()), but very expensive.
My answer to your first draft is a smaller version of this: I expand all rows only for a pattern week before selecting the day matching the date in the WHERE clause. The tricky part is to display timestamps built from the input in the WHERE clause. Not possible. You are back to the huge table covering all days. Unless you have only few companies and a decently small date range, I would not go there.
This is built off the previous answers.
The sample data:
CREATE temp TABLE company (company_id int, company text);
INSERT INTO company VALUES
(1, 'Acme, Inc.')
,(2, 'Amalgamated');
CREATE temp TABLE hours(company_id int, days bit(7), open_time time, close_time time);
INSERT INTO hours VALUES
(1, '1111100', '08:30:00', '17:00:00')
,(2, '0000010', '09:00:00', '12:30:00');
create temp table holidays(company_id int, month_of_year int, day_of_month int);
insert into holidays values
(1, 1, 1),
(2, 1, 1),
(2, 1, 12) -- this was a saturday in 2013
;
First, just matching a date's day of week against the hours table's day of week, using the logic you provided:
select *
from company a
left join hours b
on a.company_id = b.company_id
left join holidays c
on b.company_id = c.company_id
where (b.days & (B'1000000' >> (to_char(current_date,'ID')::int -1)))
= (B'1000000' >> (to_char(current_date,'ID')::int -1))
;
Postgres lets you create custom operators to simplify expressions like in that where clause, so you might want an operator that matches the day of week between a bit string and a date. First the function that performs the test:
CREATE FUNCTION match_day_of_week(bit, date)
RETURNS boolean
AS $$
select ($1 & (B'1000000' >> (to_char($2,'ID')::int -1))) = (B'1000000' >> (to_char($2,'ID')::int -1))
$$
LANGUAGE sql IMMUTABLE STRICT;
You could stop there make in your where clause look something like "where match_day_of_week(days, some-date)". The custom operator makes this look a little prettier:
CREATE OPERATOR == (
leftarg = bit,
rightarg = date,
procedure = match_day_of_week
);
Now you've got syntax sugar to simplify that predicate. Here I've also added in the next test (that the month_of_year and day_of_month of a holiday don't correspond with the supplied date):
select *
from company a
left join hours b
on a.company_id = b.company_id
left join holidays c
on b.company_id = c.company_id
where b.days == current_date
and extract(month from current_date) != month_of_year
and extract(day from current_date) != day_of_month
;
For simplicity I start by adding an extra type (another awesome postgres feature) to encapsulate the month and day of the holiday.
create type month_day as (month_of_year int, day_of_month int);
Now repeat the process above to make another custom operator.
CREATE FUNCTION match_day_of_month(month_day, date)
RETURNS boolean
AS $$
select extract(month from $2) = $1.month_of_year
and extract(day from $2) = $1.day_of_month
$$
LANGUAGE sql IMMUTABLE STRICT;
CREATE OPERATOR == (
leftarg = month_day,
rightarg = date,
procedure = match_day_of_month
);
Finally, the original query is reduced to this:
select *
from company a
left join hours b
on a.company_id = b.company_id
left join holidays c
on b.company_id = c.company_id
where b.days == current_date
and not ((c.month_of_year, c.day_of_month)::month_day == current_date)
;
Reducing that down to a view looks like this:
create view x
as
select b.days,
(c.month_of_year, c.day_of_month)::month_day as holiday,
a.company_id,
b.open_time,
b.close_time
from company a
left join hours b
on a.company_id = b.company_id
left join holidays c
on b.company_id = c.company_id
;
And you could use that like this:
select company_id, open_time, close_time
from x
where days == current_date
and not (holiday == current_date)
;
Edit: You'll need to work on this logic a bit, by the way - this was more about showing the idea of how to do it with custom operators. For starters, if a company has multiple holidays defined you'll likely get multiple results back for that company.
I posted a similar response on PostgreSQL mailing list. Basically, avoiding the use of a function-invocation API in this situation is likely a foolish decision. The function call is the best API for this use-case. If you have a concrete scenario that you need to support where a function will not work then please provide that and maybe that scenario can be solved without having to compromise the PostgreSQL API. All your comments so far are about planning for an unknown future that very well may never come to be.

Query start and stop dates between two date fields

I have a start order date field and a stop order field I need to check the database to see if the start and stop orders are outside of any of the pay period start and end fields. Say 01-aug-10 and 14-aug-10 and 15-aug-10 and 28-aug-10 and 29-aug-10 and 11-sep-10 are all of the pay periods in one month. The start order 01-aug-10 and the end order is 14-aug-10. Yet when I do a SQL that said (where end order not between pay period start and pay period end) the 01-aug-10 to 14-aug-10 still shows up. My needs are if it finds any dates that match stop looking and go to the next record start order and stop order and start the next search till we hit the end of the record search requirements.
I am looking to search by month and by year to keep the query responsive. The database is quite large. My query seams like it only does the between check once and then it shows all of the data that does fit between the pay period start and stop, and that is the data I do not want to see!
What dbms?
So you have a table like this?
CREATE TABLE something
(
pay_period_start date NOT NULL,
pay_period_end date NOT NULL,
CONSTRAINT something_pkey PRIMARY KEY (pay_period_start),
CONSTRAINT something_pay_period_end_key UNIQUE (pay_period_end),
CONSTRAINT something_check CHECK (pay_period_end > pay_period_start)
);
insert into something values ('2010-08-01', '2010-08-14');
insert into something values ('2010-08-15', '2010-08-28');
insert into something values ('2010-08-29', '2010-09-11');
Then I can run this query. ('2010-08-14' is the value of your stop order column or end order column or something like that.)
select * from something
where '2010-08-14' not between pay_period_start and pay_period_end
order by pay_period_start;
and I get
2010-08-15;2010-08-28
2010-08-29;2010-09-11
For pairs of dates, use the OVERLAPS operator. This query
select * from something
where
(date '2010-08-01', date '2010-08-14') overlaps
(pay_period_start, pay_period_end)
order by pay_period_start;
returns
2010-08-01;2010-08-14
To exclude rows where start order and end order exactly match a pay period, use something like this:
select * from something
where (
(date '2010-08-01', date '2010-08-14') overlaps
(pay_period_start, pay_period_end) and
(date '2010-08-01' <> pay_period_start) and
(date '2010-08-14' <> pay_period_end)
)
order by pay_period_start;

making sure "expiration_date - X" falls on a valid "date_of_price" (if not, use the next valid date_of_price)

I have two tables. The first table has two columns: ID and date_of_price. The date_of_price field skips weekend days and bank holidays when markets are closed.
table: trading_dates
ID date_of_price
1 8/7/2008
2 8/8/2008
3 8/11/2008
4 8/12/2008
The second table also has two columns: ID and expiration_date. The expiration_date field is the one day in each month when options expire.
table: expiration_dates
ID expiration_date
1 9/20/2008
2 10/18/2008
3 11/22/2008
I would like to do a query that subtracts a certain number of days from the expiration dates, with the caveat that the resulting date must be a valid date_of_price. If not, then the result should be the next valid date_of_price.
For instance, say we are subtracting 41 days from the expiration_date. 41 days prior to 9/20/2008 is 8/10/2008. Since 8/10/2008 is not a valid date_of_price, we have to skip 8/10/2008. The query should return 8/11/2008, which is the next valid date_of_price.
Any advice would be appreciated! :-)
This will subtract 41 days from the date with ID = 1 in expirations_dates and find the nearest date_of_price.
Query
Modify the ID and the 41 for different dates.
SELECT date_of_price
FROM trading_dates
WHERE date_of_price >= (
SELECT DATE_SUB(expiration_date, INTERVAL 41 DAY)
FROM expiration_dates
WHERE ID=1
)
ORDER BY date_of_price ASC
LIMIT 1;
Performance
To get the best performance from this query, your trading_dates table should have a clustered index on date_of_price (this will make the ORDER BY a no-op) and of course a primary key index on expirations_dates.ID (to lookup the expiration date quickly).
Don't put in the clustered index blindly though. If you update or insert values more often than you look up expirations like this, then don't put it in since all your inserts/updates will have an added overhead to keep the clustered index.