Postgres greater than or null - sql

I am trying to efficiently use an index for the greater than or null query.
Example of a query would be:
select * from table where date > '2020-01-01' or date is null
What index if any do I need that postgres will be able to do this efficiently. I tried doing an index with
create index date on table (date asc nulls last)
but it does not seem to work and the best I am able to get is two bitmaps scans (one for greater than and one for null).

If you are able to rewrite your condition, you could replace the null value with a date that is guaranteed to be greater than the comparison value:
where coalesce(date, 'infinity') > date '2020-01-01'
Then create an index on that expression:
create index on the_table ( (coalesce(date, 'infinity')) )
See also PostgreSQL docs:
Date/Time Types, 8.5.1.4. Special Values for infinity value
Conditional Expressions, 9.18.2. COALESCE for coalesce function

Does Postgres use the index correctly when you use union all?
select *
from table
where date > '2020-01-01'
union all
select *
from table
where date is null;
The issue might be the inequality combined with the NULL comparison. If this is some sort of "end date", then you might consider using some far out future value such as 9999-01-01 or infinity.

Related

current_timestamp redshift

When I select current_timestamp from redshift, I am getting a list of values instead of one value.
Here is my SQL query
select current_timestamp
from stg.table;
Does anybody know why I am getting a list of values instead a single value?
This is your query:
select current_timestamp from stg.table
This produces as many rows as there are in table stg.table (since that's the from clause), with a single column that always contains the current date/time on each row. On the other hand, if the table contains no row, the query returns no rows.
If you want just one row, use a scalar subquery without a from clause:
select current_timestamp as my_timestamp
You will receive a row for each row in stg.table. According to the RedShift docs you should be using GETDATE() or SYSDATE() instead. Perhaps you want, e.g.:
select GETDATE() as my_timestamp

Why Date/Time statement in access returning no values?

Table has an End_Date column (Datatype: Date/time) with last date of every month
When I run the query I expect all records with End Date less than for example 31-Dec-2019
Select * from Table where End_Date < 31/12/2019
But it returns no result
When dealing with dates in Access, you need to ensure that they are wrapped in octothorpes ("#"):
SELECT * FROM Table WHERE End_Date<#31/12/2019#
Regards,

Using an UDF to query a table in Hive

I have the following UDF available on Hive to convert a time bigint to date,
to_date(from_utc_timestamp(from_unixtime(cast(listed_time/1000 AS bigint)),'PST'))
I want to use this UDF to query a table on a specific date. Something like,
SELECT * FROM <table_name>
WHERE date = '2020-03-01'
ORDER BY <something>
LIMIT 10
I would suggest to change the logic: avoid applying the function to the column being filtered, because it is an inefficient approach. The function needs to be invoked for every row, which prevents the query from benefiting an index.
On the other hand, you can simply convert the input date to a unix timestamp (possibly with an UDF). This should look like;
SELECT * FROM <table_name>
WHERE date = to_utc_timestamp('2020-03-01', 'PST') * 1000
ORDER BY <something>
LIMIT 10

What indexes should be created for date subtraction in view?

I have a view (my_view) with a calculated column (days_since_my_date) that gets the difference (in days) between today and a date column (my_date from my_table):
CREATE VIEW my_view AS
SELECT 'now'::text::date - my_date AS days_since_my_date,
...
FROM my_table;
What indexes (if any) do I need to optimize greater than/less than date queries on the view's calculated column (days_since_my_date)? I'm assuming they would need to be applied to the my_date column in my_table. The queries would be fairly simple, similar to the following:
SELECT *
FROM my_view
WHERE days_since_my_date >= 10;
A standard index created against my_date, like the one below, doesn't get hit during the above query:
CREATE INDEX my_date_idx on my_table(my_date);
Any help is much appreciated.
You can't index your expression because it depends on a non-deterministic function.
Instead of comparing the constructed column, you should compare the indexed column against a constant (as of runtime):
SELECT *
FROM my_view
WHERE my_date <= NOW() - '10 days'::INTERVAL

What's the proper index for querying structures in arrays in Postgres jsonb?

I'm experimenting with keeping values like the following in a Postgres jsonb field in Postgres 9.4:
[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
{"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
{"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]
I'm executing queries like:
SELECT * FROM locations
WHERE EXISTS (
SELECT 1 FROM jsonb_array_elements(events) AS e
WHERE (
e->>'event_slug' = 'test_1' AND
(
e->>'start_time' >= '2014-10-30 14:04:06 -0400' OR
e->>'end_time' >= '2014-10-30 14:04:06 -0400'
)
)
)
How would I create an index on that data for queries like the above to utilize? Does this sound reasonable design for a few million rows that each contain ~10 events in that column?
Worth noting that it seems I'm still getting sequential scans with:
CREATE INDEX events_gin_idx ON some_table USING GIN (events);
which I'm guessing is because the first thing I'm doing in the query is converting data to json array elements.
First of all, you cannot access JSON array values like that. For a given json value:
[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
{"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
{"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]
A valid test against the first array element would be:
WHERE e->0->>'event_slug' = 'test_1'
But you probably don't want to limit your search to the first element of the array. With the jsonb data type in Postgres 9.4 you have additional operators and index support. To index elements of an array you need a GIN index.
The built-in operator classes for GIN indexes do not support "greater than" or "less than" operators > >= < <=. This is true for jsonb as well, where you can choose between two operator classes. The manual:
Name Indexed Data Type Indexable Operators
...
jsonb_ops jsonb ? ?& ?| #>
jsonb_path_ops jsonb #>
(jsonb_ops being the default.) You can cover the equality test, but neither of those operators covers your requirement for >= comparison. You would need a btree index.
Basic solution
To support the equality check with an index:
CREATE INDEX locations_events_gin_idx ON locations
USING gin (events jsonb_path_ops);
SELECT * FROM locations WHERE events #> '[{"event_slug":"test_1"}]';
This might be good enough if the filter is selective enough.
Assuming end_time >= start_time, so we don't need two checks. Checking only end_time is cheaper and equivalent:
SELECT l.*
FROM locations l
, jsonb_array_elements(l.events) e
WHERE l.events #> '[{"event_slug":"test_1"}]'
AND (e->>'end_time')::timestamp >= '2014-10-30 14:04:06 -0400'::timestamptz;
Utilizing an implicit JOIN LATERAL. Details (last chapter):
PostgreSQL unnest() with element number
Careful with the different data types! What you have in the JSON value looks like timestamp [without time zone], while your predicates use timestamp with time zone literals. The timestamp value is interpreted according to the current time zone setting, while the given timestamptz literals must be cast to timestamptz explicitly or the time zone would be ignored! Above query should work as desired. Detailed explanation:
Ignoring time zones altogether in Rails and PostgreSQL
More explanation for jsonb_array_elements():
PostgreSQL joining using JSONB
Advanced solution
If the above is not good enough, I would consider a MATERIALIZED VIEW that stores relevant attributes in normalized form. This allows plain btree indexes.
The code assumes that your JSON values have a consistent format as displayed in the question.
Setup:
CREATE TYPE event_type AS (
, event_slug text
, start_time timestamp
, end_time timestamp
);
CREATE MATERIALIZED VIEW loc_event AS
SELECT l.location_id, e.event_slug, e.end_time -- start_time not needed
FROM locations l, jsonb_populate_recordset(null::event_type, l.events) e;
Related answer for jsonb_populate_recordset():
How to convert PostgreSQL 9.4's jsonb type to float
CREATE INDEX loc_event_idx ON loc_event (event_slug, end_time, location_id);
Also including location_id to allow index-only scans. (See manual page and Postgres Wiki.)
Query:
SELECT *
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz;
Or, if you need full rows from the underlying locations table:
SELECT l.*
FROM (
SELECT DISTINCT location_id
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz
) le
JOIN locations l USING (location_id);
CREATE INDEX json_array_elements_index ON
json_array_elements ((events_arr->>'event_slug'));
Should get you started in the right direction.