What indexes should be created for date subtraction in view?

What indexes should be created for date subtraction in view? - sql

I have a view (my_view) with a calculated column (days_since_my_date) that gets the difference (in days) between today and a date column (my_date from my_table):
CREATE VIEW my_view AS
SELECT 'now'::text::date - my_date AS days_since_my_date,
...
FROM my_table;
What indexes (if any) do I need to optimize greater than/less than date queries on the view's calculated column (days_since_my_date)? I'm assuming they would need to be applied to the my_date column in my_table. The queries would be fairly simple, similar to the following:
SELECT *
FROM my_view
WHERE days_since_my_date >= 10;
A standard index created against my_date, like the one below, doesn't get hit during the above query:
CREATE INDEX my_date_idx on my_table(my_date);
Any help is much appreciated.

You can't index your expression because it depends on a non-deterministic function.
Instead of comparing the constructed column, you should compare the indexed column against a constant (as of runtime):
SELECT *
FROM my_view
WHERE my_date <= NOW() - '10 days'::INTERVAL

Related

Postgres greater than or null

I am trying to efficiently use an index for the greater than or null query.
Example of a query would be:
select * from table where date > '2020-01-01' or date is null
What index if any do I need that postgres will be able to do this efficiently. I tried doing an index with
create index date on table (date asc nulls last)
but it does not seem to work and the best I am able to get is two bitmaps scans (one for greater than and one for null).

If you are able to rewrite your condition, you could replace the null value with a date that is guaranteed to be greater than the comparison value:
where coalesce(date, 'infinity') > date '2020-01-01'
Then create an index on that expression:
create index on the_table ( (coalesce(date, 'infinity')) )
See also PostgreSQL docs:
Date/Time Types, 8.5.1.4. Special Values for infinity value
Conditional Expressions, 9.18.2. COALESCE for coalesce function

Does Postgres use the index correctly when you use union all?
select *
from table
where date > '2020-01-01'
union all
select *
from table
where date is null;
The issue might be the inequality combined with the NULL comparison. If this is some sort of "end date", then you might consider using some far out future value such as 9999-01-01 or infinity.

current_timestamp redshift

When I select current_timestamp from redshift, I am getting a list of values instead of one value.
Here is my SQL query
select current_timestamp
from stg.table;
Does anybody know why I am getting a list of values instead a single value?

This is your query:
select current_timestamp from stg.table
This produces as many rows as there are in table stg.table (since that's the from clause), with a single column that always contains the current date/time on each row. On the other hand, if the table contains no row, the query returns no rows.
If you want just one row, use a scalar subquery without a from clause:
select current_timestamp as my_timestamp

You will receive a row for each row in stg.table. According to the RedShift docs you should be using GETDATE() or SYSDATE() instead. Perhaps you want, e.g.:
select GETDATE() as my_timestamp

SQL query to retrieve last twelve months data for Oracle database

I am using Oracle database. I have a table called "TEST" where the dates/timestamps(These are stored as "Char" in my case) are stored in the following format. Now I want to retrieve the records of last twelve months based on today's date. What would be the correct way to do that?
TESTCOLUMN
------------
2019-06-28-02.01.07.327240
2020-06-28-04.49.12.480240
2020-06-28-05.05.10.681240

I think you need to use the ADD_MONTHS function and BETWEEN clause as follows:
SELECT * FROM YOUR_TABLE
WHERE TO_TIMESTAMP(YOUR_COLUMN,'YYYY-MM-DD HH24.MI.SS.FF')
BETWEEN ADD_MONTHS(SYSTIMESTAMP,-12) AND SYSTIMESTAMP;

Although storing values in a string is not recommended, your format is comparable. So you can do the comparison using strings rather than date/timestamps. Assuming your values are only in the past:
where testcolumn >= to_char(SYSTIMESTAMP, -12), 'YYYY-MM-DD HH24.MI.SS.FF')
This has an advantage over Tejash's solution, because this can make use of an index (or partitions) on testcolumn. Moving the date manipulations only on the "constants" (i.e. the system timestamp) helps the Oracle optimizer.

Using an UDF to query a table in Hive

I have the following UDF available on Hive to convert a time bigint to date,
to_date(from_utc_timestamp(from_unixtime(cast(listed_time/1000 AS bigint)),'PST'))
I want to use this UDF to query a table on a specific date. Something like,
SELECT * FROM <table_name>
WHERE date = '2020-03-01'
ORDER BY <something>
LIMIT 10

I would suggest to change the logic: avoid applying the function to the column being filtered, because it is an inefficient approach. The function needs to be invoked for every row, which prevents the query from benefiting an index.
On the other hand, you can simply convert the input date to a unix timestamp (possibly with an UDF). This should look like;
SELECT * FROM <table_name>
WHERE date = to_utc_timestamp('2020-03-01', 'PST') * 1000
ORDER BY <something>
LIMIT 10

Cannot query over table without a filter that can be used for partition elimination

I have a partitioned table and would love to use a MERGE statement, but for some reason doesn't work out.
MERGE `wr_live.p_email_event` t
using `wr_live.email_event` s
on t.user_id=s.user_id and t.event=s.event and t.timestamp=s.timestamp
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
values (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
I get
Cannot query over table 'wr_live.p_email_event' without a filter that
can be used for partition elimination.
What's the proper syntax? Also is there a way I can express shorter the insert stuff? without naming all columns?

What's the proper syntax?
As you can see from error message - your partitioned wr_live.p_email_event table was created with require partition filter set to true. This mean that any query over this table must have some filter on respective partitioned field
Assuming that timestamp IS that partitioned field - you can do something like below
MERGE `wr_live.p_email_event` t
USING `wr_live.email_event` s
ON t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > CURRENT_DATE() -- this is the filter you should tune
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
VALUES (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
So you need to make below line such that it in reality does not filter out whatever you need to be involved
AND DATE(t.timestamp) <> CURRENT_DATE() -- this is the filter you should tune
For example, I found, setting it to timestamp in future - in many cases addresses the issue, like
AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
Of course, if your wr_live.email_event table also partitioned with require partition filter set to true - you need to add same filter for s.timestamp
Also is there a way I can express shorter the insert stuff? without naming all columns?
BigQuery DML's INSERT requires column names to be specified - there is no way (at least that I am aware of) to avoid it using INSERT statement
Meantime, you can avoid this by using DDL's CREATE TABLE from the result of the query. This will not require listing the columns
For example, something like below
CREATE OR REPLACE TABLE `wr_live.p_email_event`
PARTITION BY DATE(timestamp) AS
SELECT * FROM `wr_live.p_email_event`
WHERE DATE(timestamp) <> DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL
SELECT * FROM `wr_live.email_event` s
WHERE NOT EXISTS (
SELECT 1 FROM `wr_live.p_email_event` t
WHERE t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
)
You might also want to include table options list via OPTIONS() - but looks like filter attribute is not supported yet - so if you do have/need it - above will "erase" this attribute :o(

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What indexes should be created for date subtraction in view? - sql

You can't index your expression because it depends on a non-deterministic function. Instead of comparing the constructed column, you should compare the indexed column against a constant (as of runtime): SELECT * FROM my_view WHERE my_date <= NOW() - '10 days'::INTERVAL

Related

Postgres greater than or null

current_timestamp redshift

SQL query to retrieve last twelve months data for Oracle database

Using an UDF to query a table in Hive

Cannot query over table without a filter that can be used for partition elimination

Categories

Resources