Insert the generated sequence into the Redshift table - sql

I ran into a problem with Redshift. I'm generating a sequence of dates and want to embed it in a table to work with the range. But Redshift only supports generation on the leader node. It is not possible to insert the data on the nodes. Nowhere in the documentation have I found information on how to insert the generated sequences into tables. Maybe someone has encountered such a problem and can share their experience in solving it?
My sequence:
SELECT date '2019-12-31' + INTERVAL AS date_range
FROM generate_series(1, (date '2041-01-01' - date '2020-01-01')) INTERVAL;
My query:
CREATE TABLE public.date_sequence AS (
SELECT date '2019-12-31' + INTERVAL AS date_range
FROM generate_series(1, (date '2041-01-01' - date '2020-01-01')) INTERVAL
);
I also tried inserting data from cte. Insert data into a temporary table. The result is the same:
ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.

In amazon redshift you cant populate a table with data with create table you have to use a insert as far as I know, my suggestion would be something like this
INSERT INTO public.date_sequence
SELECT date '2019-12-31' + INTERVAL AS date_range
FROM generate_series(1, (date '2041-01-01' - date '2020-01-01')) INTERVAL;

Generate_series() is an unsupported function on Redshift. While it will still work on the leader node it is not a good idea to base your solutions on such functions.
The better and supported approach is to use a recursive CTE - see https://docs.aws.amazon.com/redshift/latest/dg/r_WITH_clause.html
I wrote up an example for making a series of dates in this answer - trying to create a date table in Redshift

Related

Add a day to a date in Postgres and SQL Server

I'm looking to find a way to add a day to a date in both Postgres and SQL Server so I don't have to add an if condition checking which database the server is running
DATEADD(day, 1, STOP_DATE)
doesn't work in PostgreSQL &
STOP_DATE + 1
doesnt work in sql server
Overall, it is not a good idea to try to write SQL code using syntax that is common on both SQL Server and Postgres. You are severely limiting yourself and will sooner or later come across a query that runs too slowly because it doesn't use syntax specific to one of the DBMS.
For example, with your approach you are artificially refusing to use lateral joins, because their syntax is different in Postgres (LATERAL JOIN) and SQL Server (CROSS/OUTER APPLY).
Back to your question.
You can add an integer value to a date value in Postgres and to datetime value in SQL Server.
SQL Server
CREATE TABLE T(d datetime);
INSERT INTO T VALUES ('2020-01-01');
SELECT
d, d+1 AS NextDay
FROM T
http://sqlfiddle.com/#!18/d519d9/1
This will not work with date or datetime2 data types in SQL Server, only datetime.
Postgres
CREATE TABLE T(d date);
INSERT INTO T VALUES ('2020-01-01');
SELECT
d, d+1 AS NextDay
FROM T
http://sqlfiddle.com/#!17/b9670/2
I don't know if it will work with other data types.
Define a function in PostgreSQL that works like the sql server function.
Edit:
can't pass day
Create a function with the same name on each database system that adds a day accordingly.

Google BigQuery: Add date to table name when creating a table

I am writing a query I am planning to schedule using Big Query UI.
I would like to add a _TABLE_SUFFIX to this table which is equal to CURRENT_DATE.
How could I achieve that?
This is the query I am working on:
IF
today != DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 DAY)
THEN
CREATE TABLE `project.dataset.tablename_<insert_current_date_here>`
AS
SELECT CURRENT_DATE() as today;
END IF;
Update (2023-01-09): I think Samuel's approach using an official templating solution here is ideal.
The best bet would be to generate the query dynamically, and then execute it statically.
This could be done using something like python.
from datetime import datetime
def get_query():
return '''IF
today != DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 DAY)
THEN
CREATE TABLE `project.dataset.%s`
AS
SELECT CURRENT_DATE() as today;
END IF;''' % str(datetime.now())
BigQuery supports a template system for destination table names in scheduled queries. To add the current date to the table name, use the provided template syntax. For example, tablename_{run_time|"%Y%m%d"} would output tablename_YYYYMMDD.
You could (whether you should is another debate) create dynamic table names via BQ's SQL procedural language capability, specifically the EXECUTE IMMEDIATE statement.
e.g.
DECLARE today STRING DEFAULT STRING(DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 DAY));
EXECUTE IMMEDIATE format("""
CREATE TABLE `project.dataset.tablename_%s` AS
SELECT CURRENT_DATE() as today
""", today);
For more also see towardsdatascience.com/how-to-use-dynamic-sql-in-bigquery.
Note you might also now get error location issues with EXECUTE IMMEDIATE , if so try changing/checking your Processing location in Query Settings, see here

Redshift - Issue displaying time difference in table stored in table

I am trying to find the time difference between two time stamps and store it in a column. When I check the output in the table, I see the value to be a huge number and not the difference in day/hours. I am using amazon redshift as the database.
Data_type of time_duration column : varchar
Given below is the sample:
order_no,order_date,complain_date,time_duration
1001,2018-03-10 04:00:00,2018-03-11 07:00:00,97200000000
But I am expecting time_duration column to show 1 day,3 hours
This issue happens when I store the time_duration in a table and then query to view the output.
Could anyone assist.
Thanks.
Do it following way, it will give hours difference
select datediff(hour,order_date,complain_date) as diff_in_hour from your_table ;
If you want to do it day, do it following way
select datediff(day,order_date,complain_date) as diff_in_day from your_table;
You could use datediff function to update your table column time_duration.
Refer Redshift documentation for more details.

Cannot query over table without a filter that can be used for partition elimination

I have a partitioned table and would love to use a MERGE statement, but for some reason doesn't work out.
MERGE `wr_live.p_email_event` t
using `wr_live.email_event` s
on t.user_id=s.user_id and t.event=s.event and t.timestamp=s.timestamp
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
values (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
I get
Cannot query over table 'wr_live.p_email_event' without a filter that
can be used for partition elimination.
What's the proper syntax? Also is there a way I can express shorter the insert stuff? without naming all columns?
What's the proper syntax?
As you can see from error message - your partitioned wr_live.p_email_event table was created with require partition filter set to true. This mean that any query over this table must have some filter on respective partitioned field
Assuming that timestamp IS that partitioned field - you can do something like below
MERGE `wr_live.p_email_event` t
USING `wr_live.email_event` s
ON t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > CURRENT_DATE() -- this is the filter you should tune
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
VALUES (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
So you need to make below line such that it in reality does not filter out whatever you need to be involved
AND DATE(t.timestamp) <> CURRENT_DATE() -- this is the filter you should tune
For example, I found, setting it to timestamp in future - in many cases addresses the issue, like
AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
Of course, if your wr_live.email_event table also partitioned with require partition filter set to true - you need to add same filter for s.timestamp
Also is there a way I can express shorter the insert stuff? without naming all columns?
BigQuery DML's INSERT requires column names to be specified - there is no way (at least that I am aware of) to avoid it using INSERT statement
Meantime, you can avoid this by using DDL's CREATE TABLE from the result of the query. This will not require listing the columns
For example, something like below
CREATE OR REPLACE TABLE `wr_live.p_email_event`
PARTITION BY DATE(timestamp) AS
SELECT * FROM `wr_live.p_email_event`
WHERE DATE(timestamp) <> DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL
SELECT * FROM `wr_live.email_event` s
WHERE NOT EXISTS (
SELECT 1 FROM `wr_live.p_email_event` t
WHERE t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
)
You might also want to include table options list via OPTIONS() - but looks like filter attribute is not supported yet - so if you do have/need it - above will "erase" this attribute :o(

how to find fifference between 2 dates in DB2

In my database i have a TIMESTAMP field.Now I want to find the difference between the value stored in this field and the current TIMESTAMP .I want the result in days.
How can i get it?
Note:I am using DB2 database
days(TIMESTAMP)_-days(your_col)
You don't say what version of DB2 you're using.
In older versions, you have to SELECT against a table, even though you're not retrieving any columns.
SELECT days(TIMESTAMP)-days(your_col)
FROM SYSIBM.SYSTABLES
Substitute a creator and table that you have SELECT authority for.
I shall reiterate what Gilbert answered. He was correct, but I don't think the original author understood the idea. As an example:
create table test ( mycolumn timestamp )
insert into test values (current timestamp)
select * from test
select days(current timestamp)-days(mycolumn) from test
The current timestamp is a special register that DB2 has available. Days is a default function that will return the number of days component of the given timestamp counting from January 1, year 1.
The CURRENT_DATE returns the current timestamp then this can be used
DAYS(CURRENT_DATE)-DAYS(TIMESTAMP)