Google BigQuery: Add date to table name when creating a table

Google BigQuery: Add date to table name when creating a table - google-bigquery

I am writing a query I am planning to schedule using Big Query UI.
I would like to add a _TABLE_SUFFIX to this table which is equal to CURRENT_DATE.
How could I achieve that?
This is the query I am working on:
IF
today != DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 DAY)
THEN
CREATE TABLE `project.dataset.tablename_<insert_current_date_here>`
AS
SELECT CURRENT_DATE() as today;
END IF;

Update (2023-01-09): I think Samuel's approach using an official templating solution here is ideal.
The best bet would be to generate the query dynamically, and then execute it statically.
This could be done using something like python.
from datetime import datetime
def get_query():
return '''IF
today != DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 DAY)
THEN
CREATE TABLE `project.dataset.%s`
AS
SELECT CURRENT_DATE() as today;
END IF;''' % str(datetime.now())

BigQuery supports a template system for destination table names in scheduled queries. To add the current date to the table name, use the provided template syntax. For example, tablename_{run_time|"%Y%m%d"} would output tablename_YYYYMMDD.

You could (whether you should is another debate) create dynamic table names via BQ's SQL procedural language capability, specifically the EXECUTE IMMEDIATE statement.
e.g.
DECLARE today STRING DEFAULT STRING(DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 DAY));
EXECUTE IMMEDIATE format("""
CREATE TABLE `project.dataset.tablename_%s` AS
SELECT CURRENT_DATE() as today
""", today);
For more also see towardsdatascience.com/how-to-use-dynamic-sql-in-bigquery.
Note you might also now get error location issues with EXECUTE IMMEDIATE , if so try changing/checking your Processing location in Query Settings, see here

Related

Insert the generated sequence into the Redshift table

I ran into a problem with Redshift. I'm generating a sequence of dates and want to embed it in a table to work with the range. But Redshift only supports generation on the leader node. It is not possible to insert the data on the nodes. Nowhere in the documentation have I found information on how to insert the generated sequences into tables. Maybe someone has encountered such a problem and can share their experience in solving it?
My sequence:
SELECT date '2019-12-31' + INTERVAL AS date_range
FROM generate_series(1, (date '2041-01-01' - date '2020-01-01')) INTERVAL;
My query:
CREATE TABLE public.date_sequence AS (
SELECT date '2019-12-31' + INTERVAL AS date_range
FROM generate_series(1, (date '2041-01-01' - date '2020-01-01')) INTERVAL
);
I also tried inserting data from cte. Insert data into a temporary table. The result is the same:
ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.

In amazon redshift you cant populate a table with data with create table you have to use a insert as far as I know, my suggestion would be something like this
INSERT INTO public.date_sequence
SELECT date '2019-12-31' + INTERVAL AS date_range
FROM generate_series(1, (date '2041-01-01' - date '2020-01-01')) INTERVAL;

Generate_series() is an unsupported function on Redshift. While it will still work on the leader node it is not a good idea to base your solutions on such functions.
The better and supported approach is to use a recursive CTE - see https://docs.aws.amazon.com/redshift/latest/dg/r_WITH_clause.html
I wrote up an example for making a series of dates in this answer - trying to create a date table in Redshift

Are Macros variables in big query possible?

I am new to big query. In my code I want to be able to declare variable names that I can reference in my table names and where clauses. Is this possible in big query. Just so when I rerun the code I don’t have to find the parts to change.
So I would like to have this at the start of my code that I can change every time I run the code
enter code here
Start_date=‘2020-05-01’
End_date=‘2020-06-01’
Selection_Date=‘20200602’
Create table test_selection_date as Select * from sales Where date>= start_date and date<=End_date;
Anybody know how this is possible and how I can code it in big query? Thanks

Below is for BigQuery Standard SQL and should give you good start
DECLARE Start_date, End_date, Selection_Date STRING;
SET (Start_date, End_date, Selection_Date) = ('2020-05-01', '2020-06-01', '20200602');
EXECUTE IMMEDIATE FORMAT(
"Create table test_%s as Select * from sales Where date>= '%s' and date < = '%s'",
Selection_Date, Start_date, End_date
);
See more about Scripting to tune above to your real use-case

Cannot query over table without a filter that can be used for partition elimination

I have a partitioned table and would love to use a MERGE statement, but for some reason doesn't work out.
MERGE `wr_live.p_email_event` t
using `wr_live.email_event` s
on t.user_id=s.user_id and t.event=s.event and t.timestamp=s.timestamp
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
values (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
I get
Cannot query over table 'wr_live.p_email_event' without a filter that
can be used for partition elimination.
What's the proper syntax? Also is there a way I can express shorter the insert stuff? without naming all columns?

What's the proper syntax?
As you can see from error message - your partitioned wr_live.p_email_event table was created with require partition filter set to true. This mean that any query over this table must have some filter on respective partitioned field
Assuming that timestamp IS that partitioned field - you can do something like below
MERGE `wr_live.p_email_event` t
USING `wr_live.email_event` s
ON t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > CURRENT_DATE() -- this is the filter you should tune
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
VALUES (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
So you need to make below line such that it in reality does not filter out whatever you need to be involved
AND DATE(t.timestamp) <> CURRENT_DATE() -- this is the filter you should tune
For example, I found, setting it to timestamp in future - in many cases addresses the issue, like
AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
Of course, if your wr_live.email_event table also partitioned with require partition filter set to true - you need to add same filter for s.timestamp
Also is there a way I can express shorter the insert stuff? without naming all columns?
BigQuery DML's INSERT requires column names to be specified - there is no way (at least that I am aware of) to avoid it using INSERT statement
Meantime, you can avoid this by using DDL's CREATE TABLE from the result of the query. This will not require listing the columns
For example, something like below
CREATE OR REPLACE TABLE `wr_live.p_email_event`
PARTITION BY DATE(timestamp) AS
SELECT * FROM `wr_live.p_email_event`
WHERE DATE(timestamp) <> DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL
SELECT * FROM `wr_live.email_event` s
WHERE NOT EXISTS (
SELECT 1 FROM `wr_live.p_email_event` t
WHERE t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
)
You might also want to include table options list via OPTIONS() - but looks like filter attribute is not supported yet - so if you do have/need it - above will "erase" this attribute :o(

Oracle SQL use variable partition name

I run a daily report that has to query another table which is updated separately. Due to the high volume of records in the source table (8M+ per day) each day is stored in it's own partition. The partition has a standard format as P ... 4 digit year ... 2 digit month ... 2 digit date, so yesterday's partition is P20140907.
At the moment I use this expression, but have to manually change the name of the partition each day:
select * from <source_table> partition (P20140907) where ....
By using sysdate, toChar and Concat I have created another table called P_NAME2 that will automatically generate and update a string value as the name of the partition that I need to read. Now I need to update my main query so it does this:
select * from <source_table> partition (<string from P_NAME2>) where ....

You are working too hard. Oracle already does all these things for you. If you query the table using the correct date range oracle will perform the operation only on the relevant partitions - this is called pruning .
I suggest reading the docs on that.
If you'r still skeptic, Query all_tab_partitions.HIGH_VALUE to get each partitions high value (the table you created ... ).

I thought I'd pop back to share how I solved this in the end. The source database has a habit of leaking dates across partitions which is why queries for one day were going outside a single partition. I can't affect this, just work around it ...
begin
execute immediate
'create table LL_TEST as
select *
from SCHEMA.TABLE Partition(P'||TO_CHAR(sysdate,'YYYYMMDD')||')
where COLUMN_A=''Something''
and COLUMN_B=''Something Else''
';
end
;
Using the PL/SQL script I create the partition name with TO_CHAR(sysdate,'YYYYMMDD') and concatenate the rest of the query around it.
Note that the values you are searching for in the where clause require double apostrophes so to send 'Something' to the query you need ''Something'' in the script.
It may not be pretty, but it works on the database that I have to use.

How do I find records that are inserted in the last four hours? - (Oracle SQLPLUS)

This is regarding Oracle SQLPLUS query language.
SELECT * FROM mytable WHERE record_date > time_4_hours_ago;
I have tried several methods, described on net and all of them did not work for me.
Tied UNIX_SYSTEM_TIME as well.
Problem seems a pretty common one, but I am struggling to get a solution that works with "Oracle" SQLPLUS.

Assuming record_date is a DATE field:
SELECT * FROM mytable WHERE record_date > sysdate - (4/24)

Try this:
select *
from mytable
where record_date > sysdate - interval '240' minute

To avoid invalid datetime values due to time zone differences you should override (record_date) value at the server side "using a database trigger for example" so as to insert the correct datetime value.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Google BigQuery: Add date to table name when creating a table - google-bigquery

BigQuery supports a template system for destination table names in scheduled queries. To add the current date to the table name, use the provided template syntax. For example, tablename_{run_time|"%Y%m%d"} would output tablename_YYYYMMDD.

Related

Insert the generated sequence into the Redshift table

Are Macros variables in big query possible?

Cannot query over table without a filter that can be used for partition elimination

Oracle SQL use variable partition name

How do I find records that are inserted in the last four hours? - (Oracle SQLPLUS)

Categories

Resources