I am new to big query. In my code I want to be able to declare variable names that I can reference in my table names and where clauses. Is this possible in big query. Just so when I rerun the code I don’t have to find the parts to change.
So I would like to have this at the start of my code that I can change every time I run the code
enter code here
Start_date=‘2020-05-01’
End_date=‘2020-06-01’
Selection_Date=‘20200602’
Create table test_selection_date as Select * from sales Where date>= start_date and date<=End_date;
Anybody know how this is possible and how I can code it in big query? Thanks
Below is for BigQuery Standard SQL and should give you good start
DECLARE Start_date, End_date, Selection_Date STRING;
SET (Start_date, End_date, Selection_Date) = ('2020-05-01', '2020-06-01', '20200602');
EXECUTE IMMEDIATE FORMAT(
"Create table test_%s as Select * from sales Where date>= '%s' and date < = '%s'",
Selection_Date, Start_date, End_date
);
See more about Scripting to tune above to your real use-case
Related
I have some MariaDB SQLs that are called by a bash script, this script set the start and end dates for the queries, but now the project will use Oracle DB.
So, I have something like this in MariaDB SQL:
SET #date_start := '2000-01-01';
SET #date_end := '2001-01-01';
SELECT * FROM user WHERE birth BETWEEN #date_start AND #date_end;
And I couldn't find anything like that in Oracle SQL, already tried DECLARE, DEFINE, WITH, but nothing works
In order to use parameters in Oracle, you have to use PL/SQL. However, if you just want to parameterize a single query, you can use a CTE:
WITH params AS (
SELECT DATE '2000-01-01' as date_start,
DATE '2001-01-01' as date_end
FROM dual
)
SELECT u.*
FROM params CROSS JOIN
user u
WHERE u.birth BETWEEN params.date_start AND params.date_end;
Similar logic works in just about any database (although dual may not be needed and the date constants might have different formats).
my query just runs and doesnt execute, what is wrong. work on oracle sql developer, company server
CREATE TABLE voice2020 AS
SELECT
to_char(SDATE , 'YYYYMM') as month,
MSISDN,
SUM(CH_MONEY_SUBS_DED)/100 AS AIRTIME_VOICE,
SUM(CALLDURATION/60) AS MIN_USAGE,
sum(DUR_ONNET_OOB/60) as DUR_ONNET_OOB,
sum(DUR_ONNET_IB/60) as DUR_ONNET_IB,
sum(DUR_ONNET_FREE/60) as DUR_ONNET_FREE,
sum(DUR_OFFNET_OOB/60) as DUR_OFFNET_OOB,
sum(DUR_OFFNET_IB/60) as DUR_OFFNET_IB,
sum(DUR_OFFNET_FREE/60) as DUR_OFFNET_FREE,
SUM(case when sdate < to_date('20190301','YYYYMMDD')
then CH_MONEY_PAID_DED-nvl(CH_MONEY_SUBS_DED,0)-REV_VOICE_INT-REV_VOICE_ROAM_OUTGOING-REV_VOICE_ROAM_Incoming
else (CH_MONEY_OOB-REV_VOICE_INT-REV_VOICE_ROAM_OUTGOING-REV_VOICE_ROAM_Incoming) end)/100 AS VOICE_OOB_SPEND
FROM CCN.CCN_VOICE_MSISDN_MM#xdr1
where MSISDN IN ( SELECT MSISDN FROM saayma_a.BASE30112020) --change date
GROUP BY
MSISDN,
to_char(SDATE , 'YYYYMM')
;
This is a performance issue. Clearly the query driving your CREATE TABLE statement is taking too long to return a result set.
You are querying from a table in a remote database (CCN.CCN_VOICE_MSISDN_MM#xdr1) and then filtering against a local table (saayma_a.BASE30112020) . This means you are going to copy all of that remote table across the network, then discard the records which don't match the WHERE clause.
You know your data (or at least you should know it): does that sound efficient? If you're actually discarding most of the records you should try to filter CCN_VOICE_MSIDN_MM in the remote database.
If you need more advice you need to provide more information. Please read this post about asking Oracle tuning questions on this site, then edit your question to include some details.
You are executing CTAS (CREATE TABLE AS SELECT) and the purpose of this query is to create the table with data which is generated via this query.
If you want to just execute the query and see the data then remove first line of your query.
-- CREATE TABLE voice2020 AS
SELECT
.....
Also, the data of your actual query must be present in the voice2020 table if you have already executed it once.
Select * from voice2020;
Looks like you are trying to copying the data from one table to another table, Can you once create the table if it's not created and then try this statement.
insert into target_table select * from source_table;
I have some useful queries, I'd like to build a few more complex ones that needs them as sub queries. Can I call them by name ?
I'v seen the 'save view' option and was able to build new queries that used saved views.
Does this method refreshes the saved view each time a top query uses it, by re-executing the relevant queries ? or is it just a named query result, that I have to rerun each time to refresh ?
other suggestions to build queries in modular fashion ? For example when I change the days range I select from I want all subqueries to use the range.
In programming it's either using promoters or globals, how to do this in BigQuery ?
Whilst it is very diffcult to address your questions due to its broadness. I will answer them with general guidelines and examples for each doubt.
Regarding your first question, about subqueries and calling queries by an alias. I have 2 considerations about these:
1) You can use subqueries with WITH. So, you perform your transformations in the data, save it in a temporary table and reference it in the following (sub)query. Moreover, every time you run the code, all the queries will be executed. Below is an example,
WITH data as (
SELECT "Alice" AS name, 39 AS age, "San Francisco" AS city UNION ALL
SELECT "Marry" AS name, 35 AS age, "San Francisco" AS city UNION ALL
SELECT "Phill" AS name, 18 AS age, "Boston" AS city UNION ALL
SELECT "Robert" AS name, 10 AS age, "Tampa" AS city
),
greater_30 AS (
SELECT * FROM data
WHERE age > 30
),
SF_30 AS (
SELECT * FROM greater_30
WHERE city = "San Francisco"
)
SELECT * FROM SF_30
and the output,
Row name age city
1 Alice 39 San Francisco
2 Marry 35 San Francisco
2) Create a Stored Procedure: procedures are blocks of statements which can be called from other queries and also executed recursively ( call one procedure inside other). In order to create and store a procedure you have to specify the project and dataset where it will be saved. As well as, its name. Below is an example (using a BigQuery public dataset),
#creating the procedure
CREATE or replace PROCEDURE project_id.ataset.chicago_taxi(IN trip_sec INT64, IN price INT64)
BEGIN
CREATE TEMP TABLE taxi_rides AS
SELECT * FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE trip_seconds > trip_sec and fare >price
LIMIT 10000
;
END;
Now, you can call the procedure using CALL. As follows:
DECLARE trip_sec INT64 DEFAULT 30;
DECLARE price INT64 DEFAULT 30;
CALL `project_id.ataset.chicago_taxi`(trip_sec, price);
SELECT max(fare) AS max_fare,payment_type FROM taxi_rides
GROUP BY payment_type
And the output,
Row max_fare payment_type
1 463.45 Cash
2 200.65 Credit Card
Notice that the procedure is saved within the dataset. Then we use CALL to call it and use its output (a temporary table) in the next select statement. I must point that every time the procedure is invoked, it executes the query.
Regarding your question about saved views: the view is updated every time you run it. Please refer to the documentation.
Finally, about the last question using parameters and globals in queries: you can use scripting in BigQuery in order to DECLARE and SET a variable. So, you can take advantages when changing filter parameters for example. Below there is an usage example using a public a public dataset,
DECLARE time_s timestamp;
SET time_s= timestamp(DATETIME "2016-01-01 15:30:00");
SELECT * FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE trip_start_timestamp > time_s
LIMIT 10000
Pay attention that every time the filter needs to be changed, it is possible to do it from the SET statement.
Note: If you have any specific question, please open another thread or you can ask me in the comment section.
I am writing a query I am planning to schedule using Big Query UI.
I would like to add a _TABLE_SUFFIX to this table which is equal to CURRENT_DATE.
How could I achieve that?
This is the query I am working on:
IF
today != DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 DAY)
THEN
CREATE TABLE `project.dataset.tablename_<insert_current_date_here>`
AS
SELECT CURRENT_DATE() as today;
END IF;
Update (2023-01-09): I think Samuel's approach using an official templating solution here is ideal.
The best bet would be to generate the query dynamically, and then execute it statically.
This could be done using something like python.
from datetime import datetime
def get_query():
return '''IF
today != DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 DAY)
THEN
CREATE TABLE `project.dataset.%s`
AS
SELECT CURRENT_DATE() as today;
END IF;''' % str(datetime.now())
BigQuery supports a template system for destination table names in scheduled queries. To add the current date to the table name, use the provided template syntax. For example, tablename_{run_time|"%Y%m%d"} would output tablename_YYYYMMDD.
You could (whether you should is another debate) create dynamic table names via BQ's SQL procedural language capability, specifically the EXECUTE IMMEDIATE statement.
e.g.
DECLARE today STRING DEFAULT STRING(DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 DAY));
EXECUTE IMMEDIATE format("""
CREATE TABLE `project.dataset.tablename_%s` AS
SELECT CURRENT_DATE() as today
""", today);
For more also see towardsdatascience.com/how-to-use-dynamic-sql-in-bigquery.
Note you might also now get error location issues with EXECUTE IMMEDIATE , if so try changing/checking your Processing location in Query Settings, see here
I would like to execute a dynamic SQL query stored in a string field on Amazon Redshift.
My background is mostly T-SQL relational databases. I used to build SQL statements dynamically, store them into variables and them execute them. I know Redshift can prepare and execute statements, but I wonder if it is possible to execute a query stored in a string field.
I have a piece of code that dynamically builds the code below with stats on several tables using pg_* system tables. Every column/table name is dynamically calculated. Here's an example of the query output:
SELECT h_article_id AS key, 'transport_parameters_weight_in_grams' AS col_name, COUNT(DISTINCT transport_parameters_weight_in_grams) AS count_value FROM dv.s_products GROUP BY h_article_id UNION ALL
SELECT h_article_id AS key, 'transport_parameters_width_in_mm' AS col_name, COUNT(DISTINCT transport_parameters_width_in_mm) AS count_value FROM dv.s_products GROUP BY h_article_id UNION ALL
SELECT h_article_id AS key, 'label_owner_info_communication_address' AS col_name, COUNT(DISTINCT label_owner_info_communication_address) AS count_value FROM dv.s_products GROUP BY h_article_id
I would like to input this dynamic piece of code within another query, so I can make some statistics, like so:
SELECT col_name, AVG(count_value*1.00) AS avg_count
FROM (
'QUERY ABOVE'
) A
GROUP BY col_name;
This would ouput something like:
col_name avg_count
transport_parameters_weight_in_grams 1.00
transport_parameters_width_in_mm 1.00
label_owner_info_communication_address 0.60
The natural way for me to do this would be to store everything as a string in a variable and execute it. But I'm afraid Redshift does not support this.
Is there an alternative way to really build dynamic SQL code?
This is possible now that we have added support for Stored Procedures. "Overview of Stored Procedures in Amazon Redshift"
For example, this stored procedure counts the rows in a table and inserts the table name and row count into another table. Both table names are provided as input.
CREATE PROCEDURE get_tbl_count(IN source_tbl VARCHAR, IN count_tbl VARCHAR) AS $$
BEGIN
EXECUTE 'INSERT INTO ' || quote_ident(count_tbl)
|| ' SELECT ''' || source_tbl ||''', COUNT(*) FROM '
|| quote_ident(source_tbl) || ';'
RETURN;
END;
$$ LANGUAGE plpgsql;
In your example the query to executed could be passed in as a string.
No. There is not a straightforward way to run dynamic built SQL code in Redshift.
You can't define SQL variables, or create stored procedures, as you would have in MS SQL Server.
You can create Python Functions in Redshift, but you would be coding in Python vs. SQL.
You can use the "PREPARE" and "EXECUTE" statements to run "pre-defined" SQL queries, but you would have to create the statements outside of the database, before passing them to the execute command. By creating the statement outside of the database, in a way defeats the purpose.... You can create any statement in your "favorite" programming language.
As I said, this SQL based, in-database dynamic SQL does not exist.
Basically, you need to run this logic in your application or using something such as AWS Data Pipeline.
I am using Postgre on Redshift, and I ran into this issue and found a solution.
I was trying to create a dynamic query, putting in my own date.
date = dt.date(2018, 10, 30)
query = ''' select * from table where date >= ''' + str(my_date) + ''' order by date '''
But, the query entirely ignores the condition when typing it this way.
However, if you use the percent sign (%), you can insert the date correctly.
The correct way to write the above statement is:
query = ''' select * from table where date >= ''' + ''' '%s' ''' % my_date + ''' order by date '''
So, maybe this is helpful, or maybe it is not. I hope it helps at least one person in my situation!
Best wishes.