How to build modular queries in BigQuery? - google-bigquery

I have some useful queries, I'd like to build a few more complex ones that needs them as sub queries. Can I call them by name ?
I'v seen the 'save view' option and was able to build new queries that used saved views.
Does this method refreshes the saved view each time a top query uses it, by re-executing the relevant queries ? or is it just a named query result, that I have to rerun each time to refresh ?
other suggestions to build queries in modular fashion ? For example when I change the days range I select from I want all subqueries to use the range.
In programming it's either using promoters or globals, how to do this in BigQuery ?

Whilst it is very diffcult to address your questions due to its broadness. I will answer them with general guidelines and examples for each doubt.
Regarding your first question, about subqueries and calling queries by an alias. I have 2 considerations about these:
1) You can use subqueries with WITH. So, you perform your transformations in the data, save it in a temporary table and reference it in the following (sub)query. Moreover, every time you run the code, all the queries will be executed. Below is an example,
WITH data as (
SELECT "Alice" AS name, 39 AS age, "San Francisco" AS city UNION ALL
SELECT "Marry" AS name, 35 AS age, "San Francisco" AS city UNION ALL
SELECT "Phill" AS name, 18 AS age, "Boston" AS city UNION ALL
SELECT "Robert" AS name, 10 AS age, "Tampa" AS city
),
greater_30 AS (
SELECT * FROM data
WHERE age > 30
),
SF_30 AS (
SELECT * FROM greater_30
WHERE city = "San Francisco"
)
SELECT * FROM SF_30
and the output,
Row name age city
1 Alice 39 San Francisco
2 Marry 35 San Francisco
2) Create a Stored Procedure: procedures are blocks of statements which can be called from other queries and also executed recursively ( call one procedure inside other). In order to create and store a procedure you have to specify the project and dataset where it will be saved. As well as, its name. Below is an example (using a BigQuery public dataset),
#creating the procedure
CREATE or replace PROCEDURE project_id.ataset.chicago_taxi(IN trip_sec INT64, IN price INT64)
BEGIN
CREATE TEMP TABLE taxi_rides AS
SELECT * FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE trip_seconds > trip_sec and fare >price
LIMIT 10000
;
END;
Now, you can call the procedure using CALL. As follows:
DECLARE trip_sec INT64 DEFAULT 30;
DECLARE price INT64 DEFAULT 30;
CALL `project_id.ataset.chicago_taxi`(trip_sec, price);
SELECT max(fare) AS max_fare,payment_type FROM taxi_rides
GROUP BY payment_type
And the output,
Row max_fare payment_type
1 463.45 Cash
2 200.65 Credit Card
Notice that the procedure is saved within the dataset. Then we use CALL to call it and use its output (a temporary table) in the next select statement. I must point that every time the procedure is invoked, it executes the query.
Regarding your question about saved views: the view is updated every time you run it. Please refer to the documentation.
Finally, about the last question using parameters and globals in queries: you can use scripting in BigQuery in order to DECLARE and SET a variable. So, you can take advantages when changing filter parameters for example. Below there is an usage example using a public a public dataset,
DECLARE time_s timestamp;
SET time_s= timestamp(DATETIME "2016-01-01 15:30:00");
SELECT * FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE trip_start_timestamp > time_s
LIMIT 10000
Pay attention that every time the filter needs to be changed, it is possible to do it from the SET statement.
Note: If you have any specific question, please open another thread or you can ask me in the comment section.

Related

How do I update a value in a table in PostgreSQL on a schedule?

Let's say I have a vet application, and there're 2 databases(let's say doctors perform operations):
User database with fields : id, email, name, password and regStamp.
PetOperations database with fields: id, id(reference to user), doctorName, operationStamp and operationStatus.
What if I want to update a operationStatus field whenever someone puts a new petOperation in Pet database(initial status was PERFOMING after 20 minutes it becomes PERFORMED, but only for this unique operationId, if currentTime - operationStamp >=20). How can I do that? Maybe, there's a better way rather than subtraction times?
I think you need to rethink your data model a little bit. As I understand your question you have operations which are scheduled, and you want to treat them as performed after the scheduled time slice is complete.
PostgreSQL has no native facility to update a value in 20 minutes. You could use a cron job, but I think the more elegant solution is to change your data model instead.
Add a "table method" and you get status calculation.
suppose your table now contains:
id, id(reference to user), doctorName, operationStamp, operationTsrange
Then you create a function something like:
create or replace function status(operation) returns text language sql
as $$
select case when $1.operationTsrange is null then 'Not Scheduled'
when now() << $1.operationTsrange THEN 'Scheduled'
when now() <# $1.operationTsrange THEN 'Performing'
when now() >> $1.operationTsrange THEN 'Performed'
END;
$$;
Then you can search on this. And if you need different length intervals you can specify them on update time.
I am not sure what you want to do.
If you just want to set the initial operationstatus when you INSERT a row, use a BEFORE INSERT trigger.
If you want the value to change depending on when it is selected, use a “computed column”, that is, don't define the column in the table, but rather define a view on the table that contains the column, calculated as appropriate.

BigQuery query creation without variables?

Coming from SQL Server and a little bit of MySQL, I'm not sure how to proceed on google's BigQuery web browser query tool.
There doesn't appear to be any way to create, use or Set/Declare variables. How are folks working around this? Or perhaps I have missed something obvious in the instructions or the nature of BigQuery? Java API?
It is now possible to declare and set variables using SQL. For more information, see the documentation, but here is an example:
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data`.usa_names.usa_1910_current
WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data`.samples.shakespeare
);
There is currently no way to set/declare variables in BigQuery. If you need variables, you'll need to cut and paste them where you need them. Feel free to file this as a feature request here.
Its not elegant, and its a a pain, but...
The way we handle it is using a python script that replaces a "variable placeholder" in our query and than sending the amended query via the API.
I have opened a feature request asking for "Dynamic SQL" capabilities.
If you want to avoid BQ scripting, you can sometimes use an idiom which utilizes WITH and CROSS JOIN.
In the example below:
the events table contains some timestamped events
the reports table contain occasional aggregate values of the events
the goal is to write a query that only generates incremental (non-duplicate) aggregate rows
This is achieved by
introducing a state temp table that looks at a target table for aggregate results
to determine parameters (params) for the actual query
the params are CROSS JOINed with the actual query
allowing the param row's columns to be used to constrain the query
this query will repeatably return the same results
until the results themselves are appended to the reports table
WTIH state AS (
SELECT
-- what was the newest report's ending time?
COALESCE(
SELECT MAX(report_end_ts) FROM `x.y.reports`,
TIMESTAMP("2019-01-01")
) AS latest_report_ts,
...
),
params AS (
SELECT
-- look for events since end of last report
latest_report_ts AS event_after_ts,
-- and go until now
CURRENT_TIMESTAMP() AS event_before_ts
)
SELECT
MIN(event_ts) AS report_begin_ts,
MAX(event_ts) AS report_end_ts
COUNT(1) AS event_count,
SUM(errors) AS error_total
FROM `x.y.events`
CROSS JOIN params
WHERE event_ts > event_after_ts
AND event_ts < event_before_ts
)
This approach is useful for bigquery scheduled queries.

Convert select into stored procedure best approach

I use this SQL to get count for every group of type.
select
mytype, count(mytype)
from types1
group by 1
The result is 5 records with count for each type. I need to convert this to a stored procedure; should I write the above SQL using For...Select or should I return single value using Select...Where...Into 5 times for each type?
I will use the return counts to update a master table and types may increase in the future.
That depends on what you want out of the procedure:
If you want the same output as your select with five rows, use a FOR SELECT. You will get one row for each type and an associated count. This is probably the "standard" approach.
If however you want five output variables, one for each count of each type, you can use five queries of the form SELECT COUNT(1) FROM types1 WHERE mytype = 'type1' INTO :type1. Realize though that this will be five queries and you may be better off doing a single FOR SELECT query and looping through the returned rows in the procedure. Also note that if you at some point add a sixth type you will have to change this procedure to add the additional type.
If you want to query a single type, you can also do something like the following, which will return a single row with a single count for the type in the input parameter:
CREATE PROCEDURE GetTypeCount(
TypeName VARCHAR(256)
)
RETURNS (
TypeCount INTEGER
)
AS
BEGIN
SELECT COUNT(1)
FROM types1
WHERE mytype = :TypeName
INTO :TypeCount;
SUSPEND
END

How to run query in Wordpress against multiple tables

This query does what I need, return a list of data from a widget in several tables from a Wordpress Multi Site database.
There must be an easier way to do this. I have 30 tables I need to include, how can I get some type of loop to just return option value from all wp_n_option tables?
SELECT option_value
FROM `wp_options`
WHERE option_name = 'widget_thin_search'
UNION
SELECT option_value
FROM `wp_3_options`
WHERE option_name = 'widget_thin_search'
UNION
SELECT option_value
FROM `wp_4_options`
WHERE option_name = 'widget_thins_search'
INTO OUTFILE '/tmp/result.csv'
Edit: As Brandon pointed out, if it was a static 30 tables, I could build the query. However, the tables will increase as time goes on.
You could create a table with one column containing table names. Then create a T-SQL proc to loop through those table names and construct a query string resembling what you have in your example. Then run that query string with the exec command.
Just note that UNION removes duplicates whereas UNION ALL does not. That may not be an issue for you but I just wanted to point it out.

How do I pass the same input to all the sql files only once at the start?

I am calling 4-5 scripts from a file at once.
But I need to give only one input to the file in the first sql that I am calling.
That input will be the input for all the other sql files I have called after the first one.
Is there any way to do that?
please help.
I think you can achieve what you want by using the sqlcmd utility and scripting variables. The last link states that you can also use environment variables.
Do you mean:
query2 is based on the result of query1,
query3 is based on the result of query2 etc...
If so, you can use views to
create view view1 as select * from table1;
create view view2 as select * from view2;
create view view3 as select * from view3;
create view view4 as select * from view4;
select * from view4
Of course you have to add the where clause yourself.
See for more on views http://dev.mysql.com/doc/refman/5.0/en/create-view.html
no
START fbm.sql
START fba.sql
START fei.sql
START fbe.sql
START fae.sql
START tfat.sql
START ins_FBH.sql
is the code.
in fbm.sql
i have an input like bill id = '&1'.
also i have the same input of bill id in other sql's.
but whn i run the master sql it will run fbm.sql and ask me for the bill id input.
suppose i give it as 'ABC' and again after completing this fbm.sql it will ask me the input for bill id again for fba.sql which i dont want to give again n again.
wat i want is that this fba.sql and other corresponding sql's should take the input bill id as 'ABC' without me entering it.
Have you thought about using a stored procedure for this. It does depend on having version 5.0 (or later) of MySQL of course. But this allows you to define variables and to use them within the procedure, very flexible and great fun to use! Caveat, have not tested this myself, my experience has been with Oracle PL/SQL but concepts are similar.
Then you can do stuff like this (from the MySQL newsletter at:
http://www.mysql.com/news-and-events/newsletter/2004-01/a0000000297.html
DELIMITER // [1]
CREATE PROCEDURE payment [2]
(payment_amount DECIMAL(6,2),
payment_seller_id INT)
BEGIN
DECLARE n DECIMAL(6,2);
SET n = payment_amount - 1.00;
INSERT INTO Moneys VALUES (n, CURRENT_DATE);
IF payment_amount > 1.00 THEN
UPDATE Sellers
SET commission = commission + 1.00
WHERE seller_id = payment_seller_id;
END IF;
END;
//