How to submit multiple queries in Google BigQuery Composer and Cloud Shell - google-bigquery

Just a simple question, please don't tell me that submitting multiple queries is not supported in Query Composer and Google Cloud Shell.
When I submit two statements(for example drop table statements delimited by ";"), it tells me that the drop word on the next line is unexpected.

Turns out that there are no ways to execute multiple queries in either the BigQuery Composer or the the Google Cloud Shell. However, 1 workaround that I have found is to create a local text file in Cloud Shell which stores the queries, delimited by ";". And then set the IFS (Internal Field Separator) to ";" so that I can use a for loop to loop through the file and execute the queries one by one.
Example:
queries.txt
select 1+2;
select 2+3;
select 3+4;
Cloud Shell command
IFS=";"
alias bqn="bq query --nouse_legacy_sql"
for q in $(<"queries.txt"); do bqn $q; done;

BigQuery now has support for multi-statement execution. Check out the scripting documentation. Copying the example:
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data`.usa_names.usa_1910_current
WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data`.samples.shakespeare
);

Google BigQuery is an SQL like language and not all implementations of a mainstream SQL language will be directly compatible with BigQuery.
That being said, there are many ways to workaround. If you are creating table to materialize data in order to have better Query performance and limit the cost of storing data in BigQuery, you can set a expiration date on the temporary table.
This is the command with the expiration date flag:
bq --location=[LOCATION] mk --dataset --default_table_expiration [INTEGER] --description [DESCRIPTION] [PROJECT_ID]:[DATASET]

Related

What is the equivalent of Select into in Google bigquery

I am trying to write a SQL Command to insert some data from one table to a new table without any insert statement in bigquery but I cannot find a way to do it. (something similar to select into)
Here is the table:
create table database1.table1
(
pdesc string,
num int64
);
And here is the insert statement. I also tried the select into but it is not supported in bigquery.
insert into database1.table1
select column1, count(column2) as num
from database1.table2
group by column1;
Above is a possible way to insert. but I am looking for a way that I do not need to use any select statement. I am looking for something similar to 'select into' statement.
I am thinking of declaring variables and then somehow feed the data into the tables but do not know how.
I am not a Google employee. However - I understand the reasoning for not supporting creating a copy of a table (or query) from the console.
The challenge is that each table needs to be created must have a number of features defined such as associated project and expiry time.
Looking through the documentation (briefly) - it is worth exploring using bq utility - specifically the cp command -
Explore the following operations :
cache the query results to a temporary table
get the name of said temporary table
pass to a copy table command perhaps?
Other methods are described in the google cloud doco https://cloud.google.com/bigquery/docs/managing-tables#copy-table

Adding today date in Table name when using Create Table function in standard sql GBQ

I am quite new to GBQ and any help is appreciated it.
I have a query below:
#Standard SQL
create or replace table `xxx.xxx.applications`
as select * from `yyy.yyy.applications`
What I need to do is to add today's date at the end of the table name so it is something like xxx.xxx.applications_<todays date>
basically create a filename with Application but add date at the end of the name applications.
I am writing a procedure to create a table every time it runs but need to add the date for audit purposes every time I create the table (as a backup).
I searched everywhere and can't get the exact answer, is this possible in Query Editor as I need to store this as a Proc.
Thanks in advance
BigQuery doesn't support dynamic SQL at the moment which means that this kind of construction is not possible.
Currently BigQuery supports Parameterized Queries but its not possible to use parameters to dynamically change the source table's name as you can see in the provided link.
BigQuery supports query parameters to help prevent SQL injection when
queries are constructed using user input. This feature is only
available with standard SQL syntax. Query parameters can be used as
substitutes for arbitrary expressions. Parameters cannot be used as
substitutes for identifiers, column names, table names, or other parts
of the query.
If you need to build a query based on some variable's value, I suggest that you use some script in SHELL, Python or any other programming language to create the SQL statement and then execute it using the bq command.
Another approach could be using the BigQuery client library in some of the supported languages instead of the bq command.

How does Tableau run queries on Redshift? (And/or why can't Redshift display Tableau queries?)

I'm kicking tires on BI tools, including, of course, Tableau. Part of my evaluation includes correlating the SQL generated by the BI tool with my actions in the tool.
Tableau has me mystified. My database has 2 billion things; however, no matter what I do in Tableau, the query Redshift reports as having been run is "Fetch 10000 in SQL_CURxyz", i.e. a cursor operation. In the screenshot below, you can see the cursor ids change, indicating new queries are being run -- but you don't see the original queries.
Is this a Redshift or Tableau quirk? Any idea how to see what's actually running under the hood? And why is Tableau always operating on 10000 records at a time?
I just ran into the same problem and wrote this simple query to get all queries for currently active cursors:
SELECT
usr.usename AS username
, min(cur.starttime) AS start_time
, DATEDIFF(second, min(cur.starttime), getdate()) AS run_time
, min(cur.row_count) AS row_count
, min(cur.fetched_rows) AS fetched_rows
, listagg(util_text.text)
WITHIN GROUP (ORDER BY sequence) AS query
FROM STV_ACTIVE_CURSORS cur
JOIN stl_utilitytext util_text
ON cur.pid = util_text.pid AND cur.xid = util_text.xid
JOIN pg_user usr
ON usr.usesysid = cur.userid
GROUP BY usr.usename, util_text.xid;
Ah, this has already been asked on the AWS forums.
https://forums.aws.amazon.com/thread.jspa?threadID=152473
Redshift's console apparently doesn't display the query behind cursors. To get that, you can query STV_ACTIVE_CURSORS: http://docs.aws.amazon.com/redshift/latest/dg/r_STV_ACTIVE_CURSORS.html
Also, you can alter your .TWB file (which is really just an xml file) and add the following parameters to the odbc-connect-string-extras property.
UseDeclareFetch=0;
FETCH=0;
You would end up with something like:
<connection class='redshift' dbname='yourdb' odbc-connect-string-extras='UseDeclareFetch=0;FETCH=0' port='0000' schema='schm' server='any.redshift.amazonaws.com' [...] >
Unfortunately there's no way of changing this behavior trough the application, you must edit the file directly.
You should be aware of the performance implications of doing so. While this greatly enhances debugging there must be a reason why Tableau chose not to allow modification of these parameters trough the application.

Select from a SQL table starting with a certain index?

I'm new to SQL (using postgreSQL) and I've written a java program that selects from a large table and performs a few functions. The problem is that when I run the program I get a java OutOfMemoryError because the table is simply too big. I know that I can select from the beginning of the table using the LIMIT operator, but is there a way I can start the selection from a certain index where I left off with the LIMIT command? Thanks!
There is offset option in Postgres as in:
select from table
offset 50
limit 50
For mysql you can use the follwoing approaches:
SELECT * FROM table LIMIT {offset}, row_count
SELECT * FROM table WHERE id > {max_id_from_the previous_selection} LIMIT row_count. First max_id_from_the previous_selection = 0.
This is actually something that the jdbc driver will handle for you transparently. You can actually stream the result set instead of loading it all into memory at once. To do this in MySQL, you need to follow the instructions here: http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html
Basically when you create you call connection.prepareStatement you need to pass ResultSet.TYPE_FORWARD_ONLY and ResultSet.CONCUR_READ_ONLY as the second and third parameters, then call setFetchSize(Integer.MIN_VALUE) on your PreparedStatement object.
There are similar instructions for doing this with other databases which I could iterate if needed.
EDIT: now we know you need instructions for PostgreSQL. Follow the instructions here: How to read all rows from huge table?

Generating sql insert into for Oracle

The only thing I don't have an automated tool for when working with Oracle is a program that can create INSERT INTO scripts.
I don't desperately need it so I'm not going to spend money on it. I'm just wondering if there is anything out there that can be used to generate INSERT INTO scripts given an existing database without spending lots of money.
I've searched through Oracle with no luck in finding such a feature.
It exists in PL/SQL Developer, but errors for BLOB fields.
Oracle's free SQL Developer will do this:
http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index.html
You just find your table, right-click on it and choose Export Data->Insert
This will give you a file with your insert statements. You can also export the data in SQL Loader format as well.
You can do that in PL/SQL Developer v10.
1. Click on Table that you want to generate script for.
2. Click Export data.
3. Check if table is selected that you want to export data for.
4. Click on SQL inserts tab.
5. Add where clause if you don't need the whole table.
6. Select file where you will find your SQL script.
7. Click export.
Use a SQL function (I'm the author):
https://github.com/teopost/oracle-scripts/blob/master/fn_gen_inserts.sql
Usage:
select fn_gen_inserts('select * from tablename', 'p_new_owner_name', 'p_new_table_name')
from dual;
where:
p_sql – dynamic query which will be used to export metadata rows
p_new_owner_name – owner name which will be used for generated INSERT
p_new_table_name – table name which will be used for generated INSERT
p_sql in this sample is 'select * from tablename'
You can find original source code here:
http://dbaora.com/oracle-generate-rows-as-insert-statements-from-table-view-using-plsql/
Ashish Kumar's script generates individually usable insert statements instead of a SQL block, but supports fewer datatypes.
I have been searching for a solution for this and found it today. Here is how you can do it.
Open Oracle SQL Developer Query Builder
Run the query
Right click on result set and export
http://i.stack.imgur.com/lJp9P.png
You might execute something like this in the database:
select "insert into targettable(field1, field2, ...) values(" || field1 || ", " || field2 || ... || ");"
from targettable;
Something more sophisticated is here.
If you have an empty table the Export method won't work. As a workaround. I used the Table View of Oracle SQL Developer. and clicked on Columns. Sorted by Nullable so NO was on top. And then selected these non nullable values using shift + select for the range.
This allowed me to do one base insert. So that Export could prepare a proper all columns insert.
If you have to load a lot of data into tables on a regular basis, check out SQL Loader or external tables. Should be much faster than individual Inserts.
You can also use MyGeneration (free tool) to write your own sql generated scripts. There is a "insert into" script for SQL Server included in MyGeneration, which can be easily changed to run under Oracle.