Call BigQuery `queries` API with two statements and access all results - google-bigquery

I am currently calling the BigQuery REST API detailed here https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query, in two separate calls with different SQL statements, imagine:
SELECT * FROM 'foo' WHERE x = 1;
and
SELECT * FROM 'foo; WHERE y = 2;
I currently run these HTTP requests in parallel, but would prefer to make one call to BigQuery. However, I cannot access all results if I combine these statements into a single call, a la:
SELECT * FROM 'foo' WHERE x = 1;
SELECT * FROM 'foo; WHERE y = 2;
In this case, the HTTP API only return the last statement's results.
Anyway around this?

You don't document if you're using a specific language, but this example from python may help: Can 2 result sets be viewed from bigquery.client.query()?
The TL;DR - when running a query with multiple statements, each statement is run as it's own (sub)job. You need to traverse the set of jobs via jobs.list and access the results from each.

Related

how to iterate over projects, datasets in BigQuery using a SQL query

assume i have a list of projects in BigQuery and each project has several datasets. i'd like to extract data from all these tables into one table only using SQL:
this query below works on one project (yay!) but how can i iterate it through several projects?
DECLARE schema_list ARRAY<STRING>;
DECLARE iter INT64 DEFAULT 0;
SET schema_list = (
SELECT
ARRAY_AGG(schema_name)
FROM
$project.INFORMATION_SCHEMA.SCHEMATA
);
WHILE
iter < ARRAY_LENGTH(schema_list) DO
EXECUTE IMMEDIATE format("""
INSERT `$other_project.$data_set.$table` (col1, col2, something)
SELECT
col1,
col2,
(really clever calc) as something
FROM `$project.%s.198401*`
GROUP BY
col1,
col2,
""", schema_list[OFFSET(iter)]);
SET iter = iter + 1;
END WHILE;
i don't mind suppling the projects via an array but if the query could get the list of projects itself it would be a blast!
thanks a million!
even just for trying :)
An approach I can think of might require you to write code (Python, Nodejs, Java, etc) to use BigQuery API. This approach will loop through a list of projects and execute your query per iteration.
Use BQ endpoint projects.list to get a list of projects to which the user has been granted any project role. Or use Resource Manager API if necessary.
When you have the list of projects, loop through the list of projects, pass the project_id to your query (modify your query to accept query parameters).
Use query parameters to safely pass your project_id to your query to prevent SQL injection.
Execute your query that you have posted on your question using BQ API. See querying using a programming language.

Executing multiple Select * in QueryDatabaseTable in NiFi

I want to execute select * from table1, select * from table2, select * from table3,...select * from table80....(Basically extract data from 80 different tables and send the data to 80 different indexes in Elasticsearch(Kibana).
Is it possible for me to give multiple select * statement in one Query Database Table and then route it to different indexes? If yes, what will be the flow like?
There are a couple approaches you can take to solve this issue.
If your tables are literally table1, table2, etc., you can simply generate 80 flowfiles, each with a unique integer value in an attribute (i.e. table_count) and use GenerateTableFetch and ExecuteSQL to create the queries using this attribute via Expression Language
If the table names are non-sequential (i.e. users, addresses, etc.), you can read from a file listing each on a line or use ListDatabaseTables to query the database for the names. You can then perform simple text processing to split the created flowfile(s) to one per table and continue as above
QueryDatabaseTable doesn't allow incoming connections so it is not possible.
But you can achieve same use case with the following flow
Flow:
1. ListDatabaseTables
2. RouteOnAttribute //*optional* filter only required tables
3. GenerateTableFetch //to generate pages of sql queries and store state
4. RemoteProcessGroup (or) Load balance connection
5. ExecuteSql //run more than one concurrent task if needed
6. further processing
7. PutElasticSearch.
In addition if you don't want to run the flow incrementally then remove GenerateTableFetch processor
Configure ExecuteSql processor select query as
select * from ${db.table.schema}.${db.table.name}
Some useful references:
GenerateTableFetch link1 link2
Incrementally run ExecuteSQL processor without using GenerateTableFetch link

When did sqlalchemy execute the query?

As I've just start learning to use sqlalchemy recently, the result of the following code make me confused about when sqlalchemy execute the query:
query = db.session.query(MyTable)
query = query.filter(...)
query = query.limit(...)
query = query.offset(...)
records = query #records=query.all()
for r in records:
#do something
note the line
records = query #records=query.all()
Seems that it brings the same correct result(stored in variable "records") when using "query" and "query.all()", I wonder when was the query executed?
If it is executed during the first line "db.session.query(MyTable)", the result set may be large at this point; if during the fifth line "records = query", how could that happen as there's no function call at all?
In your example, the query gets executed upon for r in records. Accessing the query object via iterator triggers the execution. (Normally, only then will it be compiled into a SELECT statement)
Up until this time, the query will be built (via filter, limit etc).
Please read also the ORM Tutorial on querying

Teradata SQL parameter syntax

Is there syntax which allows a parameter marker in the middle of a table name? For example, consider the queries
sel * from t?x
and
sel * from t?x_blah.
Both execute as
sel * from t1
if the user inputs 1. In the first query x = 1 and in the second query x_blah = 1. I would like to modify the second query to set x = 1 and execute as
sel * from t1_blah.
Is there a way to do this?
Thanks!
A question mark parameter marker can only be used for data values, not object names. So, the short answer is no.
Here's some info from Teradata:Clicky!
What's your client tool?
SQL Assistant supports parameters for object names, but there's no way to get the expected behavior (as you probably noticed).
The only solution might be using Dynamic SQL within a Stored Procedure.
A ? works in Teradata SQL Assistant. It will ask you to enter the parameter:
select ?x from ?y;
When you run it, SQL Assistant will ask you for the parameters x and y. If you put 10 instances of ?x in your code, SQL Assistant will only ask for x once.

SQL queries in batch don't execute

My project is in Visual Foxpro and I use MS SQL server 2008. When I fire sql queries in batch, some of the queries don't execute. However, no error is thrown. I haven't used BEGIN TRAN and ROLLBACK yet. What should be done ??
that all depends... You don't have any sample of your queries posted to give us an indication of possible failure. However, one thing I've had good response with from VFP to SQL is to build into a string (I prefer using TEXT/ENDTEXT for readabilty), then send that entire value to SQL. If there are any "parameter" based values that are from VFP locally, you can use "?" to indicate it will come from a variable to SQL. Then you can batch all in a single vs multiple individual queries...
vfpField = 28
vfpString = 'Smith'
text to lcSqlCmd noshow
select
YT.blah,
YT.blah2
into
#tempSqlResult
from
yourTable YT
where
YT.SomeKey = ?vfpField
select
ost.Xblah,
t.blah,
t.blah2
from
OtherSQLTable ost
join #tempSqlResult t
on ost.Xblah = t.blahKey;
drop table #tempSqlResult;
endtext
nHandle = sqlconnect( "your connection string" )
nAns = sqlexec( nHandle, lcSqlCmd, "LocalVFPCursorName" )
No I don't have error trapping in here, just to show principle and readability. I know the sample query could have easily been done via a join, but if you are working with some pre-aggregations and want to put them into temp work areas like Localized VFP cursors from a query to be used as your next step, this would work via #tempSqlResult as "#" indicates temporary table on SQL for whatever the current connection handle is.
If you want to return MULTIPLE RESULT SETs from a single SQL call, you can do that too, just add another query that doesn't have an "into #tmpSQLblah" context. Then, all instances of those result cursors will be brought back down to VFP based on the "LocalVFPCursorName" prefix. If you are returning 3 result sets, then VFP will have 3 cursors open called
LocalVFPCursorName
LocalVFPCursorName1
LocalVFPCursorName2
and will be based on the sequence of the queries in the SqlExec() call. But if you can provide more on what you ARE trying to do and their samples, we can offer more specific help too.