How can I update data all at once with BigQuery API? - google-bigquery

I saw in this article(https://cloud.google.com/bigquery/docs/samples/bigquery-update-with-dml) how to update data via DML.
(For reference, I am developing using Java and Spring Boot.)
An additional question is whether it is possible to send json data via BigQuery API and use it for batch updates.
example json data:
[{"id":1, "name":"a"}, {"id":1, "name":"b"}, {"id":2, "name":"a"}]
And the update query I want to run is below. (The PK for this table is "id" and "name")
update table A
set A.value = 'test'
where (A.id, A.name) in (
...In this part, I want to use the above json data as a condition...
)
Is it possible to process these queries with "BigQuery.query" method? Or, I wonder if there is another way.
Or is there only a way to create an update query for the above json data one by one and execute it with conditions? (like the query below)
update table A
set A.value = 'test'
where A.id = 1
and A.name = 'a'
...
...
update table A
set A.value = 'test'
where A.id = 1
and A.name = 'b'
...
...
update table A
set A.value = 'test'
where A.id = 2
and A.name = 'a'

You can use JSON functions to retrieve and transform JSON data. GCP has introduced new features to work with JSON data type with some limitations but can be used to create JSON values, insert JSON data along with querying. Using this you can ingest semi-structured JSON into BigQuery. This feature lets you not adhere to fixed schemas and data types which makes it more flexible.For more information on JSON data type, you can check this documentation.

Related

Biq-query: add new key:val in json

How can I add a new key/val pair in an already existing JSON col in bigqyery using SQL (big query flavor).
To something like
BigQuery provides Data Manipulation Language (DML) statements such as the SQL Update statement. See:
https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#update_statement
What you will have to do is retrieve the original value of your structured column and then perform a SQL UPDATE statement to set the new value of the column to be the absolute new value that you want.
Take care to realize that BigQuery is an OLAP database and is optimized for queries rather that updates or deletes. Make sure you read the information on using DML statements in BigQuery found here.
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-manipulation-language
I feel like this question is less about how to update the table, but more about how to adjust existing json with extra/new key:value (then to either update table or just simply select out)
So, I assume you have table like below
and you might have another table with those new key:value pairs to use
in case if you don't really have second table - you can just use CTE like below
with new_key_val as (
select 1 id, '{"key3":"value3"}' add_json union all
select 2 id, '{"key14":"value14"}'
)
So, having above - you can use below approach
select *,
( select '{' || string_agg(trim(kv)) || ',' || trim(add_json, '{}') || '}'
from unnest(split(trim(json_col, '{}'), ',')) kv
) adjusted_json
from your_table
left join new_key_val
using(id)
with output
BigQuery supports JSON as a native data type but only offers a limited set of JSON functions. Unless your json data has a pre-defined, simple schema with known keys, you probably want to go the string-manipulation way.

Azure Data Factory Data Flow source query support for FOR JSON AUTO

I am trying to use below query as source for my data flow but I keep getting errors. Is the fuctionality not supported in data flow?
SELECT customer.customerid AS 'customerid',
customer.customer_fname AS 'fname',
customer.customer_lname AS 'lname',
customer.customer_phone AS 'Phone',
address.customer_addressid as 'addressid',
address.Address_type as 'addresstype',
address.street1 as 'street1'
FROM customer customer
INNER JOIN customer_address address
ON customer.customerid = address.customerid
order by customer.customerid
FOR JSON AUTO, ROOT('customer')
I get the following error:
Notifications
Column name needs to be specified in the query, set an alias if using a SQL function
ADF V2, Data Flows, Source
The error is cause by that Data Flow Query doesn't support order by statement, not the 'FOR JOSN AUTO'.
See the error bellow:
Please refence Data Flow Source transformation:
Query: If you select Query in the input field, enter a SQL query for your source. This setting overrides any table that you've chosen in the dataset. Order By clauses aren't supported here, but you can set a full SELECT FROM statement. You can also use user-defined table functions. select * from udfGetData() is a UDF in SQL that returns a table. This query will produce a source table that you can use in your data flow. Using queries is also a great way to reduce rows for testing or for lookups.
SQL Example: Select * from MyTable where customerId > 1000 and customerId < 2000
The query work well in Copy active but false in Data Flow. You need to change the query.

SnowflakeSQL: How to check stream's schema and log stream data in a separate table if schema check fails?

Hi I am pushing data into my Snowflake database's "raw" table using Fivetran. I have placed a stream on this "raw" table, and when new data comes in, I perform an operation on it and merge that data into a "target" table. The code is displayed below.
CREATE OR REPLACE TASK "TEST_TASK"
WAREHOUSE = "STREAM_TASK_WH"
SCHEDULE = '1 minute'
WHEN SYSTEM$STREAM_HAS_DATA('TEST_STREAM')
AS
MERGE INTO "TEST_PUBLIC"."TARGET" AS A
USING (
SELECT "ID",
"COL1" || '_' || "COL2" AS "COL1_2",
"COL2"
FROM "TEST_STREAM"
) AS B
ON "A"."ID" = "B"."ID"
WHEN NOT MATCHED THEN
INSERT("ID", "COL1_2", "COL2")
VALUES("B"."ID", "B"."COL1_2", "B"."COL2")
WHEN MATCHED THEN
UPDATE SET "A"."COL1_2" = "B"."COL1_2", "A"."COL2" = "B"."COL2";
The problem is, if the upstream data's schema changes, then this merge will fail. My questions are:
Is there a way to check schema before performing the merge operation?
Or is it possible to do a try catch, where if the merge fails, the stream data is logged somewhere else?
Basically, I want to be able to alert myself when a schema breaks, and also store this stream data somewhere else so that it can be cleaned up later.
Thanks for any help or advice.
Cheers

Optimize view that dynamically choose a table or another

So the problem is that I have three huge table with same structure, and I need to show the results of one of them depending on result from another query.
So my order table looks like that:
code order
A 0
B 2
C 1
And I need to retrieve data from t_results
My approach (which is working) looks like this:
select *
from t_results_a
where 'A' in (
select code
from t_order
where order = 0
)
UNION ALL
select *
from t_results_b
where 'B' in (
select code
from t_order
where order = 0
)
UNION ALL
select *
from t_results_c
where 'C' in (
select code
from t_order
where order = 0
)
Is there anyway to not scan all three tables, as I am working with Athena so I can't program?
I presume that changing your database schema is not an option.
If it were, you could use one database table and add a CODE column whose value would be either A, B or C.
Basically the result of the SQL query on your ORDER table determines which other database table you need to query. For example, if CODE in table ORDER is A, then you have to query table T_RESULTS_A.
You wrote in your question
I am working with Athena so I can't program
I see that there is both an ODBC driver and a JDBC driver for Athena, so you can program with either .NET or Java. So you could write code that queries the ORDER table and use the result of that query to build another query string to query just the relevant table.
Another thought I had was dynamic SQL. Oracle database supports it. I can create a string containing variables where one variable is the database table name and have Oracle interpret the string as SQL and execute it. I briefly searched the Internet to see whether Athena supports this (as I have no experience with Athena) but found nothing - which doesn't mean to say that it does not exist.

Execute raw SQL using ServiceStack.OrmLite

I am working ServiceStack.OrmLite using MS SQL Server. I would like to execute raw SQL against database but original documentation contains description of how to do it with SELECT statement only. That is not enough for me.
I cant find the way to run anything as simple as that:
UPDATE table1
SET column1 = 'value1'
WHERE column2 = value2
Using, for example:
var two = db.Update(#"UPDATE table1
SET column1 = 'value1'
WHERE column2 = value2");
Running this expressions with db.Update() or db.Update<> produces uncomprehensive errors like
Incorrect syntax near the keyword 'UPDATE'.
I would like to use raw sql because my real UPDATE expression uses JOIN.
db.Update is for updating a model or partial model as shown in OrmLite's Documentation on Update. You can choose to use the loose-typed API to build your update statement, e.g:
db.Update(table: "table1",
set: "column1 = {0}".Params("value1"),
where: "column2 = {0}".Params("value2"));
The Params extension method escapes your values for you.
Otherwise the way to execute any arbitrary raw sql is to use db.ExecuteSql().
If it is a SELECT statement and you want to execute using raw sql you can use:
List<Person> results = db.SqlList<Person>("SELECT * FROM Person WHERE Age < #age", new { age=50});
Reference: https://github.com/ServiceStack/ServiceStack.OrmLite#typed-sqlexpressions-with-custom-sql-apis