Update scheduled query configuration: remove destination dataset - google-bigquery

I had a scheduled query with SELECT statement and APPEND configuration, the destination was a specific table within the dataset "genera_analytics". Recently, I modified the query, and now the query follows a sequence of steps: INSERT, DELETE, INSERT, through DML. Now I get this error when the query is executed:
"Dataset specified in the query ('') is not consistent with Destination dataset 'genera_analytics'"
I have tried to update the scheduled query configuration removing the destination dataset through the UI, but it seems impossible. Also I have tried some bq commands:
bq update --transfer_config --target_dataset='' resource_name
but the destination dataset is still 'genera_analytics'.
How can I update this scheduled query, removing the destination dataset from the configuration?

Looks like the scheduled query was defined earlier with destination dataset defined with APPEND type transaction. While updating to a DML query, GUI doesn't show the dataset fields to update to NULL. Hence this error is coming considering the previously set dataset and table name in the scheduled query.
Hence the fix is delete the scheduled query and create it from scratch with DML query option.

Related

WRITE_TRUNCATE removes RLS rules from table

I have a job that inserts data in BigQuery using WRITE_TRUNCATE disposition. This will truncate all data already in the table and replace it with the results of the job.
However, in some cases the table I want to insert data into has Row Level Security (RLS) rules. In this case, TRUNCATE removes the RLS as well (which I don't want).
As said here, with WRITE_DISPOSITIONS :
Each action is atomic and only occurs if BigQuery is able to complete the job successfully. Creation, truncation and append actions occur as one atomic update upon job completion.
I am looking for a way to remove rows from my table without removing RLS, occuring as one atomic update with the append action upon job completion.
I have seen that DELETE FROM [TABLE] WHERE TRUE does not removes RLS. But I can't find a way to use it instead of TRUNCATE through BigQuery Framework. Is there a way to do it ?

BigQuery Atomicity

I am trying to do a full load of a table in big query daily, as part of ETL. The target table has dummy partition column of type integer and is clustered. I want to have the statement to be atomic i.e either it will completely overwrite the new data or rollback to old data in case of failure for any reason in between and it will serve user queries with old data until it completely overwritten.
One way of doing this is delete and insert but big query does not support multi statement transactions.
I am thinking to use the below statement. Please let me know if this is atomic.
create or replace table_1 partition by dummy_int cluster dummy_column
as select col1,col2,col3 from stage_table1

Unable to delete record from table in BigQuery

I am trying to delete records from one of the tables in BigQuery, but getting
DML over table dataset.tablename is not supported
I am using this command to delete
bq query --use_legacy_sql=false 'delete from dataset.table_name where Day = 20161215'
But it works when I tried to run it from the console like
delete from dataset.table_name where Day = 20161215
Please Help
Usually, you would see this error if you try DML against partitioned table, which is not yet supported by DML
If this table is not partitioned - check if it is under streaming inserts (check if it has streaming buffer or not)
See more about DML's Known Issues

Partition swapping in Hive

What is the impact on running queries in Hive if i swap the partition using
ALTER TABLE user_data
PARTITION (name = 'ABC')
SET LOCATION = 'db/partitions/new';
Does this command wait until queries finished executing?
Hive translate your query into temporary Map/Reduce job and that job executed on behalf of your hive query.When you submit hive query it creates a Map/Reduce job based on your query and that job gets executed and you get a result from that job. But if you ALTER your hive query and change partition or anything during the execution of query, command will not wait to finish your running job, it will alter your table and you will get result from your previous query unless or until you kill your previous job.
Best way to understand this is try and run. Just submit your hive query and redirect it to store the result into file and then change the partition and again submit the query and redirect it to store the result into file. Verify the both output.

BigQuery Schema error despite updating schema

I'm trying to run multiple simultaneous jobs in order to load around 700K record to a single BigQuery table. My code (Java) creates the schema from the records of is job, and updates the BigQuery schema, if needed.
Workflow is as follows:
A single job creates the table and sets the (initial) schema.
For each load job we create the schema from the records of the job. Then we pull the existing table schema from BigQuery, and if it's not a superset of the schema associated with the job, we update the schema with the new merged schema. The last part (starting from pulling the existing schema) is synced (using a lock) - only one job performs it at a time. The update of the schema is using the UPDATE method, and the lock is released only after the client update method returns.
I was expecting to avoid encountering schema update errors using this workflow. I'm assuming that once the client returns from the update job, then the table is updated, and that jobs that are in process can't be hurt from the schema update.
Nevertheless, I still get schema update errors from time to time. Is the update method atomic? How do I know when a schema was actually updated?
Updates in BigQuery are atomic, but they are applied at the end of the job. When a job completes, it makes sure that the schemas are equivalent. If there was a schema update while the job was running, this check will fail.
We should probably make sure that the schemas are compatible instead of equivalent. If you do an append with a compatible schema (i.e. you have a subset of the table schema) that should succeed, but currently BigQuery doesn't allow this. I'll file a bug.