So I have this schedule that gets data from some tables and aggregate then into a single table, that I use as a source in a data studio dash, one of these tables (table 1), is updated daily, sometimes more than once, I wanted to know if there is a way to automatically run the schedule every time this table (table1) is updated.
Unfortunately, BigQuery scheduled queries need the parameters of run_time and run_date, so if you want this to be only with BigQuery, you can schedule multiple queries at different times.
Additionally, using a trigger in BigQuery is not possible because this is unsupported.
I recommend you to use a Cloud Run Action that gets the event after any Insert and this creates an event Trigger, so this could do what you are looking for, but without any scheduled queries.
Related
I have 3 tables: A B and C. I scheduled a daily query on A with results appended to B. And I wanna scheduled another daily query on B, the query should run after the previous one had completed. How can I do it? The Big Query can only schedule query at Fix Time.
When the first query has finished, the second query should be trigger and run.
Bigquery is a data warehouse and does not support triggers unfortunately.
I have not used this as just leave 30 mins between scheduled queries but Google cloud appears to offer a solution
https://cloud.google.com/blog/topics/developers-practitioners/how-trigger-cloud-run-actions-bigquery-events
I have a couple of scripts scheduled as scheduled queries in BigQuery. I need to update some of the scripts and sometimes I run into an error where BigQuery doesn't seem to recognize my queries as scripts. According to the documentation, a query that includes DML/DDL statements should not have a destination table configured. However, BigQuery is forcing me to input a destination table, which I don't think I am supposed to do, as per above-mentioned documentation and since my script already includes an INSERT statement.
Here is a screenshot of what I see:
BigQuery screenshot trying to update a DML scheduled script
On the left I have highlighted my script, which included DML, and on the right, the options that BigQuery is forcing me to fill.
What should I do in order to correctly update a script with DML statements, without having to input a destination table in the options?
I have 2 tables. The source table being from a linked server and destination table being from the other server.
I want my data load to happen in the following manner:
Everyday at night I have scheduled a job to do a full dump i.e. truncate the table and load all the data from the source to the destination.
Every 15 minutes to do incremental load as data gets ingested into the source on second basis. I need to replicate the same on the destination too.
For incremental load as of now I have created scripts which are stored in a stored procedure but for future purposes we would like to implement SSIS for this case.
The scripts run in the below manner:
I have an Inserted_Date column, on the basis of this column I take the max of that column and delete all the rows that are greater than or equal to the Max(Inserted_Date) and insert all the similar values from the source to the destination. This job runs evert 15 minutes.
How to implement similar scenario in SSIS?
I have worked on SSIS using the lookup and conditional split using ID columns, but these tables I am working with have a lot of rows so lookup takes up a lot of the time and this is not the right solution to be implemented for my scenario.
Is there any way I can get Max(Inserted_Date) logic into SSIS solution too. My end goal is to remove the approach using scripts and replicate the same approach using SSIS.
Here is the general Control Flow:
There's plenty to go on here, but you may need to learn how to set variables from an Execute SQL and so on.
Every time I attempt to save a view that has "create or replace table" I get the error:Only SELECT statements are allowed in view queries.
I am using standard SQL in Google Cloud Big Query. I do not want the view as a scheduled query and the view is too large to run, meaning it errors out asking for me to create a table that allows large results. I want the query to update its associated table every time it is run. I came across the Write_Truncate syntax, however I do not know to use it.
create or replace table.test1 as( all my other queries)
I just expect for big query to allow me to run the table.
We can insert data into specific partition of partitioned table, here we need to specify partition value.But my requirement is to overwrite all partitions in a table in one query using UI. Can we perform this operation?
Consulted bigquery team member. You can NOT write to all partitions in one query.
You can only write to a partition at a time.
As YY has pointed out, you would not be able to do this directly in BigQuery/SQL with one query (you could script something to run N queries). However, if you spun up a Cloud Dataflow pipeline and configured it to have multiple BigQueryIO sinks, with each one (sink) overwriting a partition, this could be one way I can think of doing it in one shot. It would be a straightforward pipeline to spin up and run.
At this time, BigQuery allows updating upto 2000 partitions in a single statement. If you need to just insert data into a partitioned table, you can use the INSERT DML statement to write to upto 2000 partitions in one statement. If you are updating or deleting existing partitions you can use the UPDATE or DELETE statements respectively. If you are both updating and inserting new data, you can use the MERGE DML statements to achieve this.