How to update a scheduled script query including DML statements in BigQuery? - google-bigquery

I have a couple of scripts scheduled as scheduled queries in BigQuery. I need to update some of the scripts and sometimes I run into an error where BigQuery doesn't seem to recognize my queries as scripts. According to the documentation, a query that includes DML/DDL statements should not have a destination table configured. However, BigQuery is forcing me to input a destination table, which I don't think I am supposed to do, as per above-mentioned documentation and since my script already includes an INSERT statement.
Here is a screenshot of what I see:
BigQuery screenshot trying to update a DML scheduled script
On the left I have highlighted my script, which included DML, and on the right, the options that BigQuery is forcing me to fill.
What should I do in order to correctly update a script with DML statements, without having to input a destination table in the options?

Related

Run a Big Query Schedule everytime a table is updated

So I have this schedule that gets data from some tables and aggregate then into a single table, that I use as a source in a data studio dash, one of these tables (table 1), is updated daily, sometimes more than once, I wanted to know if there is a way to automatically run the schedule every time this table (table1) is updated.
Unfortunately, BigQuery scheduled queries need the parameters of run_time and run_date, so if you want this to be only with BigQuery, you can schedule multiple queries at different times.
Additionally, using a trigger in BigQuery is not possible because this is unsupported.
I recommend you to use a Cloud Run Action that gets the event after any Insert and this creates an event Trigger, so this could do what you are looking for, but without any scheduled queries.

SSIS Incremental Load-15 mins

I have 2 tables. The source table being from a linked server and destination table being from the other server.
I want my data load to happen in the following manner:
Everyday at night I have scheduled a job to do a full dump i.e. truncate the table and load all the data from the source to the destination.
Every 15 minutes to do incremental load as data gets ingested into the source on second basis. I need to replicate the same on the destination too.
For incremental load as of now I have created scripts which are stored in a stored procedure but for future purposes we would like to implement SSIS for this case.
The scripts run in the below manner:
I have an Inserted_Date column, on the basis of this column I take the max of that column and delete all the rows that are greater than or equal to the Max(Inserted_Date) and insert all the similar values from the source to the destination. This job runs evert 15 minutes.
How to implement similar scenario in SSIS?
I have worked on SSIS using the lookup and conditional split using ID columns, but these tables I am working with have a lot of rows so lookup takes up a lot of the time and this is not the right solution to be implemented for my scenario.
Is there any way I can get Max(Inserted_Date) logic into SSIS solution too. My end goal is to remove the approach using scripts and replicate the same approach using SSIS.
Here is the general Control Flow:
There's plenty to go on here, but you may need to learn how to set variables from an Execute SQL and so on.

How to execute a sql file (with both DDL and DML statements separated by ;) stored in GCS against Bigquery

I have a file on Google Cloud Storage that contains a number of queries( Create table, truncate/delete table, insert, merge, select etc). I need to execute all statements in sequence as they appear in the file against bigquery. How do I do that?
Currently, at this moment there is no way to achieve this. You might follow the procedure:
1. Separate your file in order to have instructions DDL following the correct syntax and running them.
2. Create a CSV and import the data into the bigQuery, following the procedure.
If your database is huge, you may want to do the import using the API.
Also, here the documentation for DML Syntax.

How to delete/truncate the data in the table using command line in big query?

What is the command to execute DMLs like Insert,Update,Delete in Google Big Query?
I tried using bq query "select query"
It is working only for Select statements
Note that BigQuery really excels at being a secondary database used for performing fast analytical queries on big data that is static, such as recorded data analysis, logs, and audit history.
If you instead require regular data updates, it is highly recommended to use a separate master database such as the Datastore to perform fast entity operations and updates. You would then persist your data from your master database to your secondary BigQuery database for further analysis.
Therefore, you must tell the bq commandline to use the full standard SQL --use_legacy_sql=false instead of the original BigQuery default legacy SQL to access the Data Manipulation Language (DML) functionality.

How do I handle large SQL SERVER batch inserts?

I'm looking to execute a series of queries as part of a migration project. The scripts to be generated are produced from a tool which analyses the legacy database then produces a script to map each of the old entities to an appropriate new record. THe scripts run well for small entities but some have records in the hundreds of thousands which produce script files of around 80 MB.
What is the best way to run these scripts?
Is there some SQLCMD from the prompt which deals with larger scripts?
I could also break the scripts down into further smaller scripts but I don't want to have to execute hundreds of scripts to perform the migration.
If possible have the export tool modified to export a BULK INSERT compatible file.
Barring that, you can write a program that will parse the insert statements into something that BULK INSERT will accept.
BULK INSERT uses BCP format files which come in traditional (non-XML) or XML. Does it have to get a new identity and use it in a child and you can't get away with using SET IDENTITY INSERT ON because the database design has changed so much? If so, I think you might be better off using SSIS or similar and doing a Merge Join once the identities are assigned. You could also load the data into staging tables in SQL using SSIS or BCP and then use regular SQL (potentially within SSIS in a SQL task) with the OUTPUT INTO feature to capture the identities and use them in the children.
Just execute the script. We regularly run backup / restore scripts that are 100's Mb in size. It only takes 30 seconds or so.
If it is critical not to block your server for this amount to time, you'll have to really split it up a bit.
Also look into the -tab option of mysqldump with outputs the data using TO OUTFILE, which is more efficient and faster to load.
It sounds like this is generating a single INSERT for each row, which is really going to be pretty slow. If they are all wrapped in a transaction, too, that can be kind of slow (although the number of rows doesn't sound that big that it would cause a transaction to be nearly impossible - like if you were holding a multi-million row insert in a transaction).
You might be better off looking at ETL (DTS, SSIS, BCP or BULK INSERT FROM, or some other tool) to migrate the data instead of scripting each insert.
You could break up the script and execute it in parts (especially if currently it makes it all one big transaction), just automate the execution of the individual scripts using PowerShell or similar.
I've been looking into the "BULK INSERT" from file option but cannot see any examples of the file format. Can the file mix the row formats or does it have to always be consistent in a CSV fashion? The reason I ask is that I've got identities involved across various parent / child tables which is why inserts per row are currently being used.