How do you set up a trigger with SQL queries in Bigquery?

How do you set up a trigger with SQL queries in Bigquery? - sql

I am trying to see if I can set up a trigger system, so whenever a new row of data is populated in these tables A, B, and C --> it populates new rows into a new table I created (table D, for example)?
I'm using Bigquery. Does this platform allow this capability?
Not sure what kind of coding should be used for this...(Insert into, etc.)

Maybe late to the party, but this is possible now. https://cloud.google.com/blog/topics/developers-practitioners/how-trigger-cloud-run-actions-bigquery-events

Triggers are not supported on BigQuery, basically because they are not aligned to the intended use patterns of the product. You may also refer to this existing question.
There is a Feature Request in place for an interesting approach to trigger an action when rows are loaded to BigQuery, but currently there is no ETA for it.
You my want to consider Cloud Composer as different alternative, instead of using triggers, you may orchestrate your data ingestion tasks.

BigQuery is a data warehouse, and there is no trigger support.
Have a look at https://cloud.google.com/sql/ , maybe it will help you.

Related

Is there a built-in function in SQL, which can tell if an existing row has been modified in the last 24 hours?

In my application (C# application, using Entity-Framework and SQL Database), I am needed to create a daily task to update/insert data from a third-party application (both the applications are using SQL server database). For efficiency sake, I am looking for a way to determine what all records from the previous day have been modified and thus import only those records.
I know I can add a modified_on column to the source table and create a trigger to update that column when something is changed on that record, but that will need me to make changes to the third-party application's database schema which I want to avoid.

There's the change tracking feature but it's of limited use to you as you're using EF and that makes the way the data is queried awkward. You may be able to use it somehow, but I doubt it's elegant.
Way easier is to indeed change the schema but add only a single column of type rowversion. That binary datatype (loaded as byte[] in EF) is special and gets larger every time something (such as the third-party application) updates the row. No need for any triggers. You can look what the largest one is you already processed and then query all those that are larger than that.

In addition to change tracking suggested by John, in another answer, you can think of setting up Temporal tables.
You can run queries against the temporal tables to identify the changed records and pull them accordingly from main table.

Data streaming in Apache superset from BQ?

I am new to superset and wanted to know if there's any way to perform data streaming in big query using apache superset? Currently, I have set up the database in apache superset with big query but when I update the table data using SQL commands in bigquery it doesn't reflect in superset. Is there any way to get the streaming of data to superset?

I've looked around the Apache Superset documentation and couldn't find anything related to "streaming data" from a source, what I think in this scenario is happening, is that you have a dashboard which use the data from the table you have in BigQuery, and after adding some new information to the table you expect that change to be reflected in the dashboard automatically.
Based on this my theory is that Superset saves the result of your query on memory or it could be using BigQuery cached results which may not allow the dashboard to automatically update the data and see the changes made. My suggestion is to either run again the query for your table to try to get the latest data. On the other hand, if Superset use cached results, you'll have to take a look at the configuration used for Superset for BigQuery looking for a way to disable it.

SQL Server DDL changes (column names, types)

I need to audit DDL changes made to a database. Those changes need to be replicated in many other databases at a later time. I found here that one can enable DDL triggers to keep track of DDL activities, and that works great for create table and drop table operations, because the trigger gets the T-SQL that was executed, and I can happily store it somewhere and simply execute it on the other servers later.
The problem I'm having is with alter operations: when a column name is changed from Management Studio, the event that is produced doesn't contain any information about columns! It just says the table was locked... What's more, if many columns are changed at once (say, column foo => oof, and also, column bar => rab) the event is fired only once!
My poor man's solution would be to have a table to store the structure of the table that's going to be altered, before and after the alter operation. That way, I could compare both structures and figure out what happened to which column(s).
But before I do that, I was wondering if it is possible to do it using some other feature from SQL Server that I have overlooked, or maybe there's a better way. How would you go about this?

There is a product meant for doing just that (I wrote it).
It monitors scripts that contained ddl changes, who wrote them and when together with their effect on performance, and it gives you the ability to easily copy them as one deployment script. For what you asked, the free version is sufficient.
http://www.seracode.com/

There is no special feature in SQL Server regarding your need. You can use triggers, but they require a lot of T-SQL coding for proper function. Fast solution would be some third party tools, but they're not free. Please take a look at this answer regarding the third party tools https://stackoverflow.com/a/18850705/2808398

How do I keep a table synchronized with a query in SQL Server - ETL?

I wan't sure how to word this question so I'll try and explain. I have a third-party database on SQL Server 2005. I have another SQL Server 2008, which I want to "publish" some of the data in the third-party database too. This database I shall then use as the back-end for a portal and reporting services - it shall be the data warehouse.
On the destination server I want store the data in different table structures to that in the third-party db. Some tables I want to denormalize and there are lots of columns that aren't necessary. I'll also need to add additional fields to some of the tables which I'll need to update based on data stored in the same rows. For example, there are varchar fields that contain info I'll want to populate other columns with. All of this should cleanse the data and make it easier to report on.
I can write the query(s) to get all the info I want in a particular destination table. However, I want to be able to keep it up-to-date with the source on the other server. It doesn't have to be updated immediately (although that would be good) but I'd like for it be updated perhaps every 10 minutes. There are 100's of thousands of rows of data but the changes to the data and addition of new rows etc. isn't huge.
I've had a look around but I'm still not sure the best way to achieve this. As far as I can tell replication won't do what I need. I could manually write the t-sql to do the updates perhaps using the Merge statement and then schedule it as a job with sql server agent. I've also been having a look at SSIS and that looks to be geared at the ETL kind of thing.
I'm just not sure what to use to achieve this and I was hoping to get some advice on how one should go about doing this kind-of thing? Any suggestions would be greatly appreciated.

For that tables whose schemas/realtions are not changing, I would still strongly recommend Replication.
For the tables whose data and/or relations are changing significantly, then I would recommend that you develop a Service Broker implementation to handle that. The hi-level approach with service broker (SB) is:
Table-->Trigger-->SB.Service >====> SB.Queue-->StoredProc(activated)-->Table(s)
I would not recommend SSIS for this, unless you wanted to go to something like dialy exports/imports. It's fine for that kind of thing, but IMHO far too kludgey and cumbersome for either continuous or short-period incremental data distribution.

Nick, I have gone the SSIS route myself. I have jobs that run every 15 minutes that are based in SSIS and do the exact thing you are trying to do. We have a huge relational database and then we wanted to do complicated reporting on top of it using a product called Tableau. We quickly discovered that our relational model wasn't really so hot for that so I built a cube over it with SSAS and that cube is updated and processed every 15 minutes.
Yes SSIS does give the aura of being mainly for straight ETL jobs but I have found that it can be used for simple quick jobs like this as well.

I think, staging and partitioning will be too much for your case. I am implementing the same thing in SSIS now but with a frequency of 1 hour as I need to give some time for support activities. I am sure that using SSIS is a good way of doing it.
During the design, I had thought of another way to achieve custom replication, by customizing the Change Data Capture (CDC) process. This way you can get near real time replication, but is a tricky thing.

Bulk Translation Of Table Contents

I'm currently performing a migration operation from a legacy database. I need to perform migration of millions of originating rows, breaking the original content apart into multiple destination parent / child rows.
As it's not a simple 1 to 1 migration and the the resulting rows are parent / children row based on identity generated keys, what's the best mechanism for performing the migration?
I'm assuming that I can't use bulk insert as the identity values for the child rows cannot be determined at the point of generating the script content? The only solution I can currently think of is to set the identity explicitly and then have a predetermined starting point for the import.
If anyone else has any input I'd appreciate the feedback.

This is my standard approach:
create your new data model
pull the data into the new DB unchanged
write (and run) a SQL script to perform the migration
test
(optional) drop the tables with the legacy data
You can get a long way towards migrating the data with plain SQL. For the case you described, you might not need to deal with a single Cursor to get it across.
Running the process in Query Analyzer (or an analog in your dbms), you'll have the advantage that you can wrap everything in a Transaction so that you can roll back if anything goes wacky along the way. Write it in little bits and test it in chunks, on your dev database. Once everything is working correctly, set the script loose on your production database.
Sorted.

Thanks for the suggestion but I'd prefer to produce a programmatic solution. I'm currently using Nant / CruiseControl to automate the tests and need something I can recreate on the fly based on the current live legacy content.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas