How to save a view using federated queries across two projects? - google-bigquery

I'm looking to save a view which uses federated queries (from a MySQL Cloud SQL connection) between two projects. I'm receiving two different errors (depending on which project I try to save in).
If I try to save in the project containing the dataset I get error:
Not found: Connection my-connection-name
If I try to save in the project that contains the connection I get error:
Not found: Dataset my-project:my_dataset
My example query that crosses projects looks like:
SELECT
bq.uuid,
sql.item_id,
sql.title
FROM
`project_1.my_dataset.psa_v2_202005` AS bq
LEFT OUTER JOIN
EXTERNAL_QUERY( 'project_2.us-east1.my-connection-name',
'''SELECT item_id, title
FROM items''') AS sql
ON
bq.looks_info.query_item.item_id = sql.item_id
The documentation at https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries#known_issues_and_limitations doesn't mention any limitations here.
Is there a way around this so I can save a view using an external connection from one project and dataset from another?

Your BigQuery table is located in US and your MySQL data source is located in us-east1. BigQuery automatically chooses to run the query in the location of your BigQuery table (i.e. in US), however, your Cloud MySQL is in us-east1 and that's why your query fails. Therefore the BigQuery table and Cloud SQL instance, must be in the same location in order for this query to succeed.
The solution for this kind of cases is moving your BigQuery dataset to the same location as your Cloud SQL instance manually by following the steps explained in detail in this documentation. However, the us-east1 is not currently supported for copying datasets. Thus, I will recommend you to create a new connection in one of the locations mentioned in the documentation.
I hope you find the above pieces of information useful.

Related

Need to simulate resourceName with full table path in Log Explorer

I need to understand under what circumstance does the protoPayload.resourceName with full table path i.e., projects/<project_id>/datasets/<dataset_id>/tables/<table_id> appear in the Log Explorer as shown in the example below.
The below entries were generated by a composer dag running a kubernetespodoperator executing some dbt commands on some models. On the basis of this, I have a sink linked to pub/sub for further processing.
As seen in the image the resourceName value is appearing as-
projects/gcp-project-name/datasets/dataset-name/tables/table-name
I have shaded the actual values of projectid, datasetid, and tablename.
I can't run the similar dag job with kuberenetesoperator on test tables owing to environment restrictions. So I tried running some update queries and insert queries using BigQuery Editor. Here is how value of protoPayload.resourceName comes as -
projects/gcp-project-name/jobs/bxuxjob_
I tried same queries using Composer DAG using BigQueryInsertJobOpertor. Here is how the value of protoPayload.resourceName comes as -
projects/gcp-project-name/jobs/airflow_<>_
Here is my question. What operation/operations in BigQuery will give me protoPayload.resourceName as the one that I am expecting i.e. -
projects/<project_id>/datasets/<dataset_id>/tables/<table_id>

Extracting information about View from Bigquery auditlog

I have created a Sink using Log explorer that pushes data to Bigquery. I can get information about tables by using the following query.
SELECT
SPLIT(REGEXP_EXTRACT(protopayload_auditlog.resourceName, '^projects/[^/]+/datasets/[^/]+/tables/(.*)$'), '$')[OFFSET(0)] AS TABLE
FROM `project.dataset` WHERE
JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL
OR JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL
However, I am unable to find information about Views. I have tried
Audit logs https://cloud.google.com/bigquery/docs/reference/auditlogs
And biguqery asset information https://cloud.google.com/asset-inventory/docs/resource-name-format
however, I am unable to find how to get the information about "View". What do I need to include? Is that something in my sink or there is an alternative resource name I should use?
It seems like auditLogs treat tables and views the same way.
I made this query to track view/table changes. InsertJob will tell you about view creations. UpdateTable/PatchTable will tell you about updates
SELECT
resource.labels.dataset_id,
resource.labels.project_id,
--protopayload_auditlog.methodName,
REGEXP_EXTRACT(protopayload_auditlog.methodName,r'.*\.([^/$]*)') as method,
--protopayload_auditlog.resourceName,
REGEXP_EXTRACT(protopayload_auditlog.resourceName,r'.*tables\/([^/$]*)') as tableName,
protopayload_auditlog.authenticationInfo.principalEmail,
protopayload_auditlog.metadataJson,
case when protopayload_auditlog.methodName = 'google.cloud.bigquery.v2.JobService.InsertJob' then JSON_EXTRACT(JSON_EXTRACT(JSON_EXTRACT(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableCreation"),"$.table"),"$.view"),"$.query")
else JSON_EXTRACT(JSON_EXTRACT(JSON_EXTRACT(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableChange"),"$.table"),"$.view"),"$.query") end
as query,
receiveTimestamp
FROM `<project-id>.<bq_auditlog>.cloudaudit_googleapis_com_activity_*`
WHERE DATE(timestamp) >= "2022-07-10"
and protopayload_auditlog.methodName in
('google.cloud.bigquery.v2.TableService.PatchTable',
'google.cloud.bigquery.v2.TableService.UpdateTable',
'google.cloud.bigquery.v2.TableService.InsertTable',
'google.cloud.bigquery.v2.JobService.InsertJob',
'google.cloud.bigquery.v2.TableService.DeleteTable' )
Views are virtual table which are created and queried in the same way as queried from tables. Since you are looking for Views in BigQuery which is setup as a logging sink, you need to create Views in BigQuery by using the steps given in this documentation.
Currently there are two versions supported, v1 and v2. V1 reports API invocation and V2 reports resource interactions. After creating the views, you can do further analysis in BigQuery by saving or querying the Views.

Create an Azure Data Factory pipeline to copy new records from DocumentDB to Azure SQL

I am trying to find the best way to copy yesterday's data from DocumentDB to Azure SQL.
I have a working DocumentDB database that is recording data gathered via a web service. I would like to routinely (daily) copy all new records from the DocumentDB to an Azure SQL DB table. In order to do so I have created and successfully executed an Azure Data Factory Pipeline that copies records with a datetime > '2018-01-01', but I've only ever been able to get it to work with an arbitrary date - never getting the date from a variable.
My research on DocumentDB SQL querying shows that it has Mathematical, Type checking, String, Array, and Geospatial functions but no date-time functions equivalent to SQL Server's getdate() function.
I understand that Data Factory Pipelines have some system variables that are accessible, including utcnow(). I cannot figure out, though, how to actually use those by editing the JSON successfully. If I try just including utcnow() within the query I get an error from DocumentDB that "'utcnow' is not a recognized built-in function name".
"query": "SELECT * FROM c where c.StartTimestamp > utcnow()",
If I try instead to build the string within the JSON using utcnow() I can't even save it because of a syntax error:
"query": "SELECT * FROM c where c.StartTimestamp > " + utcnow(),
I am willing to try a different technology than a Data Factory Pipeline, but I have a lot of data in our DocumentDB so I'm not interested in abandoning that, and I have much greater familiarity with SQL programming and need to move the data there for joining and other analysis.
What is the easiest and best way to copy those new entries over every day into the staging table in Azure SQL?
Are you using ADF V2 or V1?
For ADF V2.
I think that you can follow the incremental approach that they recommend, for example you could have a watermark table (it could be in your target Azure SQL database) and two lookups activities, one of the lookups will obtain the previous run watermark value (it could be date, integer, whatever your audit value is) and another lookup activity to obtain the MAX (watermark_value, i.e. date) of your source document and have a CopyActivity that gets all the values where the c.StartTimeStamp<=MaxWatermarkValueFromSource AND c.StartTimeStamp>LastWaterMarkValue.
I followed this example using the Python SDK and worked for me.
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-powershell

Get the Last Modified date for all BigQuery tables in a BigQuery Project

I have several databases within a BigQuery project which are populated by various jobs engines and applications. I would like to maintain a dashboard of all of the Last Modified dates for every table within our project to monitor job failures.
Are there any command line or SQL commands which could provide this list of Last Modified dates?
For a SQL command you could try this one:
#standardSQL
SELECT *, TIMESTAMP_MILLIS(last_modified_time)
FROM `dataset.__TABLES__` where table_id = 'table_id'
I recommend you though to see if you can log these errors at the application level. By doing so you can also understand why something didn't work as expected.
If you are already using GCP you can make use of Stackdriver (it works on AWS as well), we started using it in our projects and I recommend giving it a try (we tested for python applications though, not sure how the tool performs on other clients but it might be quite similar).
I've just queried stacked GA4 data using the following code:
FROM analytics_#########.__TABLES__
where table_id LIKE 'events_2%'
I have kept the 2 on the events to ensure my intraday tables do not pull through also.

Bteq Scripts to copy data between two Teradata servers

How do I copy data from multiple tables within one database to another database residing on a different server?
Is this possible through a BTEQ Script in Teradata?
If so, provide a sample.
If not, are there other options to do this other than using a flat-file?
This is not possible using BTEQ since you have mentioned both the databases are residing in different servers.
There are two solutions for this.
Arcmain - You need to use Arcmain Backup first, which creates files containing data from your tables. Then you need to use Arcmain restore which restores the data from the files
TPT - Teradata Parallel Transporter. This is a very advanced tool. This does not create any files like Arcmain. It directly moves the data between two teradata servers.(Wikipedia)
If I am understanding your question, you want to move a set of tables from one DB to another.
You can use the following syntax in a BTEQ Script to copy the tables and data:
CREATE TABLE <NewDB>.<NewTable> AS <OldDB>.<OldTable> WITH DATA AND STATS;
Or just the table structures:
CREATE TABLE <NewDB>.<NewTable> AS <OldDB>.<OldTable> WITH NO DATA AND NO STATS;
If you get real savvy you can create a BTEQ script that dynamically builds the above statement in a SELECT statement, exports the results, then in turn runs the newly exported file all within a single BTEQ script.
There are a bunch of other options that you can do with CREATE TABLE <...> AS <...>;. You would be best served reviewing the Teradata Manuals for more details.
There are a few more options which will allow you to copy from one table to another.
Possibly the simplest way would be to write a smallish program which uses one of their communication layers (ODBC, .NET Data Provider, JDBC, cli, etc.) and use that to take a select statement and an insert statement. This would require some work, but it would have less overhead than trying to learn how to write TPT scripts. You would not need any 'DBA' permissions to write your own.
Teradata also sells other applications which hides the complexity of some of the tools. Teradata Data Mover handles provides an abstraction layer between tools like arcmain and tpt. Access to this tool is most likely restricted to DBA types.
If you want to move data from one server to another server then
We can do this with the flat file.
First we have fetch data from source table to flat file through any utility such as bteq or fastexport.
then we can load this data into target table with the help of mload,fastload or bteq scripts.