Getting error while querying to get tables partition metadata in bigquery - google-bigquery

BigQuery project has one dataset and around 1005 tables.I am running query to get partition metadata for tables.
Query is
SELECT count(*) FROM bq-tf-test-500-298.unravelFr8ks4.INFORMATION_SCHEMA.PARTITIONS;
Following error getting INFORMATION_SCHEMA.PARTITIONS query attempted to read too many tables. Please add more restrictive filters
Attaching
Query screenshot

Partitions view is currently at Preview state at the moment. It is possible that the tables it can read are limited. As the message suggests, a workaround of adding more filters could run your query.

Related

Extracting information about View from Bigquery auditlog

I have created a Sink using Log explorer that pushes data to Bigquery. I can get information about tables by using the following query.
SELECT
SPLIT(REGEXP_EXTRACT(protopayload_auditlog.resourceName, '^projects/[^/]+/datasets/[^/]+/tables/(.*)$'), '$')[OFFSET(0)] AS TABLE
FROM `project.dataset` WHERE
JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataRead") IS NOT NULL
OR JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableDataChange") IS NOT NULL
However, I am unable to find information about Views. I have tried
Audit logs https://cloud.google.com/bigquery/docs/reference/auditlogs
And biguqery asset information https://cloud.google.com/asset-inventory/docs/resource-name-format
however, I am unable to find how to get the information about "View". What do I need to include? Is that something in my sink or there is an alternative resource name I should use?
It seems like auditLogs treat tables and views the same way.
I made this query to track view/table changes. InsertJob will tell you about view creations. UpdateTable/PatchTable will tell you about updates
SELECT
resource.labels.dataset_id,
resource.labels.project_id,
--protopayload_auditlog.methodName,
REGEXP_EXTRACT(protopayload_auditlog.methodName,r'.*\.([^/$]*)') as method,
--protopayload_auditlog.resourceName,
REGEXP_EXTRACT(protopayload_auditlog.resourceName,r'.*tables\/([^/$]*)') as tableName,
protopayload_auditlog.authenticationInfo.principalEmail,
protopayload_auditlog.metadataJson,
case when protopayload_auditlog.methodName = 'google.cloud.bigquery.v2.JobService.InsertJob' then JSON_EXTRACT(JSON_EXTRACT(JSON_EXTRACT(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableCreation"),"$.table"),"$.view"),"$.query")
else JSON_EXTRACT(JSON_EXTRACT(JSON_EXTRACT(JSON_EXTRACT(protopayload_auditlog.metadataJson, "$.tableChange"),"$.table"),"$.view"),"$.query") end
as query,
receiveTimestamp
FROM `<project-id>.<bq_auditlog>.cloudaudit_googleapis_com_activity_*`
WHERE DATE(timestamp) >= "2022-07-10"
and protopayload_auditlog.methodName in
('google.cloud.bigquery.v2.TableService.PatchTable',
'google.cloud.bigquery.v2.TableService.UpdateTable',
'google.cloud.bigquery.v2.TableService.InsertTable',
'google.cloud.bigquery.v2.JobService.InsertJob',
'google.cloud.bigquery.v2.TableService.DeleteTable' )
Views are virtual table which are created and queried in the same way as queried from tables. Since you are looking for Views in BigQuery which is setup as a logging sink, you need to create Views in BigQuery by using the steps given in this documentation.
Currently there are two versions supported, v1 and v2. V1 reports API invocation and V2 reports resource interactions. After creating the views, you can do further analysis in BigQuery by saving or querying the Views.

Why am I getting an error when scheduling a query on Google BigQuery?

When trying to schedule a query in BQ, I am getting the following error:
Error code 3 : Query error: Not found: Dataset was not found in location EU at [2:1]
Is this a permissions issue?
This sounds like a case of the scheduled query being configured to run in a different region than either the referenced tables, or the destination table of the query.
Put another way, BigQuery requires a consistent location for reading and writing, and does not allow a query in location A to write results in location B.
https://cloud.google.com/bigquery/docs/scheduling-queries has some additional information about this.

Google BigQuery large table (105M records) with 'Order Each by' clause produce "Resources Exceeds Query Execution" error

I am running into Serious issue "Resources Exceeds Query Execution" when Google Big Query large table (105M records) with 'Order Each by' clause.
Here is the sample query (which using public data set: Wikipedia):
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc
How to solve this without adding Limit keyword.
Using order by on big data databases is not an ordinary operation and at some point it exceeds the attributes of big data resources. You should consider sharding your query or run the order by in your exported data.
As I explained to you today in your other question, adding allowLargeResults will allow you to return large response, but you can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
One option here that you may try is sharding your query.
where ABS(HASH(Id) % 4) = 0
You can play with the above parameters a lot to achieve smaller resultsets and then combining.
Also read Chapter 9 - Understanding Query Execution it explaines how internally sharding works.
You should also read Launch Checklist for BigQuery
I've gone through the same problem and fixed it following the next steps
Run the query without ORDER BY and save in a dataset table.
Export the content from that table to a bucket in GCS using wildcard (BUCKETNAME/FILENAME*.csv)
Download the files to a folder in your machine.
Install XAMPP (if you get a UAC warning) and change some settings after.
Start Apache and MySQL in your XAMPP control panel.
Install HeidiSQL and stablish the connection with your MySQL server (installed with XAMPP)
Create a database and a table with its fields.
Go to Tools > Import CSV file, configure accordingly and import.
Once all data is imported, do the ORDER BY and export the table.

Google Bigquery create table with no code? syntax

I was hoping to use basic SQL Create Table syntax within Google BigQuery to create a table based on columns in 2 existing tables already in BQ. The Google SQL dialect reference does not show a CREATE. All of the documentation seems to imply that I need to know how to code.
Is there any syntax or way to do a
CREATE TABLE XYZ AS
SELECT ABC.123, DFG.234
from ABC, DFG
?
You cannot do it entirely through a SQL statement.
However, the UI does allow you to save results to a table (max result size is 64MB compressed. The API and command line clients have the same capabilities.

Google BigQuery - Error while downloading data to a table

I am trying to work with the github data which has been uploaded to Google's big data. I ran a few queries (which generated a lot of rows -
eg: a query SELECT actor_attributes_login, repository_watchers , repository_forks FROM [githubarchive:github.timeline]
where repository_watchers > 2 and REGEXP_MATCH(repository_created_at, '2012-')
ORDER BY actor_attributes_login;
The answer had more than 2,20,000 rows. When I attempted to download to CSV , it said
Download Unavailable
This result set contains too many rows for direct download. Please use "Save as Table" and then export the resulting table.
When I tried to do it as Save as Table I got the following error:
Access Denied: Job publicdata:job_c2338ba91e494b21970854e13cdc4b2a: RUN_JOB
Also, I ran queries where I limited the number of rows to 200 or so, even in such cases I got the error as mentioned above. However I was able to download it as CSV.
Any solution to this problem?
#Anerudh You don't have access to modify the publicdata samples dataset. Create a brand new dataset, and try to save your query results to a new table in that dataset.