BigQuery: Exceeded quota for Number of partition modifications to a column partitioned table - google-bigquery

I get this error when trying to run a lot of import CSV jobs on BigQuery date-partitioned with a custom Timestamp column.
Your table exceeded quota for Number of partition modifications to a column partitioned table
Full error below:
{Location: "partition_modifications_per_column_partitioned_table.long"; Message: "Quota exceeded: Your table exceeded quota for Number of partition modifications to a column partitioned table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"; Reason: "quotaExceeded"}
It is not clear to me on: What is the quota for Number of partition modifications? and how is it being exceeded?
Thanks!

What is the quota for Number of partition modifications?
See Quotas for Partitioned tables
In particular:
Maximum number of partitions modified by a single job — 2,000
Each job operation (query or load) can affect a maximum of 2,000 partitions. Any query or load job that affects more than 2,000 partitions is rejected by Google BigQuery.
Maximum number of partition modifications per day per table — 5,000
You are limited to a total of 5,000 partition modifications per day for a partitioned table. A partition can be modified by using an operation that appends to or overwrites data in the partition. Operations that modify partitions include: a load job, a query that writes results to a partition, or a DML statement (INSERT, DELETE, UPDATE, or MERGE) that modifies data in a partition.
You can see more details in above link

If you're gonna change the data often I strongly suggest you delete the table and simply upload it again with the new values. Every time you upload a new table, you get the limit refreshed.

Related

Google Dataflow store to specific Partition using BigQuery Storage Write API

I want to store data to BigQuery by using specific partitions. The partitions are ingestion-time based. I want to use a range of partitions spanning over two years. I use the partition alias destination project-id:data-set.table-id$partition-date.
I get failures since it does recognise the destination as an alias but as an actual table.
Is it supported?
When you ingest data into BigQuery, it will land automatically in the corresponding partition. If you choose a daily ingestion time as partition column, that means that every new day will be a new partition. To be able to "backfill" partitions, you need to choose some other column for the partition (e.g. a column in the table with the ingestion date). When you write data from Dataflow (from anywhere actually), the data will be stored in the partition corresponding to the value of that column for each record.
Direct writes to partitions by ingestion time is not supported using the Write API.
Also using the stream api is not supported if a window of 31 days has passed
From the documentation:
When streaming using a partition decorator, you can stream to partitions within the last 31 days in the past and 16 days in the future relative to the current date, based on current UTC time.
The solution that works is to use BigQuery load jobs to insert data. This can handle this scenario.
Because this operation has lot's of IO involved (files getting created on GCS), it can be lengthy, costly and resource intensive depending on the data.
A approach can be to create table shards and split the Big Table to small ones so the Storage Read and the Write api can be used. Then load jobs can be used from the sharded tables towards the partitioned table would require less resources, and the problem is already divided.

Does CREATE OR REPLACE in BigQuery contribute towards a previous Partition Table's quota?

We have a BigQuery query like:
create or replace table `{project}`.`{dataset}`.`{table}`
partition by date
select {...}
If we run this query a few times in a day, we get an error:
Quota exceeded: Your table exceeded quota for Number of partition
modifications to a column partitioned table. For more information, see
https://cloud.google.com/bigquery/troubleshooting-errors
I've previously loaded partitioned tables with bq load --replace and don't remember having similar errors — suggesting the quota resets for the new table.
How does this work? Does create or replace use a cumulative quota for that table name, but bq load --replace resets the quota on each run?
Is it expected behaviour for ‘CREATE OR REPLACE’ to contribute towards a previous/overwritten Partition Table's ‘maximum number of partition modifications’ quota?
Per-table quotas are bind to the table name. 'CREATE OR REPLACE' does not change the table name so it will not reset the quota.
Why then does a' bq load --replace' not affect the partition table quota?
'bq load --replace' does consume the quota as well. Both 'load' and 'create or replace' can append data to multiple partitions. The number of partitions is counted as the number of partition modifications.

Old rows left unpartitioned in partitioned table

I'm working with a BigQuery partitioned table. The partition is based on a Timestamp column in the data (rather than ingestion-based). We're streaming data into this table at a rate of several million rows per day.
We noticed that our queries based on specific days were scanning much more data than they should in a partitioned table.
Here is the current state of the UNPARTITIONED partition:
I'm assuming that little blip at the bottom-right is normal (streaming buffer for the rows inserted this morning), but there is this massive block of data between mid-November and early-December that lives in the UNPARTITIONED partition, instead of being sent to the proper daily partitions (the partitions for that period don't appear to exist at all in __PARTITIONS_SUMMARY__).
My two questions are:
Is there a particular reason why these rows would not have been partitioned correctly, while data before and after that period is fine?
Is there a way to 'flush' the UNPARTITIONED partition, i.e. force BigQuery to dispatch the rows to their correct daily partition?
I faced a similar type of issue where a lot of rows stayed unpartitioned in a column-based partitioned table. So, what I observed that some records are not partitioned due to the source of the streaming insert. For the soulition, I update the table using the update and set a partitioned date where the partitioned column date is null. For safer side make sure that partitioned date column should not be nullable.

Do Bigquery charge if table gets deleted via retention period?

I have a data of around 150 GB data and I want to store that in bigquery using DML statements.
Here is the pricing model for that.
https://cloud.google.com/bigquery/pricing#dml
According to them they will charge for deleting the table via DML.
If I create a table with retention period will I be charged for that? considering I will always insert data. I am not bothering about cost for inserting data.
Based on the DML Specifications, Google will charge for the deletion of rows if done using DML statement (or using DELETE command in their SQL). The reason being: BigQuery will have to scan rows to delete them (like DELETE FROM mydataset.mytable WHERE id=xxx;, etc.), so you will have to pay for the number of bytes scanned before deleting the resulting rows.
You can always delete your entire table from your dataset for free by either using BigQuery UI or bq command line utility.
Also, you will be charged for the storage costs in BigQuery (irrespective of usage). Meaning: you will pay for the number of bytes your data is occupying on Google disks.
BigQuery charges for deleting from a table, not deleting a table. Executing a DROP TABLE statement is free.
Creating tables is free, unless you create a table from the result of a table, in which case see query pricing to calculate your costs.
The cost of storage is based on the number of bytes stored and how long you keep the data. See storage pricing for more details.

Pricing of data importation into Bigquery

I'm looking for the price of data importation from Cloud Storage to Big Query (through "bq import").
There is no "update" statement in BigQuery, so I want to drop my table and recreate-it from scratch.
Thanks,
Romain.
As stated in the documentation, importing data is free. Only storing or querying it is charged.
https://cloud.google.com/bigquery/docs/updating-data
There is update statement in BigQuery now.
But the quota is low. So yes, we would drop table and recreate table sometimes, instead of using update.
https://cloud.google.com/bigquery/quotas
Data Manipulation Language statements
The following limits apply to Data Manipulation Language (DML).
Maximum UPDATE/DELETE statements per day per table: 96
Maximum UPDATE/DELETE statements per day per project: 10,000
Maximum INSERT statements per day per table: 1,000