I can not delete the range defined by where.
My query:
delete from `dataset.events1` as t where t.group='error';
Result:
Error: UPDATE or DELETE statement over table dataset.events1 would affect rows in the streaming buffer, which is not supported.
According to the BQ docs:
Rows that were written to a table recently via streaming (using the tabledata.insertall method) cannot be modified using UPDATE, DELETE, or MERGE statements. Recent writes are typically those that occur within the last 30 minutes. Note that all other rows in the table remain modifiable by using UPDATE, DELETE, or MERGE statements.
This looks like the error you're facing.
You can check if your table has a streaming buffer attached through the BigQuery API.
This error message is considered as an expected behavior when querying rows that were recently streamed into the table in order to maintain the data consistency. Based on this, it is required to wait until the buffer is flushed, which can take up to 90 minutes to become available for copy/export and other operations, otherwise you would get the same error.
To validate if the table has an active streaming buffer process, you can check the tables.get response and verify if it contains a section named streamingBuffer.
Related
I just started to discover Big Query in GCP for my learning purpose. so I created two tables and tried to insert, delete and update queries by using python API.
I'm able to update a table called table_1 any time using the below
query
UPDATE *****.*****.table_1 SET col_1 = 'value_1', col_2 = 'value_2' WHERE col3 = 'value_3'
and it returns This statement modified 2 rows in ****:****.Projects.
But when I try to update the table called table_2 using a query in the
same way it returns
UPDATE or DELETE statement over table ***.***.table_2 would affect rows in the streaming buffer, which is not supported
so I created tables and perform operations in the same way, my problem is why I'm getting this error only for the table_2
Thank you
Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table but it can take up to 90 minutes to become available for copy/export and other operations like UPDATE or DELETE. You probably have to wait up to 90 minutes so all buffers are persisted on the cluster. You can check the ‘tables.get’ response for a section named ‘streamingBuffer’ to check whether the table has a streaming buffer or not.
If you have used a load job to create the table, you won't have a streaming buffer, but probably you streamed some values to it.
You can also refer to this documentation [1] for more information
[1] https://cloud.google.com/bigquery/streaming-data-into-bigquery
The issue I am facing in my nodejs application is identical to this user's question: Cannot insert new value to BigQuery table after updating with new column using streaming API.
To my understanding changes such as widening a table's schema may require some period of time before streamed inserts can reference the new columns otherwise a 'no such field' error is returned. For me this error is not always consistent as sometimes I am able to successfully insert.
However, I specifically wanted to know if you could alternatively use a load job instead of streaming? If so what drawbacks does it have as I am not sure of the difference even having read the documentation.
Alternatively, if I do use streaming but with the ignoreUnknownValues option, does that mean that all of the data is eventually inserted including data referencing new columns? Just that new columns are not queryable until the table schema is finished updating?
I know this question has been asked in a different form a while ago. But now that BQ allows DML on partitioned table, its more important to understand when the streaming buffer is flushed so that we can perform DML on tables for maintenance.
This is very important now since
I have 1500 partitioned tables.
Each table has atleast 200
partitions.
Now I have to update all the tables since we are performing some sort of hashing for GDPR.
If I cant run the DML, then
I have to restate the 200 * 1500 partitions by joining with a reference table.
If I can run the DML then I just have to run 1500 udpate statements.
I have stopped the streaming and have been waiting since > 90 minutes and yet still get the same error that I cant run DML since the table has streaming buffer. Any response with your own experience would be highly appreciated.
Answer is "it depends" and mostly based on size of data you stream to buffer - but it also based on algorithmic tuning on BQ side. As of now - there is no definite time you can somehow calculate before data will flush. And there is no mechanism to invoke flush of buffer manually.
So apparently BigQuery now allows update on older partitions of partitioned tables with streaming buffer now. But not on the streaming buffer itself.
For example :
update
`dataset.table_name`
set column = 'value'
where _PARTITIONTIME = '2018-05-01'
Works beautifully.
But
update
`dataset.table_name`
set column = 'value'
where _PARTITIONTIME is null
Doesn't work and fails with the below error:
UPDATE or DELETE statement over table dataset.table_name would affect rows in the streaming buffer, which is not supported
I'm getting this following error when trying to delete records from a table created through GCP Console and updated with GCP BigQuery Node.js table insert function.
UPDATE or DELETE DML statements are not supported over table stackdriver-360-150317:my_dataset.users with streaming buffer
The table was created without streaming features. And from what I'm reading in documentation Tables that have been written to recently via BigQuery Streaming (tabledata.insertall) cannot be modified using UPDATE or DELETE statements.
Does it mean that once a record has been inserted with this function into a table, there's no way to delete records? At all? If that's the case, does it mean that table needs to be deleted and recreated from scratch? If that's not the case. Can you please suggest a workaround to avoid this issue?
Thanks!
Including new error message for SEO: "UPDATE or DELETE statement over table ... would affect rows in the streaming buffer, which is not supported" -- Fh
To check if the table has a streaming buffer, check the tables.get response for a section named streamingBuffer or, when streaming to a partitioned table, data in the streaming buffer has a NULL value for the _PARTITIONTIME pseudo column, so even with a simple WHERE query can be checked.
Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table but it can take up to 90 minutes to become available for copy/export and other operations. You probably have to wait up to 90 minutes so all buffer is persisted on the cluster. You can use queries to see if the streaming buffer is empty or not like you mentioned.
If you use load job to create the table, you won't have streaming buffer, but probably you streamed some values to it.
Note the answer below to work with tables that have ongoing streaming buffers. Just use a WHERE to filter out the latest minutes of data and your queries will work. -- Fh
Make sure to change your filters so they don't include data that could be in the current streaming buffer.
For example, this query fails while I'm streaming to this table:
DELETE FROM `project.dataset.table`
WHERE id LIKE '%-%'
Error: UPDATE or DELETE statement over table project.dataset.table would affect rows in the streaming buffer, which is not supported
You can fix it by only deleting older records:
DELETE FROM `project.dataset.table`
WHERE id LIKE '%-%'
AND ts < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 40 MINUTE)
4282 rows affected.
I have one query regarding inserting record in database using file endpoint.
I want to insert json type record in db. I create json file and all those file data i inserted into database. My query is i can insert all those data in database successfully but that is continuously inserted data and error occurred Duplicate entry '1' for key 'PRIMARY'
How can i solve this error?I don't want to insert data recursively.How can i do this only once?
I used following flow
**File->Json to Object->Splitter->Database**
please help me
You can use an Idempotent Message Filter (after the Splitter) to ensure that duplicate entries are discarded. If you json representation has an unique identifier, use the Idempotent Message Filter
<idempotent-message-filter idExpression="#[entry.id]">
<simple-text-file-store directory="./idempotent"/>
</idempotent-message-filter>
Otherwise, use the Idempotent Secure Hash Message Filter (which will filter messages based on their hash value)
<idempotent-secure-hash-filter messageDigestAlgorithm="SHA26">
<simple-text-file-store directory="./idempotent"/>
</idempotent-secure-hash-message-filter>
Please check the following reference for more info.
Personally I would try to avoid an idempotent filter with a simple message store as it will prevent potential ulterior updates of the data in the DB.
If your DBMS suports it I would try using an UPSERT mechanism that will effectively render your query idempotent. This could be done with this in postgresql and with this in mysql.
You can check the duplicates easily using .ack queries in Mule...
.ack are the query that runs immediate after normal query automatically ...
You need to create .ack query which will run immediately after your insert query and will check the rows already inserted and set the flag...
Check here how to do it with .ack query :-
http://training.middlewareschool.com/mule/database-transport/
and here :-
http://www.mulesoft.org/documentation/display/current/JDBC+Transport+Reference#JDBCTransportReference-Acknowledgment