Unable to delete record from table in BigQuery - google-bigquery

I am trying to delete records from one of the tables in BigQuery, but getting
DML over table dataset.tablename is not supported
I am using this command to delete
bq query --use_legacy_sql=false 'delete from dataset.table_name where Day = 20161215'
But it works when I tried to run it from the console like
delete from dataset.table_name where Day = 20161215
Please Help

Usually, you would see this error if you try DML against partitioned table, which is not yet supported by DML
If this table is not partitioned - check if it is under streaming inserts (check if it has streaming buffer or not)
See more about DML's Known Issues

Related

Query to delete single row in Impala

I would like to delete on record from impala table. Below I have used to delete the record from the table.
This is My Query :
DELETE FROM sample.employee_details WHERE sno=5 AND name='XYZ'AND age=26;
suggest the best way to remove a record from the table.
This is fine assuming your where conditions uniquely identify the row. See the documentation:
https://www.cloudera.com/documentation/enterprise/5-10-x/topics/impala_delete.html
Impala delete command works only for Kudu storage type. Any storage formats other than kudu are not designed for online transactions and does not offer any real-time queries and row level updates and deletes.

Hive truncate table only when it's not empty

I have a hive job that is scheduled to be executed. At the beginning of the job, I want to truncate a table by doing:
TRUNCATE TABLE SOMETABLE
The problem is that the table may be empty. In that case, I don't want to perform the truncate operation which will raise an exception. I know in MySQL you can do something like:
IF EXISTS(SELECT * FROM SOMETABLE)
BEGIN
TRUNCATE SOMETABLE
END
Is there a way I can achieve something similar in hive? Thanks a lot for your help!
If the table is empty also hive won't raise any exception.
You can also make use of Temporary tables in hive so that these tables only accessible through the session that established and very useful to manage intermediate data.
Once the session got closed then Hive deletes all temporary tables.
Please refer this and this links regarding hive temporary tables.

BigQuery UPDATE or DELETE DML

Tables that have been written to recently via BigQuery Streaming
(tabledata.insertall) cannot be modified using UPDATE or DELETE
statements. To check if the table has a streaming buffer, check the
tables.get response for a section named streamingBuffer. If it is
absent, the table can be modified using UPDATE or DELETE statements.
When I try to modify my table (rows were recently inserted data, table created few days ago)
delete table_dataset.table1 where true
I have following error - Error: UPDATE or DELETE DML statements are not supported over table with streaming buffer However once I deleted all these records somehow maybe after some delay.
What is the streaming buffer ? When exactly I can modify my table ? If I use JOB which create table or export data from another source can I run UPDATE/DELETE DDL?
Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table but it can take up to 90 minutes to become available for copy/export and other operations. You probably have to wait up to 90 minutes so all buffer is persisted on the cluster. You can use queries to see if the streaming buffer is empty or not like you mentioned.
If you use load job to create the table, you won't have streaming buffer.

Impala query failed for -compute incremental stats databsename.table name

We are scooping data from netezza to hadoop non-partitioned tables and then from non-partition to partitioned with insert overwrite method. After this we are running compute incremental stats for databasename.tablename on partitioned tables but this query failed for some of the partitions with error
Could not execute command: compute incremental stats and No such file or directory for some file in partitioned directory.
You can run a refresh statement before computing stats to refresh the metadata right away. It may be necessary to wait a few seconds before computing stats even if the refresh statement return code is 0 as past experience has shown that metadata is still refreshing even after a return code is given. You won't typically won't see this issue unless a script is executing these commands sequentially.
refresh yourTableName
compute stats yourTableName
As of Impala 2.3 your can also use the alter table recover partitions instead of refresh metadata or repair table.
alter yourTableName recover partitions
compute stats yourTableName

Truncate a table in GBQ

I am trying to truncate an existing table in GBQ but the below command fails when I run it. Is there any specific command or syntax to do that. I looked into GBQ documentation but no luck.
TRUNCATE TABLE [dw_test.test];
While BigQuery didn't used to support anything other than SELECTs, it now does as long as you uncheck "Use Legacy SQL" in the query options. There is no truncation, but you can delete:
DELETE from my_table WHERE 1=1
Note that BigQuery requires the use of WHERE in the DELETE, so if you want to delete everything you need to use a statement that will always be true.
Good news, TRUNCATE TABLE is supported by now: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#truncate_table_statement
TRUNCATE TABLE [[project_name.]dataset_name.]table_name
However, please note that this will not work / is not supported, if a partition filter is required through your table definition.
CREATE OR REPLACE TABLE <dataset>.<table>
AS SELECT * FROM <dataset>.<table> LIMIT 0;
For partitionned tables, assuming you have a day partition on field "created_on", then execute the following :
CREATE OR REPLACE TABLE <dataset>.<table> PARTITION BY created_on
AS SELECT * FROM <dataset>.<table> WHERE created_on = CURRENT_DATE() LIMIT 0;
EDIT (Nov 2020): BigQuery now supports other verbs, check other answers for newer solutions.
BigQuery doesn't support TRUNCATE as part of a query string. The only DDL/DML verb that BQ supports is SELECT.
One option is to run a job with WRITE_TRUNCATE write disposition (link is for the query job parameter, but it's supported on all job types with a destination table). This will truncate all data already in the table and replace it with the results of the job.
If you don't want to replace the contents with other data or start a job, your best option is probably to delete and recreate the table with the same schema.