Can't copy BigQuery table with DDL modifications? - google-bigquery

What's the problem
I have a table in BigQuery where some schema had to change overtime.
I used DDL to change the schema doing the following:
Switching some columns from INT to FLOAT
Delete a FLOAT column and recreating as STRING column.
I attempted to copy the table into a new blank table and I get the following error:
Operation could not be completed. Error message: Table project_id:dataset.table_id with column level ddl operation does not support table copy.
I can no longer copy and snapshot this table? I can't find any documentation on this error at all.
How can I copy this table and what is going on?
My best guess
I assumed DDL changed the table and that was it.
I guess perhaps the way it works is more like Django Migrations and the DDL statement can't be copied so now I can never copy that table again?
I wouldn't have altered this table, had I known that was the case. Does that mean we're back to exporting all our data to GCS and reloading?

Related

Rename a table in Amazon Redshift

I've been trying to rename a table from "fund performance" to fund_performance in SQLWorkbench for a Redshift database. Commands I have tried are:
alter table schemaname."fund performance"
rename to fund_performance;
I received a message that the command executed successfully, and yet the table name did not change.
I then tried copying the table to rename it that way. I used
#CREATE TABLE fund_performance LIKE "schema_name.fund performance";
CREATE TABLE fund_performance AS SELECT * FROM schema_name."fund performance";
In both these cases I also received a message that the statements executed successfully, but nothing changed. Does anyone have any ideas?
Use following it may work out for you
SELECT * into schema_name.fund_performance FROM schema_name.[fund performance]
It will copy the data by creating new table as fund_performance but it won't create any constraints and Identity's
To Rename specific table without disturbing existing constraints
EXEC sp_rename 'schema_name.[fund performance]', 'schema_name.fund_performance';

Databricks - is not empty but it's not a Delta table

I run a query on Databricks:
DROP TABLE IF EXISTS dublicates_hotels;
CREATE TABLE IF NOT EXISTS dublicates_hotels
...
I'm trying to understand why I receive the following error:
Error in SQL statement: AnalysisException: Cannot create table ('default.dublicates_hotels'). The associated location ('dbfs:/user/hive/warehouse/dublicates_hotels') is not empty but it's not a Delta table
I already found a way how to solve it (by removing it manually):
dbutils.fs.rm('.../dublicates_hotels',recurse=True)
But I can't understand why it's still keeping the table?
Even though that I created a new cluster (terminated the previous one) and I'm running this query with a new cluster attached.
Anyone can help me to understand that?
I also faced a similar problem, then tried the command line CREATE OR REPLACE TABLE and it solved my problem.
DROP TABLE & CREATE TABLE work with entries in the Metastore that is some kind of database that keeps the metadata about databases and tables. There could be the situation when entries in metastore don't exist so DROP TABLE IF EXISTS doesn't do anything. But when CREATE TABLE is executed, then it additionally check for location on DBFS, and fails if directory exists (maybe with data). This directory could be left from some previous experiments, when data were written without using the metastore.
if the table created with LOCATION specified - this means the table is EXTERNAL, so when you drop it - you drop only hive metadata for that table, directory contents remains as it is. You can restore the table by CREATE TABLE if you specify the same LOCATION (Delta keeps table structure along with it's data in the directory).
if LOCATION wasn't specified while table creation - it's a MANAGED table, DROP will destroy metadata and directory contents

External Table data not getting Purged in Hive

I created 2 external tables Hive. In first table specified data location with create statement. In second table loaded data after creating it.
I can see data file created for second table in /hive/warehouse/ directory. Then I set "external.table.purge"="true" for both tables. And DROP both tables. But data files of both tables remains as is.
What is the behaviour of 'external.table.purge'='true'. Shouldn't it delete data files as well on issuing Drop command?
If Hive does not take any ownership over data files of external table, why is there even an option as 'external.table.purge'='true'.
I read in one of the threads, where someone mentioned it is possible to delete data as well for external tables by ALTER TABLE ... SET TBLPROPERTIES('external.table.purge'='true'), but unable to find that post again.
You can not drop the data in external table but you can do it for internal(managed) tables. So convert the table to internal and then drop it.
First change eternal property to false.
hive> ALTER TABLE nyse_external SET TBLPROPERTIES('EXTERNAL'='False');
and then you can easily drop it.
hive> drop table nyse_external;
TBLPROPERTIES ("external.table.purge"="true") should work for hive version 4.x+.
Answer to point 1:
Table property "external.table.purge", which if true (and if the table is an external table), will let Hive know to delete the table data when the table is dropped. This feature is introduced in this apache jira.
https://issues.apache.org/jira/browse/HIVE-19981 .
For reference on how to set the property take a look at this example,
https://docs.cloudera.com/runtime/7.2.7/using-hiveql/topics/hive_drop_external_table_data.html

Is it possible to change the metadata of a column that is on a partitioned table in Hive?

This is an extension of a previous question I asked: Is it possible to change partition metadata in HIVE?
We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to a column that is on a partitioned table (they are daily)? Note: the column itself is not the partitioning column. It is a simple ID field that is being changed from STRING to BIGINT.
Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.
Question: Is it possible to change partition meta data in Hive? If yes, how?
Note: I am asking this as a separate question as the original answer appears to be for a column on a partitioned table that is also the partitioning column. So, I do not want to muddy the waters.
Update:
I ran the ALTER TABLE .. CHANGE COLUMN ... CASCADE command, but I get the following error:
Error while processing statement: FAILED: Execution Error, return code
1 from org.apache.hadoop.hive.ql.exec.DDLTask. Not allowed to alter
schema of Avro stored table having external schema. Consider removing
avro.schema.literal or avro.schema.url from table properties.
The metadata is stored in a separate avro file. I can confirm that the updated metadata is in the avro file, but not in the individual partition file.
Note: The table is stored as EXTERNAL.
You can easily change column type:
Use alter table in Hive, change type to STRING, etc:
alter table table_name change column col_name col_name string cascade; --change to string
See documentation.
ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata.
Alternatively you can recreate table like in this answer: https://stackoverflow.com/a/58299056/2700344

Apache hive create table

I have a problem understand the real meaning behind this Apache Hive code, Can someone please explain to me whether this code is really doing anything?
ALTER TABLE a RENAME TO a_tmp;
DROP TABLE a;
CREATE TABLE a AS SELECT * FROM a_tmp;
ALTER TABLE a RENAME TO a_tmp;
This simply allows you to rename your table a to a_tmp.
Let's say your table a initially points to /user/hive/warehouse/a, then after executing this command your data will be moved to /user/hive/warehouse/a_tmp and the content of /user/hive/warehouse/a will no longer exist. Note that this behavior of moving HDFS directories only exist in more recent versions of Hive. Before that the RENAME command was only updating the metastore and not moving directories in HDFS.
Similarly, if you do a show tables after, you will see that a doesn't exist anymore, but a_tmp exists. You can no longer query a at that point because it is no longer registered in the metastore.
DROP TABLE a;
This does basically nothing, because you already renamed a to a_tmp. So a doesn't exist anymore in the metastore. This will still print "OK" because there's nothing to do.
CREATE TABLE a AS SELECT * FROM a_tmp;
You are asking to create a brand new table called a and register it in the metastore. You are also asking to populate it with the same data that is in a_tmp (which you already copied from a before)
So in short you're moving your initial table to a new one, and then copying the new one back to the original, so the only thing these query do is duplicating your initial data into both a and a_tmp.