Is it possible to change the metadata of a column that is on a partitioned table in Hive? - sql

This is an extension of a previous question I asked: Is it possible to change partition metadata in HIVE?
We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to a column that is on a partitioned table (they are daily)? Note: the column itself is not the partitioning column. It is a simple ID field that is being changed from STRING to BIGINT.
Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.
Question: Is it possible to change partition meta data in Hive? If yes, how?
Note: I am asking this as a separate question as the original answer appears to be for a column on a partitioned table that is also the partitioning column. So, I do not want to muddy the waters.
Update:
I ran the ALTER TABLE .. CHANGE COLUMN ... CASCADE command, but I get the following error:
Error while processing statement: FAILED: Execution Error, return code
1 from org.apache.hadoop.hive.ql.exec.DDLTask. Not allowed to alter
schema of Avro stored table having external schema. Consider removing
avro.schema.literal or avro.schema.url from table properties.
The metadata is stored in a separate avro file. I can confirm that the updated metadata is in the avro file, but not in the individual partition file.
Note: The table is stored as EXTERNAL.

You can easily change column type:
Use alter table in Hive, change type to STRING, etc:
alter table table_name change column col_name col_name string cascade; --change to string
See documentation.
ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata.
Alternatively you can recreate table like in this answer: https://stackoverflow.com/a/58299056/2700344

Related

Can't copy BigQuery table with DDL modifications?

What's the problem
I have a table in BigQuery where some schema had to change overtime.
I used DDL to change the schema doing the following:
Switching some columns from INT to FLOAT
Delete a FLOAT column and recreating as STRING column.
I attempted to copy the table into a new blank table and I get the following error:
Operation could not be completed. Error message: Table project_id:dataset.table_id with column level ddl operation does not support table copy.
I can no longer copy and snapshot this table? I can't find any documentation on this error at all.
How can I copy this table and what is going on?
My best guess
I assumed DDL changed the table and that was it.
I guess perhaps the way it works is more like Django Migrations and the DDL statement can't be copied so now I can never copy that table again?
I wouldn't have altered this table, had I known that was the case. Does that mean we're back to exporting all our data to GCS and reloading?

Does Redshift Spectrum allow you to add columns on an external table

I cant find anything in the Redshift Documentation on Altering an external table. Just notes on adding a partition.
I would need to do something like this
Alter table spectrum.some_table
Add column notes character varying;
Does anyone have experience with this before I potentially embarrass myself with a PR?
Many thanks
No. 'Alter table XXX add column ...' is only valid for an internal table. Since Spectrum is based on files stored in S3 these files would also need to change their contents to support the new table definition. The external table definition and the construction of the files are linked so it is not clear as to why you would want to add a column to an external table other than saving on statements (alter vs drop & create). The may be a way to achieve your greater objective (that I don't see) if you state what that is.

External Table data not getting Purged in Hive

I created 2 external tables Hive. In first table specified data location with create statement. In second table loaded data after creating it.
I can see data file created for second table in /hive/warehouse/ directory. Then I set "external.table.purge"="true" for both tables. And DROP both tables. But data files of both tables remains as is.
What is the behaviour of 'external.table.purge'='true'. Shouldn't it delete data files as well on issuing Drop command?
If Hive does not take any ownership over data files of external table, why is there even an option as 'external.table.purge'='true'.
I read in one of the threads, where someone mentioned it is possible to delete data as well for external tables by ALTER TABLE ... SET TBLPROPERTIES('external.table.purge'='true'), but unable to find that post again.
You can not drop the data in external table but you can do it for internal(managed) tables. So convert the table to internal and then drop it.
First change eternal property to false.
hive> ALTER TABLE nyse_external SET TBLPROPERTIES('EXTERNAL'='False');
and then you can easily drop it.
hive> drop table nyse_external;
TBLPROPERTIES ("external.table.purge"="true") should work for hive version 4.x+.
Answer to point 1:
Table property "external.table.purge", which if true (and if the table is an external table), will let Hive know to delete the table data when the table is dropped. This feature is introduced in this apache jira.
https://issues.apache.org/jira/browse/HIVE-19981 .
For reference on how to set the property take a look at this example,
https://docs.cloudera.com/runtime/7.2.7/using-hiveql/topics/hive_drop_external_table_data.html

DB2: How to add new column between existing columns?

I have an existing DB2 database and a table named
employee with columns
id,e_name,e_mobile_no,e_dob,e_address.
How can I add a new column e_father_name before e_mobile_no?
You should try using the ADMIN_MOVE_TABLE procedure which allows to change the table structure.
The ALTER TABLE only allows adding columns to the end of the table. The reason is that it would change the physical structure of the table, i.e., each row would need to be adapted to the new format. This would be quite expensive.
Using the mentioned procedure ADMIN_MOVE_TABLE you would copy the entire table and during that process change the table structure. It requires a significant amount of space and time.
In DB2 IBM i v7r1 you can do it, try on your DB2 version
alter table yourtable
add column e_father_name varchar(10) before e_mobile_no
I always do the following --
Take a backup/dump of table data and db2look
(If you dump to a CSV file as I do I suggest dumping in the new format so for example put null for the new column in the right place.
Drop table and indexes
Create table with the new colunn
Load data with old values
Recreate all indexes and runstats.
Once you have done it a few times it becomes old hat.

change attributes data type in database table when it is already filled with records

Could we change attribute's data type when the database table has record in SQL?
I am using Microsoft Management Studio 2008. The error that i am getting is:
** Error converting data type nvarchar to float. **
In short: It is possible with alter column command ONLY if the altered data type is compatible with newly modified one. In addition, it is recommended to be done with transaction.
For example: You may change a column from a varchar(50) to a nvarchar(200), with a script below.
alter table TableName
alter column ColumnName nvarchar(200)
Edit: Regarding your posted error while altering column type.
** Error converting data type nvarchar to float. **
One way would be to create a new column, and convert all good (convertible and compatible) records to new column. After that you may wanna to clean-up the bad records that do not convert, delete old column and re-name your newly added and populated column back to the original name. Important: use testing environment for all this manipulations first. Usually, playing with productions tables turns to be a bad practice to screw things up.
References to look for more discussions on similar SE posts:
Change column types in a huge table
How to change column datatype in SQL Server database without losing data
Obviously, there is no default conversion to your new datatype. One solution could be to create a second column with the requested type, and write your own conversion function. Once this done, delete the first column and rename the second one with the same name.
Things to consider: How big your table is. You then use the alter table syntax. We do not know what data type you want to change, so just for e.g.
alter column:
Alter Table [yourTable] Alter column [yourColumn] varchar(15)
You could also try to add a new column and then update that column using your old column. Drop the old column. Rename the new column. This is a safe better way, becasue at times the data that you hold might not react well to the new data type...
A post to look into for ideas: Change column types in a huge table, How to change column datatype in SQL database without losing data
Alter datatype of that column ..But In general sql wont allow to channge.It will prompt u drop that column..There is setting to achive that thing.
Go to Tool->Option->designers->Table and Database designers and Uncheck Prevent saving option.I m taking abt sql server 2008R2.Now u can easily alter data type.