Does Redshift Spectrum allow you to add columns on an external table

Does Redshift Spectrum allow you to add columns on an external table - sql

I cant find anything in the Redshift Documentation on Altering an external table. Just notes on adding a partition.
I would need to do something like this
Alter table spectrum.some_table
Add column notes character varying;
Does anyone have experience with this before I potentially embarrass myself with a PR?
Many thanks

No. 'Alter table XXX add column ...' is only valid for an internal table. Since Spectrum is based on files stored in S3 these files would also need to change their contents to support the new table definition. The external table definition and the construction of the files are linked so it is not clear as to why you would want to add a column to an external table other than saving on statements (alter vs drop & create). The may be a way to achieve your greater objective (that I don't see) if you state what that is.

Related

External Table data not getting Purged in Hive

I created 2 external tables Hive. In first table specified data location with create statement. In second table loaded data after creating it.
I can see data file created for second table in /hive/warehouse/ directory. Then I set "external.table.purge"="true" for both tables. And DROP both tables. But data files of both tables remains as is.
What is the behaviour of 'external.table.purge'='true'. Shouldn't it delete data files as well on issuing Drop command?
If Hive does not take any ownership over data files of external table, why is there even an option as 'external.table.purge'='true'.
I read in one of the threads, where someone mentioned it is possible to delete data as well for external tables by ALTER TABLE ... SET TBLPROPERTIES('external.table.purge'='true'), but unable to find that post again.

You can not drop the data in external table but you can do it for internal(managed) tables. So convert the table to internal and then drop it.
First change eternal property to false.
hive> ALTER TABLE nyse_external SET TBLPROPERTIES('EXTERNAL'='False');
and then you can easily drop it.
hive> drop table nyse_external;
TBLPROPERTIES ("external.table.purge"="true") should work for hive version 4.x+.

Answer to point 1:
Table property "external.table.purge", which if true (and if the table is an external table), will let Hive know to delete the table data when the table is dropped. This feature is introduced in this apache jira.
https://issues.apache.org/jira/browse/HIVE-19981 .
For reference on how to set the property take a look at this example,
https://docs.cloudera.com/runtime/7.2.7/using-hiveql/topics/hive_drop_external_table_data.html

Is it possible to change the metadata of a column that is on a partitioned table in Hive?

This is an extension of a previous question I asked: Is it possible to change partition metadata in HIVE?
We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to a column that is on a partitioned table (they are daily)? Note: the column itself is not the partitioning column. It is a simple ID field that is being changed from STRING to BIGINT.
Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.
Question: Is it possible to change partition meta data in Hive? If yes, how?
Note: I am asking this as a separate question as the original answer appears to be for a column on a partitioned table that is also the partitioning column. So, I do not want to muddy the waters.
Update:
I ran the ALTER TABLE .. CHANGE COLUMN ... CASCADE command, but I get the following error:
Error while processing statement: FAILED: Execution Error, return code
1 from org.apache.hadoop.hive.ql.exec.DDLTask. Not allowed to alter
schema of Avro stored table having external schema. Consider removing
avro.schema.literal or avro.schema.url from table properties.
The metadata is stored in a separate avro file. I can confirm that the updated metadata is in the avro file, but not in the individual partition file.
Note: The table is stored as EXTERNAL.

You can easily change column type:
Use alter table in Hive, change type to STRING, etc:
alter table table_name change column col_name col_name string cascade; --change to string
See documentation.
ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata.
Alternatively you can recreate table like in this answer: https://stackoverflow.com/a/58299056/2700344

DB2: How to add new column between existing columns?

I have an existing DB2 database and a table named
employee with columns
id,e_name,e_mobile_no,e_dob,e_address.
How can I add a new column e_father_name before e_mobile_no?

You should try using the ADMIN_MOVE_TABLE procedure which allows to change the table structure.
The ALTER TABLE only allows adding columns to the end of the table. The reason is that it would change the physical structure of the table, i.e., each row would need to be adapted to the new format. This would be quite expensive.
Using the mentioned procedure ADMIN_MOVE_TABLE you would copy the entire table and during that process change the table structure. It requires a significant amount of space and time.

In DB2 IBM i v7r1 you can do it, try on your DB2 version
alter table yourtable
add column e_father_name varchar(10) before e_mobile_no

I always do the following --
Take a backup/dump of table data and db2look
(If you dump to a CSV file as I do I suggest dumping in the new format so for example put null for the new column in the right place.
Drop table and indexes
Create table with the new colunn
Load data with old values
Recreate all indexes and runstats.
Once you have done it a few times it becomes old hat.

Teradata Drop Column returns with "no more room"

I am trying to drop a varchar(100) column of a 150 GB table (4.6 billion records). All the data in this column is null. I have 30GB more space in the database.
When I attempt to drop the column, it says "no more room in database XY". Why does such an action needs so much space?

The ALTER TABLE statement needs a temporary storage for the altered version before overwriting the original table. I guess the the table that you are trying to alter occupies at least 1/3 of your total storage size

This could happen for a variety of reasons. It's possible that one of the AMP's in your database are full, this would cause that error even with a minor table alteration.
try running the following SQL to check space
select VProc, CurrentPerm, MaxPerm
from dbc.DiskSpace
where DatabaseName='XY';
also, you should check to see what column your primary index is on in this very large table. if the table is not skewed properly, you could also run into space issues when trying to alter a table or by running a query against it.
For additional suggestions I found a decent article on the kind of things you may want to investigate when the "no more room in database" error occurs - Teradata SQL Tutorial. Some of the suggestions include:
dropping any intermediary work or "sandbox" tables
implementing single value or multi-value compression.
dropping unwanted/unnecessary secondary indexes
removing data in dbc tables like accesslog or dbql tables
remove and archive old tables that are no longer used.

Inconsistent Generate Change Script

I add a column of type tinyint and being set to not allow nulls in a table and generate the change scripts. The table has data in it at this time. The script has code that creates a temp table and inserts the data that is in the current table into. It then deletes the old table and renames this temp table to the same name as the original table. All fine and good. My question is, why if I do the same thing to another table (same field, but different table), the generate change script does not include this new table insertion code?
Any tips would be greatly appreciated!

If the table does not contain data, there is no need to rebuild the table. Essentially Management Studio "plays it safe" behind the scenes by generating the script this way if it thinks it can't do it simply by just modifying the table. In my experience, it often does this when it doesn't really need to, however there are exceptions ... for example if you add your column not at the "end" of the table. Rather than make changes in the UI and script them, I recommend becoming familiar with the ALTER TABLE command. Rebuilding the table in that manner can be catastrophic on a production system, and can usually be avoided.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Does Redshift Spectrum allow you to add columns on an external table - sql

Related

External Table data not getting Purged in Hive

Is it possible to change the metadata of a column that is on a partitioned table in Hive?

DB2: How to add new column between existing columns?

Teradata Drop Column returns with "no more room"

Inconsistent Generate Change Script

Categories

Resources