Renaming Multiple Columns (Hive) - hive

Is there a way to rename many (50+) columns in Hive in one step?
Currently using ALTER TABLE mytable CHANGE COLUMN mycolumn mynewcolumn bigint but it is time consuming and needs to be altered frequently.
Possible to include all old and new columns in this single step?

Related

Is it possible to change the metadata of a column that is on a partitioned table in Hive?

This is an extension of a previous question I asked: Is it possible to change partition metadata in HIVE?
We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to a column that is on a partitioned table (they are daily)? Note: the column itself is not the partitioning column. It is a simple ID field that is being changed from STRING to BIGINT.
Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.
Question: Is it possible to change partition meta data in Hive? If yes, how?
Note: I am asking this as a separate question as the original answer appears to be for a column on a partitioned table that is also the partitioning column. So, I do not want to muddy the waters.
Update:
I ran the ALTER TABLE .. CHANGE COLUMN ... CASCADE command, but I get the following error:
Error while processing statement: FAILED: Execution Error, return code
1 from org.apache.hadoop.hive.ql.exec.DDLTask. Not allowed to alter
schema of Avro stored table having external schema. Consider removing
avro.schema.literal or avro.schema.url from table properties.
The metadata is stored in a separate avro file. I can confirm that the updated metadata is in the avro file, but not in the individual partition file.
Note: The table is stored as EXTERNAL.
You can easily change column type:
Use alter table in Hive, change type to STRING, etc:
alter table table_name change column col_name col_name string cascade; --change to string
See documentation.
ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata.
Alternatively you can recreate table like in this answer: https://stackoverflow.com/a/58299056/2700344

Sybase ASE: Add NOT NULL column without a DEFAULT fails. Why?

Consider the following empty (as in without rows) table:
CREATE TABLE my_table(
my_column CHAR(10) NOT NULL
);
Trying to add a NOT NULL column without a DEFAULT will fail:
ALTER TABLE my_table ADD my_new_column CHAR(10) NOT NULL;
Error:
*[Code: 4997, SQL State: S1000]
ALTER TABLE my_table failed.
Default clause is required in order to add non-NULL column 'my_new_column'.
But adding the column as NULL and then change it to be NOT NULL will work:
ALTER TABLE my_table ADD my_new_column CHAR(10) NULL;
ALTER TABLE my_table MODIFY my_new_column CHAR(10) NOT NULL;
Setting a default and then removing the default will work too:
ALTER TABLE my_table ADD my_new_column CHAR(10) DEFAULT '' NOT NULL;
ALTER TABLE my_table REPLACE my_new_column DEFAULT NULL;
What's the justification for this behavior? What is the database trying to do internally that adding the column directly fails? I have a feeling that it might have something to do with internal versioning but I can't find anything in this regard.
This is speculation. I am guessing that Sybase is being overly conservative. In general, you cannot add a new not null column with no default value to a table that has rows. This is true in all databases, because there is no way to populate the existing rows for the new column.
I am guessing that Sybase simply doesn't check if the table has rows, only if it exists. Clearly it is not doing the check for the alter.
This is only speculation, but I suspect it has to do the combination of needing to both acquire a lock on the whole table to guarantee continued compliance with the schema, and re-allocate space for the records.
Allowing a direct add of a NOT NULL column would compromise any existing records if there's no default value. Yes, we know the table is empty. And the database can (eventually) know the table is empty at execution time... but it can't really know the table is empty at execution plan compile time, because a row could be added while the execution plan is determined.
This means the database would need to generate the worst-possible execution plan, involving a lock on the entire table, for the query to run in a transactionally-safe way. Additionally, adding (or removing) a column causes extra work for the database because it needs to re-allocate any pages and rebuild indexes in order to account for the changed size of individual records.
Put the two together, and it becomes difficult to just rollback a failed query, because you may have actual pages in different states. For whatever reason, the developers chose not to allow this.
The other options allow you to simply fail the query if a bad row gets in the way and would violate the schema, because you're not re-sizing records within pages. It might even allow you to get away with some page and row locks, rather than full table locks.

DB2: How to add new column between existing columns?

I have an existing DB2 database and a table named
employee with columns
id,e_name,e_mobile_no,e_dob,e_address.
How can I add a new column e_father_name before e_mobile_no?
You should try using the ADMIN_MOVE_TABLE procedure which allows to change the table structure.
The ALTER TABLE only allows adding columns to the end of the table. The reason is that it would change the physical structure of the table, i.e., each row would need to be adapted to the new format. This would be quite expensive.
Using the mentioned procedure ADMIN_MOVE_TABLE you would copy the entire table and during that process change the table structure. It requires a significant amount of space and time.
In DB2 IBM i v7r1 you can do it, try on your DB2 version
alter table yourtable
add column e_father_name varchar(10) before e_mobile_no
I always do the following --
Take a backup/dump of table data and db2look
(If you dump to a CSV file as I do I suggest dumping in the new format so for example put null for the new column in the right place.
Drop table and indexes
Create table with the new colunn
Load data with old values
Recreate all indexes and runstats.
Once you have done it a few times it becomes old hat.

Add new column without table lock?

In my project having 23 million records and around 6 fields has been indexed of that table.
Earlier I tested to add delta column for Thinking Sphinx search but it turns in holding the whole database lock for an hour. Afterwards when the file is added and I try to rebuild indexes this is the query that holds the database lock for around 4 hours:
"update user_messages set delta = false where delta = true"
Well for making the server up I created a new database from db dump and promote it as database so server can be turned live.
Now what I am looking is that adding delta column in my table with out table lock is it possible? And once the column delta is added then why is the above query executed when I run the index rebuild command and why does it block the server for so long?
PS.: I am on Heroku and using Postgres with ika db model.
Postgres 11 or later
Since Postgres 11, only volatile default values still require a table rewrite. The manual:
Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and its indexes to be rewritten.
Bold emphasis mine. false is immutable. So just add the column with DEFAULT false. Super fast, job done:
ALTER TABLE tbl ADD column delta boolean DEFAULT false;
Postgres 10 or older, or for volatile DEFAULT
Adding a new column without DEFAULT or DEFAULT NULL will not normally force a table rewrite and is very cheap. Only writing actual values to it creates new rows. But, quoting the manual:
Adding a column with a DEFAULT clause or changing the type of an
existing column will require the entire table and its indexes to be rewritten.
UPDATE in PostgreSQL writes a new version of the row. Your question does not provide all the information, but that probably means writing millions of new rows.
While doing the UPDATE in place, if a major portion of the table is affected and you are free to lock the table exclusively, remove all indexes before doing the mass UPDATE and recreate them afterwards. It's faster this way. Related advice in the manual.
If your data model and available disk space allow for it, CREATE a new table in the background and then, in one transaction: DROP the old table, and RENAME the new one. Related:
Best way to populate a new column in a large table?
While creating the new table in the background: Apply all changes to the same row at once. Repeated updates create new row versions and leave dead tuples behind.
If you cannot remove the original table because of constraints, another fast way is to build a temporary table, TRUNCATE the original one and mass INSERT the new rows - sorted, if that helps performance. All in one transaction. Something like this:
BEGIN
SET temp_buffers = 1000MB; -- or whatever you can spare temporarily
-- write-lock table here to prevent concurrent writes - if needed
LOCK TABLE tbl IN SHARE MODE;
CREATE TEMP TABLE tmp AS
SELECT *, false AS delta
FROM tbl; -- copy existing rows plus new value
-- ORDER BY ??? -- opportune moment to cluster rows
-- DROP all indexes here
TRUNCATE tbl; -- empty table - truncate is super fast
ALTER TABLE tbl ADD column delta boolean DEFAULT FALSE; -- NOT NULL?
INSERT INTO tbl
TABLE tmp; -- insert back surviving rows.
-- recreate all indexes here
COMMIT;
You could add another table with the one column, there won't be any such long locks. Of course there should be another column, a foreign key to the first column.
For the indexes, you could use "CREATE INDEX CONCURRENTLY", it doesn't use too heavy locks on this table http://www.postgresql.org/docs/9.1/static/sql-createindex.html.

a special case when modifing the database

sometimes i face the following case in my database design,, i wanna to know what is the best practice to handle this case:::
for example i have a specific table and after a while ,, when the database in operation and some real data are already entered.. i need to add some required fields (that supposed not to accept null)..
what is the best practice in this situation..
make the field accept null as (some data already entered in the table ,, and scarify the important constraint )and try to force the user to enter this field through some validation in the code..
truncate all the entered data and reentered them again (tedious work)..
any other suggestions about this issue...
It depends on requirements. If the data to populate existing rows for the new column isn't available immediately then I would generally prefer to create a new table and just populate new rows when the data exists. If and when you have all the data for every row then put the new column into the original table.
If possible i would set a default value for the new column.
e.g. For Varchar
alter table table_name
add column_name varchar(10) not null
constraint column_name_default default ('Test')
After you have updated you could then drop the default
alter table table_name
drop constraint column_name_default
A lot will come down to your requirements.
It depends on your application, your database scheme, your entities.
The best way to go about it is to truncate the data and re - enter it again, but it need not be too tedious an item. Temporary tables and table variables could assist a great deal with this issue. A simple procedure comes to mind to go about it:
In SQL Server Management Studio, Right - click on the table you wish to modify and select Script Table As > CREATE To > New Query Editor Window.
Add a # in front of the table name in the CREATE statement.
Move all records into the temporary table, using something to the effect of:
INSERT INTO #temp SELECT * FROM original
Then run the script to keep all your records into the temporary table.
Truncate your original table, and make any changes necessary.
Right - click on the table and select Script Table As > INSERT To > Clipboard, paste it into your query editor window and modify it to read records from the temporary table, using INSERT .. SELECT.
That's it. Admittedly not quite straightforward, but a well - kept database is almost always worth a slight hassle.