SQL - drop a column in netezza - sql

I have a tabletable1 like below from which i'm trying to drop a column.
table1
id name time value
---------------------
1 john 11:00 324
2 NULL 12:00 645
3 NULL 13:00 324
4 jane 11:00 132
5 NULL 12:00 30
A temp table is created as the original table cannot be altered due to permissions. This case may be very simple to be done by selecting everything except id , but what I really need to do is get rid of one column when there are large number of cols.
create temp table table2 as(
select * from table1
) distribute on random;
alter table table2 drop column id;
this gives the error - Drop behaviour (RESTRICT | CASCADE) needs to be specified when dropping a column or constraint
How should the alter table statement be ?

As the error message and documentation say, you need to specify either RESTRICT or CASCADE. However, note that you can't drop a column from a true TEMPORARY table, so this only applies to normal tables.
ALTER TABLE <table> <action> [ORGANIZE ON {(<columns>) | NONE}]
Where <action> can be one of:
ADD COLUMN <col> <type> [<col_constraint>][,…] |
ADD <table_constraint> |
ALTER [COLUMN] <col> { SET DEFAULT <value> | DROP DEFAULT } |
DROP [COLUMN] column_name[,column_name…] {CASCADE | RESTRICT } |
DROP CONSTRAINT <constraint_name> {CASCADE | RESTRICT} |
MODIFY COLUMN (<col> VARCHAR(<maxsize>)) |
OWNER TO <user_name> |
RENAME [COLUMN] <col> TO <new_col_name> |
RENAME TO <new_table> |
SET PRIVILEGES TO <table>
Like this:
SYSTEM.ADMIN(ADMIN)=> create table t1 (col1 bigint, col2 varchar(5));
CREATE TABLE
SYSTEM.ADMIN(ADMIN)=> insert into t1 values (1,'One');
INSERT 0 1
SYSTEM.ADMIN(ADMIN)=> insert into t1 values (2,'Two');
INSERT 0 1
SYSTEM.ADMIN(ADMIN)=> insert into t1 values (3,'Three');
INSERT 0 1
SYSTEM.ADMIN(ADMIN)=> select * from t1;
COL1 | COL2
------+-------
3 | Three
1 | One
2 | Two
(3 rows)
SYSTEM.ADMIN(ADMIN)=> alter table t1 drop column col2 restrict;
ALTER TABLE
SYSTEM.ADMIN(ADMIN)=> select * from t1;
COL1
------
1
2
3
(3 rows)
As always, if you alter a table to drop or add a column, you should follow it up with a GROOM to clean up the versioned table:
SYSTEM.ADMIN(ADMIN)=> groom table t1 versions;
NOTICE: Groom will not purge records deleted by transactions that started after 2016-11-07 17:00:11.
NOTICE: If this process is interrupted please either repeat GROOM VERSIONS or issue 'GENERATE STATISTICS ON "T1"'
NOTICE: Groom processed 2 pages; purged 0 records; scan size unchanged; table size unchanged.
GROOM VERSIONS
SYSTEM.ADMIN(ADMIN)=>

This is the syntax for dropping the column in Netezza
Alter table tablename drop columnname RESTRICT

According to this: http://datawarehouse.ittoolbox.com/groups/technical-functional/netezza-l/netezza-issue-2467523
it seems that you can't DROP a column via ALTER TABLE, only a constraint.

Related

How to ignore duplicates without unique constraint in Postgres 9.4?

I am currently facing an issue in our old database(postgres 9.4) table which contains some duplicate rows. I want to ensure that no more duplicate rows should be generated.
But I also want to keep the duplicate rows that already has been generated. Due to which I could not apply unique constraint on those columns(multiple column).
I have created a trigger which would check the row if already exists and raise exception accordingly. But it is also failing when concurrent transactions are in processing.
Example :
TAB1
col1 | col2 | col3 |
------------------------------------
1 | A | B | --
2 | A | B | -- already present duplicates for column col2 and col3(allowed)
3 | C | D |
INSERT INTO TAB1 VALUES(4 , 'A' , 'B') ; -- This insert statement will not be allowed.
Note: I cannot use on conflict due to older version of database.
Presumably, you don't want new rows to duplicate historical rows. If so, you can do this but it requires modifying the table and adding a new column.
alter table t add duplicate_seq int default 1;
Then update this column to identify existing duplicates:
update t
set duplicate_seq = seqnum
from (select t.*, row_number() over (partition by col order by col) as seqnum
from t
) tt
where t.<primary key> = tt.<primary key>;
Now, create a unique index or constraint:
alter table t add constraint unq_t_col_seq on t(col, duplicate_seq);
When you insert rows, do not provide a value for duplicate_seq. The default is 1. That will conflict with any existing values -- or with duplicates entered more recently. Historical duplicates will be allowed.
You can try to create a partial index to have the unique constraint only for a subset of the table rows:
For example:
create unique index on t(x) where (d > '2020-01-01');

CHECK (table1.integer >= table2.integer)

I need to create a CHECK constraint to verify that the entered integer in a column is greater than or equal to the integer in another column in a different table.
For example, the following tables would be valid:
=# SELECT * FROM table1;
current_project_number
------------------------
12
=# SELECT * FROM table2;
project_name | project_number
--------------+----------------
Schaf | 1
Hase | 8
Hai | 12
And the following tables would NOT be valid:
=# SELECT * FROM table1;
current_project_number
------------------------
12
=# SELECT * FROM table2;
project_name | project_number
--------------+----------------
Schaf | 1
Hase | 8
Hai | 12
Erdmännchen | 71 <-error:table1.current_project_number is NOT >= 71
Please note this CHECK constraint is designed to make sure info like above cannot be inserted. I'm not looking to SELECT values where current_project_number >= project_number, this is about INSERTing
What would I need in order for such a CHECK to work? Thanks
Defining a CHECK constraint that references another table is possible, but a seriously bad idea that will lead to problems in the future.
CHECK constraints are only validated when the table with the constraint on it is modified, not when the other table referenced in the constraint is modified. So it is possible to render the condition invalid with modifications on that second table.
In other words, PostgreSQL will not guarantee that the constraint is always valid. This can and will lead to unpleasant surprises, like a backup taken with pg_dump that can no longer be restored.
Don't go down that road.
If you need functionality like that, define a BEFORE INSERT trigger on table1 that verifies the condition and throws an exception otherwise:
CREATE FUNCTION ins_trig() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
IF EXISTS (SELECT 1 FROM table1
WHERE NEW.project_number > current_project_number)
THEN
RAISE EXCEPTION 'project number must be less than or equal to values in table1';
END IF;
RETURN NEW;
END;$$;
CREATE TRIGGER ins_trig BEFORE INSERT ON table2
FOR EACH ROW EXECUTE PROCEDURE ins_trig();

Why does SQL Server populate new fields in existing rows in some environments and not others?

I am using MS SQL Server 2012. I have this bit of SQL:
alter table SomeTable
add Active bit not null default 1
In some environments the default value is applied to existing rows and in other environments we have to add an update script to set the new field to 1. Naturally I am thinking that the difference is a SQL Server setting but my searches thus far are not suggesting which one. Any suggestions?
Let me know if the values of particular settings are desired.
Edit: In the environments that don't apply the default the existing rows are set to 0, which at least conforms to the NOT NULL.
If you add the column as not null it will be set to the default value for existing rows.
If you add the column as null it will be null despite having a default constraint when added to the table.
For example:
create table SomeTable (id int);
insert into SomeTable values (1);
alter table SomeTable add Active_NotNull bit not null default 1;
alter table SomeTable add Active_Null bit null default 1;
select * from SomeTable;
returns:
+----+----------------+-------------+
| id | Active_NotNull | Active_Null |
+----+----------------+-------------+
| 1 | 1 | NULL |
+----+----------------+-------------+
dbfiddle.uk demo: http://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=c4aeea808684de48097ff44d391c9954
Default value will be applied to existing row to avoid violation of "NOT NULL" constraint.

how to add columns to existing hive partitioned table?

alter table abc add columns (stats1 map<string,string>, stats2 map<string,string>)
i have altered my table with above query. But after while checking the data i got NULL's for the both extra columns. I'm not getting data.
screenshot
CASCADE is the solution.
Query:
ALTER TABLE dbname.table_name ADD columns (column1 string,column2 string) CASCADE;
This changes the columns of a table's metadata and cascades the same change to all the partition metadata.
RESTRICT is the default, limiting column change only to table metadata.
As others have noted CASCADE will change the metadata for all partitions. Without CASCADE, if you want to change old partitions to include the new columns, you'll need to DROP the old partitions first and then fill them, INSERT OVERWRITE without the DROP won't work, because the metadata won't update to the new default metadata.
Let's say you have already run alter table abc add columns (stats1 map<string,string>, stats2 map<string,string>) without CASCADE by accident and then you INSERT OVERWRITE an old partition without DROPPING first. The data will be stored in the underlying files, but if you query that table from hive for that partition, it won't show because the metadata wasn't updated. This can be fixed without having to rerun the insert overwrite using the following:
Run SHOW CREATE TABLE dbname.tblname and copy all the column definitions that existed before adding new columns
Run ALTER TABLE dbname.tblname REPLACE COLUMNS ({paste in col defs besides columns to add here}) CASCADE
Run ALTER TABLE dbname.tblname ADD COLUMNS (newcol1 int COMMENT "new col") CASCADE
be happy that the metadata has been changed for all partitions =)
As an example of steps 2-3:
DROP TABLE IF EXISTS junk.testcascade ;
CREATE TABLE junk.testcascade (
startcol INT
)
partitioned by (d int)
stored as parquet
;
INSERT INTO TABLE junk.testcascade PARTITION(d=1)
VALUES
(1),
(2)
;
INSERT INTO TABLE junk.testcascade PARTITION(d=2)
VALUES
(1),
(2)
;
SELECT * FROM junk.testcascade ;
+-----------------------+----------------+--+
| testcascade.startcol | testcascade.d |
+-----------------------+----------------+--+
| 1 | 1 |
| 2 | 1 |
| 1 | 2 |
| 2 | 2 |
+-----------------------+----------------+--+
--no cascade! opps
ALTER TABLE junk.testcascade ADD COLUMNS( testcol1 int, testcol2 int) ;
INSERT OVERWRITE TABLE junk.testcascade PARTITION(d=3)
VALUES
(1,1,1),
(2,1,1)
;
INSERT OVERWRITE TABLE junk.testcascade PARTITION(d=2)
VALUES
(1,1,1),
(2,1,1)
;
--okay! because we created this table after altering the metadata
select * FROM junk.testcascade where d=3;
+-----------------------+-----------------------+-----------------------+----------------+--+
| testcascade.startcol | testcascade.testcol1 | testcascade.testcol2 | testcascade.d |
+-----------------------+-----------------------+-----------------------+----------------+--+
| 1 | 1 | 1 | 3 |
| 2 | 1 | 1 | 3 |
+-----------------------+-----------------------+-----------------------+----------------+--+
--not okay even tho we inserted =( because the metadata isnt changed
select * FROM junk.testcascade where d=2;
+-----------------------+-----------------------+-----------------------+----------------+--+
| testcascade.startcol | testcascade.testcol1 | testcascade.testcol2 | testcascade.d |
+-----------------------+-----------------------+-----------------------+----------------+--+
| 1 | NULL | NULL | 2 |
| 2 | NULL | NULL | 2 |
+-----------------------+-----------------------+-----------------------+----------------+--+
--cut back to original columns
ALTER TABLE junk.testcascade REPLACE COLUMNS( startcol int) CASCADE;
--add
ALTER table junk.testcascade ADD COLUMNS( testcol1 int, testcol2 int) CASCADE;
--it works!
select * FROM junk.testcascade where d=2;
+-----------------------+-----------------------+-----------------------+----------------+--+
| testcascade.startcol | testcascade.testcol1 | testcascade.testcol2 | testcascade.d |
+-----------------------+-----------------------+-----------------------+----------------+--+
| 1 | 1 | 1 | 2 |
| 2 | 1 | 1 | 2 |
+-----------------------+-----------------------+-----------------------+----------------+--+
To add columns into partitioned table you need to recreate partitions.
Suppose the table is external and the datafiles already contain new columns, do the following:
1. Alter table add columns...
2. Recreate partitions. For each partitions do Drop then create. Newly created partition schema will inherit the table schema.
Alternatively you can drop the table then create table and create all partitions or restore them simply running MSCK REPAIR TABLE abc command. The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is: ALTER TABLE table_name RECOVER PARTITIONS.
See manual here: RECOVER PARTITIONS
Also in Hive 1.1.0 and later you can use CASCADE option of ALTER TABLE ADD|REPLACE COLUMNS. See manual here: ADD COLUMN
These suggestions work for external tables.
This solution only works if your data is partitioned and you know the location of the latest partition. In this case instead of doing a recover partition or a repair which is a costly operation, you can do something like:
Read the partitioned table and get the schema details
Read the table you want to update
Now find which all columns are different and do a alter table for each
Posting a scala code for reference:
def updateMetastoreColumns(spark: SparkSession, partitionedTablePath: String, toUpdateTableName: String): Unit = {
//fetch all column names along with their corresponding datatypes from latest partition
val partitionedTable = spark.read.orc(partitionedTablePath)
val partitionedTableColumns = partitionedTable.columns zip partitionedTable.schema.map(_.dataType.catalogString)
//fetch all column names along with their corresponding datatypes from currentTable
val toUpdateTable = spark.read.table(toUpdateTableName)
val toUpdateTableColumns = toUpdateTable.columns zip toUpdateTable.schema.map(_.dataType.catalogString)
//check if new columns are present in newer partition
val diffColumns = partitionedTableColumns.diff(toUpdateTableColumns)
//update the metastore with new column info
diffColumns.foreach {column: (String, String) => {
spark.sql(s"ALTER TABLE ${toUpdateTableName} ADD COLUMNS (${column._1} ${column._2})")
}}
}
This will help you dynamically find latest columns which are added to newer partition and update it to your metastore on the fly.

Swap unique indexed column values in database

I have a database table and one of the fields (not the primary key) is having a unique index on it. Now I want to swap values under this column for two rows. How could this be done? Two hacks I know are:
Delete both rows and re-insert them.
Update rows with some other value
and swap and then update to actual value.
But I don't want to go for these as they do not seem to be the appropriate solution to the problem.
Could anyone help me out?
The magic word is DEFERRABLE here:
DROP TABLE ztable CASCADE;
CREATE TABLE ztable
( id integer NOT NULL PRIMARY KEY
, payload varchar
);
INSERT INTO ztable(id,payload) VALUES (1,'one' ), (2,'two' ), (3,'three' );
SELECT * FROM ztable;
-- This works, because there is no constraint
UPDATE ztable t1
SET payload=t2.payload
FROM ztable t2
WHERE t1.id IN (2,3)
AND t2.id IN (2,3)
AND t1.id <> t2.id
;
SELECT * FROM ztable;
ALTER TABLE ztable ADD CONSTRAINT OMG_WTF UNIQUE (payload)
DEFERRABLE INITIALLY DEFERRED
;
-- This should also work, because the constraint
-- is deferred until "commit time"
UPDATE ztable t1
SET payload=t2.payload
FROM ztable t2
WHERE t1.id IN (2,3)
AND t2.id IN (2,3)
AND t1.id <> t2.id
;
SELECT * FROM ztable;
RESULT:
DROP TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "ztable_pkey" for table "ztable"
CREATE TABLE
INSERT 0 3
id | payload
----+---------
1 | one
2 | two
3 | three
(3 rows)
UPDATE 2
id | payload
----+---------
1 | one
2 | three
3 | two
(3 rows)
NOTICE: ALTER TABLE / ADD UNIQUE will create implicit index "omg_wtf" for table "ztable"
ALTER TABLE
UPDATE 2
id | payload
----+---------
1 | one
2 | two
3 | three
(3 rows)
I think you should go for solution 2. There is no 'swap' function in any SQL variant I know of.
If you need to do this regularly, I suggest solution 1, depending on how other parts of the software are using this data. You can have locking issues if you're not careful.
But in short: there is no other solution than the ones you provided.
Further to Andy Irving's answer
this worked for me (on SQL Server 2005) in a similar situation
where I have a composite key and I need to swap a field which is part of the unique constraint.
key: pID, LNUM
rec1: 10, 0
rec2: 10, 1
rec3: 10, 2
and I need to swap LNUM so that the result is
key: pID, LNUM
rec1: 10, 1
rec2: 10, 2
rec3: 10, 0
the SQL needed:
UPDATE DOCDATA
SET LNUM = CASE LNUM
WHEN 0 THEN 1
WHEN 1 THEN 2
WHEN 2 THEN 0
END
WHERE (pID = 10)
AND (LNUM IN (0, 1, 2))
There is another approach that works with SQL Server: use a temp table join to it in your UPDATE statement.
The problem is caused by having two rows with the same value at the same time, but if you update both rows at once (to their new, unique values), there is no constraint violation.
Pseudo-code:
-- setup initial data values:
insert into data_table(id, name) values(1, 'A')
insert into data_table(id, name) values(2, 'B')
-- create temp table that matches live table
select top 0 * into #tmp_data_table from data_table
-- insert records to be swapped
insert into #tmp_data_table(id, name) values(1, 'B')
insert into #tmp_data_table(id, name) values(2, 'A')
-- update both rows at once! No index violations!
update data_table set name = #tmp_data_table.name
from data_table join #tmp_data_table on (data_table.id = #tmp_data_table.id)
Thanks to Rich H for this technique.
- Mark
Assuming you know the PK of the two rows you want to update... This works in SQL Server, can't speak for other products. SQL is (supposed to be) atomic at the statement level:
CREATE TABLE testing
(
cola int NOT NULL,
colb CHAR(1) NOT NULL
);
CREATE UNIQUE INDEX UIX_testing_a ON testing(colb);
INSERT INTO testing VALUES (1, 'b');
INSERT INTO testing VALUES (2, 'a');
SELECT * FROM testing;
UPDATE testing
SET colb = CASE cola WHEN 1 THEN 'a'
WHEN 2 THEN 'b'
END
WHERE cola IN (1,2);
SELECT * FROM testing;
so you will go from:
cola colb
------------
1 b
2 a
to:
cola colb
------------
1 a
2 b
I also think that #2 is the best bet, though I would be sure to wrap it in a transaction in case something goes wrong mid-update.
An alternative (since you asked) to updating the Unique Index values with different values would be to update all of the other values in the rows to that of the other row. Doing this means that you could leave the Unique Index values alone, and in the end, you end up with the data that you want. Be careful though, in case some other table references this table in a Foreign Key relationship, that all of the relationships in the DB remain intact.
I have the same problem. Here's my proposed approach in PostgreSQL. In my case, my unique index is a sequence value, defining an explicit user-order on my rows. The user will shuffle rows around in a web-app, then submit the changes.
I'm planning to add a "before" trigger. In that trigger, whenever my unique index value is updated, I will look to see if any other row already holds my new value. If so, I will give them my old value, and effectively steal the value off them.
I'm hoping that PostgreSQL will allow me to do this shuffle in the before trigger.
I'll post back and let you know my mileage.
In SQL Server, the MERGE statement can update rows that would normally break a UNIQUE KEY/INDEX. (Just tested this because I was curious.)
However, you'd have to use a temp table/variable to supply MERGE w/ the necessary rows.
For Oracle there is an option, DEFERRED, but you have to add it to your constraint.
SET CONSTRAINT emp_no_fk_par DEFERRED;
To defer ALL constraints that are deferrable during the entire session, you can use the ALTER SESSION SET constraints=DEFERRED statement.
Source
I usually think of a value that absolutely no index in my table could have. Usually - for unique column values - it's really easy. For example, for values of column 'position' (information about the order of several elements) it's 0.
Then you can copy value A to a variable, update it with value B and then set value B from your variable. Two queries, I know no better solution though.
Oracle has deferred integrity checking which solves exactly this, but it is not available in either SQL Server or MySQL.
1) switch the ids for name
id student
1 Abbot
2 Doris
3 Emerson
4 Green
5 Jeames
For the sample input, the output is:
id student
1 Doris
2 Abbot
3 Green
4 Emerson
5 Jeames
"in case n number of rows how will manage......"