What happens to indexes when using ALTER SCHEMA to transfer a table? - sql

I have a massive job that runs nightly, and to have a smaller impact on the DB it runs on a table in a different schema (EmptySchema) that isn't in general use, and is then swapped out to the usual location (UsualSchema) using
ALTER SCHEMA TempSchema TRANSFER UsualSchema.BigTable
ALTER SCHEMA UsualSchema TRANSFER EmptySchema.BigTable
ALTER SCHEMA EmptySchema TRANSFER TempSchema.BigTable
Which effectively swaps the two tables.
However, I then need to set up indexes on the UsualSchema table. Can I do this by disabling them on the UsualSchema table and then re-enabling them once the swap has taken place? Or do I have to create them each time on the swapped out table? Or have duplicate indexes in both places and disable/enable them as necessary (leading to duplicates in source control, so not ideal)? Is there a better way of doing it?
There's one clustered index and five non-clustered indexes.
Thanks.

Indexes, including those that support constraints, are transferred by ALTER SCHEMA, so you can have them in both the source and target object schema.
Constraint names are schema-scoped based on the table schema and other indexes names are scoped by the table/view itself. It is therefore possible to have identical index names in the same schema but on different tables. Constraint names must be unique within the schema.

Related

HANA: How to add a unique contraint on a column that is already indexed?

Having the following schema:
CREATE TABLE test_table(
cryptid varchar(255)
);
CREATE INDEX cryptid_index ON test_table (cryptid);
I am trying to a unique contraint to the column.
ALTER TABLE test_table ADD constraint crypid_unique_contraint UNIQUE(cryptid);
But this runs into an error:
Could not execute 'ALTER TABLE test_table ADD constraint crypid_unique_contraint ...'
Error: (dberror) [261]: invalid index name: column list already indexed
I can understand that the column is already indexed because I have created the index by myself. But I want the column to be unique. Is there a way to do this?
This is indeed an undocumented limitation in the current HANA versions.
The only way to create a unique constraint on this column is to first drop the single-column index present on this column.
I would consider the fact that this is not documented a (docu-)bug. However, the fact that existing indexes cannot be generally reused for uniqueness checks is not.
Single-column indexes in HANA's column store (which is what you use by default) tables are not B-tree indexes. Instead, these are inverted indexes into the column store structure of the main store-part of a column store table column.
These inverted structures cannot be checked for duplicates in the current transactional context as easily as B-tree indexes could.
This, I believe, is the reason for
a) implementing the uniqueness check only on a specific index implementation in the column store,
and
b) making the system behavior (not allowing the "conversion" of an existing index into a unique index) consistent across all table types.
As a general comment: for column store tables the benefit of single-column indexes for lookup/point-read scenarios is very often not worth the additional storage & compute resource consumption. This type of index practically doubles the memory requirement for the indexed column. So the speed-up in looking up a specific value should be worthwhile this additional permanent resource consumption.
You may check the documentation on INDEXES system view. They listed this index types:
Type of row store indexes: BTREE, BTREE_UNIQUE, CPBTREE, and CPBTREE_UNIQUE.
"Simple" index and unique index are a different index types at a build time, so there's no way to change it after the index was declared.
In other databases when you add unique constraint, it creates a new unique index (like in T-SQL or MySQL or Postgers) or reuses current index with this column (as in Oracle). But HANA doesn't allow you to create ether additional index on the same column (due to unknown reason, I didn't find it documented) or enforce constraint using existing index (due to poor implementation of mixture of uniqueness and index type).
The only way to go is to drop existing index and create it as unique (it is equivalent to unique constraint from the metadata point of view) from scratch, which you cannot due to authorizations. Sad but...

Error while dropping column from a table with secondary index (Scylladb)

While dropping a column from a table that contains secondary index I get the following error. I am using ScyllaDB version 3.0.4.
[Invalid query] message="Cannot drop column name on base table warehouse.myuser with materialized views"
Below are the example commands
create table myuser (id int primary key, name text, email text);
create index on myuser(email);
alter table myuser drop name;
I can successfully run the above statements in Apache Cassandra.
Default secondary indexes in Scylla are global and implemented on top of materialized views (as opposed to Apache Cassandra's local indexing implementation), which gives them new possibilities, but also adds certain restrictions. Dropping a column from a table with materialized views is a complex operation, especially if the target column is selected by one of the views or its liveness can affect view row liveness. In order to avoid these problems, dropping a column is unconditionally not possible when there are materialized views attached to a table. The error you see is a combination of that and the fact that Scylla's index uses a materialized view underneath to store corresponding base keys for each row.
The obvious workaround is to drop the index first, then drop the column and recreate the index, but that of course takes time and resources.
However, in some cases columns can be allowed to be dropped from the base table even if it has materialized views, especially if the column is not selected in the view and its liveness does not have any impact on view rows. For reference, I created an issue that requests implementing it in our bug tracker: https://github.com/scylladb/scylla/issues/4448

Checking foreign key constraint "online"

If we have a giant fact table and want to add a new dimension, we can do it like this:
BEGIN TRANSACTION
ALTER TABLE [GiantFactTable]
ADD NewDimValueId INT NOT NULL
CONSTRAINT [temp_DF_NewDimValueId] DEFAULT (-1)
WITH VALUES -- table is not actually rebuilt!
ALTER TABLE [GiantFactTable]
WITH NOCHECK
ADD CONSTRAINT [FK_GiantFactTable_NewDimValue]
FOREIGN KEY ([NewDimValueId])
REFERENCES [NewDimValue] ([Id])
-- drop the default constraint, new INSERTs will specify a value for NewDimValueId column
ALTER TABLE [GiantFactTable]
DROP CONSTRAINT [temp_DF_NewDimValueId]
COMMIT TRANSACTION
NB: all of the above only manipulate table metadata and should be fast regardless of table size.
Then we can run a job to backfill GiantFactTable.NewDimValueId in small transactions, such that the FK is not violated. (At this point any INSERTs/UPDATEs - e.g. backfill operation - are verified by the FK since it's enabled, but not "trusted")
After the backfill we know the data is consistent, my question is how can SQL engine become enlightened too? Without taking the table offline.
This command will make the FK trusted but it requires a schema modification (Sch-M) lock and likely take hours (days?) taking the table offline:
ALTER TABLE [GiantFactTable]
WITH CHECK CHECK CONSTRAINT [FK_GiantFactTable_NewDimValue]
About the workload: Table has a few hundred partitions (fixed number), data is appended to one partition at a time (in a round-robin fashion), never deleted. There is also a constant read workload that uses the clustering key to get a (relatively small) range of rows from one partition at a time.
Checking one partition at a time, taking it offline, would be acceptable. But I can't find any syntax to do this. Any other ideas?
A few ideas come to mind but they aren't pretty:
Redirect workloads and run check constraint offline
Create a new table with the same structure.
Change the "insert" workload to insert into the new table
Copy the data from the partition used by the "read" workload to the new table (or a third table with the same structure)
Change the "read" workload to use the new table
Run alter table to check the constraint and let it take as long as it needs
Change the both workloads back to the main table.
Insert the new rows back into the main table
Drop new table(s)
A variation on the above is to switch the relevant partition to the new table in step 3. That should be faster than copying the data but I think you will have to copy (and not just switch) the data back after the constraint has been checked.
Insert all the data into a new table
Create a new table with the same structure and constraint enabled
Change the "insert" workload to the new table
Copy all the data from old to new table in batches and wait as long as it takes to complete
Change the "read" workload to the new table. If step 3 takes too long and the "read" workload needs rows that have only been inserted into the new table, you will have to manage this changeover manually.
Drop old table
Use index to speed up constraint check?
I have no idea if this works but you can try to create a non-clustered index on the foreign key column. Also make sure there's an index on the relevant unique key on the table referenced by the foreign key. The alter table command might be able to use them to speed up the check (at least by minimizing IO compared to doing a full table scan). The indexes, of course, can be created online to avoid any disruption.

Using "Create Table" for an SQL Server 2008 R2 database

If I'm going to write a whole SQL script to create a database with tables (that has foreign keys) should I write the dependent tables first?
You have some options:
You can create all the tables first, and then use ALTER TABLE to add the Foreign Keys.
You can create the one to many relationships as the tables are created. In that case, the order of table creation will matter.
When you create such DBs you (more often than not) seed the tables with data as well.
Depending on how much data you insert, you may want to make a decision to either INSERT data first, or to enforce RI first. If you have small tables, the RI checks don't consume too many resources. If you have large tables, then you may want to first insert the data and then implement the RI - that way the check is not done one row at a time, but at one time for all rows. Since you're seeding the tables, you know your data - presumably you'll do clean inserts so as to not fail the downstream RI check.

SQL: Where should the Primary Key be defined

I'm creating a database with several sql files
1 file creates the tables.
1 file adds constraints.
1 file drops constraints.
The primary is a constraint however I've been told by someone to define your primary key in the table definition but not given a reason why.
Is it better to define the primary key as a constraint that can be added and dropped or is it better to do it in the table definition.
My current thinking is to do it in the table definition because doing it as a removable constraint could potentially lead to some horrible issues with duplicate keys.
But dropping constraints could lead to serious issues anyway so it is expected that if someone did drop the primary key, they would have taken appropriate steps to avoid problems as they should have for any other data entry
A primary key is a constraint, but a constraint is not necessarily a primary key. Short of doing some major database surgery, there should never be a need to drop a primary key, ever.
Defining the primary key along with the table is good practice - if you separate the table and the key definition, that opens the window to the key definition getting lost or forgotten. Given that any decent database design utterly depends on consistent keys, you don't ever want to have even the slightest chance that your primary keys aren't functioning properly.
From a maintainability perspective I would say that it is better to have the Primary Key in the table definition as it is a very good indicator of what the table will most likely be used for.
The other constraints are important as well though and your argument holds.
All of this is somewhat platform specific, but a primary key is a logical concept, whereas a constraint (or unique index, or whatever) is a physical thing that implements the logical concept of "primary key".
That's another reason to argue for putting it with the table itself - it's logical home - rather than the constraints file.
For effective source control it usually makes sense to have a separate script for each object (constraints included). That way you can track changes to each object individually.
There's a certain logical sense in keeping everything related to a table in one file--column definitions, keys, indexes, triggers, etc. If you never have to rebuild a very large database from SQL, that will work fine almost all the time. The few times it doesn't work well probably aren't worth changing the process of keeping all the related things together in one file.
But if you have to rebuild a very large database, or if you need to move a database onto a different server for testing, or if you just want to fiddle around with things, it makes sense to split things up. In PostgreSQL, we break things up like this. All these files are under version control.
All CREATE DOMAIN statements in one file.
Each CREATE TABLE statement in a separate file. That file includes all constraints except FOREIGN KEY constraints, expressed as ALTER TABLE statements. (More about this in a bit.)
Each table's FOREIGN KEY constraints in a separate file.
Each table's indexes for non-key columns in a separate file.
Each table's triggers in a separate file. (If a table has three triggers, all three go in one file.)
Each table's data in a separate file. (Only for tables loaded before bringing the database online.)
Each table's rules in a separate file.
Each function in a separate file. (Functions are PostgreSQL's equivalent to stored procedures.)
Without foreign key constraints, we can load tables in any order. After the tables are loaded, we can run a single script to rebuild all the foreign keys. The makefile takes care of bundling the right individual files together. (Since they're separate files, we can run them individually if we want to.)
Tables load faster if they don't have constraints. I said we put each CREATE TABLE statement in a separate file. The file includes all constraints except FOREIGN KEY constraints, expressed as ALTER TABLE statements. You can use the streaming editor sed to split those files into two pieces. One piece has the column definitions; the other piece has all the 'ALTER TABLE ADD CONSTRAINT' statements. The makefile takes care of splitting the source files and bundling them together--all the table definitions in one SQL file, and all the ALTER TABLE statements in another. Then we can run a single script to create all the tables, load the tables, then run a single script to rebuild all the constraints.
make is your friend.