How to create an Index in Amazon Redshift - sql

I'm trying to create indexes in Amazon Redshift but I received an error
create index on session_log(UserId);
UserId is an integer field.

If you try and create an index (with a name) on a Redshift table:
create index IX1 on "SomeTable"("UserId");
You'll receive the error
An error occurred when executing the SQL command:
create index IX1 on "SomeTable"("UserId")
ERROR: SQL command "create index IX1 on "SomeTable"("UserId")" not supported on Redshift tables.
This is because, like other data warehouses, Redshift uses columnar storage, and as a result, many of the indexing techniques (like adding non-clustered indexes) used in other RDBMS aren't applicable.
You do however have the option of providing a single sort key per table, and you can also influence performance with a distribution key for sharding your data, and selecting appropriate compression encodings for each column to minimize storage and I/O overheads.
For example, in your case, you may elect to use UserId as a sort key:
create table if not exists "SomeTable"
(
"UserId" int,
"Name" text
)
sortkey("UserId");
You might want to read a few primers like these

You can Define Constraints but will be informational only, as Amazon says: they are not enforced by Amazon Redshift. Nonetheless, primary keys and foreign keys are used as planning hints and they should be declared if your ETL process or some other process in your application enforces their integrity.
Some services like pipelines with insert mode (REPLACE_EXISTING) will need a primary key defined in your table.
For other performance purposes the Stuart's response is correct.

Redshift allow to create primary key
create table user (
id int ,
phone_number int,
primary key(id))
but since Redshift does not enforce this constraints, primary key accepts duplicate values.
attached article on that issue
http://www.sqlhaven.com/amazon-redshift-what-you-need-to-think-before-defining-primary-key/

Related

Alter table in impala : make a column a primary key

Using Hue, how can I alter a table to make a prexisting column a primary key?
I check and things like :
ALTER TABLE table_name ADD CONSTRAINT colname PRIMARY KEY (cs_id);
is not syntactically correct.
NB: data is stored using Kudu file system.
First, Impala does not support alter contraint as an option in alter table.
Second, primary keys are very limited:
The primary key columns must be the first ones specified in the CREATE TABLE statement.
I don't think you can change the primary key after it has been defined. In Impala, the data is clustered (i.e. sorted) by the primary key, so any change would be quite expensive.
You probably need to recreate the table and reload it with data.
When you are storing as Kudu, you need to consider that the PK columns need to be all created at the creation of the table.
Impala does not support altering primary keys.
I'm afraid you need to delete and create the table again.

Create table as in Redshift defining primary key

I am trying to replicate a table using CTAS clause in redshift by additionally specifying a primary key to the table.
Tried below syntax but no luck. However, I was able to specify DISTKEY/SORTKEY using the same syntax
create table date_dim
PRIMARY KEY(date_key)
--DISTKEY ( date_key )
as
select date_key,
calendar_date,.....;
I want to use primary key as part of merge logic I am designing in my flow.
TIA!
Many people consider primary and foreign keys in Redshift to be an anti-pattern (because they're unenforced), but my team built a small tool (a Python script) that supports this scenario.
You write your select statement in a normal SQL file, define primary key, foreign keys, distkey, etc in a YAML configuration file, and then use the script to generate (and optionally execute) SQL to create and populate the table.
We also include an Airflow operator to make it simple to schedule and automate this.
The repo is here, and we wrote a bit more about it on our team blog
You can only specify distkey and sortkey in CTAS . Here is the below link which describes what all options you can specify
Redshift CTAS
If the column you are wishing to dub as primary key is already non-nullable you can use this:
ALTER TABLE <table_name> ADD CONSTRAINT <a_name_for_this_constraint> PRIMARY KEY (<attribute_name>)
e.g.: ALTER TABLE member ADD CONSTRAINT pk_1 PRIMARY KEY (member_id);
Redshift doesn't support primary and foreign key constraints: http://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html

Are temp table indexes unique across sessions or are they shared?

i have a large query (Web Dashboard Query) with many temporary tables.i have created indexes on the temp table.the application that is using the query has a user management module with different levels of permissions.My question is are indexes created per session like the temp db ?
i don't want the indexes to be shared across sessions.
i have been doing something like
EXEC('CREATE INDEX idx_test'+ #sessionId + 'ON #TempTable (id1,id2)');
is this necessary. i have seen it done by some developers.
Indexes on temporary tables (#t, not ##t) are not shared across the sessions, and there is no need to invent a unique index name to an index on a temporary table.
What is different (and may be you have seen in the code from other developers) is CONSTRAINT NAME. Index name can be repeated many times for different tables, but a constraint name must be unique within the database.
So maybe you see the code for stored procedures that create a constraint name with reference to a session, this is an attempt to give a unique name to a constraint. Because if you launch a stored procedure that creates a temp table #t in two sessions, every session create it's own table with it's own name(not just #t, the system is adding additional symbols to a table name that makes it unique)
But if the same proc tries to create a CONSTRAINT PK_t, the first session will succeeded but the second will get an error that the constraint PK_t already exists in the database(tempdb)

PostgreSQL FOREIGN KEY with second database

I'm running the following queries on PostgreSQL 9.3:
CREATE TABLE "app_item"
(
"id" SERIAL NOT NULL PRIMARY KEY,
"location_id" UUID NOT NULL
);
CREATE INDEX app_item_e274a5da
ON "app_item" ("location_id");
ALTER TABLE "app_item"
ADD CONSTRAINT app_item_location_id_5cecc1c0b46e12e2_fk_fias_addrobj_aoguid
FOREIGN KEY ("location_id") REFERENCES "fias_addrobj" ("aoguid") deferrable
initially deferred;
Third query returns:
ERROR: relation "fias_addrobj" does not exist
app_item - table in first database
fias_addrobj - table in second database
How to do correct query with this databases?
A local table must be referenced
However, as stated within the below link, you could maybe use a trigger which uses a cross server join (facilitated by dblink) to simulate the built-in methods for constraining?
For instance, you could have a trigger set up that on INSERT, checks to see if a given FK exists to aid with enforcing referential integrity, or on DELETE to cascade
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=101322
P.S. Would avoid this at all costs.
I've not had occasion to use this myself, but you might want to look into Foreign Data Wrappers, which are essentially the successor to dblink. In particular, postgres-fdw.
Once the general setup of the fdw is in place (steps 1-3 in the link above), you could create a foreign table via CREATE FOREIGN TABLE, defined like the table in your remote DB, and then use that table as part of the foreign key CONSTRAINT, and see if it works.
If that doesn't work, another option would be to have a process which ETL's the data (say, via a Python script) from the remote server over to the local server (say, on an hourly or daily basis, depending on the size), and then you would have a true local table to use in the foreign key CONSTRAINT. It wouldn't be real-time, but depending on your needs, may suffice.

Partitioned Table Integrity in SQL Server 2012

One of the issues that I worked with in previous versions of SQL Server was working with maintaining data integrity on a partitioned table where the partition key was not part of the primary key of the records being stored on the table. Specifically, creating a unique constraint on the value that uniquely identified each record would generate a non-aligned index which would prevent using some of the more fun partition features such as swapping data.
Is there any way in SQL Server 2012 to be able to enforce unique values on a partitioned table to enforce data integrity and allow for foreign key relationships without disabling features?