Bucket policy in Hive - hive

Can we create bucket on character field in Hive ?
example:
create table EmpTab(
emp_id string,
emp_name string,
emp_city string,
emp_grade char
);
Can I create bucket on emp_grade column ? If no then why ?

Yes you can. You can use a coonstraction CLUSTERED BY(emp_grade) INTO 5 BUCKETS for that.
Example of a bucketed table creation:
CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)
COMMENT 'A bucketed copy of user_info'
CLUSTERED BY(user_id) INTO 256 BUCKETS;

Hive does not explicitly limit the data type of a column on which data is bucketed. And char is a legitimate type for bucketing.
You need to specify the bucketing column at table creation with clustered by.
create table EmpTab(
emp_id string,
emp_name string,
emp_city string,
emp_grade char(10)
) clustered by (emp_grade) into 32 buckets
;
Note, clustered by on table creation does not restrict how data is inserted into the table.
To make sure that a bucketed table's data is organized in accordance with its DDL, enforce the number of reducers to be the same as the number of buckets.
Before 2.x, this can be done by simply setting hive.enforce.bucketing to be true.
Or you can manually set the number of reducers, and add a cluster by in the select clause.
After 2.x, the enforcement is defaulted and the conf hive.enforce.bucketing is removed.

Related

How to add missing values of file on COPY command

I have a table with a column that comes from a file, although I'm certain that for the other column the value is missing on the file.
Here's the table:
create table if not exists user(
id varchar(36) primary key,
relevance varchar(3) not null,
constraint relevance_check check (relevance in ('ONE', 'TWO'))
);
The command I want to populated the table with:
copy user(id') from '/home/users_ids.txt';
The problem is that my column relevance is not null, and I'd like to set default values on the relevance column when copying, but I'm not sure if that's possible.
I cant set a default value on the tables because I need to import data from many files and each one would have a different value on the relevance field.
Can I achieve what I want using the copy command or there's another approach for this?
You can set the column's default value just for a while, e.g.:
alter table "user" alter relevance set default 'ONE';
copy "user"(id) from '/home/users_one.txt';
alter table "user" alter relevance set default 'TWO';
copy "user"(id) from '/home/users_two.txt';
alter table "user" alter relevance drop default;
The solution is simple and efficient when you are sure that the import takes place in only one session at a time but is not safe if you intend to use it simultaneously in more than one session. A safer alternative, in this case, could be to use a temporary table, e.g .:
create temp table ids(id text);
copy ids(id) from '/home/users_one.txt';
insert into "user"
select id, 'ONE'
from ids;

Alter column set default unsupported feature

I want to alter the table and set the default sequence of a column which is identity. When I try to run
ALTER TABLE report.test_table MODIFY id set default test_table_seq.NEXTVAL;
it shows following error:
[0A000][2] Unsupported feature 'Alter Column Set Default'.
Here's create table sql:
create table report.test_table(
id int identity,
txt text
);
Considering snowflake documentation a column must have a sequence to use alter column set default and trusting snowflake docs too identity or autoincrement are synonyms and snowflake use sequence to autoincrement that column.
https://docs.snowflake.net/manuals/sql-reference/sql/create-table.html
Sadly, there's no other way. Snowflake uses a sequence in backend but doesn't allow applying another sequence on that. You can only alter the column to add a new sequence if it was added as default while table creation.

Adding a NOT NULL column to a Redshift table

I'd like to add a NOT NULL column to a Redshift table that has records, an IDENTITY field, and that other tables have foreign keys to.
In PostgreSQL, you can add the column as NULL, fill it in, then ALTER it to be NOT NULL.
In Redshift, the best I've found so far is:
ALTER TABLE my_table ADD COLUMN new_column INTEGER;
-- Fill that column
CREATE TABLE my_table2 (
id INTEGER IDENTITY NOT NULL SORTKEY,
(... all the fields ... )
new_column INTEGER NOT NULL,
PRIMARY KEY(id)
) DISTSTYLE all;
UNLOAD ('select * from my_table')
to 's3://blah' credentials '<aws-auth-args>' ;
COPY my_table2
from 's3://blah' credentials '<aws-auth-args>'
EXPLICIT_IDS;
DROP table my_table;
ALTER TABLE my_table2 RENAME TO my_table;
-- For each table that had a foreign key to my_table:
ALTER TABLE another_table ADD FOREIGN KEY(my_table_id) REFERENCES my_table(id)
Is this the best way of achieving this?
You can achieve this w/o having to load to S3.
modify the existing table to create the desired column w/ a default value
update that column in some way (in my case it was copying from another column)
create a new table with the column w/o a default value
insert into the new table (you must list out the columns rather than using (*) since the order may be the same (say if you want the new column in position 2)
drop the old table
rename the table
alter table to give correct owner (if appropriate)
ex:
-- first add the column w/ a default value
alter table my_table_xyz
add visit_id bigint NOT NULL default 0; -- not null but default value
-- now populate the new column with whatever is appropriate (the key in my case)
update my_table_xyz
set visit_id = key;
-- now create the new table with the proper constraints
create table my_table_xzy_new
(
key bigint not null,
visit_id bigint NOT NULL, -- here it is not null and no default value
adt_id bigint not null
);
-- select all from old into new
insert into my_table_xyz_new
select key, visit_id, adt_id
from my_table_xyz;
-- remove the orig table
DROP table my_table_xzy_events;
-- rename the newly created table to the desired table
alter table my_table_xyz_new rename to my_table_xyz;
-- adjust any views, foreign keys or permissions as required

Restoring a Truncated Table from a Backup

I am restoring the data of a truncated table in an Oracle Database from an exported csv file. However, I find that the primary key auto-increments and does not insert the actual values of the primary key constrained column from the backed up file.
I intend to do the following:
1. drop the primary key
2. import the table data
3. add primary key constraints on the required column
Is this a good approach? If not, what is recommended? Thanks.
EDIT: After more investigation, I observed there's a trigger to generate nextval on a sequence to be inserted into the primary key column. This is the source of the predicament. Hence, following the procedure above would not solve the problem. It lies in the trigger (and/or sequence) on the table. This is solved!
easier to use your .csv as an external table and then go
create table your_table_temp as select * from external table
examine the data in the new temp table to ensure you know what range of primary keys is present
do a merge into the new table
samples from here and here
CREATE TABLE countries_ext (
country_code VARCHAR2(5),
country_name VARCHAR2(50),
country_language VARCHAR2(50)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY ext_tab_data
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL
(
country_code CHAR(5),
country_name CHAR(50),
country_language CHAR(50)
)
)
LOCATION ('Countries1.txt','Countries2.txt')
)
PARALLEL 5
REJECT LIMIT UNLIMITED;
and the merge
MERGE INTO employees e
USING hr_records h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);
Edit: after you have merged the data you can drop the temp table and the result is your previous table with the old data and the new data together
Edit you mention " During imports, the primary key column does not insert from the file, but auto-increments". This can only happen when there is a trigger on the table, likely, Before insert on each row. Disable the trigger and then do your import. Re-enable the trigger after committing your inserts.
I used the following procedure to solve it:
drop trigger trigger_name
Imported the table data into target table
drop sequence sequence_name
CREATE SEQUENCE SEQ_NAME INCREMENT BY 1 START WITH start_index_for_next_val MAXVALUE max_val MINVALUE 1 NOCYCLECACHE 20 NOORDER
CREATE OR REPLACE TRIGGER "schema_name"."trigger_name"
before insert on target_table
for each row
begin
select seq_name.nextval
into :new.unique_column_name
from dual;
end;

sqlite3 explain table_name

In mysql you can view a table's structure via explain tablename; What is the equivalent for sqlite3?
I believe ".schema tablename" is what you're looking for.
You can use .schema in the Command Line Shell:
With no arguments, the ".schema"
command shows the original CREATE
TABLE and CREATE INDEX statements that
were used to build the current
database. If you give the name of a
table to ".schema", it shows the
original CREATE statement used to make
that table and all if its indices.
This was already answered in a more generic way here.
Edit:
Note that .schema will also give you INDEXES that match the same name.
Example:
CREATE TABLE job (
id INTEGER PRIMARY KEY,
data VARCHAR
);
CREATE TABLE job_name (
id INTEGER PRIMARY KEY,
name VARCHAR
);
CREATE INDEX job_idx on job(data);
Note the differences between:
sqlite> SELECT sql FROM SQLITE_MASTER WHERE type = 'table' AND name = 'job';
CREATE TABLE job (
id INTEGER PRIMARY KEY,
data VARCHAR
)
sqlite> SELECT sql FROM SQLITE_MASTER WHERE name = 'job_idx';
CREATE INDEX job_idx on job(data)
and
sqlite> .schema job
CREATE TABLE job (
id INTEGER PRIMARY KEY,
data VARCHAR
);
CREATE INDEX job_idx on job(data);
Including the semi-colon at the end of the queries.