Advice on creating/inserting data into Hive's bucketed tables.
Did some reading (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables) and tested few options but with no success.
Currently I get following error while running insert:
Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Create code:
CREATE test_in (
id VARCHAR(250),
field_1 VARCHAR(250),
field_2 VARCHAR(250),
field_3 VARCHAR(250),
field_4 VARCHAR(250),
field_5 VARCHAR(250)
)
PARTITIONED BY(ds STRING)
CLUSTERED BY(id) into 10 buckets
STORED AS orc
tblproperties("orc.compress"="NONE","transactional"="true");
Insert code:
INSERT INTO TABLE test_in
VALUES (
'9gD0xQxOYS',
'ZhQbTjUGLhz8KuQ',
'SmszyJHEqIVAeK8gAFVx',
'RvbRdU7ia1AMHhaXd9tOgLEzi',
'a010E000004uJt8QAE',
'yh6phK4ZG7W4JaOdoOhDJXNJgmcoZU'
)
Need help in creating proper syntax for create/insert statement and some explanation on bucketting in Hive.
CREATE STATEMENT - The word table is missing. (May be a typo)
INSERT STATEMENT - Partition details are missing. Partition value is required during INSERT operation since it is a partitioned table.
The correct and working queries are below,
CREATE STATEMENT:
CREATE TABLE test_in (
id VARCHAR(250),
field_1 VARCHAR(250),
field_2 VARCHAR(250),
field_3 VARCHAR(250),
field_4 VARCHAR(250),
field_5 VARCHAR(250)
)
PARTITIONED BY(ds STRING)
CLUSTERED BY(id) into 10 buckets
STORED AS orc
INSERT STATEMENT:
INSERT INTO test_in
PARTITION (ds='123')
VALUES (
'9gD0xQxOYS',
'ZhQbTjUGLhz8KuQ',
'SmszyJHEqIVAeK8gAFVx',
'RvbRdU7ia1AMHhaXd9tOgLEzi',
'a010E000004uJt8QAE',
'yh6phK4ZG7W4JaOdoOhDJXNJgmcoZU'
)
Hope this helps!
Related
I am trying to create a Hive table with partitions but getting the above error. What am I doing wrong?
CREATE TABLE IF NOT EXISTS schema.table_name
(
ID varchar(20),
name varchar(50)
)
PARTITIONED BY (part_dt varchar(8), system varchar(5));
The code works without the partitioning clause. Something gives up during partitioning.
Statement is working in hive. Pls find below screenshot.
Its possible that some of column names are reserved keywords and that is throwing error. if yes, you can use below SQL too.
CREATE TABLE IF NOT EXISTS schema.table_name
(
`ID` varchar(20),
`name` varchar(50)
)
PARTITIONED BY (`part_dt` varchar(8), `system` varchar(5));
I just created a database and then added a couple of hundred tables with a script like this:
CREATE TABLE CapBond
(
[timestamp] varchar(50),
[Reward] varchar(50),
[Award] varchar(50),
[Fact] varchar(50)
)
CREATE TABLE Values
(
[timestamp] varchar(50),
[Name] varchar(50),
[Test] varchar(50),
[Read] varchar(50),
[Parameters] varchar(50)
)
I realize I forgot to add two columns to each table. One for the PK and one for an FK that points back to a 'master' table.
Is there an easy way to insert columns without having to drop the DB and recreate it? Preferably with the columns inserted as the first two columns in the table?
Yes. In mysql you have the alter table command for this purpose. Check out this page for more detailed explanation
https://www.sqlservertutorial.net/sql-server-basics/sql-server-alter-table-add-column/ .
And here is the solution for the ordering of the columns
https://www.mysqltutorial.org/mysql-add-column/
Very new to SQL, but I thought that I had at least mastered how to make tables. I am trying to create the following table and get the error 'ORA-00903: invalid table name'. I'm not sure what is wrong.
Create table order (
order_id int,
item_type varchar(50),
item_name varchar(50),
item_price decimal(10,2),
primary key(order_id)
);
I am testing this on Oralce Live SQL and it is ok as well as on my Oracle 12c Database EE, all you need to add are "". But even so, I would not recommend it to use reserved words for naming tables.
Create table "order" (
order_id int,
item_type varchar(50),
item_name varchar(50),
item_price decimal(10,2),
primary key(order_id)
);
insert into "order" values (1, 'Item', 'Name', '20.2');
select * from "order";
I've a weird issue.
DROP table IF EXISTS ipl;
CREATE TABLE ipl(
match_id VARCHAR(50),
batting VARCHAR(50),
bowling VARCHAR(50),
overn VARCHAR(50),
batsman VARCHAR(50),
bowler VARCHAR(50),
super_over VARCHAR(50),
bat_runs VARCHAR(50),
extra_runs VARCHAR(50),
total_runs VARCHAR(50),
player_out VARCHAR(50),
how VARCHAR(50),
fielder VARCHAR(50));
BULK INSERT ipl
FROM 'F:\Study\Semesters\4th
Sem\COL362\HomeWork\1\Dataset\deliveries.csv'
WITH(FIELDTERMINATOR= ',');
SELECT * FROM ipl;
This is the code I'm using to make the table in SSMS. match_id goes from 1 to about 290, in increasing order in the csv file. When I executed this query once, everything was ok. But, when I did that again, some rows from the middle were moved to the last.
You can see that below:
(Note that jump from 4 to 49)
I don't know what's wrong. Please help me resolve this issue. Thanks!
SQL tables represent unordered sets. If you want rows in a particular order, you need an order by. How can you do this with a bulk insert? Well, you need an identity column. The idea is to create the table with an identity and use a view for the bulk insert:
create table ipl (
ipl_id int identity(1, 1) primary key,
. . .
);
create view vw_ipl as
select match_id, batting, bowling, . . .
from ipl
bulk insert vw_ipl
from 'F:\Study\Semesters\4th Sem\COL362\HomeWork\1\Dataset\deliveries.csv'
with (fieldterminator= ',' );
select *
from ipl
order by ipl_id;
As a relational database SQL server do not guarantee a particular order of returned data. If you need ordered data, specify order by clause.
I have the below two tables created.
CREATE TABLE DELETE_CUSTOM
(
column1 varchar2(30),
column2 varchar2(30)
)
CREATE TABLE DELETE_CUSTOM_HIST
(
column1 varchar2(30),
column2 varchar2(30),
ARCHIVAL_DATE DATE
)
I used to insert data into DELETE_CUSTOM_HIST like below.
INSERT INTO DELETE_CUSTOM_HIST
(SELECT DE.*, to_date('12/31/2017','mm/dd/yyyy')
FROM DELETE_CUSTOM DE);
Now, I had to add a new column to both the tables.
ALTER TABLE DELETE_CUSTOM ADD column3 VARCHAR2(30);
ALTER TABLE DELETE_CUSTOM_HIST ADD column3 VARCHAR2(30);
If I try to insert data to DELETE_CUSTOM_HIST table with the same below INSERT statementm,
I am getting ORA-01858 error.
INSERT INTO DELETE_CUSTOM_HIST
(SELECT DE.*, to_date('12/31/2017','mm/dd/yyyy')
FROM DELETE_CUSTOM DE);
--ORA-01858: a non-numeric character was found when a numeric was expected.
What change I need to do to my INSERT statement to overcome this issue.
Note : The above insert statement is in PLSQL package. The above two tables are actually having
many columns. I have just mentioned few columns to explain my problem in a simple way.
INSERT INTO DELETE_CUSTOM_HIST(column1,column2,column3)
(SELECT de.column1,de.column2,de.column3,
to_date('12/31/2017','mm/dd/yyyy')
FROM DELETE_CUSTOM DE);
explicitly write column list and try again.
you have :
CREATE TABLE DELETE_CUSTOM_HIST
(
column1 varchar2(30),
column2 varchar2(30),
ARCHIVAL_DATE DATE,
column3 varchar2(30)
);
trying to INSERT
a varchar2 type column ( COLUMN3 column of DELETE_CUSTOM table ) column to
a date type column ( ARCHIVAL_DATE column of DELETE_CUSTOM_HIST table ).
The raised error is due to this mismatch with varchar2 and date.
Two points. The first is to always list the columns explicitly for such statements. Don't use * in production code -- unless you really know what you are doing.
INSERT INTO DELETE_CUSTOM_HIST (column1, column2, column3, archival_date)
SELECT DE.column1, DE.column2, DE.column3, DATE '2017-12-31'
FROM DELETE_CUSTOM DE;
Notes (in addition to listing the columns):
Parentheses are not appropriate around the SELECT.
Use date literals. Oracle supports the ANSI standard here, use it.
Are you sure you just want the date portion for the archival date?