hive indexes error on partitioned table - indexing

I was trying to test an example of hive indexes. I am not able to create indexes on a partitioned column but works on all other columns. Most of the sites gave examples on partitioned column but for some reason I am not able to get it worked. I am using hive 14 and the example was taken from Programming in hive. Can some one let me know if something is wrong in the below code?
CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING);
CREATE INDEX employees_index
ON TABLE employees (country)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD
IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time')
IN TABLE employees_index_table
PARTITIONED BY (country, name)
COMMENT 'Employees indexed by country and name.';
Error :
FAILED: ParseException line 7:0 missing EOF at 'PARTITIONED' near 'employees_index_table'
org.apache.hadoop.hive.ql.parse.ParseException: line 7:0 missing EOF at 'PARTITIONED' near 'employees_index_table'

Related

Hive - Can't Use ORDER BY with INSERT

I'm using Hive 3.1.3 on AWS EMR. When I try to INSERT records with an ORDER BY statement, the statement fails with error message SemanticException [Error 10004]: Line 5:9 Invalid table alias or column reference 'ColumnName': (possible column names are: _col0...n). When I remove the ORDER BY, the INSERT works fine. Here's a simple example that reproduces the error:
CREATE TABLE People (PersonName VARCHAR(50), Age INT);
INSERT INTO People (PersonName, Age)
SELECT 'Mary' PersonName, 32 Age
UNION
SELECT 'John' PersonName, 41 Age
ORDER BY Age DESC;
FAILED: SemanticException [Error 10004]: Line 5:9 Invalid table alias or column reference 'Age': (possible column names are: _col0, _col1)
I know I can simply remove the ORDER BY, but the codebase is an existing application built to run on a traditional RDBMS. There are lots of ORDER BYs on INSERT statements. Is there any way I can make the INSERTs with ORDER BYs to work so I don't have to comb through thousands of lines of SQL and remove them?

Table can't be queried after change column position

When querying table using "select * from t2p", the reponse is as blow. I think I have missed some concepts, please help me out.
Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
Step1, create table
create table t2p(id int, name string, score map<string,double>)
partitioned by (class int)
row format delimited
fields terminated by ','
collection items terminated by '\\;'
map keys terminated by ':'
lines terminated by '\n'
stored as textfile;
Step2, insert data like
1,zs,math:90.0;english:92.0
2,ls,chinese:89.0;math:80.0
3,xm,geo:87.0;math:80.0
4,lh,chinese:89.0;english:81.0
5,xw,physics:91v;english:81.0
Step3, add another column
alter table t2p add columns (school string);
Step4, change column's order
alter table t2p change school school string after name;
Step5, do query and get error as mentioned above.
select * from t2p;
This is an obvious error.
Your command alter table t2p change school school string after name; changes metadata only. If you are moving columns, the data must already match the new schema or you must change it to match by some other means.
Which means, the map column has to be matching to the new column. In other words, if you want to move column around, make sure new column and existing data types are same.
I did a simple experiment with int data type. It worked because data type are not hugely different but you can see metadata changed but data stayed same.
create table t2p(id int, name string, score int)
partitioned by (class int)
stored as textfile;
insert into t2p partition(class=1) select 100,'dum', 199;
alter table t2p add columns (school string);
alter table t2p change school school string after name;
MSCK REPAIR TABLE t2p ;
select * from t2p;
You can see new column school is mapped to position 3( defined as INT).
Solution - You can do this but make sure new structure+data type is compatible to old structure.

SemanticException Partition spec {col=null} contains non-partition columns

I am trying to create dynamic partitions in hive using following code.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
create external table if not exists report_ipsummary_hourwise(
ip_address string,imp_date string,imp_hour bigint,geo_country string)
PARTITIONED BY (imp_date_P string,imp_hour_P string,geo_coutry_P string)
row format delimited
fields terminated by '\t'
stored as textfile
location 's3://abc';
insert overwrite table report_ipsummary_hourwise PARTITION (imp_date_P,imp_hour_P,geo_country_P)
SELECT ip_address,imp_date,imp_hour,geo_country,
imp_date as imp_date_P,
imp_hour as imp_hour_P,
geo_country as geo_country_P
FROM report_ipsummary_hourwise_Temp;
Where report_ipsummary_hourwise_Temp table contains following columns,
ip_address,imp_date,imp_hour,geo_country.
I am getting this error
SemanticException Partition spec {imp_hour_p=null, imp_date_p=null,
geo_country_p=null} contains non-partition columns.
Can anybody suggest why this error is coming ?
You insert sql have the geo_country_P column but the target table column name is geo_coutry_P. miss a n in country
I was facing the same error. It's because of the extra characters present in the file.
Best solution is to remove all the blank characters and reinsert if you want.
It could also be https://issues.apache.org/jira/browse/HIVE-14032
INSERT OVERWRITE command failed with case sensitive partition key names
There is a bug in Hive which makes partition column names case-sensitive.
For me fix was that both column name has to be lower-case in the table
and PARTITION BY clause's in table definition has to be lower-case. (they can be both upper-case too; due to this Hive bug HIVE-14032 the case just has to match)
It says while copying the file from result to hdfs jobs could not recognize the partition location. What i can suspect you have table with partition (imp_date_P,imp_hour_P,geo_country_P) whereas job is trying to copy on imp_hour_p=null, imp_date_p=null, geo_country_p=null which doesn't match..try to check hdfs location...the other point what i can suggest not to duplicate column name and partition twice
insert overwrite table report_ipsummary_hourwise PARTITION (imp_date_P,imp_hour_P,geo_country_P)
SELECT ip_address,imp_date,imp_hour,geo_country,
imp_date as imp_date_P,
imp_hour as imp_hour_P,
geo_country as geo_country_P
FROM report_ipsummary_hourwise_Temp;
The highlighted fields should be the same name available in the report_ipsummary_hourwise file

oracle external table load

I have 100 records in players.dat file like
PIT INDIANPOLISH COLTS
and then new line same again. How can I load this data into external table?
Here is my table definition:
CREATE TABLE TEAMS1(
TEAM_ID VARCHAR2(20)
, TEAM_NAME VARCHAR2(35)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY DATA_WAREHOUSE
ACCESS PARAMETERS (
RECORDS DELIMITED BY newline
fields terminated by whitespace
missing fields values are null (
TEAM_ID VARCHAR(20),
TEAM_NAME VARCHAR2(35)
TERMINATED BY '/N')
)
LOCATION ('NFL_Teams.dat')
) ;
Here is the error:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "fields": expecting one of: "field"
KUP-01007: at line 3 column 9 29913. 00000 - "error in executing %s callout"
The error message points to a syntax problem, it even gives us a clue.
KUP-01005: syntax error: found "fields": expecting one of: "field"
Sure enough, your table definition has this ...
missing fields values are null
... when it should be this:
missing field values are null
You have a major problem with your data file. Your table definition specifies fields terminated by whitespace but your sample data shows a team name consisting of two words,INDIANPOLISH COLTS. You won't be able to load that.
The best solution is to get the providing system to do the right thing and supply a data file which uses a sensible field delimiter and/or field enclosure. (If this is a school assignment you can do this yourself.)
A less desirable solution would be to pre-process the data file, using regex to delimit or enclose the fields.
There are several issues with your code, you can try to correct it according to this:
CREATE TABLE TEAMS1( TEAM_CODE varchar2(10), TEAM_ID VARCHAR2(20) , TEAM_NAME VARCHAR2(35)) ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER DEFAULT DIRECTORY DATA_WAREHOUSE ACCESS PARAMETERS
( RECORDS DELIMITED BY newline fields ( TEAM_CODE varchar(10), TEAM_ID VARCHAR(20),
TEAM_NAME VARCHAR(35) )) LOCATION ('players.dat')) ;
Then you can create standard table let's say TEAMS(team_id varchar2(20), team_name varchar2(35)) and load it from external table:
insert into teams select team_id, team_name from teams1;

How can I create a partitioned table 'like' an unpartitioned table with Hive HQL?

I've got a table with two weeks worth of entries, and I would like to copy those entries into a table partitioned by date (creating it if it does not exist).
I'm writing a luigi task to do this, and I would love for it to be independent of the table schema--i.e. I wouldn't have to specify column names and types, and it would CREATE TABLE IF NOT EXISTS when necessary.
I was hoping I could use:
CREATE TABLE IF NOT EXISTS test_part
COMMENT 'This is a test table to see if partitioning works in this case'
PARTITIONED BY (event_date string)
AS select *, '2014-12-15' from source_db.source_table
where event_at <'2014-12-16' and event_at >='2014-12-15';
But this of course fails with: FAILED: SemanticException [Error 10068]: CREATE-TABLE-AS-SELECT does not support partitioning in the target table
I tried again with "like" with basically the same results. Is there a way to do this that I am missing? It doesn't have to be atomic. Multiple sequential commands are fine.
You do not do a create table as.
You create a table first using describe source_table and then you make an insert into table partition (event_date string)
2 steps it works better.