Hive Table name starts with underscore select statement issue - sql

In the process of executing my hql script, i have to store data into a temporary table before inserting to the main table.
In that scenario, I have tried to create a temporary table with an underscore at the starting.
Note: with quotes the table name with underscore is not working.
Working Create Statement:
create table
dbo.`_temp_table` (
emp_id int,
emp_name string)
stored as ORC
tblproperties ('ORC.compress' = 'ZLIB')';
Working Insert Statement:
insert into table dbo.`_temp_table` values (123, 'ABC');
But, the select statement on the temp table is not working and it is showing null records even though we have inserted the record as per insert statement.
select * from dbo.`_temp_table`;
Everything is working fine, but select statement to view the rows is not working.
I still not sure, that we can create a temp table in the above way???

Hadoop uses such filenames started with underscore for hidden files and ignores them when reading. For example "_$folder$" file which is created when you execute mkdir to create empty folder in S3 bucket.
See HIVE-6431 - Hive table name start with underscore
By default, FileInputFormat(which is the super class of various
formats) in hadoop ignores file name starts with "_" or ".", and hard
to walk around this in hive codebase.
You can try to create external table and specify table location without underscore and still having underscore in table name. Also consider using TEMPORARY tables.

Related

How to recalculate table created by CTAS?

I have created table using this statement:
CREATE TABLE tablename STORED AS PARQUET AS (SELECT ...)
How can i recalculate it without DROP TABLE - CREATE TABLE flow?
In Impala, The INSERT INTO syntax appends data to a table. The existing data files are left as-is, and the inserted data is put into one or more new data files.
The INSERT OVERWRITE syntax replaces the data in a table. Currently, the overwritten data files are deleted immediately; they do not go through the HDFS trash mechanism.
So If you want to replace the data in the table tablename without undergoing drop table and create table, you can run a query like this
INSERT OVERWRITE TABLE tablename SELECT * from <source_tablename>;

Creation of a partitioned external table with hive: no data available

I have the following file on HDFS:
I create the structure of the external table in Hive:
CREATE EXTERNAL TABLE google_analytics(
`session` INT)
PARTITIONED BY (date_string string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/flumania/google_analytics';
ALTER TABLE google_analytics ADD PARTITION (date_string = '2016-09-06') LOCATION '/flumania/google_analytics';
After that, the table structure is created in Hive but I cannot see any data:
Since it's an external table, data insertion should be done automatically, right?
your file should be in this sequence.
int,string
here you file contents are in below sequence
string, int
change your file to below.
86,"2016-08-20"
78,"2016-08-21"
It should work.
Also it is not recommended to use keywords as column names (date);
I think the problem was with the alter table command. The code below solved my problem:
CREATE EXTERNAL TABLE google_analytics(
`session` INT)
PARTITIONED BY (date_string string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/flumania/google_analytics/';
ALTER TABLE google_analytics ADD PARTITION (date_string = '2016-09-06');
After these two steps, if you have a date_string=2016-09-06 subfolder with a csv file corresponding to the structure of the table, data will be automatically loaded and you can already use select queries to see the data.
Solved!

Delete records from Hive table using filename

I have a use case where I build a hive table from a bunch of csv files. While writing csv information into hive table, I assign INPUT__FILE__NAME (part of the name) to one of the columns. When I want to the update the records for the same filename, I need to delete the records of the csv file before writing it again.
I use the below query but failed
CREATE EXTERNAL TABLE T_TEMP_CSV(
F_FRAME_RANK BIGINT,
F_FRAME_RATE BIGINT,
F_SOURCE STRING,
F_PARAMETER STRING,
F_RECORDEDVALUE STRING,
F_VALIDITY INT,
F_VALIDITY_INTERPRETATION STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ';'
location '/user/baamarna5617/HUMS/csv'
TBLPROPERTIES ("skip.header.line.count"="2");
DELETE FROM T_RECORD
WHERE T_RECORD.F_SESSION = split(reverse(split(reverse(T_TEMP_CSV.INPUT__FILE__NAME),"/")[0]), "[.]")[0]
from T_TEMP_CSV;
The T_RECORD table has a column called F_SESSION which was assigned part of the INPUT__FILE__NAME using the split method shown above. I want to use the same method while removing those records. Can someone please point me where i am going wrong in this query?
I could successfully delete the records using the below syntax
DELETE FROM T_RECORD
WHERE F_SESSION = 68;
I need to get that 68 from the INPUT_FILE_NAME.

SemanticException Partition spec {col=null} contains non-partition columns

I am trying to create dynamic partitions in hive using following code.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
create external table if not exists report_ipsummary_hourwise(
ip_address string,imp_date string,imp_hour bigint,geo_country string)
PARTITIONED BY (imp_date_P string,imp_hour_P string,geo_coutry_P string)
row format delimited
fields terminated by '\t'
stored as textfile
location 's3://abc';
insert overwrite table report_ipsummary_hourwise PARTITION (imp_date_P,imp_hour_P,geo_country_P)
SELECT ip_address,imp_date,imp_hour,geo_country,
imp_date as imp_date_P,
imp_hour as imp_hour_P,
geo_country as geo_country_P
FROM report_ipsummary_hourwise_Temp;
Where report_ipsummary_hourwise_Temp table contains following columns,
ip_address,imp_date,imp_hour,geo_country.
I am getting this error
SemanticException Partition spec {imp_hour_p=null, imp_date_p=null,
geo_country_p=null} contains non-partition columns.
Can anybody suggest why this error is coming ?
You insert sql have the geo_country_P column but the target table column name is geo_coutry_P. miss a n in country
I was facing the same error. It's because of the extra characters present in the file.
Best solution is to remove all the blank characters and reinsert if you want.
It could also be https://issues.apache.org/jira/browse/HIVE-14032
INSERT OVERWRITE command failed with case sensitive partition key names
There is a bug in Hive which makes partition column names case-sensitive.
For me fix was that both column name has to be lower-case in the table
and PARTITION BY clause's in table definition has to be lower-case. (they can be both upper-case too; due to this Hive bug HIVE-14032 the case just has to match)
It says while copying the file from result to hdfs jobs could not recognize the partition location. What i can suspect you have table with partition (imp_date_P,imp_hour_P,geo_country_P) whereas job is trying to copy on imp_hour_p=null, imp_date_p=null, geo_country_p=null which doesn't match..try to check hdfs location...the other point what i can suggest not to duplicate column name and partition twice
insert overwrite table report_ipsummary_hourwise PARTITION (imp_date_P,imp_hour_P,geo_country_P)
SELECT ip_address,imp_date,imp_hour,geo_country,
imp_date as imp_date_P,
imp_hour as imp_hour_P,
geo_country as geo_country_P
FROM report_ipsummary_hourwise_Temp;
The highlighted fields should be the same name available in the report_ipsummary_hourwise file

Bucket is not creating on hadoop-hive

I'm trying to create a bucket in hive by using following commands:
hive> create table emp( id int, name string, country string)
clustered by( country)
row format delimited
fields terminated by ','
stored as textfile ;
Command is executing successfully: when I load data into this table, it executes successfully and all data is shown when using select * from emp.
However, on HDFS it is only creating one table and only one file is there with all data. That is, there is no folder for specific country records.
First of all, in the DDL statement you have to explicitly mention how many buckets you want.
create table emp( id int, name string, country string)
clustered by( country)
INTO 2 BUCKETS
row format delimited
fields terminated by ','
stored as textfile ;
In the above statement I have mention 2 buckets, similarly you can mention any number you want.
Still you are not done!!
After that, while loading data into the table you also have to mention the below hint to hive.
set hive.enforce.bucketing = true;
That should do it.
After this you should be able to see that number of files created under the table directory is same as the number of buckets mentioned in the DDL statement.
Bucketing doesn't create HDFS folders, rather if you want a separate floder to be created for a country then you should PARTITION.
Please go through hive partitioning and bucketing in detail.