Hive table data load gives NULL values - hive

Select * from movierating gives NULL values as a Result.
I have tried below create table queries:
CREATE TABLE movierating(id INT, movieid INT, rating INT, time string);
CREATE TABLE movierating(id INT, movieid INT, rating INT, time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' stored as textfile;
Tried below load queries:
load data local inpath '/tmp/Movie-rating.txt' into table movierating;
load data local inpath '/tmp/Movie-rating.txt' OVERWRITE into table movierating;
data into 'Movie-rating.txt' file:(delimeter is tab)
1 123 3 881250949
2 125 4 881250123

For tab delimited data use '\t' as field delimiter:
CREATE TABLE movierating(id int,movieid int,rating int,time string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

Related

Hive is not handling integer values properly when loading text data into table

I was loading some text data into Apache Hive containing int columns. It was storing null values at unexpected places. So, I ran some tests:
create table testdata (c1 INT, c2 FLOAT) row format delimited fields terminated by ',' stored as textfile;
load data local inpath "testdata.csv" overwrite into table testdata;
select * from testdata;
testdata.csv contains this data:
1,1.0
1, 1.0
1 ,1.0
1 , 1.0
As you can see, dataset contains some extra whitespace around numbers. But this is causing hive to store null values in integer columns, while float is being parsed correctly.
Select query output:
1 1.0
NULL 1.0
NULL 1.0
NULL 1.0
Why this is happening so, and how to correctly handle these cases?
You can not do it in one step.
First load the data as string in stg table and then load into final table from stg table by removing space.
Create and load table like below.
create table testdata (c1 string, c2 string) row format delimited fields terminated by ',' stored as textfile;
create table stgtestdata as select * from testdata;
load data local inpath "testdata.csv" overwrite into table stgtestdata;
Use insert to load into final table by trimming space and convert properly like below
Insert overwrite testdata
select
Cast(trim(c1) as int) as c1,
Cast(trim(c2) as float) as c2
from stgtestdata;

inserting data into partition table Error while processing statement: org.apache.hadoop.hive.ql.exec.mr.MapRedTask

So i am new to this.
I created a partition table and was trying to insert data into it
this is my main table >>>>
CREATE TABLE test1(
FIPS INT, Admin2 STRING, Province_State STRING, Country_Region STRING, Last_Update TIMESTAMP, Lat FLOAT, Long_ FLOAT, Confirmed INT, Deaths INT, Recovered INT, Active INT, Combined_Key STRING, Incident_Rate FLOAT, Case_Fatality_Ratio FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
partition table >>>
create table province_state_part(country_region string,
confirmed int, deaths int)
PARTITIONED BY(province_state string)
row format delimited fields terminated by ',' lines terminated by '\n'
inserting into partition table from main >>>>
INSERT into TABLE province_state_part PARTITION(province_state)
SELECT country_region, confirmed, deaths, province_state
FROM test1;
but i get this error
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
what is this and how do i solve it ?

error loading csv into hive table

I'm trying to load a tab delimited file into a table in hive, and I want to skip the first row because it contains column names. I'm trying to run the code below, but I'm getting the error below. Does anyone see what the issue is?
Code:
set hive.exec.compress.output=false;
set hive.mapred.mode=nonstrict;
-- region to state mapping
DROP TABLE IF EXISTS StateRegion;
CREATE TEMPORARY TABLE StateRegion (Zip_Code int,
Place_Name string,
State string,
State_Abbreviate string,
County string,
Latitude float,
Longitude float,
ZIP_CD int,
District_NM string,
Region_NM string)
row format delimited fields terminated by '\t'
tblproperties("skip.header.line.count"="1");
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH 'StateRegion'
OVERWRITE INTO TABLE StateRegion;
--test Export
INSERT OVERWRITE LOCAL DIRECTORY './StateRegionTest/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from StateRegion;
Error:
FAILED: ParseException line 2:0 cannot recognize input near 'STORED' 'AS' 'TEXTFILE'

Hive: Partitioning by part of integer column

I want to create an external Hive table, partitioned by record type and date (year, month, day). One complication is that the date format I have in my data files is a single value integer yyyymmddhhmmss instead of the required date format yyyy-mm-dd hh:mm:ss.
Can I specify 3 new partition column based on just single data value? Something like the example below (which doesn't work)
create external table cdrs (
record_id int,
record_detail tinyint,
datetime_start int
)
partitioned by (record_type int, createyear=datetime_start(0,3) int, createmonth=datetime_start(4,5) int, createday=datetime_start(6,7) int)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");
If you want to be able to use MSCK REPAIR TABLE to add the partition for you based on the directories structure you should use the following convention:
The nesting of the directories should match the order of the partition columns.
A directory name should be {partition column name}={value}
If you intends to add the partitions manually then the structure has no meaning.
Any set values can be coupled with any directory. e.g. -
alter table cdrs
add if not exist partition (record_type='TYP123',createdate=date '2017-03-22')
location 'hdfs://nameservice1/tmp/sbx_unleashed.db/2017MAR22_OF_TYPE_123';
Assuming directory structure -
.../sbx_unleashed.db/record_type=.../createyear=.../createmonth=.../createday=.../
e.g.
.../sbx_unleashed.db/record_type=TYP123/createyear=2017/createmonth=03/createday=22/
create external table cdrs
(
record_id int
,record_detail tinyint
,datetime_start int
)
partitioned by (record_type int,createyear int, createmonth tinyint, createday tinyint)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;
Assuming directory structure -
.../sbx_unleashed.db/record_type=.../createdate=.../
e.g.
.../sbx_unleashed.db/record_type=TYP123/createdate=2017-03-22/
create external table cdrs
(
record_id int
,record_detail tinyint
,datetime_start int
)
partitioned by (record_type int,createdate date)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;

Update column in HIVE

I have a table in php which is in this format:
CREATE EXTERNAL TABLE IF NOT EXISTS {$tableName} (fileContent VARCHAR(250), description VARCHAR(250), dimension DOUBLE, fileName VARCHAR(250)) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/var/www/ASOIS_Proiect/metadata/'
I want for a situation to update only description field if fileName='a' and 'size'='12' already exist in database.
Any idea please? I tried to update the file create for insert with command LOAD and flag OVERWRITE but it is not working.