Hive is not handling integer values properly when loading text data into table - hive

I was loading some text data into Apache Hive containing int columns. It was storing null values at unexpected places. So, I ran some tests:
create table testdata (c1 INT, c2 FLOAT) row format delimited fields terminated by ',' stored as textfile;
load data local inpath "testdata.csv" overwrite into table testdata;
select * from testdata;
testdata.csv contains this data:
1,1.0
1, 1.0
1 ,1.0
1 , 1.0
As you can see, dataset contains some extra whitespace around numbers. But this is causing hive to store null values in integer columns, while float is being parsed correctly.
Select query output:
1 1.0
NULL 1.0
NULL 1.0
NULL 1.0
Why this is happening so, and how to correctly handle these cases?

You can not do it in one step.
First load the data as string in stg table and then load into final table from stg table by removing space.
Create and load table like below.
create table testdata (c1 string, c2 string) row format delimited fields terminated by ',' stored as textfile;
create table stgtestdata as select * from testdata;
load data local inpath "testdata.csv" overwrite into table stgtestdata;
Use insert to load into final table by trimming space and convert properly like below
Insert overwrite testdata
select
Cast(trim(c1) as int) as c1,
Cast(trim(c2) as float) as c2
from stgtestdata;

Related

What is the right way to handle type string null values in SQL's Bulk Insert?

For example, I have a column with type int.
The raw data source has integer values, but the null values, instead of being empty (''), is 'NIL'
How would I handle those values when trying to Bulk Insert into MSSQL?
My code is
create table test (nid INT);
bulk insert test from #FILEPATH with (format="CSV", firstrow=2);
the first 5 rows of my .csv file looks like
1
2
3
NIL
7
You can replace the nil with " (empty string) directly in your data source file or insert the data into a staging table and transform it:
BULK INSERT staging_sample_data
FROM '\\data\sample_data.dat';
INSERT INTO [sample_data]
SELECT NULLIF(ColA, 'nil'), NULLIF(ColB, 'nil'),...
Of course if your field is for example a numeric, the staging table should have a string field. Then, you can do as Larnu offers: 'TRY_CONVERT(INT, ColA)'.
*Note: if there are default constraints you may need to check how to keep nulls

Hive table data load gives NULL values

Select * from movierating gives NULL values as a Result.
I have tried below create table queries:
CREATE TABLE movierating(id INT, movieid INT, rating INT, time string);
CREATE TABLE movierating(id INT, movieid INT, rating INT, time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' stored as textfile;
Tried below load queries:
load data local inpath '/tmp/Movie-rating.txt' into table movierating;
load data local inpath '/tmp/Movie-rating.txt' OVERWRITE into table movierating;
data into 'Movie-rating.txt' file:(delimeter is tab)
1 123 3 881250949
2 125 4 881250123
For tab delimited data use '\t' as field delimiter:
CREATE TABLE movierating(id int,movieid int,rating int,time string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

How to handle the embedded commas in hive?

For example if I have a csv file with three cols,
sno,name,salary
1,latha, 2000
2,Bhavish, Chaturvedi, 3000
How to load this type of file in hive. I tried few of the posts from stackoverflow, but it didn't worked.
I have created a external table:
create external table test(
id int,
name string,
salary int
)
fields terminated by '\;'
stored as text file;
and loaded the data into it.
But when done select * from table, I got all null's into it.
I think CSV file has column name then you have to skip header to avoid the error follow the following steps:
Step 1: Create table e.g
CREATE TABLE salary (sno INT, name STRING, salary INT)
row format delimited fields terminated BY ',' stored as textfile
tblproperties("skip.header.line.count"="1");
Step 2: load the CSV file into table e.g
load data local inpath 'file path' into table salary;
Step 3: Test the records
select * from salary;

Getting Error 10293 while inserting a row to a hive table having array as one of the fileds

I have a hive table created using the following query:
create table arraytbl (id string, model string, cost int, colors array <string>,size array <float>)
row format delimited fields terminated by ',' collection items terminated by '#';
Now , while trying to insert a row:
insert into mobilephones values
("AA","AAA",5600,colors("red","blue","green"),size(5.6,4.3));
I get the following error:
FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
How can I resolve this issue?
The syantax to enter values in complex datatype if kinda bit weird, however this is my personal opinion.
You need a dummy table to insert values into hive table with complex datatype.
insert into arraytbl select "AA","AAA",5600, array("red","blue","green"), array(CAST(5.6 AS FLOAT),CAST(4.3 AS FLOAT)) from (select 'a') x;
And this is how it looks after insert.
hive> select * from arraytbl;
OK
AA AAA 5600 ["red","blue","green"] [5.6,4.3]

Queries using temp tables

SELECT 1
FROM geo_locationInfoMajor_tbl
WHERE geo_locationInfoM_taluka IN(SELECT * from #temp
I have created a temp table which gets its values from the front end.. using a function I insert values into the temp table...
Now the data in the temp table is mixed... it can be integer or varchar..
when I pass only int or varchar into the temp table it is fine.
but if the output is mixed the query throws an error.. how to deal with this?
Conversion failed when converting the varchar value 'English' to data type int.
this is fine-->
#temp
1
this is not-->
1
English
How many values do you have in you temp table?
if you want to use the IN clause you should only use one column that is identical with your geo_locationInfoMajor_tbl.
try this:
SELECT * FROM geo_locationInfoMajor_tbl
WHERE geo_locationInfoM_taluka IN (SELECT geo_locationInfoM_taluka from #temp)