I am new to hadoop hive. I am using open source hadoop 2.7.1 hive 1.2.2. It is installed on ubuntu a single node cluster. I have 106 rows and 30 columns data in csv file. I import it into hive table using following code:
CREATE TABLE clinicaldatabc (comp_tcga_id String, gender String, age_inti_diag int, ER_status String, PR_status String, HER2_final_status String, Tumor String, Tumor_T1_code String, Node String, Node_coded String, Metastasis String, Metastasis_coded String, AJCC_Stage String, Converted_stage String, Survival_dt_from String, Vital_Status String, d_to_date_of_last_contact int, d_to_Day_of_Death int, OS_event int,OS_time int, PAM50_mRNA String, SigClust_unsupervised_mRNA int, SigClust_intrinsic_mRNA int, miRNA_clusters int, methylation_clusters int,RPPA_clusters int, CN_clusters int, integrated_clusters_with_PAM50 int, integrated_cluster_no_exp int, integrated_clusters_unsup_exp int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
Then I got null column name:
first half of returns
second half of returns
Please help me how to solve it. Thank you in advance!
Possible duplicate of NULL column names in Hive query result
The first thing to note here is that the NULL values occur in columns that are not of type string
Have a refer
CREATE EXTERNAL TABLE IF NOT EXISTS ejREGandTEST(
DBN STRING,
School_name STRING,
Year_of_SHST INT,
Grade_level INT,
Enrollment INT,
Number_of_registered INT,
Number_students_SHSAT INT)
row format delimited fields terminated by ','
location "/user/ebin/kaggleData/csv"
TBLPROPERTIES("skip.header.line.count"="1");
Related
So i am new to this.
I created a partition table and was trying to insert data into it
this is my main table >>>>
CREATE TABLE test1(
FIPS INT, Admin2 STRING, Province_State STRING, Country_Region STRING, Last_Update TIMESTAMP, Lat FLOAT, Long_ FLOAT, Confirmed INT, Deaths INT, Recovered INT, Active INT, Combined_Key STRING, Incident_Rate FLOAT, Case_Fatality_Ratio FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
partition table >>>
create table province_state_part(country_region string,
confirmed int, deaths int)
PARTITIONED BY(province_state string)
row format delimited fields terminated by ',' lines terminated by '\n'
inserting into partition table from main >>>>
INSERT into TABLE province_state_part PARTITION(province_state)
SELECT country_region, confirmed, deaths, province_state
FROM test1;
but i get this error
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
what is this and how do i solve it ?
I am using virtual box to run the cloudera v5.4.2-0.
I followed the instructions of an online course to create an empty table using the following HIVEQL:
create table sales_all_years (RowID smallint, OrderID int, OrderDate date, OrderMonthYear date, Quantity int, Quote float, DiscountPct float, Rate float, SaleAmount float, CustomerName string, CompanyName string, Sector string, Industry string, City string, ZipCode string, State string, Region string, ProjectCompleteDate date, DaystoComplete int, ProductKey string, ProductCategory string, ProductSubCategory string, Consultant string, Manager string, HourlyWage float, RowCount int, WageMargin float)
partitioned by (yr int) -- partitioning
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
stored as textfile;
I experimented a little and realized that partitioning the table is causing the error but I could not figure out the reason.
P.S. I am a little new to this. Sorry if the format is of my question is a little off
The problem might be with com.bizo.hive.serde.csv.CSVSerde. Try to create table with different SerDe, for instance
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "'",
"escapeChar" = "\\"
)
STORED AS TEXTFILE
or just textfile with the corresponding separator
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
I have an IBM cloud where I have Hive/Hbase, I just create a "table" on Hive and I also load some data from a csv file.
My csv file contains information from google play store apps.
My commands for creating and upload data to my table are the following ones:
hive> create table if not exists app_desc (name string,
category string, rating int,
reviews int, installs string,
type string, price int,
content string, genres string,
last_update string, current_ver string,
android_ver string)
row format delimited fields terminated by ',';
hive > load data local inpath '/home/uamibm130/googleplaystore.csv' into table app_desc;
Ok, It works correctly and using a Select I obtain the data correctly.
Now what I want to do is to create a HBASE table, my problem is that I don't know how to do it correctly.
First of all I create a Hbase Db -> create google_db_ , google_data, info_data
Now I try to create an external table using this hive command, but what I am getting is an error that my table is not found.
This is the command I am using for the creation of the external hive table.
create external table uamibm130_hbase_google (name string, category string, rating int, reviews int, installs string, type string, price int, content string, genres string, last_update string, current_ver string, android_ver string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
google_data:category,google_data:rating, info_data:reviews,
info_data:installs, info_data:type, info_data:price, info_data:content,
info_data:genres, info_data:last_update, info_data:current_ver,
info_data:android_ver") TBLPROPERTIES("hbase.table.name" = "google_db_");
I don't know the correct way for the creation of Hbase table based on an Hive schema, for uploading correctly my .csv data.
Any idea ? I am new on it.
Thanks!
Try with below create table statement in HBase,
Create Hbasetable:
hbase(main):001:0>create 'google_db_','google_data','info_data'
Create Hive External table on Hbase:
hive> create external table uamibm130_hbase_google (name string, category string, rating int, reviews int, installs string, type string, price int, content string, genres string, last_update string, current_ver string, android_ver string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
google_data:category,google_data:rating, info_data:reviews,
info_data:installs, info_data:type, info_data:price, info_data:content,
info_data:genres, info_data:last_update, info_data:current_ver,
info_data:android_ver") TBLPROPERTIES("hbase.table.name" = "google_db_",
"hbase.mapred.output.outputtable" = "google_db_");
Then insert data into Hive-Hbase table(uamibm130_hbase_google) from Hive table(app_desc).
Insert data into Hive-Hbase table:
Hive> insert into table uamibm130_hbase_google select * from app_desc;
I was creating table in hive with
create table stores(
storeid integer,
area string,
city string,
state string,
country string)
Try using INT in place of Integer.
I have copied and pasted your ddl statement above into my hive cli and it worked perfectly.
You need to check your hive installation
INTEGER can be used from Hive version 2.2.0 onwards. For earlier versions, INT should be used.
Try using this DDL:
create table stores(
storeid int,
area string,
city string,
state string,
country string);
References: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-NumericTypes
https://issues.apache.org/jira/browse/HIVE-14950
I have a very basic question which is: How can I add a very simple table to Hive. My table is saved in a text file (.txt) which is saved in HDFS. I have tried to create an external table in Hive which points out this file but when I run an SQL query (select * from table_name) I don't get any output.
Here is an example code:
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
LOCATION 'hdfs:///KibTEst/Data.txt';
KibTEst/Data.txt is the path of the text file in HDFS.
The rows in the table are seperated by carriage return, and the columns are seperated by commas.
Thanks for your help!
You just need to create an external table pointing to your file
location in hdfs and with delimiter properties as below:
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 'hdfs:///KibTEst/Data.txt';
You need to run select query(because file is already in HDFS and external table directly fetches data from it when location is specified in create statement). So you test using below select statement:
SELECT * FROM Data;
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
row format delimited
FIELDS TERMINATED BY ‘,’
stored as textfile
LOCATION 'Your hdfs location for external table';
If data in HDFS then use :
LOAD DATA INPATH 'hdfs_file_or_directory_path' INTO TABLE tablename
The use select * from table_name
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
row format delimited
FIELDS TERMINATED BY ','
stored as textfile
LOCATION '/Data';
Then load file into table
LOAD DATA INPATH '/KibTEst/Data.txt' INTO TABLE Data;
Then
select * from Data;
I hope, below inputs will try to answer the question asked by #mshabeen.
There are different ways that you can use to load data in Hive table that is created as external table.
While creating the Hive external table you can either use the LOCATION option and specify the HDFS, S3 (in case of AWS) or File location, from where you want to load data OR you can use LOAD DATA INPATH option to load data from HDFS, S3 or File after creating the Hive table.
Alternatively you can also use ALTER TABLE command to load data in the Hive partitions.
Below are some details
Using LOCATION - Used while creating the Hive table. In this case data is already loaded and available in Hive table.
**LOAD DATA INPATH** option - This Hive command can be used to load data from specified location. Point to remember here is, the data will get MOVED from input path to Hive warehouse path.
Example -
LOAD DATA INPATH 'hdfs://cluster-ip/path/to/data/location/'
Using ALTER TABLE command - Mostly this is used to add data from other locations into the Hive partitions. In this case it is required that all partitions are already defined and the values for the partitions are already known. In case of dynamic partitions this command is not required.
Example -
ALTER TABLE table_name ADD PARTITION (date_col='2018-02-21') LOCATION 'hdfs/path/to/location/'
The above code will map the partition to the specified data location (in this case HDFS). However, the data will NOT MOVED to Hive internal warehouse location.
Additional details are available here