How to create external table on parquet file

How to create external table on parquet file - hive

I have a parquet file on gcp storage. File converted from simple json {"id":1,"name":"John"}.
Could you help me write the correct script? Is it possible to do that without schema?
create external table test (
id string,
name string
)
row format delimited
fields terminated by '\;'
stored as ?????
location '??????'
tblproperties ('skip.header.line.count'='1');

Hive is , as sql databases, working in a write-in schema architecture so you cannot create a table using HQL without using a schema ( not like other cases for NoSql like Hbase for example or others). I advise you to use a Hive version >= 0.14, it is easier:
CREATE TABLE table_name (
string1 string,
string2 string,
int1 int,
boolean1 boolean,
long1 bigint,
float1 float,
double1 double,
inner_record1 struct,
enum1 string,
array1 array,
map1 map,
union1 uniontype,
fixed1 binary,
null1 void,
unionnullint int,
bytes1 binary)
PARTITIONED BY (ds string);

Related

impala CREATE EXTERNAL TABLE and remove double quotes

i got data on CSV for example :
"Female","44","0","0","Yes","Govt_job","Urban","103.59","32.7","formerly smoked"
i put it as hdfs with hdfs dfs put
and now i want to create external table from it on impala (not in hive)
there is an option without the double quotes ?
this is what i run by impala-shell:
CREATE EXTERNAL TABLE IF NOT EXISTS test_test.test1_ext
( `gender` STRING,`age` STRING,`hypertension` STRING,`heart_disease` STRING,`ever_married` STRING,`work_type` STRING,`Residence_type` STRING,`avg_glucose_level` STRING,`bmi` STRING,`smoking_status` STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION "/user/test/tmp/test1"
Update 28.11
i managed to do it by create the external and then create a VIEW as select with case when concat() each col.

Impala uses the Hive metastore so anything created in Hive is available from Impala after issuing an INVALIDATE METADATA dbname.tablename. HOWEVER, to remove the quotes you need to use the Hive Serde library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' and this is not accessible from Impala. My suggestion would be to do the following:
Create the external table in Hive
CREATE EXTERNAL TABLE IF NOT EXISTS test_test.test1_ext
( gender STRING, age STRING, hypertension STRING, heart_disease STRING, ever_married STRING, work_type STRING, Residence_type STRING, avg_glucose_level STRING, bmi STRING, smoking_status STRING )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES
(
"separatorChar" = ",",
"quoteChar" = """
)
STORED AS TEXTFILE
LOCATION "/user/test/tmp/test1"
Create a managed table in Hive using CTAS
CREATE TABLE mytable AS SELECT * FROM test_test.test1_ext;
Make it available in Impala
INVALIDATE METADATA db.mytable;

Hive and Hbase table for hipotesis

I have an IBM cloud where I have Hive/Hbase, I just create a "table" on Hive and I also load some data from a csv file.
My csv file contains information from google play store apps.
My commands for creating and upload data to my table are the following ones:
hive> create table if not exists app_desc (name string,
category string, rating int,
reviews int, installs string,
type string, price int,
content string, genres string,
last_update string, current_ver string,
android_ver string)
row format delimited fields terminated by ',';
hive > load data local inpath '/home/uamibm130/googleplaystore.csv' into table app_desc;
Ok, It works correctly and using a Select I obtain the data correctly.
Now what I want to do is to create a HBASE table, my problem is that I don't know how to do it correctly.
First of all I create a Hbase Db -> create google_db_ , google_data, info_data
Now I try to create an external table using this hive command, but what I am getting is an error that my table is not found.
This is the command I am using for the creation of the external hive table.
create external table uamibm130_hbase_google (name string, category string, rating int, reviews int, installs string, type string, price int, content string, genres string, last_update string, current_ver string, android_ver string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
google_data:category,google_data:rating, info_data:reviews,
info_data:installs, info_data:type, info_data:price, info_data:content,
info_data:genres, info_data:last_update, info_data:current_ver,
info_data:android_ver") TBLPROPERTIES("hbase.table.name" = "google_db_");
I don't know the correct way for the creation of Hbase table based on an Hive schema, for uploading correctly my .csv data.
Any idea ? I am new on it.
Thanks!

Try with below create table statement in HBase,
Create Hbasetable:
hbase(main):001:0>create 'google_db_','google_data','info_data'
Create Hive External table on Hbase:
hive> create external table uamibm130_hbase_google (name string, category string, rating int, reviews int, installs string, type string, price int, content string, genres string, last_update string, current_ver string, android_ver string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
google_data:category,google_data:rating, info_data:reviews,
info_data:installs, info_data:type, info_data:price, info_data:content,
info_data:genres, info_data:last_update, info_data:current_ver,
info_data:android_ver") TBLPROPERTIES("hbase.table.name" = "google_db_",
"hbase.mapred.output.outputtable" = "google_db_");
Then insert data into Hive-Hbase table(uamibm130_hbase_google) from Hive table(app_desc).
Insert data into Hive-Hbase table:
Hive> insert into table uamibm130_hbase_google select * from app_desc;

Can't query Hive external ORC format table?

My external table which is stored as orc has been created properly but i am not able to query on my table its returning
"Failed with exception java.io.IOException:java.lang.RuntimeException:
serious problem?"
My create command is
CREATE EXTERNAL TABLE 30534_events_orc ( eventId string ,event STRING, entityType STRING, entityId STRING, targetEntityType STRING, targetEntityId STRING, properties STRING, eventTime STRING, creationTime STRING )
STORED AS ORC
LOCATION '/30534_data'
TBLPROPERTIES ("orc.compress"="SNAPPY");

Hive Json SerDE for ORC or RC Format

IS It possible to use a JSON serde with RC or ORC file formats? I am trying to insert into a Hive table with file format ORC and store on azure blob in serialized JSON.

Apparently not
insert overwrite local directory '/home/cloudera/local/mytable'
stored as orc
select '{"mycol":123,"mystring","Hello"}'
;
create external table verify_data (rec string)
stored as orc
location 'file:////home/cloudera/local/mytable'
;
select * from verify_data
;
rec
{"mycol":123,"mystring","Hello"}
create external table mytable (myint int,mystring string)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as orc
location 'file:///home/cloudera/local/mytable'
;
myint mystring
Failed with exception java.io.IOException:java.lang.ClassCastException:
org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.Text
JsonSerDe.java:
...
import org.apache.hadoop.io.Text;
...
#Override
public Object deserialize(Writable blob) throws SerDeException {
Text t = (Text) blob;
...

You can do so using some sort of a conversion step, like a bucketing step which will produce ORC files in a target directory and mounting a hive table with same schema after bucketing. Like below.
CREATE EXTERNAL TABLE my_fact_orc
(
mycol STRING,
mystring INT
)
PARTITIONED BY (dt string)
CLUSTERED BY (some_id) INTO 64 BUCKETS
STORED AS ORC
LOCATION 's3://dev/my_fact_orc'
TBLPROPERTIES ('orc.compress'='SNAPPY');
ALTER TABLE my_fact_orc ADD IF NOT EXISTS PARTITION (dt='2017-09-07') LOCATION 's3://dev/my_fact_orc/dt=2017-09-07';
ALTER TABLE my_fact_orc PARTITION (dt='2017-09-07') SET FILEFORMAT ORC;
SELECT * FROM my_fact_orc WHERE dt='2017-09-07' LIMIT 5;

How to load data in partitioned table automatically

I created an external but partitioned table as below
CREATE EXTERNAL TABLE IF NOT EXISTS dividends ( ymd STRING, dividend
FLOAT ) PARTITIONED BY (exchange STRING, symbol STRING) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ',';
I want to load data in such a way that for each unique partition value, it automatically forms a new partition and data goes in that .Is there any way?
Sample data below
NASDAQ,AMTD,2006-01-25,6.0
NASDAQ,AHGP,2009-11-09,0.44
NASDAQ,AHGP,2009-08-10,0.428
NASDAQ,AHGP,2009-05-11,0.415
NASDAQ,AHGP,2009-02-10,0.403
NASDAQ,AHGP,2008-11-07,0.39
NASDAQ,AHGP,2008-08-08,0.353
NASDAQ,AHGP,2008-05-09,0.288
NASDAQ,AHGP,2008-02-08,0.288
NASDAQ,AHGP,2007-11-07,0.265
NASDAQ,AHGP,2007-08-08,0.265
NASDAQ,AHGP,2007-05-09,0.25
NASDAQ,AHGP,2007-02-07,0.25
NASDAQ,AHGP,2006-11-07,0.215
NASDAQ,AHGP,2006-08-09,0.215
NASDAQ,ALEX,2009-11-03,0.315
NASDAQ,ALEX,2009-08-04,0.315
NASDAQ,ALEX,2009-05-12,0.315
NASDAQ,ALEX,2009-02-11,0.315
NASDAQ,ALEX,2008-11-04,0.315
NASDAQ,AFCE,2005-06-06,12.0
NASDAQ,ASRVP,2009-12-28,0.528
NASDAQ,ASRVP,2009-09-25,0.528
NASDAQ,ASRVP,2009-06-25,0.528
NASDAQ,ASRVP,2009-03-26,0.528
NASDAQ,ASRVP,2008-12-26,0.528
NASDAQ,ASRVP,2008-09-25,0.528
NASDAQ,ASRVP,2008-06-25,0.528

I was searching for this. These were my steps, created a Staging table and loaded the csv file and then created and loaded the table using dynamic partition.
CREATE EXTERNAL TABLE stocks ( exchange STRING,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT)
LOCATION '/user/hduser/stocks';
CREATE EXTERNAL TABLE IF NOT EXISTS dividends_stage (
exchange STRING,
symbol STRING,
ymd STRING,
dividend FLOAT )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/hduser/div_stage';
hadoop fs -mv /user/hduser/dividends.csv /user/hduser/div_stage
CREATE EXTERNAL TABLE IF NOT EXISTS dividends (
ymd STRING,
dividend FLOAT )
PARTITIONED BY (exchange STRING, symbol STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
INSERT OVERWRITE TABLE dividends partition (exchange , symbol)
SELECT ymd,dividend, exchange, symbol from dividends_stage;
SELECT INPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE from dividends ;
Hope this helps and not too late..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to create external table on parquet file - hive

Related

impala CREATE EXTERNAL TABLE and remove double quotes

Hive and Hbase table for hipotesis

Can't query Hive external ORC format table?

Hive Json SerDE for ORC or RC Format

How to load data in partitioned table automatically

Categories

Resources