Problem Statement : To Load parquet data from aws s3 to snowflake table.
Command which I am using :
COPY INTO schema.test_table from
(select $1:ID::INTEGER, $1:DATE::TIMESTAMP, $1:TYPE::VARCHAR FROM
#s3_external_stage/folder/part-00000-c000.snappy.parquet)
file_format = (type=parquet);```
In result , I am getting null values
I queried parquet data with s3, it has values in it.
Not sure where I am missing.
Also, is there any way we can get data from parquet files into tables recursively
for ex :
s3_folder /
|
----fileabc.parquet
-----file_xyz.parquet
Related
I do not understand what paths Trino needs in order to create table from existing files. I use S3 + Hive metastore.
My JSON file:
{"a":1,"b":2,"snapshot":"partitionA"}
Create table command:
create table trino.partitioned_jsons (a INTEGER, b INTEGER, snapshot varchar) with (external_location = 's3a://bucket/test/partitioned_jsons/*', format='JSON', partitioned_by = ARRAY['snapshot']
What I have tried:
Store JSON file in s3://bucket/test/partitioned_jsons/partitionA/file.json
Store JSON file in s3://bucket/test/partitioned_jsons/snapshot=partitionA/file.json
Store JSON file in s3://bucket/test/partitioned_jsons/snapshot/partitionA.json
But all returns just an empty table.
I needed to create huge test data in hive table. I tried following commands but it only inserts one partition data at a time.
connect to beeline:
beeline --force=true -u 'jdbc:hive2://<host>:<port>/<hive database name>;ssl=true;user=<username>;password=<pw>'
create partitioned table :
CREATE TABLE p101(
Name string,
Age string)
PARTITIONED BY(fi string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
I have created ins.csv file with data and copy it to hdfs location, its data is as follows.
Name,Age
aaa,33
bbb,22
ccc,55
then I tried to load same file for multiple partition ids with following command
LOAD DATA INPATH 'hdfs_path/ins.csv' INTO TABLE p101 PARTITION(fi=1,fi=2,fi=3,fi=4,fi=5);
but it loads record only for partitionID=5.
You can only specify one partition for each insert into.
What you can do in order to have different partitions is add it into your csv file like this:
Name,Age,fi
aaa,33,1
bbb,22,2
ccc,55,3
Hive will automatically know that this is the partition.
LOAD DATA INPATH 'hdfs_path/ins.csv' INTO TABLE tmp.p101;
I have created Hive Managed Table with ORC and PARQUET format. While getting the values from table with "Select * from table_name" I am getting below error.
java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0)"
Check the DDL of the table. Table seems to be a bucketed table. However, the underlying folders/files are of different bucket sizes compared to the table definition.
Overwrite queries are failing on the external tables whose data is located on s3.I am using Hive 1.2
Steps to reproduce:
1)create a file with below 3 rows and place it at some location in s3
a,b,c
x,y,z
c,d,e
2)create external table:
create external table test(col1 string,col2 string,col3 string)
row format delimited fields terminated by ',' location '<S3LocationOfAboveFile>'
3)Do insert overwrite on this table:
insert overwrite table test select * from test order by col1;
I get an error and I see that the s3 file is deleted.
Job Submission failed with exception 'java.io.FileNotFoundException
(No such file or directory:<S3 location> )
I'm trying to generate some parquet files with hive,to accomplish this i loaded a regular hive table from some .tbl files, throuh this command in hive:
CREATE TABLE REGION (
R_REGIONKEY BIGINT,
R_NAME STRING,
R_COMMENT STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
location '/tmp/tpch-generate';
After this i just execute this 2 lines:
create table parquet_reion LIKE region STORED AS PARQUET;
insert into parquet_region select * from region;
But when i check the output generated in HDFS, i dont find any .parquet file, intead i find files names like 0000_0 to 0000_21, and the sum of their sizes are much bigger that the original tbl file.
What im i doing Wrong?
Insert statement doesn't create file with extension but these are the parquet files.
You can use DESCRIBE FORMATTED <table> to show table information.
hive> DESCRIBE FORMATTED <table_name>
Additional Note: You can also create new table from source table using below query:
CREATE TABLE new_test row STORED AS PARQUET AS select * from source_table
It will create new table as parquet format and copies the structure as well as the data.