How we can create a hive table using ozone object store.
In order to create a hive table, first we need to create a volume and bucket in ozone.
Step1: Create the volume with the name vol1 in Apache Ozone.
# ozone sh volume create /vol1
Step2: Create the bucket with the name bucket1 under vol1.
# ozone sh bucket create /vol1/bucket1
Step3: Login to beeline shell
Step4: Create the hive database.
CREATE DATABASE IF NOT EXISTS ozone_db;
USE ozone_db;
Step5: Create the hive table.
CREATE EXTERNAL TABLE IF NOT EXISTS `employee`(
`id` bigint,
`name` string,
`age` smallint)
STORED AS parquet
LOCATION 'o3fs://bucket1.vol1.om.host.example.com/employee';
Note: Update the om.host.example.com value.
Reference:
https://community.cloudera.com/t5/Community-Articles/Spark-Hive-Ozone-Integration-in-CDP/ta-p/323346
Related
I have a lambda function load data into an AWS GLUE table a few times a day, then at the end, it create an external table on snowflake. the function is running ok for some time but now it starts to return this error to me every now and then:
Number of auto-ingest pipes on location <bucket name> cannot be greater than allowed limit: 50000
the create table sql is like the following:
create or replace external table table_name(
row_id STRING as (value:row_id::string),
created TIMESTAMP as (value:created::timestamp)
...
) with location = #conformedstage/table_name/ file_format = (type = parquet);
I have googled this issue and almost all the answers are referring to sonwpipe, however, it doesn't use snowpipe at all.
any ideas are appreciated
When an external table is created with auto_refresh, Snowflake creates an internal pipe to process the events, and you have probably lots of external tables using the same bucket.
Can you try to set AUTO_REFRESH=false?
create or replace external table table_name(
row_id STRING as (value:row_id::string),
created TIMESTAMP as (value:created::timestamp)
...
) with location = #conformedstage/table_name/ file_format = (type = parquet) auto_refresh=false;
https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html
We have a scenario where the source csv files are isolated by Customer i.e., each Customer will have a Container in the Azure Storage.
When creating External Table in SQL Synapse, is it possible to pass the Container name as parameter that way there are not multiple External Data Tables in SQL Synapse DB?
CREATE EXTERNAL DATA SOURCE AzureBlobStorage with (
TYPE = HADOOP,
LOCATION ='wasbs://<**container100**>#<accountname>.blob.core.windows.net',
CREDENTIAL = AzureStorageCredential
);
CREATE EXTERNAL TABLE [dbo].[res1_Data] (
[ID] INT,
[UniqueId] VARCHAR(50),
[Status] VARCHAR(50) NULL,
[JoinedDate] DATE
)
WITH (LOCATION='<**container2**>/<folder>/<file>.csv',
DATA_SOURCE = AzureBlobStorage,
FILE_FORMAT = CEFormat
);
Unfortunately you can't use variables within DDL commands. However, you can build dynamic statements and then execute with sp_executesql to do this.
More information here.
Need to rename partition and set the new location for an external table all in one statement.
alter table A partition(part='A') rename to partition (part='B');
alter table A partition(part='B') set location 's3://...';
I am trying to see if this can be combined in to single statement
alter table A partition(part='A') rename to partition(part='B') set location 's3://'..'
I have a hive table in XYZ db named ABC.
When I run describe formatted XYZ.ABC; from hue, I get the following..
that is
Table Type: MANAGED_TABLE
Table Parameters: EXTERNAL True
So is this actually an external or a managed/internal hive table?
This is treated as an EXTERNAL table. Dropping table will keep the underlying HDFS data. The table type is being shown as MANAGED_TABLE since the parameter EXTERNAL is set to True, instead of TRUE.
To fix this metadata, you can run this query:
hive> ALTER TABLE XYZ.ABC SET TBLPROPERTIES('EXTERNAL'='TRUE');
Some details:
The table XYZ.ABC must have been created via this kind of query:
hive> CREATE TABLE XYZ.ABC
<additional table definition details>
TBLPROPERTIES (
'EXTERNAL'='True');
Describing this table will give:
hive> desc formatted XYZ.ABC;
:
Location: hdfs://<location_of_data>
Table Type: MANAGED_TABLE
:
Table Parameters:
EXTERNAL True
Dropping this table will keep the data referenced in Location in describe output.
hive> drop table XYZ.ABC;
# does not drop table data in HDFS
The Table Type still shows as MANAGED_TABLE which is confusing.
Making the value for EXTERNAL as TRUE will fix this.
hive> ALTER TABLE XYZ.ABC SET TBLPROPERTIES('EXTERNAL'='TRUE');
Now, doing a describe will show it as expected:
hive> desc formatted XYZ.ABC;
:
Location: hdfs://<location_of_data>
Table Type: EXTERNAL_TABLE
:
Table Parameters:
EXTERNAL TRUE
Example -
Lets create a sample MANAGED table,
CREATE TABLE TEST_TBL(abc int, xyz string);
INSERT INTO TABLE test_tbl values(1, 'abc'),(2, 'xyz');
DESCRIBE FORMATTED test_tbl;
Changing type to EXTERNAL (in the wrong way using True, instead of TRUE):
ALTER TABLE test_tbl SET TBLPROPERTIES('EXTERNAL'='True');
This gives,
Now lets DROP the table,
DROP TABLE test_tbl;
The result:
Table is dropped but data on HDFS isn't. Showing correct external table behavior!
If we re-create the table we can see data exists:
CREATE TABLE test_tbl(abc int, xyz string);
SELECT * FROM test_tbl;
Result:
The describe shows it wrongly as MANAGED TABLE along with EXTERNAL True because of:
.equals check in the meta
Hive Issue JIRA: HIVE-20057
Proposed fix: Use case insensitive equals
I am attempting to create a table in Hive environment and point it to an external location in S3.
When I try :
create table x (key int, value string) location 's3/...'
it works well.
But, when I attempt :
'create external table as select x,y,z from alphabet location 's3/...'
it doesn't run. Is there a way to create a table as a select statement and store it at an external location?
You can create a managed table using the select statement and update the table property to External.
ALTER TABLE <table name> SET TBLPROPERTIES('EXTERNAL'='TRUE')
or
Write the output of the select query to a location
INSERT OVERWRITE DIRECTORY ‘/myDirectory’
SELECT * FROM PARAGRAPH;
CREATE EXTERNAL TABLE <table name> LOCATION '/myDirectory'