I am trying to create an external table with tblproperties in Hive. The table gets created but it does not display the rows. Any ideas? Please find the scripts i am using below:
Thanks for your time and suggestions in advance.
Data is in a recursive folder: /user/test/test1/test2/samplefile.csv
use dw_raw;
drop table if exists temp_external_tab1;
create external table if not exists temp_external_tab1 (
col1 int,
col2 string,
col3 string,
col4 string
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile
location '/user/test/test1/'
tblproperties ("hive.input.dir.recursive" = "TRUE",
"hive.mapred.supports.subdirectories" = "TRUE",
"hive.supports.subdirectories" = "TRUE",
"mapred.input.dir.recursive" = "TRUE");
These are not table properties, but global settings.
You should set these using 'set', i.e.:
set hive.mapred.supports.subdirectories=true;
set mapred.input.dir.recursive=true;
You've created a table but haven't put any data into it. Try
hive> LOAD DATA LOCAL INPATH '/user/test/test1/test2/samplefile.csv'
INTO TABLE temp_external_tab1;
If you are using ambari the set the following properties to hive advanced config inside custom hive-site.xml.
SET hive.input.dir.recursive=TRUE
SET hive.mapred.supports.subdirectories=TRUE
SET hive.supports.subdirectories=TRUE
SET mapred.input.dir.recursive=TRUE
And then restart the affected services. This will read all the data recursively.
Related
I have a lambda function load data into an AWS GLUE table a few times a day, then at the end, it create an external table on snowflake. the function is running ok for some time but now it starts to return this error to me every now and then:
Number of auto-ingest pipes on location <bucket name> cannot be greater than allowed limit: 50000
the create table sql is like the following:
create or replace external table table_name(
row_id STRING as (value:row_id::string),
created TIMESTAMP as (value:created::timestamp)
...
) with location = #conformedstage/table_name/ file_format = (type = parquet);
I have googled this issue and almost all the answers are referring to sonwpipe, however, it doesn't use snowpipe at all.
any ideas are appreciated
When an external table is created with auto_refresh, Snowflake creates an internal pipe to process the events, and you have probably lots of external tables using the same bucket.
Can you try to set AUTO_REFRESH=false?
create or replace external table table_name(
row_id STRING as (value:row_id::string),
created TIMESTAMP as (value:created::timestamp)
...
) with location = #conformedstage/table_name/ file_format = (type = parquet) auto_refresh=false;
https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html
I have a Hive table that was originally created as transactional, but I want to disable transactions on the table because they are not actually needed.
I tried to disable them using ALTER TABLE, but I got an error:
hive> ALTER TABLE foo SET TBLPROPERTIES('transactional'='false');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. TBLPROPERTIES with 'transactional'='true' cannot be unset
I am using Hive 2.3.2
According to the documentation changing TBLPROPERTIES ("transactional"="false") is not allowed.
You can re-create the table.
Do table backup first:
create table bkp_table as
select * from your_table;
Then drop table and create again without transactional property. Reload data from backup.
Or make a new table, load data from old one, delete old, rename new.
You have to re-create the table.
First backup table if you want. then, DROP TABLE
Create Table with TBLPROPERTIES ( 'transactional'='false' )
CREATE TABLE your_table(
`col` string,
`col2` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES (
'transactional'='false'
)
You can Choose Input and Output format
Every day a PPE.txt file with clients data, separated by semicolon and always with the same layout is stored to a specific file directory.
Every day someone has to update a specific table from our database based in this PPE.txt.
I want to automate this process via a SQL script
What I thought would be a solution is to import the data via a script from this .txt file into a created table, then execute the update.
What I have so far is
IF EXISTS (SELECT 1 FROM Sysobjects WHERE name LIKE 'CX_PPEList_TMP%')
DROP TABLE CX_PPEList_TMP
GO
CREATE TABLE CX_PPEList_TMP
(
Type_Registy CHAR(1),
Number_Person INTEGER,
CPF_CNPJ VARCHAR(14),
Type_Person CHAR(1),
Name_Person VARCHAR(80),
Name_Agency VARCHAR(40),
Name_Office VARCHAR(40),
Number_Title_Related INTEGER,
Name_Title_Related VARCHAR(80)
)
UPDATE Table1
SET SN_Policaly_Exposed = 'Y'
WHERE Table1.CD_Personal_Number = CX_PPEList_TMP.CPF_CNPJ
AND Table1.SN_Policaly_Exposed = 'N'
UPDATE Table1
SET SN_Policaly_Exposed = 'N'
WHERE Table1.CD_Personal_Number NOT IN (SELECT CX_PPEList_TMP.CPF_CNPJ
FROM CX_PPEList_TMP)
AND Table1.SN_Policaly_Exposed = 'Y'
I know I haven't given much, but it is because I don't have much yet.
I want to populate the CX_PEPList_TMP temp table with the data from the PEP.txt file via a script so I could just execute this script to update my database. But I don't know any kind of command I can use neither have found in my research.
Thanks in advance!
Using OPENROWSET
You can read text files using OPENROWSET option (first you have to enable adhoc queries)
Using Microsoft Text Driver
SELECT * FROM OPENROWSET('MSDASQL',
'Driver={Microsoft Text Driver (*.txt; *.csv)};
DefaultDir=C:\Docs\csv\;',
'SELECT * FROM PPE.txt')
Using OLEDB provider
SELECT
*
FROM
OPENROWSET
('Microsoft.ACE.OLEDB.12.0','Text;Database=C:\Docs\csv\;IMEX=1;','SELECT *
FROM PPE.txt') t
Using BULK INSERT
You can import text file data to a staging table and update data from it:
BULK INSERT dbo.StagingTable
FROM 'C:\PPE.txt'
WITH
(
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n'
)
In your case,i recommend you to use an ETL like SSIS it's much better and easy to work with and you can also Schedule the package to execute in a specific time
I'm planning to truncate the hive external table which has one partition. So, I have used the following command to truncate the table :
hive> truncate table abc;
But, it is throwing me an error stating : Cannot truncate non-managed table abc.
Can anyone please suggest me out regarding the same ...
Make your table MANAGED first:
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='FALSE');
Then truncate:
truncate table abc;
And finally you can make it external again:
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');
By default, TRUNCATE TABLE is supported only on managed tables. Attempting to truncate an external table results in the following error:
Error: org.apache.spark.sql.AnalysisException: Operation not allowed: TRUNCATE TABLE on external tables
Action Required
Change applications. Do not attempt to run TRUNCATE TABLE on an external table.
Alternatively, change applications to alter a table property to set external.table.purge to true to allow truncation of an external table:
ALTER TABLE mytable SET TBLPROPERTIES ('external.table.purge'='true');
There is an even better solution to this, which is basically a one liner.
insert overwrite table table_xyz select * from table_xyz where 1=2;
This code will delete all the files and create a blank file in the external folder location with absolute zero records.
Look at https://issues.apache.org/jira/browse/HIVE-4367 : use
truncate table my_ext_table force;
I am attempting to create a table in Hive environment and point it to an external location in S3.
When I try :
create table x (key int, value string) location 's3/...'
it works well.
But, when I attempt :
'create external table as select x,y,z from alphabet location 's3/...'
it doesn't run. Is there a way to create a table as a select statement and store it at an external location?
You can create a managed table using the select statement and update the table property to External.
ALTER TABLE <table name> SET TBLPROPERTIES('EXTERNAL'='TRUE')
or
Write the output of the select query to a location
INSERT OVERWRITE DIRECTORY ‘/myDirectory’
SELECT * FROM PARAGRAPH;
CREATE EXTERNAL TABLE <table name> LOCATION '/myDirectory'