SAS - Hive Data step delete issue - hive

When we drop a managed table , Hive deletes the data in the table is my understanding. By default, SAS data step option DBCREATE_EXTERNAL is set to NO which means SAS data step using hive libraries like below creates a “managed table”.
When using proc sql - drop table / proc delete / proc dataset - delete, the hive metadata is deleted i.e. table structure is dropped from the schema but the underlying HDFS file is not. While running the same data step again (after deletion), i,e. creating the same table in the schema - the number of records ingested is incorrect.
Steps
Create a hive table using SAS data step and note the no of rows.
Drop the table using proc sql delete / proc dataset delete.
Run the create table step again.
Count the number of rows.
Source : http://support.sas.com/documentation/cdl/en/acreldb/69580/HTML/default/viewer.htm#n12r2tbfrrrsgdn1fa4ufw8vb79f.htm
Thanks.

The issue occurs because of an alternate syntax requirement for the DROP TABLE procedure when the table data resides in an HDFS encryption zone.
Hotfix : http://support.sas.com/kb/58/727.html

Related

Rename a table in Amazon Redshift

I've been trying to rename a table from "fund performance" to fund_performance in SQLWorkbench for a Redshift database. Commands I have tried are:
alter table schemaname."fund performance"
rename to fund_performance;
I received a message that the command executed successfully, and yet the table name did not change.
I then tried copying the table to rename it that way. I used
#CREATE TABLE fund_performance LIKE "schema_name.fund performance";
CREATE TABLE fund_performance AS SELECT * FROM schema_name."fund performance";
In both these cases I also received a message that the statements executed successfully, but nothing changed. Does anyone have any ideas?
Use following it may work out for you
SELECT * into schema_name.fund_performance FROM schema_name.[fund performance]
It will copy the data by creating new table as fund_performance but it won't create any constraints and Identity's
To Rename specific table without disturbing existing constraints
EXEC sp_rename 'schema_name.[fund performance]', 'schema_name.fund_performance';

What is the difference between a table variable and a local temporary table in OpenGauss?

What is the difference between a table variable and a local temporary table in OpenGauss?
A local temporary table is automatically dropped at the end of the current session. Therefore, you can create and use temporary tables in the current session as long as the connected database node in the session is normal. Temporary tables are created only in the current session. If a DDL statement involves operations on temporary tables, a DDL error will be generated. Therefore, you are not advised to perform operations on temporary tables in DDL statements.
You can refer to this link for more detailed information.
https://opengauss.org/en/docs/latest/docs/Developerguide/create-table.html

Databricks - is not empty but it's not a Delta table

I run a query on Databricks:
DROP TABLE IF EXISTS dublicates_hotels;
CREATE TABLE IF NOT EXISTS dublicates_hotels
...
I'm trying to understand why I receive the following error:
Error in SQL statement: AnalysisException: Cannot create table ('default.dublicates_hotels'). The associated location ('dbfs:/user/hive/warehouse/dublicates_hotels') is not empty but it's not a Delta table
I already found a way how to solve it (by removing it manually):
dbutils.fs.rm('.../dublicates_hotels',recurse=True)
But I can't understand why it's still keeping the table?
Even though that I created a new cluster (terminated the previous one) and I'm running this query with a new cluster attached.
Anyone can help me to understand that?
I also faced a similar problem, then tried the command line CREATE OR REPLACE TABLE and it solved my problem.
DROP TABLE & CREATE TABLE work with entries in the Metastore that is some kind of database that keeps the metadata about databases and tables. There could be the situation when entries in metastore don't exist so DROP TABLE IF EXISTS doesn't do anything. But when CREATE TABLE is executed, then it additionally check for location on DBFS, and fails if directory exists (maybe with data). This directory could be left from some previous experiments, when data were written without using the metastore.
if the table created with LOCATION specified - this means the table is EXTERNAL, so when you drop it - you drop only hive metadata for that table, directory contents remains as it is. You can restore the table by CREATE TABLE if you specify the same LOCATION (Delta keeps table structure along with it's data in the directory).
if LOCATION wasn't specified while table creation - it's a MANAGED table, DROP will destroy metadata and directory contents

Setting transactional-table properties results in external table

I am creating a managed table via Impala as follows:
CREATE TABLE IF NOT EXISTS table_name
STORED AS parquet
TBLPROPERTIES ('transactional'='false', 'insert_only'='false')
AS ...
This should result in a managed table which does not support HIVE-ACID.
However, when I run the command I still end up with an external table.
Why is this?
I found out in the Cloudera documentation that neglecting the EXTERNAL-keyword when creating the table does not mean that the table definetly will be managed:
When you use EXTERNAL keyword in the CREATE TABLE statement, HMS stores the table as an external table. When you omit the EXTERNAL keyword and create a managed table, or ingest a managed table, HMS might translate the table into an external table or the table creation can fail, depending on the table properties.
Thus, setting transactional=false and insert_only=false leads to an External Table in the interpretation of the Hive Metastore.
Interestingly, only setting TBLPROPERTIES ('transactional'='false') is completly ignored and will still result in a managed table having transactional=true).

External Table data not getting Purged in Hive

I created 2 external tables Hive. In first table specified data location with create statement. In second table loaded data after creating it.
I can see data file created for second table in /hive/warehouse/ directory. Then I set "external.table.purge"="true" for both tables. And DROP both tables. But data files of both tables remains as is.
What is the behaviour of 'external.table.purge'='true'. Shouldn't it delete data files as well on issuing Drop command?
If Hive does not take any ownership over data files of external table, why is there even an option as 'external.table.purge'='true'.
I read in one of the threads, where someone mentioned it is possible to delete data as well for external tables by ALTER TABLE ... SET TBLPROPERTIES('external.table.purge'='true'), but unable to find that post again.
You can not drop the data in external table but you can do it for internal(managed) tables. So convert the table to internal and then drop it.
First change eternal property to false.
hive> ALTER TABLE nyse_external SET TBLPROPERTIES('EXTERNAL'='False');
and then you can easily drop it.
hive> drop table nyse_external;
TBLPROPERTIES ("external.table.purge"="true") should work for hive version 4.x+.
Answer to point 1:
Table property "external.table.purge", which if true (and if the table is an external table), will let Hive know to delete the table data when the table is dropped. This feature is introduced in this apache jira.
https://issues.apache.org/jira/browse/HIVE-19981 .
For reference on how to set the property take a look at this example,
https://docs.cloudera.com/runtime/7.2.7/using-hiveql/topics/hive_drop_external_table_data.html