How to create table like avro? - hive

I created a table (test_load) based on the schema of another one (test). Then i inserted test_load into another table.
drop table if exists test_load;
create external table test_load
like test
location /test_load_folder;
insert into warehouse
select * from test_load;
It works fine when everything is in parquet.
I then evolved my test schema to avro and recreated my test_load table but when i try to insert into warehouse i receive an error :
Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I'm looking for the good syntax to re-create the load table and scpecify its avro. My hypothesis is that hive still considers its parquet files.
I tried
drop table if exists test_load;
create external table test_load
like test
location /test_load_folder
stored as avro;
but i have a syntax error.

Related

Rename a table in Amazon Redshift

I've been trying to rename a table from "fund performance" to fund_performance in SQLWorkbench for a Redshift database. Commands I have tried are:
alter table schemaname."fund performance"
rename to fund_performance;
I received a message that the command executed successfully, and yet the table name did not change.
I then tried copying the table to rename it that way. I used
#CREATE TABLE fund_performance LIKE "schema_name.fund performance";
CREATE TABLE fund_performance AS SELECT * FROM schema_name."fund performance";
In both these cases I also received a message that the statements executed successfully, but nothing changed. Does anyone have any ideas?
Use following it may work out for you
SELECT * into schema_name.fund_performance FROM schema_name.[fund performance]
It will copy the data by creating new table as fund_performance but it won't create any constraints and Identity's
To Rename specific table without disturbing existing constraints
EXEC sp_rename 'schema_name.[fund performance]', 'schema_name.fund_performance';

Databricks - is not empty but it's not a Delta table

I run a query on Databricks:
DROP TABLE IF EXISTS dublicates_hotels;
CREATE TABLE IF NOT EXISTS dublicates_hotels
...
I'm trying to understand why I receive the following error:
Error in SQL statement: AnalysisException: Cannot create table ('default.dublicates_hotels'). The associated location ('dbfs:/user/hive/warehouse/dublicates_hotels') is not empty but it's not a Delta table
I already found a way how to solve it (by removing it manually):
dbutils.fs.rm('.../dublicates_hotels',recurse=True)
But I can't understand why it's still keeping the table?
Even though that I created a new cluster (terminated the previous one) and I'm running this query with a new cluster attached.
Anyone can help me to understand that?
I also faced a similar problem, then tried the command line CREATE OR REPLACE TABLE and it solved my problem.
DROP TABLE & CREATE TABLE work with entries in the Metastore that is some kind of database that keeps the metadata about databases and tables. There could be the situation when entries in metastore don't exist so DROP TABLE IF EXISTS doesn't do anything. But when CREATE TABLE is executed, then it additionally check for location on DBFS, and fails if directory exists (maybe with data). This directory could be left from some previous experiments, when data were written without using the metastore.
if the table created with LOCATION specified - this means the table is EXTERNAL, so when you drop it - you drop only hive metadata for that table, directory contents remains as it is. You can restore the table by CREATE TABLE if you specify the same LOCATION (Delta keeps table structure along with it's data in the directory).
if LOCATION wasn't specified while table creation - it's a MANAGED table, DROP will destroy metadata and directory contents

Setting transactional-table properties results in external table

I am creating a managed table via Impala as follows:
CREATE TABLE IF NOT EXISTS table_name
STORED AS parquet
TBLPROPERTIES ('transactional'='false', 'insert_only'='false')
AS ...
This should result in a managed table which does not support HIVE-ACID.
However, when I run the command I still end up with an external table.
Why is this?
I found out in the Cloudera documentation that neglecting the EXTERNAL-keyword when creating the table does not mean that the table definetly will be managed:
When you use EXTERNAL keyword in the CREATE TABLE statement, HMS stores the table as an external table. When you omit the EXTERNAL keyword and create a managed table, or ingest a managed table, HMS might translate the table into an external table or the table creation can fail, depending on the table properties.
Thus, setting transactional=false and insert_only=false leads to an External Table in the interpretation of the Hive Metastore.
Interestingly, only setting TBLPROPERTIES ('transactional'='false') is completly ignored and will still result in a managed table having transactional=true).

How do I make CREATE DATABASE command re-runnable in SQL?

I'd like to create a script that simply drops and creates a database over and over in PostgreSQL.
For a table this is not problem with the following:
DROP TABLE IF EXISTS test CASCADE;
CREATE TABLE IF NOT EXISTS test ( Sample varchar );
The above code works, no problem.
However, when I try to do the same for a database, ie:
DROP DATABASE IF EXISTS sample;
CREATE DATABASE sample;
I get the following error:
ERROR: DROP DATABASE cannot run inside a transaction block
SQL state: 25001
Any idea how I can get the database to be created and dropped repetitively without doing it manually?

Insert If Table Exists in Databricks or Spark SQL

I am writing in a Databricks notebook where I need to say something like:
%sql
IF EXISTS myTable INSERT OVERWRITE TABLE myTable SELECT * FROM somethingElse
I know that the INSERT OVERWRITE clause works, but I'm not sure how to get the IF EXISTS to work without breaking out of pure SQL code and using python (which would make the script messier).
Unfortunately, there is no DDL named "IF EXISTS" supported in Databricks.
You have to use command called "Drop Table":
Drop a table and delete the directory associated with the table from the file system if this is not an EXTERNAL table. If the table to drop does not exist, an exception is thrown.
IF EXISTS
If the table does not exist, nothing happens.
DROP TABLE [IF EXISTS] [db_name.]table_name
Example:
DROP TABLE IF EXISTS diamonds;
CREATE TABLE diamonds USING CSV OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true")
Reference: SQL Language Manual This is a complete list of Data Definition Language (DDL) and Data Manipulation Language (DML) constructs supported in Databricks.
Hope this helps.