How to rename a database in azure databricks? - apache-spark-sql

I am trying to rename a database in azure databricks but I am getting the following error:
no viable alternative at input 'ALTER DATABASE inventory
Below is code:
%sql
use inventory;
ALTER DATABASE inventory MODIFY NAME = new_inventory;
Please explain what is meant by this error "no viable alternative at input 'ALTER DATABASE inventory"
and how can I solve it

It's not possible to rename database on Databricks. If you go to the documentation, then you will see that you can only set DBPROPERTIES.
If you really need to rename database, then you have 2 choices:
if you have unmanaged tables (not created via saveAsTable, etc.), then you can produce SQL using SHOW CREATE TABLE, drop your database (be careful anyway), and recreate all tables from saved SQL
if you have managed tables, then the solution would be to create new database, and either use CLONE (only for Delta tables), or CREATE TABLE ... AS SELECT for other file types, and after that drop your database

Alex Ott's answer, to use Clone, is OK if you do not need to maintain the versioning history of your database when you rename it.
However if you wish to time travel on the database of Delta tables after the renaming, this solution works:
Create your new database, specifying its location
Move the file system from the old location to the new location
For each table on the old database, create a table on the new database, based on the location (my code relies on the standard file structure of {database name}/{table name} being observed). No need to specify schema as it's just taken from the files in place
Drop old database
You will then be left with a database with your new name, that has all of the data and all of the history of your old database, i.e. a renamed database of Delta tables.
Pyspark method (on databricks, with "spark" and "dbutils" already defined by default) :
def rename_db(original_db_name, original_db_location, new_db_name, new_db_location):
spark.sql(f"create database if not exists {new_db_name} location '{new_db_location}'")
dbutils.fs.mv(original_db_location,new_db_location,True)
for table in list(map(lambda x: x.tableName, spark.sql(f"SHOW TABLES FROM {original_db_name}").select("tableName").collect())):
spark.sql(f"create table {new_db_name}.{table} location '{new_db_location}/{table}'")
spark.sql(f"drop database {original_db_name} cascade")
return spark.sql(f"SHOW TABLES FROM {new_db_name}")

Related

Getting a Databricks drop schema error for delta table

I have a delta table schema that needs new columns/changed data types (Usually I do this on non delta tables and those work fine)
I have already dropped the existing delta table and tried dropping the schema and getting a 'v1 session catalog' error.
I am currently using SQL, 10.4 LTS cluster, spark3.2.1, scala 2.12 (I cant change these computes), driver and workers are standard E_v4
What I already did, and worked as usual
drop table if exists dbname.tablename;
What I wanted to do next:
drop schema if exists dbname.tablename;
The error I got instead:
Error in SQL statement: AnalysisException: Nested databases are not supported by v1 session catalog: dbname.tablename
When I try recreating the schema in the same location I get the error:
AnalysisException: The specified schema does not match the existing schema at dbfs:locationOfMy/table
... Differences
-Specified schema has additional fields newColNameIAdded, anotherNewColIAdded
-Specified type for myOldCol is different from existing schema ...
If your intention is to keep the existing schema, you can omit the
schema from the create table command. Otherwise please ensure that
the schema matches.
How can I do the schema drop and re-register it in same location and same name with new definitions?
Answering a month later since I didnt get replies and found the right solution;
Delta files have left over partitions and logs that cannot be updated using the drop commands. I had to manually delete the logs depending on where my location was.
Try this:
dbutils.fs.rm(path, True)
Use the path of your schema.
Then create your table again.

SQL Server : creating a table

When creating a table in SQL Server, it is created using dbo.<tableName> format.
I want to change dbo.tableName to source.tableName, as I want to import data into a source table and then cook that data.
Thanks
You are talking about schemas. If the schema source doesn't exist yet, you need to run create schema source. Once the schema exists it's as easy as create table source.tableName (...).

Copying data from External table to database

I'm having a data in a external table. Now I'm copying the data from external table to a newly created table in a database. What kind of table will be the table in the database? Is it a managed table or external table? I need your help to understand the concept behind this question
Thanks,
Madan Mohan S
The hive table get their type "Managed" or "External" at time of their creation, not when data is inserted.
So table employees is external (because it was created using "create External" in DDL and provided location of data file.
The emp is managed table because "external" was NOT used in DDL and also location of data was not needed.
The difference now is, if table employees dropped the data it was reading that was provided in "location" is not deleted. So external table is useful when data is being read by multiple tools i.e pig. If pig script is reading same location, it will still function even though employees table is dropped.
But emp is managed (in other word metadata and data both are managed by hive) so when emp is dropped the data also are deleted. So after dropping it if you check the hive warehouse directory you will no find "emp" hdfs directory anymore.

How do I dump an entire impala database

Is there a way to dump all the schema / data of an impala database so I can recreate in a new database instance?
Something akin to what mysqldump does?
Yes ,
you can take all data from impala warehouse ( usually /user/hive/warehouse)
use dictcp to copy from one cluster to other cluster in same location
Fire show create table to get schema of each table and just change location to destination location
Since there is no DUMP command (or something similar):
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_shell_commands.html
I think the best solution will be to use only external tables in one database.
That way, you can know where your data is saved, and potentially copy it in another place.
CREATE EXTERNAL TABLE table_name(one_field INT, another_field BIGINT,
another_field1 STRING)
COMMENT 'This is an external table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
STORED AS TEXTFILE
LOCATION '<my_hdfs_location>';

Copy (Import) Data from Oracle Database To Another

I want to Copy a data from One oracle database to another.
I have checked Import/Export Utility but the problem is import utility doesn't support conflicts resolution techniques between rows.
For Example if there's a table in the source database have the same row key in the destination database. if i use 'Ignore' parameter with value = y, the destination table will have a duplicate rows.
I want to ask if there's another way to import data from oracle database to another with some mechanism of detecting the conflicts and resolve them?
You might want to consider using a database link from database A to database B. You can query the data from database B to insert into your database A tables. You are free to query whatever you want using SQL or PL/SQL.
More on database links:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_5005.htm