Can we create external table on top of adls2 from data bricks using interactive cluster? - azure-data-lake

I'm trying to create external table on top of adls2 from azure data bricks and in location I gave "abfss://......". This is not working and throwing the below error
Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.xxxxx.org.apache.hadoop.fs.azurebfs.contracts.exceptions.ConfigurationPropertyNotFoundException Configuration property xxxxxx.dfs.core.windows.net not found.);;
If I'm giving mount point path in the location it is working as expected. Is there any other way we can create the table without the mount point?

After some research I figured out that we can create an external table by giving full path in the options.
Example of creating parquet table
CREATE TABLE TABLE_NAME
(column1 int,
column2 int
)
USING PARQUET OPTIONS (PATH "abfss://container#storageAccount/.....")

Related

Native Table creation on top of GCS failed in Bigquery

I tried to create a Big query native table on top of GCS bucket. The native table creation works, when the table is created from the UI, But when I tried to run DDL for creating a native table on top of GCS it Failed
Below is the query used and error produced:
create table sample_ds_001.native
options
(
format='csv'
,uris=["gs://test-bucket-003/File_Folder/sample.txt"]
)
Error:
Unknown option: format at [4:3]
You can create an external table for your requirement using the csv file stored in the gcs bucket, below query can be considered:
Create EXTERNAL table `project-id.dataset.table`
options (format='csv' ,uris=["gs://file-location.csv"])
CSV Data:
A,B,C,D,E,F,G
47803,000000000020030263,629,785,2021-01-12 23:26:37,986,2022-01-12 23:26:37
Output:

Unable to delete a DataBricks table: Container xxx in account yyy.blob.core.windows.net not found

I have a series of parquet files in different folders on an Azure Storage Account Container.
I can expose them all as SQL tables with command like:
create table table_name
using parquet
location 'wasbs://mycontainer#mystorageaccount.blob.core.windows.net/folder_or_parquet_files'
And all is fine. However I want to drop them all and they all drop except one, which gives me:
Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:Got exception: shaded.databricks.org.apache.hadoop.fs.azure.AzureException
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container mycontainer in account
mystorageaccount.blob.core.windows.net not found,and we can't create it using
anoynomous credentials, and no credentials found for them in the configuration.)
Obviously mystorageaccount and mycontainer are in to replace my real values, and creating / droping other folders of parquet files in that container / storage account works fine.
It's just one table seems a little messed up.
How can I get rid of this broken table, please?

Error while creating a table from another one in Apache Spark

I'm creating the table the following way:
spark.sql("CREATE TABLE IF NOT EXISTS table USING DELTA AS SELECT * FROM origin")
But i get this error:
Exception in thread "main" org.apache.spark.SparkException: Table implementation does not support writes: table
You get this SparkException error because this way to create a delta-lake table using a SQL request as input data is not implemented.
If you want to create a delta-lake table and insert data in the same time, you should use DataFrameWriter API, as explained in databricks documentation:
spark.sql("SELECT * FROM origin")
.write
.format("delta")
.save("/path/to/where/you/want/to/save/your/data")

U-SQL External table error: 'Unable to cast object of type 'System.DBNull' to type 'System.Type'.'

I'm failing to create external tables to two specific tables from Azure SQL DB,
I already created few external tables with no issues.
The only difference I can see between the failed and the successful external tables is that the tables that failed contains geography type columns, so I think this is the issue but i'm not sure.
CREATE EXTERNAL TABLE IF NOT EXISTS [Data].[Devices]
(
[Id] int
)
FROM SqlDbSource LOCATION "[Data].[Devices]";
Failed to connect to data source: 'SqlDbSource', with error(s): 'Unable to cast object of type 'System.DBNull' to type 'System.Type'.'
I solved it by doing a workaround to the external table:
I created a view that select from external rowset using EXECUTE
CREATE VIEW IF NOT EXISTS [Data].[Devices]
AS
SELECT Id FROM EXTERNAL SqlDbSource
EXECUTE "SELECT Id FROM [Data].[Devices]";
This made the script to completely ignore the geography type column, which is currently not supported as REMOTEABLE_TYPE for data sources by U-SQL.
Please have a look at my answer on the other thread opened by you. To add to that, I would also recommend you to have a look at how to create a table using a query. In the query, you should be able to use "extractors" in the query to create the tables. To read more about extractors, please have a look at this doc.
Hope this helps.

Getting exception while updating table in Hive

I have created one table in hive from existing s3 file as follows:
create table reconTable (
entryid string,
run_date string
)
LOCATION 's3://abhishek_data/dump1';
Now I would like to update one entry as follows:
update reconTable set entryid='7.24E-13' where entryid='7.24E-14';
But I am getting following error:
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
I have gone through a few posts here, but not getting any idea how to fix this.
I think you should create an external table when reading data from a source like S3.
Also, you should declare the table in ORC format and set properties 'transactional'='true'.
Please refer to this for more info: attempt-to-do-update-or-delete-using-transaction-manager
You can refer to this Cloudera Community Thread:
https://community.cloudera.com/t5/Support-Questions/Hive-update-delete-and-insert-ERROR-in-cdh-5-4-2/td-p/29485