hive External table maps to Azure Storage Authorization issue - hive

I am creating External table on hive which is mapped to Azure Blob storage
CREATE EXTERNAL TABLE test(id bigint, name string, dob timestamp,
salary decimal(14,4), line_number bigint) STORED AS PARQUET LOCATION
'wasb://(container)#(Stroage_Account).blob.core.windows.net/test'
getting below exception
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got
exception: org.apache.hadoop.fs.azure.AzureException
com.microsoft.azure.storage.StorageException: Server failed to
authenticate the request. Make sure the value of Authorization header
is formed correctly including the signature.)
The Storage Account that i am using here is not primary storage account that is attached to hdinsight cluster
Could some one help me how to solve this issue?

I am able to resolve this issue by adding configuration below, i have done this through Ambari server
HDFS >>Custom core-site
fs.azure.account.key.(storage_account).blob.core.windows.net=(Access
Key)
fs.azure.account.keyprovider.(storage_account).blob.core.windows.net=org.apache.hadoop.fs.azure.SimpleKeyProvider
Hive >> Custom hive-env
AZURE_STORAGE_ACCOUNT=(Storage Account name)
AZURE_STORAGE_KEY=(Access Key)

Related

Add a column to a delta table in Azure Synapse

I have a delta table that I created in Azure Synapse using a mapping data flow. The data flow reads append-only changes from Dataverse, finds the latest value, and upserts them to the table.
Now, I'd like to add a column to the delta table. When you select Upsert in a mapping dataflow, the Merge Schema is disabled, so it doesn't appear I can use that.
I tried creating a notebook and executing the following SQL, but I get an error.
ALTER TABLE delta.`https://xxxx.dfs.core.windows.net/path/to/table` ADD COLUMNS (mytest STRING)
Error: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: null path
The path provided is not in the default Synapse container.
How can I alter the table and add the column?
The issue I was running into was that the default Synapse container wasn't present in the storage account. After creating the container, the command executed successfully.
Open Azure Portal
Navigate to Synapse Workspace
Click Properties
Value of Primary ADLS Gen2 file system
This helped me track down the issue. https://learn.microsoft.com/en-us/answers/questions/706694/unable-to-run-sql-queries-in-azure-synapse-error-o.html

Unable to delete a DataBricks table: Container xxx in account yyy.blob.core.windows.net not found

I have a series of parquet files in different folders on an Azure Storage Account Container.
I can expose them all as SQL tables with command like:
create table table_name
using parquet
location 'wasbs://mycontainer#mystorageaccount.blob.core.windows.net/folder_or_parquet_files'
And all is fine. However I want to drop them all and they all drop except one, which gives me:
Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:Got exception: shaded.databricks.org.apache.hadoop.fs.azure.AzureException
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container mycontainer in account
mystorageaccount.blob.core.windows.net not found,and we can't create it using
anoynomous credentials, and no credentials found for them in the configuration.)
Obviously mystorageaccount and mycontainer are in to replace my real values, and creating / droping other folders of parquet files in that container / storage account works fine.
It's just one table seems a little messed up.
How can I get rid of this broken table, please?

BigQuery return Unknown error after create table name with '_ads` suffix

I try both API and GUI to create this empty table and they both failed.
I create many tables via API just fine but only this name organizes_ads has a problem.
Same create process and schema can create organizes_ads_0 but not organizes_ads.
If I try to get this table via API it will return.
{"error":{"code":-1,"message":"A network error occurred, and the request could not be completed."}}
I tend to use this name because it's a replicated table name from other source, so it will be weird if I have to hard code to use other name for workaround.
[UPDATE] I also found that any table name with suffix _ads will be broken (so nothing wrong with schema).
This error could be caused by an AdBlocker.
I created a table with _ads suffix and when enabled the AdBlocker I got the same error: Unknown error response from the server.

Not able to drop Hive table pointing to Azure Storage account that no longer exists

I am using Hive based on HDInsight Hadoop cluster -- Hadoop 2.7 (HDI 3.6).
We have some old Hive tables that point to some very storage accounts that don't exist any more. But these tables still point to these storage locations , basically the Hive Metastore still contain references to the deleted storage accounts. If I try to drop such a hive table , I get an error
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException org.apache.hadoop.fs.azure.AzureException: No credentials found for account <deletedstorage>.blob.core.windows.net in the configuration, and its container data is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.)
Manipulating the Hive Metatstore directly is risky as it could land the Metastore in an invalid state.
Is there any way to get rid of these orphan tables?

Using Azure HDInsight and Hive

I have created an HDInsight cluster but wants to upload a database on portal and use hive on it. What are the steps i need to take?
I know how to use hive but don't know how to connect the data being uploaded in container blob and hive. Btw I am using Powershell
Need to link storage account of the container with hdinsight cluster.
To do that, add following property in core-site.xml
<property>
<name>fs.azure.account.key.[STORAGE ACCOUNT NAME].blob.core.windows.net</name>
<value>[STORAGE ACCOUNT KEY]</value>
</property>
Once its linked, you will be to access that storage account.
To Create hive table on data residing in blob, use external hive table with location pointing to blob directory of your data.
example : CREATE EXTERNAL TABLE (col1 datatype, ....)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Location 'wasb://#.blob.core.windows.net/PATH/OF/DATA/'