Unable to delete a DataBricks table: Container xxx in account yyy.blob.core.windows.net not found - hive

I have a series of parquet files in different folders on an Azure Storage Account Container.
I can expose them all as SQL tables with command like:
create table table_name
using parquet
location 'wasbs://mycontainer#mystorageaccount.blob.core.windows.net/folder_or_parquet_files'
And all is fine. However I want to drop them all and they all drop except one, which gives me:
Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:Got exception: shaded.databricks.org.apache.hadoop.fs.azure.AzureException
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container mycontainer in account
mystorageaccount.blob.core.windows.net not found,and we can't create it using
anoynomous credentials, and no credentials found for them in the configuration.)
Obviously mystorageaccount and mycontainer are in to replace my real values, and creating / droping other folders of parquet files in that container / storage account works fine.
It's just one table seems a little messed up.
How can I get rid of this broken table, please?

Related

Getting a Databricks drop schema error for delta table

I have a delta table schema that needs new columns/changed data types (Usually I do this on non delta tables and those work fine)
I have already dropped the existing delta table and tried dropping the schema and getting a 'v1 session catalog' error.
I am currently using SQL, 10.4 LTS cluster, spark3.2.1, scala 2.12 (I cant change these computes), driver and workers are standard E_v4
What I already did, and worked as usual
drop table if exists dbname.tablename;
What I wanted to do next:
drop schema if exists dbname.tablename;
The error I got instead:
Error in SQL statement: AnalysisException: Nested databases are not supported by v1 session catalog: dbname.tablename
When I try recreating the schema in the same location I get the error:
AnalysisException: The specified schema does not match the existing schema at dbfs:locationOfMy/table
... Differences
-Specified schema has additional fields newColNameIAdded, anotherNewColIAdded
-Specified type for myOldCol is different from existing schema ...
If your intention is to keep the existing schema, you can omit the
schema from the create table command. Otherwise please ensure that
the schema matches.
How can I do the schema drop and re-register it in same location and same name with new definitions?
Answering a month later since I didnt get replies and found the right solution;
Delta files have left over partitions and logs that cannot be updated using the drop commands. I had to manually delete the logs depending on where my location was.
Try this:
dbutils.fs.rm(path, True)
Use the path of your schema.
Then create your table again.

Add a column to a delta table in Azure Synapse

I have a delta table that I created in Azure Synapse using a mapping data flow. The data flow reads append-only changes from Dataverse, finds the latest value, and upserts them to the table.
Now, I'd like to add a column to the delta table. When you select Upsert in a mapping dataflow, the Merge Schema is disabled, so it doesn't appear I can use that.
I tried creating a notebook and executing the following SQL, but I get an error.
ALTER TABLE delta.`https://xxxx.dfs.core.windows.net/path/to/table` ADD COLUMNS (mytest STRING)
Error: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: null path
The path provided is not in the default Synapse container.
How can I alter the table and add the column?
The issue I was running into was that the default Synapse container wasn't present in the storage account. After creating the container, the command executed successfully.
Open Azure Portal
Navigate to Synapse Workspace
Click Properties
Value of Primary ADLS Gen2 file system
This helped me track down the issue. https://learn.microsoft.com/en-us/answers/questions/706694/unable-to-run-sql-queries-in-azure-synapse-error-o.html

AWS athena giving error when trying to query files in S3 that have already been catalogued in Glue data catalog

Trying to build a data lake using S3 for files that are in .csv.gz format and then further cleansing/processing data in AWS environment itself.
First used AWS Glue to create a data catalog\ (crawler was able to identify all tables).
The tables from catalog are also available in AWS Athena but when i try to run a Select * from the table it gives me following error.
Error opening Hive split s3://BUCKET_NAME/HEADER FOLDER/FILENAME.csv.gz (offset=0, length=44354) using org.apache.hadoop.mapred.TextInputFormat: Permission denied on S3 path: 3://BUCKET_NAME/HEADER FOLDER/FILENAME.csv.gz.
Could it be that the file is in CSV.GZ format and that is why it cannot be accessed as is or do i need to give user or role a specific access for these files?
You need to fix your permissions. The error says the principal (user/role) that ran the query does not have permission to read an object on S3.

Not able to drop Hive table pointing to Azure Storage account that no longer exists

I am using Hive based on HDInsight Hadoop cluster -- Hadoop 2.7 (HDI 3.6).
We have some old Hive tables that point to some very storage accounts that don't exist any more. But these tables still point to these storage locations , basically the Hive Metastore still contain references to the deleted storage accounts. If I try to drop such a hive table , I get an error
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException org.apache.hadoop.fs.azure.AzureException: No credentials found for account <deletedstorage>.blob.core.windows.net in the configuration, and its container data is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.)
Manipulating the Hive Metatstore directly is risky as it could land the Metastore in an invalid state.
Is there any way to get rid of these orphan tables?

error loading table on bigquery dashboard but queries works fine

I clicked a table on bigquery dashboard, got this error:
However, I can get data when I do a select on this table. (That means the table does exist)
I already have the highest admin privilege so it shouldn't be a permission issue.
I created this table with python script, which collects data, writes into a csv file, and upload the csv file to bigquery everyday. After I created the table I once changed the schema both in the script and on the dashboard. Not sure if that's the cause, but the table loading error occurred several days after I changed the schema.
If you have Addblock extensions, this might be the root cause of this issue. Thus, try disabling it, then try running your query again.
Hope it helps.