U-SQL External table error: 'Unable to cast object of type 'System.DBNull' to type 'System.Type'.' - azure-sql-database

I'm failing to create external tables to two specific tables from Azure SQL DB,
I already created few external tables with no issues.
The only difference I can see between the failed and the successful external tables is that the tables that failed contains geography type columns, so I think this is the issue but i'm not sure.
CREATE EXTERNAL TABLE IF NOT EXISTS [Data].[Devices]
(
[Id] int
)
FROM SqlDbSource LOCATION "[Data].[Devices]";
Failed to connect to data source: 'SqlDbSource', with error(s): 'Unable to cast object of type 'System.DBNull' to type 'System.Type'.'

I solved it by doing a workaround to the external table:
I created a view that select from external rowset using EXECUTE
CREATE VIEW IF NOT EXISTS [Data].[Devices]
AS
SELECT Id FROM EXTERNAL SqlDbSource
EXECUTE "SELECT Id FROM [Data].[Devices]";
This made the script to completely ignore the geography type column, which is currently not supported as REMOTEABLE_TYPE for data sources by U-SQL.

Please have a look at my answer on the other thread opened by you. To add to that, I would also recommend you to have a look at how to create a table using a query. In the query, you should be able to use "extractors" in the query to create the tables. To read more about extractors, please have a look at this doc.
Hope this helps.

Related

Getting a Databricks drop schema error for delta table

I have a delta table schema that needs new columns/changed data types (Usually I do this on non delta tables and those work fine)
I have already dropped the existing delta table and tried dropping the schema and getting a 'v1 session catalog' error.
I am currently using SQL, 10.4 LTS cluster, spark3.2.1, scala 2.12 (I cant change these computes), driver and workers are standard E_v4
What I already did, and worked as usual
drop table if exists dbname.tablename;
What I wanted to do next:
drop schema if exists dbname.tablename;
The error I got instead:
Error in SQL statement: AnalysisException: Nested databases are not supported by v1 session catalog: dbname.tablename
When I try recreating the schema in the same location I get the error:
AnalysisException: The specified schema does not match the existing schema at dbfs:locationOfMy/table
... Differences
-Specified schema has additional fields newColNameIAdded, anotherNewColIAdded
-Specified type for myOldCol is different from existing schema ...
If your intention is to keep the existing schema, you can omit the
schema from the create table command. Otherwise please ensure that
the schema matches.
How can I do the schema drop and re-register it in same location and same name with new definitions?
Answering a month later since I didnt get replies and found the right solution;
Delta files have left over partitions and logs that cannot be updated using the drop commands. I had to manually delete the logs depending on where my location was.
Try this:
dbutils.fs.rm(path, True)
Use the path of your schema.
Then create your table again.

How to create a database, table and Insert data into it and use it as a source in another data flow in SSIS?

I have a need to create a SQL database and a table and Insert data into the table from another SQL database . And also to use this newly created database as a oledb source in another dataflow in the same SSIS package. The table and database name are fixed.
I tried using script task to create database and tables. But when I have to insert data , I am not able to give database name in the connection manager as the database is created only in runtime.
I have tried setting ValidExternalMetaData to false, but that doesnt seems to help as well.
Any idea or suggestions on how to accomplish this will be of great help. Thanks
I think you just need two things to make this work:
While developing the package, the database and table will need to exist.
Set DelayValidation to true on the connection manager and dataflow tasks in order to avoid failures with connection tests before they are created.
use a variable to hold the new table name create and populate the using the variable then use the variable name in the source object.

Is it possible to apply functions to an external table DDL in BigQuery?

I have JSONL data being loaded into a GCS bucket, and some "record" field types are blank (i.e., {}). So auto detecting the schema ends up creating a RECORD type which has no sub fields (this is not allowed in BigQuery).
As a workaround, I have created the external table as such:
CREATE EXTERNAL TABLE `my_table`
OPTIONS
(
uris=['gs://path/to/files/*'],
format=JSON,
ignore_unknown_values=true
)
(this table cannot be queried fully as I receive an error: Unsupported empty struct type for field '<field name here>'). So I create a view on top of this external table as such:
CREATE VIEW `my_table_view` AS
select TO_JSON_STRING(value) as column from `my_table`
and then I can successfully query this view (built on top of the external table) for any column.
My question is, can I skip creating the view and add this TO_JSON_STRING function in the external table definition itself? I have not been able to figure out how as I keep getting syntax errors.

Load data from Drill table into Hive Table

I have created a table using Drill and it is located at
/user/abc/drill/Drilltable.
Now I would like to load the data from DrillTable into HiveTable which is located at path
/user/hive/warehouse/userxyz.db
I am using below statement to load data
INSERT INTO TABLE HiveTable select * from DrillTable;
I get the error
Table not found
and I am bit confused how to let Hive know the path of Drill table.
What would be the right way to handle this?
Hive might be confused about the schema of the drill data as well as the location. If you're willing to experiment, try something like this:
Store the data in a Drill format you can model in Hive, CSV for example, as described in this post.
In Hive, create an external table that defines the schema and location of the textual data. You can then convert the external table to a managed table (optional). For example ....

Getting exception while updating table in Hive

I have created one table in hive from existing s3 file as follows:
create table reconTable (
entryid string,
run_date string
)
LOCATION 's3://abhishek_data/dump1';
Now I would like to update one entry as follows:
update reconTable set entryid='7.24E-13' where entryid='7.24E-14';
But I am getting following error:
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
I have gone through a few posts here, but not getting any idea how to fix this.
I think you should create an external table when reading data from a source like S3.
Also, you should declare the table in ORC format and set properties 'transactional'='true'.
Please refer to this for more info: attempt-to-do-update-or-delete-using-transaction-manager
You can refer to this Cloudera Community Thread:
https://community.cloudera.com/t5/Support-Questions/Hive-update-delete-and-insert-ERROR-in-cdh-5-4-2/td-p/29485