Truncating tables in Azure Data Factory Pre-Copy script? - sql

I am building a pipeline, and now I need to truncate my destination tables in azure sql db, but before that I need to truncate the destination tables. but I can't figure out the script:
Click to view the ADF screenshot for SINK settings
instead, I put this code but that is wrong because it runs before every copy of the tables (5 times) and truncates all the table except the last one. so I need to make it parameterized I guess:
*truncate table [dbo].[Global_data.csv]
truncate table [dbo].[Option_data.csv]
truncate table [dbo].[State_data.csv]
truncate table [dbo].[Status_data.csv]
truncate table [dbo].[Target_data.csv]*
Also please see my source parameters:
**ADLSv2 container: #pipeline().parameters.SourceContainer
ADLSv2 Directory: #pipeline().parameters.SourceDirectory
ADLSv2 filename: #item().name
Sink TableName: #item().name**
So I'm guessing that my pre-script must be something like:
truncate table #item().name but this resulted an error for me:
Error Screenshot
DetailsErrorCode= SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Incorrect syntax near '#item'.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Incorrect syntax near '#item'.,Source=.Net SqlClient Data Provider,SqlErrorNumber=102,Class=15,ErrorCode=-2146232060,State=1,Errors=[{Class=15,Number=102,State=1,Message=Incorrect syntax near '#item'.,},],'
when I use TRUNCATE TABLE [#{item()}] , I get below error 5 times (one for each table accordingly):
ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Cannot find the object "{"name":"StateMetadata.csv","type":"File"}" because it does not exist or you do not have permissions.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Cannot find the object "{"name":"StateMetadata.csv","type":"File"}" because it does not exist or you do not have permissions.,Source=.Net SqlClient Data Provider,SqlErrorNumber=4701,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=4701,State=1,Message=Cannot find the object "{"name":"StateMetadata.csv","type":"File"}" because it does not exist or you do not have permissions.,},],'

Please use truncate table TRUNCATE TABLE [#{item().name}]

Related

Getting a Databricks drop schema error for delta table

I have a delta table schema that needs new columns/changed data types (Usually I do this on non delta tables and those work fine)
I have already dropped the existing delta table and tried dropping the schema and getting a 'v1 session catalog' error.
I am currently using SQL, 10.4 LTS cluster, spark3.2.1, scala 2.12 (I cant change these computes), driver and workers are standard E_v4
What I already did, and worked as usual
drop table if exists dbname.tablename;
What I wanted to do next:
drop schema if exists dbname.tablename;
The error I got instead:
Error in SQL statement: AnalysisException: Nested databases are not supported by v1 session catalog: dbname.tablename
When I try recreating the schema in the same location I get the error:
AnalysisException: The specified schema does not match the existing schema at dbfs:locationOfMy/table
... Differences
-Specified schema has additional fields newColNameIAdded, anotherNewColIAdded
-Specified type for myOldCol is different from existing schema ...
If your intention is to keep the existing schema, you can omit the
schema from the create table command. Otherwise please ensure that
the schema matches.
How can I do the schema drop and re-register it in same location and same name with new definitions?
Answering a month later since I didnt get replies and found the right solution;
Delta files have left over partitions and logs that cannot be updated using the drop commands. I had to manually delete the logs depending on where my location was.
Try this:
dbutils.fs.rm(path, True)
Use the path of your schema.
Then create your table again.

U-SQL External table error: 'Unable to cast object of type 'System.DBNull' to type 'System.Type'.'

I'm failing to create external tables to two specific tables from Azure SQL DB,
I already created few external tables with no issues.
The only difference I can see between the failed and the successful external tables is that the tables that failed contains geography type columns, so I think this is the issue but i'm not sure.
CREATE EXTERNAL TABLE IF NOT EXISTS [Data].[Devices]
(
[Id] int
)
FROM SqlDbSource LOCATION "[Data].[Devices]";
Failed to connect to data source: 'SqlDbSource', with error(s): 'Unable to cast object of type 'System.DBNull' to type 'System.Type'.'
I solved it by doing a workaround to the external table:
I created a view that select from external rowset using EXECUTE
CREATE VIEW IF NOT EXISTS [Data].[Devices]
AS
SELECT Id FROM EXTERNAL SqlDbSource
EXECUTE "SELECT Id FROM [Data].[Devices]";
This made the script to completely ignore the geography type column, which is currently not supported as REMOTEABLE_TYPE for data sources by U-SQL.
Please have a look at my answer on the other thread opened by you. To add to that, I would also recommend you to have a look at how to create a table using a query. In the query, you should be able to use "extractors" in the query to create the tables. To read more about extractors, please have a look at this doc.
Hope this helps.

Getting exception while updating table in Hive

I have created one table in hive from existing s3 file as follows:
create table reconTable (
entryid string,
run_date string
)
LOCATION 's3://abhishek_data/dump1';
Now I would like to update one entry as follows:
update reconTable set entryid='7.24E-13' where entryid='7.24E-14';
But I am getting following error:
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
I have gone through a few posts here, but not getting any idea how to fix this.
I think you should create an external table when reading data from a source like S3.
Also, you should declare the table in ORC format and set properties 'transactional'='true'.
Please refer to this for more info: attempt-to-do-update-or-delete-using-transaction-manager
You can refer to this Cloudera Community Thread:
https://community.cloudera.com/t5/Support-Questions/Hive-update-delete-and-insert-ERROR-in-cdh-5-4-2/td-p/29485

Error while reviewing file after inserting data in redshift table

I have a table in Redshift in which I am inserting data from S3.
I viewed the table before inserting the data and it returned a blank table.
However, After inserting data in Redshift table, I am getting below error while doing select * from table.
Command to copy data in table from S3 runs successfully without any error.
java.lang.NoClassDefFoundError:
com/amazon/jdbc/utils/DataTypeUtilities$NumericRepresentation error in
redshift
what could be the possible cause and sol for this?
I have faced this error : java.lang.NoClassDefFoundError when the JDBC connection properties are set incorrectly.
If you are using postgres driver then ensure using postgres://
eg : jdbc:postgresql:// HostName:5439/
Let me know if this works.

Bigquery: invalid: Illegal Schema update

I tried to append data from a query to a bigquery table.
Job ID job_i9DOuqwZw4ZR2d509kOMaEUVm1Y
Error: Job failed while writing to Bigquery. invalid: Illegal Schema update. Cannot add fields (field: debug_data) at null
I copy and paste the query executed in above jon, run it in web console and choose the same dest table to append, it works.
The job you listed is trying to append query results to a table. That query has a field named 'debug_data'. The table you're appending to does not have that field. This behavior is by design, in order to prevent people from accidentally modifying the schema of their tables.
You can run a tables.update() or tables.patch() operation to modify the table schema to add this column (see an example using bq here: Bigquery add columns to table schema), and then you'll be able to run this query successfully.
Alternately, you could use truncate instead of append as the write disposition in your query job; this would overwrite the table, and in doing so, will allow schema changes.
See this post for how to have bigquery automatically add new fields to a schema while doing an append.
The code in python is:
job_config.schema_update_options = ['ALLOW_FIELD_ADDITION']