Parquet troubles with decimal in Azure Data Factory V2

Parquet troubles with decimal in Azure Data Factory V2 - azure-data-factory-2

Since 3 or 4 days i'm experiencing troubles in writing decimal values in parquet file format with Azure Data Factory V2.
The repro steps are quite simple, from an SQL source containing a numeric value i map it to a parquet file using the copy activity.
At runtime the following exception is thrown:
{
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=UserErrorParquetTypeNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Decimal Precision or Scale information is not found in schema for column: ADDRESSLONGITUDE,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=System.InvalidCastException,Message=Object cannot be cast from DBNull to other types.,Source=mscorlib,'",
"failureType": "UserError",
"target": "Copy Data"
}
In the source the complaining column is defined as numeric(32,6) type.
I think the problem is circumscribed to the parquet sink because changing the destination format to csv result in a succeeded pipeline.
Any suggestions?
Based on Jay's answer, here is the whole dataset :
SELECT
[ADDRESSLATITUDE]
FROM
[dbo].[MyTable]

Based on the SQL Types to Parquet Logical Types and Data type mapping for Parquet files in data factory copy activity,it supports Decimal data type.Decimal data is converted into binary data type.
Back to your error message:
Failure happened on 'Source' side.
ErrorCode=UserErrorParquetTypeNotSupported,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Decimal Precision or Scale information is not found in schema
for column:
ADDRESSLONGITUDE,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''
Type=System.InvalidCastException,Message=Object cannot be cast from
DBNull to other types.,Source=mscorlib,'
If your numeric data has null value, it will be converted into Int data type without any
Decimal precision or scale information.
Csv format does not have this transformation process so you could set default value for your numeric data.

Related

Google BigQuery: Importing DATETIME fields using Avro format

I have a script that downloads data from an Oracle database, and uploads it to Google BigQuery. This is done by writing to an Avro file, which is then uploaded directly using BQ's python framework. The BigQuery tables I'm uploading the data to has predefined schemas, some of which contain DATETIME fields.
As BigQuery now has support for Avro Logical fields, import of timestamp data is no longer a problem. However, I'm still not able to import datetime fields. I tried using string, but then I got the following error:
Field CHANGED has incompatible types. Configured schema: datetime; Avro file: string.
I also tried to convert the field data to timestamps on export, but that produced an internal error in BigQuery:
An internal error occurred and the request could not be completed. Error: 3144498
Is it even possible to import datetime fields using Avro?

In Avro, the logical data types must include the attribute logicalType, it is possible that this field is not included in your schema definition.
Here there are a couple of examples like the following one. As far as I know the type can be int or long, but logicalType should be date:
{
'name': 'DateField',
'type': 'int',
'logicalType': 'date'
}
Once the logical data type is set, try again. The documentation does indicate it should work:
Avro logical type --> date
Converted BigQuery data type --> DATE
In case you get an error, it would be helpful to check the schema of your avro file, you can use this command to obtain its details:
java -jaravro-tools-1.9.2.jargetschema my-avro-file.avro
UPDATE
For cases where DATE alone doesn't work, please consider that the TIMESTAMP can store the date and time with a number of micro/nano seconds from the unix epoch, 1 January 1970 00:00:00.000000 UTC (UTC seems to be the default for avro). Additionally, the values stored in an avro file (of type DATE o TIMESTAMP) are independent of a particular time zone, in this sense, it is very similar to BigQuery Timestamp data type.

Informatica BDM string data type

I am running my mapping on blaze engine and my target hive table is transnational.
I have a field coming from source with data type as 32000 (varchar), but when I run the mapping , it is getting failed with following error.
"The Integration Service failed to execute grid mapping with following error [An internal exception occurred with message: The length of the data is larger than the precision of the column.]."
Any insights will be very helpful
Note: 1. My target is a hive table and it's transactional properties are true.
2. I am running this mapping on blaze informatica engine which needs update strategy to be used.
3. Target column field length is also varchar(32000)

Check that the length of the source data you are ingesting is not bigger that 3200 characters.
If you using a flat file as source check the delimiters and be sure that is reading correctly the file.

SAP Vora dealing with decimal type

So I'm trying to create and load Vora table from an ORC file created by SAP BW archiving process on HDFS.
The Hive table automatically generated on top of that file by BW has, among other things, this column:
archreqtsn decimal(23,0)
At attempt to create a Vora table using that datatype fails with the error "Unsupported type (DecimalType(23,0)}) on column archreqtsn".
So, the biggest decimal supported seems to be decimal(18,0)?
Next thing I tried was to either use decimal(18,0) or string as the type for that column. But when attempting to load data from a file:
APPEND TABLE F002_5_F
OPTIONS (
files "/sap/bw/hb3/nldata/o_1ebic_1ef002__5/act/archpartid=p20170611052758000009000/000000_0",
format "orc" )
I'm getting another error:
com.sap.spark.vora.client.VoraClientException: Could not load table F002_5_F: [Vora [<REDACTED>.com.au:30932.1639407]] sap.hanavora.jdbc.VoraException: HL(9): Runtime error. (decimal 128 unsupported (c++ exception)).
An unsuccessful attempt to load a table might lead to an inconsistent table state. Please drop the table and re-create it if necessary. with error code 0, status ERROR_STATUS
What could be the workarounds for this issue of unsupported decimal types? In fact, I might not need that column in the Vora table at all, but I can't get rid of it in the ORC file.

Convert from string to int in SSIS

I'm converting a database from one structure into a new structure. The old database is FoxPro and the new one is SQL server. The problem is that some of the data is saved as char data in foxpro but are actually foreign key tables. This means they need to be int types in sql. Problem is When i try to do a data conversion in SSIS from any of the character related types to an integer, I get something along the following error message:
There was an error with the output column "columnName"(24) on output "OLE DB Source Output" (22). The column status returned was : "The value could not be converted because of potential loss of data".
How do i convert from a string or character to an int without getting the potential loss of data error. I hand checked the values and it looks like all of them are small enough to fit into an int data type.

Data source -> Data Conversion Task.
In Data Conversion Task, click Configure Error Output
For Error and Truncation, change it from Fail Component to Redirect Row.
Now you have two paths. Good data will flow out of the DCT with the proper types. The bad data will go down the Red path. Do something with it. Dump to a file, add a data view and inspect, etc.

Values like 34563927342 exceed the max size for integer. You should use Int64 / bigint

ssis package ado destination Data Conversion warning

i'm working on ssis package and i'm taking values from .xml file to ado .net destination
but when i enter values iinto table getting following error :
potential data loss may occur due to inserting data from input column "Copy of swaps_Id" with data type "DT_I8" to external column "swaps_id" with data type "DT_I4". If this is intended, an alternative way to do conversion is using a Data Conversion component before ADO NET destination component
I have used Data conversion transformation editor then also getting above error
what should be corrected?

This warning means that data in Copy of swaps_Id is 64bit integer and you trying to insert it into 32bit integer column in destination table. What should you do depends on your data.
If you are sure that data in your column is in 32bit signed integer range (-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)) you can leave it as is (data truncation will never occur, bur warning will stay) or do data conversion or change Copy of swaps_Id column data type
If not, you should change column data type in your destination data table to 64bit integer (bigint in Sql Server)

This simply means that your source data type is Larger than what your destination can handle. On the Data Conversion Transformation, you may want to convert the column to DT_I8 datatype in order for that warning to disappear.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Parquet troubles with decimal in Azure Data Factory V2 - azure-data-factory-2

Related

Google BigQuery: Importing DATETIME fields using Avro format

Informatica BDM string data type

SAP Vora dealing with decimal type

Convert from string to int in SSIS

ssis package ado destination Data Conversion warning

Categories

Resources