Azure Data Factory: ErrorCode=TypeConversionFailure,Exception occurred when converting value : ErrorCode: 2200 - azure-data-factory-2

Can somoene let me know why Azure Data Factory is trying to convert a value from String to type Double.
I am getting the error:
{
"errorCode": "2200",
"message": "ErrorCode=TypeConversionFailure,Exception occurred when converting value '+44 07878 44444' for column name 'telephone2' from type 'String' (precision:255, scale:255) to type 'Double' (precision:15, scale:255). Additional info: Input string was not in a correct format.",
"failureType": "UserError",
"target": "Copy Table to EnrDB",
"details": [
{
"errorCode": 0,
"message": "'Type=System.FormatException,Message=Input string was not in a correct format.,Source=mscorlib,'",
"details": []
}
]
}
My Sink looks like the following:
I don't have any mapping set
The column setting for the the field 'telephone2' is as follows:
I changed the 'table option' to none, however I got the following error:
{
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Internal system error occurred.\r\nStatement ID: {C2C38377-5A14-4BB7-9298-28C3C351A40E} | Query hash: 0x2C885D2041993FFA | Distributed request ID: {6556701C-BA76-4D0F-8976-52695BBFE6A7}. Total size of data scanned is 134 megabytes, total size of data moved is 102 megabytes, total size of data written is 0 megabytes.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Internal system error occurred.\r\nStatement ID: {C2C38377-5A14-4BB7-9298-28C3C351A40E} | Query hash: 0x2C885D2041993FFA | Distributed request ID: {6556701C-BA76-4D0F-8976-52695BBFE6A7}. Total size of data scanned is 134 megabytes, total size of data moved is 102 megabytes, total size of data written is 0 megabytes.,Source=.Net SqlClient Data Provider,SqlErrorNumber=75000,Class=17,ErrorCode=-2146232060,State=1,Errors=[{Class=17,Number=75000,State=1,Message=Internal system error occurred.,},{Class=0,Number=15885,State=1,Message=Statement ID: {C2C38377-5A14-4BB7-9298-28C3C351A40E} | Query hash: 0x2C885D2041993FFA | Distributed request ID: {6556701C-BA76-4D0F-8976-52695BBFE6A7}. Total size of data scanned is 134 megabytes, total size of data moved is 102 megabytes, total size of data written is 0 megabytes.,},],'",
"failureType": "UserError",
"target": "Copy Table to EnrDB",
"details": []
}
Any more thoughts

The issue was resolved by changing the column DataType on the database to match the DataType recorded in Azure Data Factory i.e StringType

Related

Kusto | calculate percentage grouped by 2 columns

I have a result set that look something similar to the table below and I extended with Percentage like so:
datatable (Code:string, App:string, Requests:long)
[
"200", "tra", 63,
"200", "api", 1036,
"302", "web", 12,
"200", "web", 219,
"500", "web", 2,
"404", "api", 18
]
| as T
| extend Percentage = round(100.0 * Requests / toscalar(T | summarize sum(Requests)), 2)
The problem is I really want the percentage to be calculated from the total of Requests of the Code by App rather than the grand total.
For example, for the App "api" where Code is "200", instead of 76.74% of the total, I want to express it as a percentage of just the "api" Code values, which would be 98.29% of the total Requests for App "api".
I haven't really tried anything that would be considered valid syntax. Any help much appreciated.
you can use the join or lookup operators:
datatable (Code:string, App:string, Requests:long)
[
"200", "tra", 63,
"200", "api", 1036,
"302", "web", 12,
"200", "web", 219,
"500", "web", 2,
"404", "api", 18
]
| as T
| lookup ( T | summarize sum(Requests) by App ) on App
| extend Percentage = round(100.0 * Requests / sum_Requests, 2)
| project Code, App, Requests, Percentage
Code
App
Requests
Percentage
200
api
1036
98.29
404
api
18
1.71
200
tra
63
100
302
web
12
5.15
200
web
219
93.99
500
web
2
0.86

Comparing numbers in JSON Schema

I have number property in JSON schema
"years": {"type": "number", "pattern": "^([0-9]|10)$"}
I want to match this number in a condition where I need to check whether number is less than 3, is there a way to do it ? I tried
"if": {"properties": {"years": {"anyOf": [0,1,2]}}
you want exclusiveMaximum. see https://json-schema.org/understanding-json-schema/reference/numeric.html#range
note that you may want minimum: 0 to exclude negative numbers.
you may also want type: integer instead of type: number if you don't want to include fractional numbers.
pattern is incorrect as it applies to strings, not numbers.
anyOf takes a schema, not values, but you could use enum: [0, 1, 2] if those are the only allowed values.

BigQuery - Extract all array elements inside the array Dynamically

I have already posted this question, but now Im trying to achieve this with BigQuery.
I have JSON data like below.
{
"userid": null,
"appnumber": "9",
"trailid": "1547383536",
"visit": [{
"visitNumber": "1",
"time": "0",
"hour": "18",
"minute": "15"
},
{
"visitNumber": "2",
"time": "2942",
"hour": "18",
"minute": "15"
}
]
}
I want to extract the visit array values dynamically.
Like below: (pipe demited column)
userid,appnumber| trailid |
visit.visitnumber | visit.time | visit.hour | visit.minute |
visit.visitnumber | visit.time | visit.hour | visit.minute
If you see I have 2 json elements inside the visit array. So I want to extract visitNumber, time, hour, minute dynamically. Sometime I may have 3 or 5 values inside the array, so It should extract all 3 or 5 json automatically(I mean dynamically).
There is a common way to extract this like JsonExtractScalar(JsonExtract(visit,'$.[0].visitnumber') (This syntax may be wrong, but similar syntax we'll use. So here Im manually using [0] to extract fist element in the array.
If it has 10+ elements then I should use [0]...[1]...[2]....[10]. This thing I want to solve, somehow without mentioning all elements it should dynamically pick all 10 elements and extract it.
Could someone help me with the extract queries?

Column names when exporting ORC files from hive server 2 using beeline

I am facing a problem where exporting results from hive server 2 to ORC files show some kind of default column names (e.g. _col0, _col1, _col2) instead of the original ones created in hive. We are using pretty much default components from HDP-2.6.3.0.
I am also wondering if the below issue is related:
https://issues.apache.org/jira/browse/HIVE-4243
Below are the steps we are taking:
Connecting:
export SPARK_HOME=/usr/hdp/current/spark2-client
beeline
!connect jdbc:hive2://HOST1:2181,HOST2:2181,HOST2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Creating test table and inserting sample values:
create table test(str string);
insert into test values ('1');
insert into test values ('2');
insert into test values ('3');
Running test query:
select * from test;
+-----------+--+
| test.str |
+-----------+--+
| 1 |
| 2 |
| 3 |
+-----------+--+
Exporting as ORC:
insert overwrite directory 'hdfs://HOST1:8020/tmp/test' stored as orc select * from test;
Getting the results:
hdfs dfs -get /tmp/test/000000_0 test.orc
Checking the results:
java -jar orc-tools-1.4.1-uber.jar data test.orc
Processing data file test.orc [length: 228]
{"_col0":"1"}
{"_col0":"2"}
{"_col0":"3"}
java -jar orc-tools-1.4.1-uber.jar meta test.orc
Processing data file test.orc [length: 228]
Structure for test.orc
File Version: 0.12 with HIVE_13083
Rows: 2
Compression: SNAPPY
Compression size: 262144
Type: struct<_col0:string>
Stripe Statistics:
Stripe 1:
Column 0: count: 2 hasNull: false
Column 1: count: 2 hasNull: false min: 1 max: 3 sum: 2
File Statistics:
Column 0: count: 2 hasNull: false
Column 1: count: 2 hasNull: false min: 1 max: 3 sum: 2
Stripes:
Stripe: offset: 3 data: 11 rows: 2 tail: 60 index: 39
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 28
Stream: column 1 section DATA start: 42 length 5
Stream: column 1 section LENGTH start: 47 length 6
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
File length: 228 bytes
Padding length: 0 bytes
Padding ratio: 0%
Looking at the results I can see _col0 as the column name while expecting the original str.
Any ideas on what I am missing?
Update
I noticed that the connection from beeline was going to hive 1.x, and not 2.x as wanted. I changed the connection to the Hive Server 2 Interactive URL:
Connected to: Apache Hive (version 2.1.0.2.6.3.0-235)
Driver: Hive JDBC (version 1.21.2.2.6.3.0-235)
Transaction isolation: TRANSACTION_REPEATABLE_READ
And tried again with the same sample. It even prints out the schema correctly:
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:test.str, type:string, comment:null)], properties:null)
But still no luck in getting it to the ORC file.
Solution
You need to enable Hive LLAP (Interactive SQL) in Ambari, then change the connection string you are using. For example, my connection became jdbc:hive2://.../;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2
Note the additional "-hive2" at the end of the URL. Here is a tutorial vid from hortonworks.
"Proof"
After connecting to the updated Hive endpoint, I ran
create table t_orc(customer string, age int) stored as orc;
insert into t_orc values('bob', 12),('kate', 15);
Then
~$ hdfs dfs -copyToLocal /apps/hive/warehouse/t_orc/000000_0 ~/tmp/orc/hive2.orc
~$ orc-metadata tmp/orc/hive2.orc
{ "name": "tmp/orc/hive2.orc",
"type": "struct<customer:string,age:int>",
"rows": 2,
"stripe count": 1,
"format": "0.12", "writer version": "HIVE-13083",
"compression": "zlib", "compression block": 262144,
"file length": 305,
"content": 139, "stripe stats": 46, "footer": 96, "postscript": 23,
"row index stride": 10000,
"user metadata": {
},
"stripes": [
{ "stripe": 0, "rows": 2,
"offset": 3, "length": 136,
"index": 67, "data": 23, "footer": 46
}
]
}
Where orc-metadata is a tool distributed by the ORC repo on github.
You have to set this in hive script or hive-shell otherwise put it in a .hiverc file in your main directory or any of the other hive user properties files.
set hive.cli.print.header=true;

Storing a 30KB BLOB in SQL Server 2005

My data is 30KB on disk (Serialized object) was size should the binary field in t-sql be?
Is the brackets bit bytes ?
... so is binary(30000) .... 30KB?
Thanks
You need to use the varbinary(max) data type; the maximum allowed size for binary is 8,000 bytes. Per the MSDN page on binary and varbinary:
varbinary [ ( n | max) ]
Variable-length binary data. n can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size is the actual length of the data entered + 2 bytes. The data that is entered can be 0 bytes in length.
The number after binary() is the number of bytes, see MSDN:
binary [ ( n ) ]
Fixed-length binary data of n bytes. n
must be a value from 1 through 8,000.
Storage size is n+4 bytes.
Whether 30kb is 30000 or 30720 bytes depends on which binary prefix system your file system is using.