what is the alternative for double datatype from spark sql(Databricks) to Sql Server Data warehouse

what is the alternative for double datatype from spark sql(Databricks) to Sql Server Data warehouse - sql-server-2016

I have to load the data from azure datalake to data warehouse.I have created set up for creating external tables.there is one column which is double datatype, i have used decimal type in sql server data warehouse for creating the external table and file format is parquet.But using csv it is working.
i'm getting the following error.
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered
filling record reader buffer: ClassCastException: class
java.lang.Double cannot be cast to class parquet.io.api.Binary
(java.lang.Double is in module java.base of loader 'bootstrap';
parquet.io.api.Binary is in unnamed module of loader 'app'.
Can some one help me on this issue?
Thanks in advance.
CREATE EXTERNAL TABLE [dbo].[EXT_TEST1]
( A VARCHAR(10),B decimal(36,19)))
(DATA_SOURCE = [Azure_Datalake],LOCATION = N'/A/B/PARQUET/*.parquet/',FILE_FORMAT =parquetfileformat,REJECT_TYPE = VALUE,REJECT_VALUE = 1)
Column datatype in databricks:
A string,B double
Data: A | B
'a' 100.0050

Use float(53) which is of 53 digits precision and 8 bytes length.

Related

Synapse polybase data ingestion is not working

I have a task to convert the jobs from synapse bulk insert to synapse polybase pattern. As part of that I see that it doesn't work straight away. It is complaining about some datatypes etc as below.... where as there is no double datatypes sometimes in the source query. Please help to understand if there a basic pattern or casting we need to do before we use polybase.
Here the source SQL I used
SELECT TOP (1000) cast([SiteCode_SourceId] as varchar(1000))
[SiteCode_SourceId]
,cast([EquipmentCode_SourceId] as varchar(1000))
[EquipmentCode_SourceId]
,FORMAT([RecordedAt],'yyyy-MM-dd HH:mm:ss.fffffff') AS
[RecordedAt]
,cast([DataLineage_SK] as varchar(1000)) [DataLineage_SK]
,cast([DataQuality_SK] AS varchar(1000)) [DataQuality_SK]
,cast([FixedPlantAsset_SK] as varchar(1000))
[FixedPlantAsset_SK]
,cast([ProductionTimeOfDay_SK] as varchar(1000))
[ProductionTimeOfDay_SK]
,cast([ProductionType_SK] as varchar(1000)) [ProductionType_SK]
,cast([Shift_SK] as varchar(1000)) [Shift_SK]
,cast([Site_SK] as varchar(1000)) [Site_SK]
,cast([tBelt] as varchar(1000)) [tBelt]
,FORMAT([ModifiedAt],'yyyy-MM-dd HH:mm:ss.fffffff') [ModifiedAt]
,FORMAT([SourceUpdatedAt],'yyyy-MM-dd HH:mm:ss.fffffff')
[SourceUpdatedAt]
FROM [ORXX].[public_XX].[fact_FixedXXXX]
Operation on target cp_data_movement failed: ErrorCode=PolybaseOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse. Operation: 'Polybase operation'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class parquet.io.api.Binary$ByteArraySliceBackedBinary cannot be cast to class java.lang.Double (parquet.io.api.Binary$ByteArraySliceBackedBinary is in unnamed module of loader 'app'; java.lang.Double is in module java.base of loader 'bootstrap'),Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class parquet.io.api.Binary$ByteArraySliceBackedBinary cannot be cast to class java.lang.Double (parquet.io.api.Binary$ByteArraySliceBackedBinary is in unnamed module of loader 'app'; java.lang.Double is in module java.base of loader 'bootstrap'),},],'

Reasons for this error can be,
Order of the columns in the target table is not matching with the source table. So, there will be data type mismatch
Data types in parquet file is Incompatible to target tables' data type.
Solution:
Make sure the order of the columns are same as in parquet staging file.
Keep the same data types in source columns and target columns.

Loading CSV data containing string and numeric format to Ignite is failing

I am evaluating Ignite and trying to load CSV data to Apache Ignite. I have created a table in Ignite:
jdbc:ignite:thin://127.0.0.1/> create table if not exists SAMPLE_DATA_PK(SID varchar(30),id_status varchar(50), active varchar, count_opening int,count_updated int,ID_caller varchar(50),opened_time varchar(50),created_at varchar(50),type_contact varchar, location varchar,support_incharge varchar,pk varchar(10) primary key);
I tried to load data to this table with command:
copy from '/home/kkn/data/sample_data_pk.csv' into SAMPLE_DATA_PK(SID,ID_status,active,count_opening,count_updated,ID_caller,opened_time,created_at,type_contact,location,support_incharge,pk) format csv;
But the data load is failing with this error:
Error: Server error: class org.apache.ignite.internal.processors.query.IgniteSQLException: Value conversion failed [column=COUNT_OPENING, from=java.lang.String, to=java.lang.Integer] (state=50000,code=1)
java.sql.SQLException: Server error: class org.apache.ignite.internal.processors.query.IgniteSQLException: Value conversion failed [column=COUNT_OPENING, from=java.lang.String, to=java.lang.Integer]
at org.apache.ignite.internal.jdbc.thin.JdbcThinConnection.sendRequest(JdbcThinConnection.java:1009)
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.sendFile(JdbcThinStatement.java:336)
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.execute0(JdbcThinStatement.java:243)
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.execute(JdbcThinStatement.java:560)
at sqlline.Commands.executeSingleQuery(Commands.java:1054)
at sqlline.Commands.execute(Commands.java:1003)
at sqlline.Commands.sql(Commands.java:967)
at sqlline.SqlLine.dispatch(SqlLine.java:734)
at sqlline.SqlLine.begin(SqlLine.java:541)
at sqlline.SqlLine.start(SqlLine.java:267)
at sqlline.SqlLine.main(SqlLine.java:206)
Below is the sample data I am trying to load:
SID|ID_status|active|count_opening|count_updated|ID_caller|opened_time|created_at|type_contact|location|support_incharge|pk
|---|---------|------|-------------|-------------|---------|-----------|----------|------------|--------|----------------|--|
INC0000045|New|true|1000|0|Caller2403|29-02-2016 01:16|29-02-2016 01:23|Phone|Location143||1
INC0000045|Resolved|true|0|3|Caller2403|29-02-2016 01:16|29-02-2016 01:23|Phone|Location143||2
INC0000045|Closed|false|0|1|Caller2403|29-02-2016 01:16|29-02-2016 01:23|Phone|Location143||3
INC0000047|Active|true|0|1|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||4
INC0000047|Active|true|0|2|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||5
INC0000047|Active|true|0|489|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||6
INC0000047|Active|true|0|5|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||7
INC0000047|AwaitingUserInfo|true|0|6|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||8
INC0000047|Closed|false|0|8|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||9
INC0000057|New|true|0|0|Caller4416|29-02-2016 06:10||Phone|Location204||10
Need help to understand how to figure out what is the issue and resolve it

You have to upload CSV without header line. Which contains the column names. An error is thrown when trying to convert the string value "count_opening" to a Integer.

Cannot save NAs to SQL DB using R and RJDBS

I noticed that I cannot save NAs to my DB, where price is a float variable. I found that changing column type to double will help, however, it did not help. The sam issue for varchar variables.
dbSendUpdate(CONNECTION,paste("INSERT INTO DB (Price) values (",
paste("'",NA,"'", sep=""),
");"))
Error in .local(conn, statement, ...) : execute JDBC update query
failed in dbSendUpdate ([Vertica]VJDBC ERROR: Invalid syntax
for float: "NA")
How can I export data showing NAs as empty cells in the DB?

Datetime parsing in Apache Pig

I'm trying to parse a Date in a Pig script and i got the following error "Hadoop does not return any error message".
Here is the Date format example : 3/9/16 2:50 PM
And here is how I parse it :
data = LOAD 'cleaned.txt'
AS (Date, Block, Primary_Type, Description, Location_Description, Arrest, Domestic, District, Year);
times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;
You can see the data file here
Do you have any idea ?
Thanks
EDIT:
It look like the error is caused by the STORE command on "times".
If I do a DUMP then I got:
ERROR 1066: Unable to open iterator for alias times
It happen only when I use the ToDate function, I have other scripts that work perfectly.

First of all, you need to specify the loader in the LOAD statement:
USING PigStorage('\t')
I assumed that you're using tab separator.
Than if you have no schema specify the schema with type!
So you're load statement will be sg like this:
data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);
For now I just use chararray type for everything, but you have to specify the type what is the right representation for you.
After this the date conversion just works fine as you wrote:
(2016-03-09T23:55:00.000Z)
(2016-03-09T23:55:00.000Z)
(2016-03-09T23:55:00.000Z)
My test script:
data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);
times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;
DUMP times;
UPDATE:
Some explanation
By the way the default loader is pig storage
PigStorage is the default load function for the LOAD operator.
but it's nicer to specify.
The original issue caused by the lack of datatype
If you don't assign types, fields default to type bytearray
so the ToDate failed on the input type.

Unable to specify schema during storage with pig scripts

Ignore above query. Its incorrect.
I have following pig script
A = LOAD 'textinput' using PigStorage() as (a0:chararray, a1:chararray, a2:chararray, a3:chararray, a4:chararray, a5:chararray, a6:chararray, a7:chararray, a8:chararray,a9:chararray);
describe A;
store A into 'output2' using PigStorage();
This works fine.
However when i modify the store statement to
store A into 'output3' using PigStorage() as (a0:chararray, a1:chararray, a2:chararray, a3:chararray, a4:chararray, a5:chararray, a6:chararray, a7:chararray, a8:chararray,a9:chararray);
It fails with below error
2013-05-04 11:49:56,296 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input 'as' expecting SEMI_COLON

You don't specify a schema when storing an output with pig. The schema of the alias you're storing is whatever it was when you created it. If you wished to change the way it's stored you could do something like
B = FOREACH A GENERATE (insert transformation here);
STORE B INTO 'output3';
If you wished to change the way PigStorage writes your alias to disk you could create your own StoreFunc

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

what is the alternative for double datatype from spark sql(Databricks) to Sql Server Data warehouse - sql-server-2016

Use float(53) which is of 53 digits precision and 8 bytes length.

Related

Synapse polybase data ingestion is not working

Loading CSV data containing string and numeric format to Ignite is failing

Cannot save NAs to SQL DB using R and RJDBS

Datetime parsing in Apache Pig

Unable to specify schema during storage with pig scripts

Categories

Resources