Vora support for ORC files - char() type? - orc

Is there any reason why Vora's ORC reader doesn't support char() type? Here's what I got when trying to read a file of a Hive ORC-based table containing char(n) fields:
com.sap.spark.vora.client.VoraClientException: Could not load table FLIGHTS_2006_ORC: [Vora[eba165.extendtec.com.au:2202]] sap.hanavora.jdbc.VoraException: HL(9): Runtime error. (ORC Reader: Unsupported type char(2)
(c++ exception))
[Vora[eba169.extendtec.com.au:2202]] sap.hanavora.jdbc.VoraException: HL(9): Runtime error. (ORC Reader: Unsupported type char(2)
(c++ exception))
[Vora[eba156.extendtec.com.au:2202]] sap.hanavora.jdbc.VoraException: HL(9): Runtime error. (ORC Reader: Unsupported type char(2)
(c++ exception)) with error code 0, status TStatusCode_ERROR_STATUS

This is a known problem in Vora 1.2 and earlier. It will be working in the next Vora version (Vora 1.3).

Related

Synapse polybase data ingestion is not working

I have a task to convert the jobs from synapse bulk insert to synapse polybase pattern. As part of that I see that it doesn't work straight away. It is complaining about some datatypes etc as below.... where as there is no double datatypes sometimes in the source query. Please help to understand if there a basic pattern or casting we need to do before we use polybase.
Here the source SQL I used
SELECT TOP (1000) cast([SiteCode_SourceId] as varchar(1000))
[SiteCode_SourceId]
,cast([EquipmentCode_SourceId] as varchar(1000))
[EquipmentCode_SourceId]
,FORMAT([RecordedAt],'yyyy-MM-dd HH:mm:ss.fffffff') AS
[RecordedAt]
,cast([DataLineage_SK] as varchar(1000)) [DataLineage_SK]
,cast([DataQuality_SK] AS varchar(1000)) [DataQuality_SK]
,cast([FixedPlantAsset_SK] as varchar(1000))
[FixedPlantAsset_SK]
,cast([ProductionTimeOfDay_SK] as varchar(1000))
[ProductionTimeOfDay_SK]
,cast([ProductionType_SK] as varchar(1000)) [ProductionType_SK]
,cast([Shift_SK] as varchar(1000)) [Shift_SK]
,cast([Site_SK] as varchar(1000)) [Site_SK]
,cast([tBelt] as varchar(1000)) [tBelt]
,FORMAT([ModifiedAt],'yyyy-MM-dd HH:mm:ss.fffffff') [ModifiedAt]
,FORMAT([SourceUpdatedAt],'yyyy-MM-dd HH:mm:ss.fffffff')
[SourceUpdatedAt]
FROM [ORXX].[public_XX].[fact_FixedXXXX]
Operation on target cp_data_movement failed: ErrorCode=PolybaseOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse. Operation: 'Polybase operation'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class parquet.io.api.Binary$ByteArraySliceBackedBinary cannot be cast to class java.lang.Double (parquet.io.api.Binary$ByteArraySliceBackedBinary is in unnamed module of loader 'app'; java.lang.Double is in module java.base of loader 'bootstrap'),Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class parquet.io.api.Binary$ByteArraySliceBackedBinary cannot be cast to class java.lang.Double (parquet.io.api.Binary$ByteArraySliceBackedBinary is in unnamed module of loader 'app'; java.lang.Double is in module java.base of loader 'bootstrap'),},],'
Reasons for this error can be,
Order of the columns in the target table is not matching with the source table. So, there will be data type mismatch
Data types in parquet file is Incompatible to target tables' data type.
Solution:
Make sure the order of the columns are same as in parquet staging file.
Keep the same data types in source columns and target columns.

DBT RUN - Getting Database Error using VS Code BUT Not Getting Database Error using DBT Cloud

I'm using DBT connected to Snowflake. I use DBT Cloud, but we are moving to using VS Code for our DBT project work.
I have an incremental DBT model that compiles and works without error when I issue the DBT RUN command in DBT Cloud. Yet when I attempt to run the exact same model from the same git branch using the DBT RUN command from the terminal in VS Code I get the following error:
Database Error in model dim_cifs (models\core_data_warehouse\dim_cifs.sql)
16:45:31 040050 (22000): SQL compilation error: cannot change column LOAN_MGMT_SYS from type VARCHAR(7) to VARCHAR(3) because reducing the byte-length of a varchar is not supported.
The table in Snowflake defines this column as VARCHAR(50). I have no idea why DBT is attempting to change the data length or why it only happens when the command is run from VS Code Terminal. There is no need to make this DDL change to the table.
When I view the compiled SQL in the Target folder there is nothing that indicates a DDL change.
When I look in the logs I find the following, but don't understand what is triggering the DDL change:
describe table "DEVELOPMENT_DW"."DBT_XXXXXXXX"."DIM_CIFS"
16:45:31.354314 [debug] [Thread-9 (]: SQL status: SUCCESS 36 in 0.09 seconds
16:45:31.378864 [debug] [Thread-9 (]:
In "DEVELOPMENT_DW"."DBT_XXXXXXXX"."DIM_CIFS":
Schema changed: True
Source columns not in target: []
Target columns not in source: []
New column types: [{'column_name': 'LOAN_MGMT_SYS', 'new_type': 'character varying(3)'}]
16:45:31.391828 [debug] [Thread-9 (]: Using snowflake connection "model.xxxxxxxxxx.dim_cifs"
16:45:31.391828 [debug] [Thread-9 (]: On model.xxxxxxxxxx.dim_cifs: /* {"app": "dbt", "dbt_version": "1.1.1", "profile_name": "xxxxxxxxxx", "target_name": "dev", "node_id": "model.xxxxxxxxxx.dim_cifs"} */
alter table "DEVELOPMENT_DW"."DBT_XXXXXXXX"."DIM_CIFS" alter "LOAN_MGMT_SYS" set data type character varying(3);
16:45:31.546962 [debug] [Thread-9 (]: Snowflake adapter: Snowflake query id: 01a5bc8d-0404-c9c1-0000-91b5178ac72a
16:45:31.548895 [debug] [Thread-9 (]: Snowflake adapter: Snowflake error: 040050 (22000): SQL compilation error: cannot change column LOAN_MGMT_SYS from type VARCHAR(7) to VARCHAR(3) because reducing the byte-length of a varchar is not supported.
Any help is greatly appreciated.

Loading CSV data containing string and numeric format to Ignite is failing

I am evaluating Ignite and trying to load CSV data to Apache Ignite. I have created a table in Ignite:
jdbc:ignite:thin://127.0.0.1/> create table if not exists SAMPLE_DATA_PK(SID varchar(30),id_status varchar(50), active varchar, count_opening int,count_updated int,ID_caller varchar(50),opened_time varchar(50),created_at varchar(50),type_contact varchar, location varchar,support_incharge varchar,pk varchar(10) primary key);
I tried to load data to this table with command:
copy from '/home/kkn/data/sample_data_pk.csv' into SAMPLE_DATA_PK(SID,ID_status,active,count_opening,count_updated,ID_caller,opened_time,created_at,type_contact,location,support_incharge,pk) format csv;
But the data load is failing with this error:
Error: Server error: class org.apache.ignite.internal.processors.query.IgniteSQLException: Value conversion failed [column=COUNT_OPENING, from=java.lang.String, to=java.lang.Integer] (state=50000,code=1)
java.sql.SQLException: Server error: class org.apache.ignite.internal.processors.query.IgniteSQLException: Value conversion failed [column=COUNT_OPENING, from=java.lang.String, to=java.lang.Integer]
at org.apache.ignite.internal.jdbc.thin.JdbcThinConnection.sendRequest(JdbcThinConnection.java:1009)
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.sendFile(JdbcThinStatement.java:336)
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.execute0(JdbcThinStatement.java:243)
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.execute(JdbcThinStatement.java:560)
at sqlline.Commands.executeSingleQuery(Commands.java:1054)
at sqlline.Commands.execute(Commands.java:1003)
at sqlline.Commands.sql(Commands.java:967)
at sqlline.SqlLine.dispatch(SqlLine.java:734)
at sqlline.SqlLine.begin(SqlLine.java:541)
at sqlline.SqlLine.start(SqlLine.java:267)
at sqlline.SqlLine.main(SqlLine.java:206)
Below is the sample data I am trying to load:
SID|ID_status|active|count_opening|count_updated|ID_caller|opened_time|created_at|type_contact|location|support_incharge|pk
|---|---------|------|-------------|-------------|---------|-----------|----------|------------|--------|----------------|--|
INC0000045|New|true|1000|0|Caller2403|29-02-2016 01:16|29-02-2016 01:23|Phone|Location143||1
INC0000045|Resolved|true|0|3|Caller2403|29-02-2016 01:16|29-02-2016 01:23|Phone|Location143||2
INC0000045|Closed|false|0|1|Caller2403|29-02-2016 01:16|29-02-2016 01:23|Phone|Location143||3
INC0000047|Active|true|0|1|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||4
INC0000047|Active|true|0|2|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||5
INC0000047|Active|true|0|489|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||6
INC0000047|Active|true|0|5|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||7
INC0000047|AwaitingUserInfo|true|0|6|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||8
INC0000047|Closed|false|0|8|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||9
INC0000057|New|true|0|0|Caller4416|29-02-2016 06:10||Phone|Location204||10
Need help to understand how to figure out what is the issue and resolve it
You have to upload CSV without header line. Which contains the column names. An error is thrown when trying to convert the string value "count_opening" to a Integer.

what is the alternative for double datatype from spark sql(Databricks) to Sql Server Data warehouse

I have to load the data from azure datalake to data warehouse.I have created set up for creating external tables.there is one column which is double datatype, i have used decimal type in sql server data warehouse for creating the external table and file format is parquet.But using csv it is working.
i'm getting the following error.
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered
filling record reader buffer: ClassCastException: class
java.lang.Double cannot be cast to class parquet.io.api.Binary
(java.lang.Double is in module java.base of loader 'bootstrap';
parquet.io.api.Binary is in unnamed module of loader 'app'.
Can some one help me on this issue?
Thanks in advance.
CREATE EXTERNAL TABLE [dbo].[EXT_TEST1]
( A VARCHAR(10),B decimal(36,19)))
(DATA_SOURCE = [Azure_Datalake],LOCATION = N'/A/B/PARQUET/*.parquet/',FILE_FORMAT =parquetfileformat,REJECT_TYPE = VALUE,REJECT_VALUE = 1)
Column datatype in databricks:
A string,B double
Data: A | B
'a' 100.0050
Use float(53) which is of 53 digits precision and 8 bytes length.

Dbeaver Connecting to Hive - SQLException: Method not supported

I'm getting this error when trying to run a select after connecting to Hive.
Is this a bad jar file?
org.jkiss.dbeaver.model.impl.jdbc.JDBCException: SQL Error: Method not supported
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCConnectionImpl.prepareStatement(JDBCConnectionImpl.java:170)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCConnectionImpl.prepareStatement(JDBCConnectionImpl.java:1)
at org.jkiss.dbeaver.model.DBUtils.createStatement(DBUtils.java:985)
at org.jkiss.dbeaver.model.DBUtils.prepareStatement(DBUtils.java:963)
at org.jkiss.dbeaver.runtime.sql.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:313)
at org.jkiss.dbeaver.runtime.sql.SQLQueryJob.extractData(SQLQueryJob.java:633)
at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsProvider.readData(SQLEditor.java:1169)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetDataPumpJob.run(ResultSetDataPumpJob.java:132)
at org.jkiss.dbeaver.runtime.AbstractJob.run(AbstractJob.java:91)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Caused by: java.sql.SQLException: Method not supported
at org.apache.hadoop.hive.jdbc.HiveConnection.createStatement(HiveConnection.java:229)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCConnectionImpl.createStatement(JDBCConnectionImpl.java:350)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCConnectionImpl.prepareStatement(JDBCConnectionImpl.java:138)
... 9 more
There is a calls in hive jdbc jar called org.apache.hive.jdbc.HiveResultSetMetaData . This class contains a method isWritable which is not supported by hive yet. This is the reason why you get the error "Method not supported".
Take the source code of this class and update the above method. Then generate the class and replaced it in the jar. This worked for me.