SAP Vora dealing with decimal type - sap

So I'm trying to create and load Vora table from an ORC file created by SAP BW archiving process on HDFS.
The Hive table automatically generated on top of that file by BW has, among other things, this column:
archreqtsn decimal(23,0)
At attempt to create a Vora table using that datatype fails with the error "Unsupported type (DecimalType(23,0)}) on column archreqtsn".
So, the biggest decimal supported seems to be decimal(18,0)?
Next thing I tried was to either use decimal(18,0) or string as the type for that column. But when attempting to load data from a file:
APPEND TABLE F002_5_F
OPTIONS (
files "/sap/bw/hb3/nldata/o_1ebic_1ef002__5/act/archpartid=p20170611052758000009000/000000_0",
format "orc" )
I'm getting another error:
com.sap.spark.vora.client.VoraClientException: Could not load table F002_5_F: [Vora [<REDACTED>.com.au:30932.1639407]] sap.hanavora.jdbc.VoraException: HL(9): Runtime error. (decimal 128 unsupported (c++ exception)).
An unsuccessful attempt to load a table might lead to an inconsistent table state. Please drop the table and re-create it if necessary. with error code 0, status ERROR_STATUS
What could be the workarounds for this issue of unsupported decimal types? In fact, I might not need that column in the Vora table at all, but I can't get rid of it in the ORC file.

Related

What does this error mean: Required column value for column index: 8 is missing in row starting at position: 0

I'm attempting to upload a CSV file (which is an output from a BCP command) to BigQuery using the gcloud CLI BQ Load command. I have already uploaded a custom schema file. (was having major issues with Autodetect).
One resource suggested this could be a datatype mismatch. However, the table from the SQL DB lists the column as a decimal, so in my schema file I have listed it as FLOAT since decimal is not a supported data type.
I couldn't find any documentation for what the error means and what I can do to resolve it.
What does this error mean? It means, in this context, a value is REQUIRED for a given column index and one was not found. (By the way, columns are usually 0 indexed, meaning a fault at column index 8 is most likely referring to column number 9)
This can be caused by myriad of different issues, of which I experienced two.
Incorrectly categorizing NULL columns as NOT NULL. After exporting the schema, in JSON, from SSMS, I needed to clean it
up for BQ and in doing so I assigned IS_NULLABLE:NO to
MODE:NULLABLE and IS_NULLABLE:YES to MODE:REQUIRED. These
values should've been reversed. This caused the error because there
were NULL columns where BQ expected a REQUIRED value.
Using the wrong delimiter The file I was outputting was not only comma-delimited but also tab-delimited. I was only able to validate this by using the Get Data tool in Excel and importing the data that way, after which I saw the error for tabs inside the cells.
After outputting with a pipe ( | ) delimiter, I was finally able to successfully load the file into BigQuery without any errors.

BigQuery Scheduled Data Transfer throws "Incompatible table partitioning specification." Error - but error message is truncated

I'm using the new BQ Data Transfer UI and upon scheduling a Data Transfer, the transfer fails.
The error message in Run History isn't terribly helpful as the error message seems truncated.
Incompatible table partitioning specification. Expects partitioning specification interval(type:hour), but input partitioning specification is ; JobID: xxxxxxxxxxxx
Note the part of the error that says..."but input partition specification is..." with nothing before the semicolon. Seems this error is truncated.
Some details about the run:
The run imports data from a CSV file located in a GCS Bucket on a nightly basis. Once successfully ingested the process will delete the file. The target table in BQ is a partitioned table using the default partition pseudo column (_PARTITIONTIME)
What I have done so far:
Reran the scheduled Data Transfer -- which failed and threw the same error
Deleted the target table in BQ and recreated it with different partition specifications (day, hour, month) -- then Reran Scheduled Transfer -- failed and threw same error.
Imported the data manually (I downloaded the file from GCS and uploaded it locally from my machine) using the BQ UI (Create Table, append the specific table) - Worked perfectly.
Checked to see if this was a known issue here on Stack Overflow and only found this (which is now closed) -- close, but not exactly the issue. (BigQuery Data Transfer Service with BigQuery partitioned table)
What I'm holding off doing since it would take a bit more work.
Change schema of the target BQ table to include a specified column specific for partitioning
Include a system-generated timestamp in the original file inside of GCS and ensure the process recognizes this as the partitioning field.
Am I doing something wrong? Or is this a known issue?
Alright, I believe I have solved this. It looks like you need to include runtime parameters into your target table if the destination table is being partitioned.
https://cloud.google.com/bigquery-transfer/docs/gcs-transfer-parameters
Specifically this section called "Runtime Parameter Examples" here: https://cloud.google.com/bigquery-transfer/docs/gcs-transfer-parameters#loading_a_snapshot_of_all_data_into_an_ingestion-time_partitioned_table
They also advise that minutes cannot be specified in these parameters.
You will need to append the parameters to your destination table details as shown below:

Migrating data from Hive PARQUET table to BigQuery, Hive String data type is getting converted in BQ - BYTES datatype

I am trying to migrate the data from Hive to BigQuery. Data in Hive table is stored in PARQUET file format.Data type of one column is STRING, I am uploading the file behind the Hive table on Google cloud storage and from that creating BigQuery internal table with GUI. The datatype of column in imported table is getting converted to BYTES.
But when I imported CHAR of VARCHAR datatype, resultant datatype was STRING only.
Could someone please help me to explain why this is happening.
That does not answer the original question, as I do not know exactly what happened, but had experience with similar odd behavior.
I was facing similar issue when trying to move the table between Cloudera and BigQuery.
First creating the table as external on Impala like:
CREATE EXTERNAL TABLE test1
STORED AS PARQUET
LOCATION 's3a://table_migration/test1'
AS select * from original_table
original_table has columns with STRING datatype
Then transfer that to GS and importing that in BigQuery from console GUI, not many options, just select the Parquet format and point to GS.
And to my surprise I can see that the columns are now Type BYTES, the names of the columns was preserved fine, but the content was scrambled.
Trying different codecs, pre-creating the table and inserting still in Impala lead to no change.
Finally I tried to do the same in Hive, and that helped.
So I ended up creating external table in Hive like:
CREATE EXTERNAL TABLE test2 (col1 STRING, col2 STRING)
STORED AS PARQUET
LOCATION 's3a://table_migration/test2';
insert into table test2 select * from original_table;
And repeated the same dance with copying from S3 to GS and importing in BQ - this time without any issue. Columns are now recognized in BQ as STRING and data is as it should be.

Informatica BDM string data type

I am running my mapping on blaze engine and my target hive table is transnational.
I have a field coming from source with data type as 32000 (varchar), but when I run the mapping , it is getting failed with following error.
"The Integration Service failed to execute grid mapping with following error [An internal exception occurred with message: The length of the data is larger than the precision of the column.]."
Any insights will be very helpful
Note: 1. My target is a hive table and it's transactional properties are true.
2. I am running this mapping on blaze informatica engine which needs update strategy to be used.
3. Target column field length is also varchar(32000)
Check that the length of the source data you are ingesting is not bigger that 3200 characters.
If you using a flat file as source check the delimiters and be sure that is reading correctly the file.

ssis package ado destination Data Conversion warning

i'm working on ssis package and i'm taking values from .xml file to ado .net destination
but when i enter values iinto table getting following error :
potential data loss may occur due to inserting data from input column "Copy of swaps_Id" with data type "DT_I8" to external column "swaps_id" with data type "DT_I4". If this is intended, an alternative way to do conversion is using a Data Conversion component before ADO NET destination component
I have used Data conversion transformation editor then also getting above error
what should be corrected?
This warning means that data in Copy of swaps_Id is 64bit integer and you trying to insert it into 32bit integer column in destination table. What should you do depends on your data.
If you are sure that data in your column is in 32bit signed integer range (-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)) you can leave it as is (data truncation will never occur, bur warning will stay) or do data conversion or change Copy of swaps_Id column data type
If not, you should change column data type in your destination data table to 64bit integer (bigint in Sql Server)
This simply means that your source data type is Larger than what your destination can handle. On the Data Conversion Transformation, you may want to convert the column to DT_I8 datatype in order for that warning to disappear.