data is converting to binary format while loading data into monet db using Apache pig - apache-pig

I am using MonetDb-Pig layer to load the csv data into Monet db. Internally it is using Binarybulkload commands to load the data but after loading data into table, the csv file values are not not matching with Monet db table values(int ,double).Seems to be data converted into binary format.
How can we get back the actual values in monetdb? .
Table Structure that I am using
CREATE TABLE "test" (
"s_suppkey" INT,
"s_name" CLOB,
"s_address" CLOB,
"s_nationkey" INT,
"s_phone" CLOB,
"s_acctbal" DOUBLE,
"s_comment" CLOB
);
Load command that I am using
COPY BINARY INTO "test" FROM (
'$PATH/part-1/col-0.bulkload',
'$PATH/part-1/col-1.bulkload',
'$PATH/part-1/col-2.bulkload',
'$PATH/part-1/col-3.bulkload',
'$PATH/part-1/col-4.bulkload',
'$PATH/part-1/col-5.bulkload',
'$PATH/part-1/col-6.bulkload'
);

please convert byte buffer from BigEndian to LittleEndian, and check

The information provided is insufficient to isolate the issue. The most probable issue is a mis-alignment of the number of values in each of the binary columns.
Check the size of the elements in 's_acctbal' input file, to see if it
produced Floats instead of Double binary values.
btw. the MonetDB-Pig project is not actively maintained, but we welcome
patches.

Related

Migrating data from Hive PARQUET table to BigQuery, Hive String data type is getting converted in BQ - BYTES datatype

I am trying to migrate the data from Hive to BigQuery. Data in Hive table is stored in PARQUET file format.Data type of one column is STRING, I am uploading the file behind the Hive table on Google cloud storage and from that creating BigQuery internal table with GUI. The datatype of column in imported table is getting converted to BYTES.
But when I imported CHAR of VARCHAR datatype, resultant datatype was STRING only.
Could someone please help me to explain why this is happening.
That does not answer the original question, as I do not know exactly what happened, but had experience with similar odd behavior.
I was facing similar issue when trying to move the table between Cloudera and BigQuery.
First creating the table as external on Impala like:
CREATE EXTERNAL TABLE test1
STORED AS PARQUET
LOCATION 's3a://table_migration/test1'
AS select * from original_table
original_table has columns with STRING datatype
Then transfer that to GS and importing that in BigQuery from console GUI, not many options, just select the Parquet format and point to GS.
And to my surprise I can see that the columns are now Type BYTES, the names of the columns was preserved fine, but the content was scrambled.
Trying different codecs, pre-creating the table and inserting still in Impala lead to no change.
Finally I tried to do the same in Hive, and that helped.
So I ended up creating external table in Hive like:
CREATE EXTERNAL TABLE test2 (col1 STRING, col2 STRING)
STORED AS PARQUET
LOCATION 's3a://table_migration/test2';
insert into table test2 select * from original_table;
And repeated the same dance with copying from S3 to GS and importing in BQ - this time without any issue. Columns are now recognized in BQ as STRING and data is as it should be.

how to solve Data type BLOB can not be converted to varchar2

I have created in apex 5.1 a report with form in oracle apex 5.1 in which I have a BLOB column called 'LIEN'. And when I insert data in the table and run the application I get this error:
Data type BLOB can not be converted to VARCHAR2!
How can this be solved?
Blob is used for binary data, like image or other binary files.
For textual long fields a Clob or NClob should be used.
A binary representation in string should be used for Blob such as HEX or Base64.
For Oracle there are several stored procedure, or functions for this purpose, such as rawtohex(COLUMN), utl_raw.cast_to_varchar2(utl_encode.base64_encode(COLUMN)) and some other.

How to Insert BLOB Values

I have the following table, FILES:
create table files(
id number,
file_name varchar2(25),
file_data blob);
I would like to be able to store data about binary files located on my computer in this table. However, when converting a file on my computer to hex, the string is too long to be inserted as Oracle will not work with string literals that have a length greater than 4,000. How may I insert a record into this table?
Usually what you do is:
You create an empty "Blob" object in your application.
You insert the empty Blob into the database as one of the columns of the row.
Then, in the same transaction, you retrieve an "output stream" from the Blob object you just inserted.
You send data to the output stream until all bytes are sent.
You close the output stream.
You commit the transaction.
It's a really bad practice to load entire files into memory and then insert them into the database. Use streaming instead.

SAP Vora dealing with decimal type

So I'm trying to create and load Vora table from an ORC file created by SAP BW archiving process on HDFS.
The Hive table automatically generated on top of that file by BW has, among other things, this column:
archreqtsn decimal(23,0)
At attempt to create a Vora table using that datatype fails with the error "Unsupported type (DecimalType(23,0)}) on column archreqtsn".
So, the biggest decimal supported seems to be decimal(18,0)?
Next thing I tried was to either use decimal(18,0) or string as the type for that column. But when attempting to load data from a file:
APPEND TABLE F002_5_F
OPTIONS (
files "/sap/bw/hb3/nldata/o_1ebic_1ef002__5/act/archpartid=p20170611052758000009000/000000_0",
format "orc" )
I'm getting another error:
com.sap.spark.vora.client.VoraClientException: Could not load table F002_5_F: [Vora [<REDACTED>.com.au:30932.1639407]] sap.hanavora.jdbc.VoraException: HL(9): Runtime error. (decimal 128 unsupported (c++ exception)).
An unsuccessful attempt to load a table might lead to an inconsistent table state. Please drop the table and re-create it if necessary. with error code 0, status ERROR_STATUS
What could be the workarounds for this issue of unsupported decimal types? In fact, I might not need that column in the Vora table at all, but I can't get rid of it in the ORC file.

Convert from string to int in SSIS

I'm converting a database from one structure into a new structure. The old database is FoxPro and the new one is SQL server. The problem is that some of the data is saved as char data in foxpro but are actually foreign key tables. This means they need to be int types in sql. Problem is When i try to do a data conversion in SSIS from any of the character related types to an integer, I get something along the following error message:
There was an error with the output column "columnName"(24) on output "OLE DB Source Output" (22). The column status returned was : "The value could not be converted because of potential loss of data".
How do i convert from a string or character to an int without getting the potential loss of data error. I hand checked the values and it looks like all of them are small enough to fit into an int data type.
Data source -> Data Conversion Task.
In Data Conversion Task, click Configure Error Output
For Error and Truncation, change it from Fail Component to Redirect Row.
Now you have two paths. Good data will flow out of the DCT with the proper types. The bad data will go down the Red path. Do something with it. Dump to a file, add a data view and inspect, etc.
Values like 34563927342 exceed the max size for integer. You should use Int64 / bigint