Load avro decimal data into BigQuery? - hive

I have AVRO hive-table which has some columns as decimal.I know How Avro store the decimal data that is in Logical-type format.But When i load these data into BigQuery, BigQuery is not able to parse these decimal data it treat them like garbage value.I don't know to load these decimal data into BigQuery.Any help would be appreciated.

With AVRO_DECIMAL type being a relatively recent addition, BigQuery doesn's support it yet. Here's the issuetracker link:
https://issuetracker.google.com/issues/65641870
As a temporary workaround you can convert your decimal data to a floating point representation so it can be loaded into BigQuery.

Related

What BigQuery data type do I use to support many decimal digits

We are importing data into BigQuery from a mainframe system, and some of the monetary values look like this: "54091.4369372923497267759562841530054645". If I cast it to NUMERIC, which is recommended for financial data, I only retain 9 decimal digits. If I cast it to FLOAT64, I get 12 decimal digits.
Is there any way for me to retain all the original information without losing precision?
I get that FLOAT64 is not recommended for financial calculations, but I would still expect to retain more decimals.
Moving the comment to an answer for completeness:
The NUMERIC type in BigQuery won't be able to hold a number of this style.
In the meantime (to prevent data loss), just store these numbers as a STRING.
If this is an important feature for you, use the BigQuery public issue tracker - to enter and follow the request.

How many bytes does a BigQuery Date require

I've got a timestamp column in BigQuery, and now I realize I could have used a date data type to represent this column instead (I don't need fine time granularity). My table is large and expensive to query so I'm wondering whether I'll save money by converting it to a new column of type DATE instead.
However, the official BigQuery documentation on data types doesn't seem to indicate how many bytes a date object requires. Does anyone here know?
DATE and TIMESTAMP both require 8 bytes
You can see more details at Data size calculation

Editing Parquet Files as Binary

Assuming Parquet files on AWS S3 (used for querying by AWS Athena).
I need to anonymize a record with specific numeric field by changing the numeric value (changing one digit is enough).
Can I scan a parquet file as Binary and find a numeric value ? Or the compression will make it impossible to find such string ?
Assuming I can do #1 - can I anonymize the record by changing a digit on this number on the binary level without corrupting the parquet file ?
10X
No, this will not be possible. Parquet has two layers in its format that make this impossible: encoding and compression. They both reorder the data to fit into less space, the difference between them is CPU usage and universalness. Sometimes data can be compressed so that we need less than a byte per value if all values are the same / very similar. Changing a single value would than lead to more space usage which in turn makes your edit impossible.

what is the corresponding sql data type for abap DF16_DEC data type?

I am trying to map abap data type to SQL data type, but I don't know to which data type in SQL should I map. I am trying to map these following data type:
DF16_DEC: 8 byte Decimal floating point number stored in BCD format.
DF16_RAW: 8 byte Decimal floating point number stored in binary format.
DF34_DEC: 16 byte Decimal floating point number stored in BCD format.
DF34_RAW: 16 byte Decimal floating point number stored in binary format.
Can anyone tell me to which SQL data type should I map these types?
That is answered rather extensively in the on-line documentation. Be aware that your question does not make much sense within the ABAP environment because DFnn_[DEC|RAW] already are dictionary types that can be mapped to runtime types decfloatnn. Generally speaking, DFnn_DEC is mapped to a DEC type and DFnn_RAW is mapped to a RAW type. Mapping these types to the underlying database types might depend on the DBMS product used though.

fast parse property of flat file source

I am using the flat file source for a large data migration and the source data in the text stream form unlike UI, datetime or sting. The component is not supporting for fast parsing for text stream.
Could I get any ideas to improve fast performance in this scenario.
thanks
prav
As you've seen fast parse does not support strings. It only supports integers, date and time and then with caveats
The first thing I would do is ensure that you're using the smallest data types you can in your flow definition (WSTR rather than NTEXT for example if you're strings < 4000 characters).
This problem has solved by taking DT_STR instead of DT_TEXT by chaning my DB design for better performance issue. I got 1 million rows transfer in 13 sec. Which is required for my business logic.
thanks
prav