Converting Protobuf Datatypes map<string , google.protobuf.Any> to BigQuery Datatypes - google-bigquery

I'm wondering if we can convert Protobuf datatypes of map to bigquery schema datatypes. I've tried to search relevant issue and found this issue [1]. However it's seems like I still don't find the answer.
[1] Bigquery importing map's json data into a table

Related

Unknown postgis gps data format

A third-party program stores tracking data to the db, but I not understand the format. I know that postgis is working there and this column should contain GPS location(s) and maybe additional data.
Example (db dump as csv):
"Location","DateTime"
"010100000023E37C4023E33C40417F41EF407F4740","2020-05-24 15:33:53+00"
How can I decode Location column data?
This is Well-known binary format.
See PostGIS methods for WKB: ST_AsBinary, ST_GeomFromWKB.
WKT methods: ST_AsText, ST_GeomFromText.
The example in WKT format: POINT(28.887256651392033 46.99416914651966).
For .Net can use Geo, NetTopologySuite.IO.TinyWKB.

Issue while writing into Date datatype in Big Query using Spark Java

I am trying to store date datatype column in BigQuery via Spark
cast(from_unixtime(eventtime*60) as date) as createdDate
I have tried to_date as well like below, but no luck
to_date(from_unixtime(eventtime*60)) as createdDate
Now I am trying to save this dataset using Spark-BigQuery connector, it is giving me error that field createdDate has changed type from DATE to INTEGER. But when I try to print the schema in spark, its correctly saying that the column data types is Date.
|-- createdDate: date (nullable = false)
Not sure why its failing while loading data into BigQuery.
And the same things works if I change the data types from Date to Timestamp. Please advice.
The resolution is to use intermediateFormat as Orc. With intermediate format as Avro it is not working, and we can't use parquet(default) format as we have array data type in our table where Big Query create intermediate format like explained here.
Save Array<T> in BigQuery using Java

Schema Evolution Comparison Apache Avro Vs Apache Parquet

I would like to cross check my understanding about the differences in File Formats like Apache Avro and Apache Parquet in terms of Schema Evolution. Looking at various blogs and SO answers gives me the following understanding. I need to verify if my understanding is correct and also I would like to know if I am missing on any other differences with respect to Schema Evolution. Explanation is given in terms of using these file formats in Apache Hive.
Adding column: Adding column (with default value) at the end of the columns is supported in both the file formats. I think adding column (with default value) in the middle of the columns can be supported in Parquet if hive table property is set "hive.parquet.use-column-names=true". Is this not the case?.
Deleting Column: As far as deleting column at the end of the column list is concerned, I think it is supported in both the file formats, i.e if any of the Parquet/Avro file has the deleted column also since the reader schema(hive schema) doesn't have the deleted column, even if the writter's schema(actual Avro or Parquet file schema) has additional column, I think it will be easily ignored in both the formats. Deleting the column in the middle of the column list also can be supported if the property "hive.parquet.use-column-names=true" is set. Is my understanding correct?.
Renaming column: When it comes to Renaming the column, since Avro has "column alias" option, column renaming is supported in Avro but not possible in Parquet because there are no such column aliasing option in Parquet. Am I right?.
Data type change: This is supported in Avro because we can define multiple datatypes for a single column using union type but not possible in Parquet because there is no union type in Parquet.
Am I missing any other possibility?. Appreciate the help.
hive.parquet.use-column-names=true needs to be set for accessing columns by name in Parquet. It is not only for column addition/deletion. Manipulating columns by indices would be cumbersome to the point of being infeasible.
There is a workaround for column renaming as well. Refer to https://stackoverflow.com/a/57176892/14084789
Union is a challenge with Parquet.

How to upload dataset into Google cloud platform? How to deal with datatypes?

While uploading my dataset from Google drive to Google cloud platform, I failed to edit the schema. Every time I uploaded the dataset, I was asked to edit the schema. For example, column: yearinjob was in the type of float. But while executing the query SELECT * FROM ...
it always says Error while reading the table: XXX.demo1.wkfc3, error message: Could not convert the value to float. Row 1888; Col 19.
I changed the schema for a datatype to integer, numeric, but none works except string.
Can anyone help me with it?
You might ensure that the info in the dataset that you are uploading is aligned to your Specified Schema.
When you use Schema auto-detection, BigQuery starts the inference process by selecting a random file in the data source and scanning up to 100 rows of data to use as a representative sample. BigQuery then examines each field and attempts to assign a data type to that field based on the values in the sample.
Check that the value at Row 1888; Col 19 matches with the assigned type for that field. That might be the cause of the error “Could not convert the value to float”.
This docs that may be helpful.
This is the documentation for Data Types and allowed values.
And this is about managing datasets.

BigQuery cant import the data from DataPrep

I have the table created in BigQuery with partitioned by date and it has the Date type. DataPrep also has the same column with same data type. When i try to load the data from dataprep to bigquery table i am getting the error like "The column datatypes in the dataset must match the destination column datatypes". Screenshot also been attached, please go through it and give me a solution.enter image description here
As the screenshot says, one is a TIME, DATETIME, or TIMESTAMP
the other is STRING as noted by the icons in front of your columns.
You need to make sure at dataset that you've chosen the right data type. Dataprep may infer wrong automatically sometimes your data type.
In this this thread it is mentioned that you need to convert both types to TIMESTAMP in order to make this work. In my case this did the trick, but it is kind of cumbersome. Hopefully they will support this for simple DATE columns soon.