Google Dataflow: how to insert RECORD non-repeated type field to Big Query? - google-bigquery

I'm new to Dataflow. I've got a predefined-schema containing a non-repeated RECORD field called "device":
device.configId: STRING
device.version: STRING
Using a ParDo transform, I tried inserting a TableRow with this kind of field, as follows:
TableRow row = new TableRow();
row.put("field1", "val1");
TableRow device = new TableRow();
device.put("configId", "conf1");
device.put("version", "1.2.3");
row.put("device", device);
out.output(row);
I logged the table row, it looks like this:
{field1=val1, device={configId=conf1, version=1.2.3}}
I output it to a standard transform: BigQueryIO.write()
But the latter issues an error:
java.lang.RuntimeException: java.io.IOException:
Insert failed: [{"errors":[{
"debugInfo":"",
"location":"device.configid",
"message":"This field is not a record.",
"reason":"invalid"
}],"index":0}]
Not sure why, but note the location spells "configid" in lowecase - not in camel case as in the original log.
Any ideas on how to insert such an object to BigQuery?

Found out the problem. Apparently, this error message was caused only when the "configId" field was set to null rather than "conf1". To be exact, it was implicitly set to JSONObject.NULL coming from some input object.

Related

How to append the data in big query having a column mode as REQUIRED in python? Error I get is ABC_Col changed mode from REQUIRED to NULLABLE

value = self.bq_client.import_data_from_dataframe(table=table_id, dataframe=dataframe,
source_format=bigquery.SourceFormat.CSV,
autodetect=True,
write_disposition=operation_type,
allow_quoted_newlines=True)
The code I am using right now is the above but it throws an exception as I mentioned when I try to append data into my BQ table. There are no nulls in the data to be appended.

Google Cloud Datalow:Getting a below error at runtime

I am writing data into nested array BQ table(array name inside the table is -merchant_array)using my dataflow template.
Sometime its running fine and loading the data but sometime its giving me that error at run time.
java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: com.fasterxml.jackson.databind.JsonMappingException: Null key for a Map not allowed in JSON (use a converting NullKeySerializer?) (through reference chain: com.google.api.services.bigquery.model.TableRow["null"])
"message" : "Error while reading data, error message: JSON parsing error in row starting at position 223615: Only optional fields can be set to NULL. Field: merchant_array; Value: NULL",
Anyone has any idea why I am getting this error.
Thanks in advance.
here I got the issue that was causing error so I am posting my own question's answer,it might be helpful for anyone.
So the error was like-
Only optional fields can be set to NULL. Field: merchant_array; Value: NULL",
And here merchant_array is defined as an array that contains record (repetitive) data.
As per google doc the the array can not be-
ARRAYs cannot be NULL.
NULL ARRAY elements cannot persist to a table.
At the same time I was using arraylist in my code, that allows null values. So before making a record type data in code or setting the data in arraylist, just remove the NULL tablerows if exist.
hope this will helpful.

Write to a dynamic BigQuery table through Apache Beam

I am getting the BigQuery table name at runtime and I pass that name to the BigQueryIO.write operation at the end of my pipeline to write to that table.
The code that I've written for it is:
rows.apply("write to BigQuery", BigQueryIO
.writeTableRows()
.withSchema(schema)
.to("projectID:DatasetID."+tablename)
.withWriteDisposition(WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED));
With this syntax I always get an error,
Exception in thread "main" java.lang.IllegalArgumentException: Table reference is not in [project_id]:[dataset_id].[table_id] format
How to pass the table name with the correct format when I don't know before hand which table it should put the data in? Any suggestions?
Thank You
Very late to the party on this however.
I suspect the issue is you were passing in a string not a table reference.
If you created a table reference I suspect you'd have no issues with the above code.
com.google.api.services.bigquery.model.TableReference table = new TableReference()
.setProjectId(projectID)
.setDatasetId(DatasetID)
.setTableId(tablename);
rows.apply("write to BigQuery", BigQueryIO
.writeTableRows()
.withSchema(schema)
.to(table)
.withWriteDisposition(WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED));

BigQuery Java API - Cannot Send Null For TableRow Value

I am trying to set values on BigQuery table using java big query api but its throwing NullPointerException (java.lang.NullPointerException: null value in entry: popup=null ) every time I send the null value.
The null value should be completely acceptable since my Mode is NULLABLE on schema itself. I have bunch of other fields that have null value on them.
Any suggestion on this issue would be really helpful for me, I am stuck no where due to this.
Note : I may ignore and not set those fields having null values on them but that is not the solution I am looking for. My piece of code is below:
TableRow row = new TableRow();
row.set("ip", "test");
row.set("popup", null);
Don't explicitly set the value to null. Simply ignore it. If it's not present in the payload to BigQuery, it will be set to null. You will not be able to set it to null anyway, because the API is checking for null parameters, and you can't change that behaviour.
So, do this instead:
TableRow row = new TableRow();
row.set("ip", "test");

String was not recognized as a valid Boolean Error on varchar column

I am getting this error:
String was not recognized as a valid Boolean.Couldn't store <No> in meetsstd Column. Expected type is Boolean
When I am running this query:
SELECT * FROM work_nylustis_2013_q3.nylustis_details WHERE siteid = 'NYLUSTIS-155718' LIMIT 50
From this code:
Adapter.SelectCommand = New NpgsqlCommand(SQL, MLConnect)
Adapter.Fill(subDT) ' This line throws error
The meetsstd field is a varchar(3) and it does store either a 'Yes' or a 'No' value. How is this getting this confused with a boolean - a varchar should not care whether is holds 'Yes', or 'Si', or 'Oui'? And it only happens on 27 records out of the 28,000 in the table.
I usually blame npgsql for this kind of strangeness, but the last entry in the stack trace is: System.Data.DataColumn.set_Item(Int32 record, Object value)
Any clues?
Thanks!
Brad
To check if it is problem with database or with driver you can reduce problem to one row and column using your current environment:
SELECT meetsstd FROM work_nylustis_2013_q3.nylustis_details WHERE sitenum=1
(of course you must change sitenum into primary key)
Then try such query using psql, pgAdmin or some JDBC/ODBC based general editor.
If psql shows such record which raises error with your Npgsql based application then problem is with Npgsql driver or problem is with displaying query results.
If other tools shows such strange errors then problem is with your data.
Have you changed type of meetsstd field? Mayby you try to show it on some grid and this grid used Boolean field which was converted to Yes/No for displaying?