Error in bq load "Could not convert value to string" - google-bigquery

I tried to load logs from Google Cloud Storage to BigQuery by the bq command
and I've got this error "Could not convert value to string".
my example data
{"ids":"1234,5678"}
{"ids":1234}
my example schema
[
{ "name":"ids", "type":"string" }
]
It seems IDs can't convert by none quote at single ID.
Data is made with fluent-plugin-s3, but more than one ID connected by a comma can be bound up with a quotation and isn't made single id.
How can I load these data to BigQuery?
Thanks in advance

Well check different fluentd plugins that can help you, maybe
https://github.com/lob/fluent-plugin-json-transform
https://github.com/tarom/fluent-plugin-typecast

Related

Data Factory Copy Activity: Error found when processing 'Csv/Tsv Format Text' source 'xxx.csv' with row number 6696: found more columns than expected

I am trying to perform a simply copy activity in Azure Data Factory from CSV to SQL Table, but I'm getting the following error:
{
"errorCode": "2200",
"message": "ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source 'organizations.csv' with row number 6696: found more columns than expected column count 41.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "Copy data1",
"details": []
}
The copy activity is as follows
Source
My Sink is as follows:
As preview of the data in source is as follows:
This seems like a very straight forward copy activity. Any thoughts on what might be causing the error?
My row 6696 looks like the following:
3b1a2e5f-d08b-166b-4b91-eb53009b2377 Compassites Software Solutions organization compassites-software https://www.crunchbase.com/organization/compassites-software 318375 17/07/2008 10:46 05/12/2022 12:17 company compassitesinc.com http://www.compassitesinc.com IND Karnataka Bangalore "Pradeep Court", #163/B, 6th Main 3rd Cross, JP Nagar 3rd phase 560078 operating Custom software solution experts Big Data,Cloud Computing,Information Technology,Mobile,Software Data and Analytics,Information Technology,Internet Services,Mobile,Software 01/11/2005 51-100 info#compassitesinc.com 080-42032572 http://www.facebook.com/compassites http://www.linkedin.com/company/compassites-software-solutions http://twitter.com/compassites https://res.cloudinary.com/crunchbase-production/image/upload/v1397190270/c3e5acbde40f36eaf4f8c6f6eda3f803.png company
No commas
As the error message indicates, there is a record at row number 6696 where there is a value containing , as a character in it.
Look at the following demonstration where I have taken a similar case. I have 3 columns in my source. The data looks as shown below:
When I run use similar dataset settings and read these values, the same error would be thrown.
So, the value T1,OG is being considered as if they belong to 2 different columns since they have dataset delimiter within the value.
Such values would throw an error as it is ambiguous to read. One way to avoid this is to enclose such values with quote character (double quote in this case).
Now when I run the copy activity, it would give the desired output.
The table data would look like this:

Google Cloud Datalow:Getting a below error at runtime

I am writing data into nested array BQ table(array name inside the table is -merchant_array)using my dataflow template.
Sometime its running fine and loading the data but sometime its giving me that error at run time.
java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: com.fasterxml.jackson.databind.JsonMappingException: Null key for a Map not allowed in JSON (use a converting NullKeySerializer?) (through reference chain: com.google.api.services.bigquery.model.TableRow["null"])
"message" : "Error while reading data, error message: JSON parsing error in row starting at position 223615: Only optional fields can be set to NULL. Field: merchant_array; Value: NULL",
Anyone has any idea why I am getting this error.
Thanks in advance.
here I got the issue that was causing error so I am posting my own question's answer,it might be helpful for anyone.
So the error was like-
Only optional fields can be set to NULL. Field: merchant_array; Value: NULL",
And here merchant_array is defined as an array that contains record (repetitive) data.
As per google doc the the array can not be-
ARRAYs cannot be NULL.
NULL ARRAY elements cannot persist to a table.
At the same time I was using arraylist in my code, that allows null values. So before making a record type data in code or setting the data in arraylist, just remove the NULL tablerows if exist.
hope this will helpful.

PropertyType search problems with RESO API

I am using connect-mls RESO API and I am having a problem forming the query to search for via PropertyType.
http://odata.reso.org/RESO/OData/Property?$filter=/PropertyType/Name eq "Residential"
The above query keeps coming up with malformed URI.
I also run into a problem is if try to filter on the PropertyType field directly via $filter=(PropertyType eq 'Residental') or $filter=(PropertyType eq 'DE').
I get the following error message:
"message": "StatusCodeError: 400 - {\"error\":{\"code\":null,\"message\":\"The types 'ODataService.PropertyType' and 'Edm.String' are not compatible.\"}}"
Also looked at values in the data dictionary because it seems property type is a enum but have not had any success in any of the formats.
http://ddwiki.reso.org/display/DDW16/Property+Type+Summary
Appreciate any guidance on this.
I was able to find the answer from another source. For the enums they are in a format of ODataService.PropertyType'DE'. A proper API call example is listed below.
https://connectmls-api.mredllc.com/reso/odata/Property?$filter=PropertyType eq ODataService.PropertyType'DE'
For more detailed information on how to properly construct these types of queries, you can look at http://www.odata.org/documentation/

aws data pipeline datetime variable

I am using AWS Data Pipeline to save a text file to my S3 bucket from RDS. I would like the file name to to have the date and the hour in the file name like:
myfile-YYYYMMDD-HH.txt
myfile-20140813-12.txt
I have specified my S3DataNode FilePath as:
s3://mybucketname/out/myfile-#{format(myDateTime,'YYYY-MM-dd-HH')}.txt
When I try to save my pipeline I get the following error:
ERROR: Unable to resolve myDateTime for object:DataNodeId_xOQxz
According to the AWS Data Pipeline documentation for date and time functions this is the proper syntax for using the format function.
When I save pipeline using a "hard-coded" the date and time I don't get this error and my file is in my S3 bucket and folder as expected.
My thinking is that I need to define "myDateTime" somewhere or use a NOW()
Can somebody tell me how to set "myDateTime" to the current time (e.g. NOW) or give a workaround so I can format the current time to be used in my FilePath?
I am not aware of an exact equivalent of NOW() in Data Pipeline. I tried using makeDate with no arguments (just for fun) to see if that worked.. it did not.
The closest are runtime variables scheduledStartTime, actualStartTime, reportProgressTime.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-s3datanode.html
The following for eg. should work.
s3://mybucketname/out/myfile-#{format(#scheduledStartTime,'YYYY-MM-dd-HH')}.txt
Just for fun, here is some more info on Parameters.
At the end of your Pipeline Json (click List Pipelines, select into one, click Edit Pipeline, then click Export), you need to add a Parameters and/or Values object.
I use a myStartDate for backfill processes which you can manipulate once it is passed in for ad hoc runs. You can give this a static default, but can't set it to a dynamic value so it is limited for regular schedule tasks. For realtime/scheduled dates, you need to use the #scheduledStartTime, etc, as suggested. Here is a sample of setting up some Parameters and or Values. Both show up in Parameters in the UI. These values can be used through out your pipeline activities (shell, hive, etc) with the #{myVariableToUse} notation.
"parameters": [
{
"helpText": "Put help text here",
"watermark": "This shows if no default or value set",
"description": "Label/Desc",
"id": "myVariableToUse",
"type": "string"
}
]
And for Values:
"values": {
"myS3OutLocation": "s3://some-bucket/path",
"myThreshold": "30000",
}
You cannot add these directly in the UI (yet) but once they are there you can change and save the values.

Loading data into Google Big Query

my question is the following:
Let's say I have a json file that I want to load into big query.
It contains these two lines of data.
{"value":"123"}
{"value": 123 }
I have defined the following schema for my data.
[
{ "name":"value", "type":"String"}
]
When I try to load the json file into big query it will fail with the following error:
Field:value: Could not convert value to string
Is there a way to get around this issue other than transforming the data in the json file?
Thanks!
You can set the maxBadRecords property on the load job to skip a number of errors but still load the data.
Following your example, you could still load the data if you set it as:
"configuration": {
"load": {
"maxBadRecords": 1,
}
}
This is a way to get around the issue while still loading your JSON data into the table, just that the erroneous rows will be skipped. If loading a list of files, you could set it to be a function of the number of files that you are loading (e.g. maxBadRecords = 20 * fileCount)