BigQuery Could not parse 'null' as int for field - google-bigquery

Tried to load csv files into bigquery table. There are columns where the types are INTEGER, but some missing values are NULL. So when I use the command bq load to load, got the following error:
Could not parse 'null' as int for field
So I am wondering what are the best solutions to deal with this, have to reprocess the data first for bq to load?

You'll need to transform the data in order to end up with the expected schema and data. Instead of INTEGER, specify the column as having type STRING. Load the CSV file into a table that you don't plan to use long-term, e.g. YourTempTable. In the BigQuery UI, click "Show Options", then select a destination table with the table name that you want. Now run the query:
#standardSQL
SELECT * REPLACE(SAFE_CAST(x AS INT64) AS x)
FROM YourTempTable;
This will convert the string values to integers where 'null' is treated as null.

Please try with job config setting.
job_config.null_marker = 'NULL'
configuration.load.nullMarker
string
[Optional] Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load

BigQuery Console has it's limitations and doesn't allow you to specify a null marker while loading data from a CSV. However, it can easily be done by using the BigQuery command-line tool's bq load command. We can use the --null_marker flag to specify the marker which is simply null in this case.
bq load --source_format=CSV \
--null_marker=null \
--skip_leading_rows=1 \
dataset.table_name \
./data.csv \
./schema.json
Setting the null_marker as null does the trick here. You can omit the schema.json part if the table is already present with a valid schema. --skip_leading_rows=1 is used because my first row was a header.
You can learn more about the bg load command in the BigQuery Documentation.
The load command however lets you create and load a table in a single go. The schema needs to be specified in a JSON file in the below format:
[
{
"description": "[DESCRIPTION]",
"name": "[NAME]",
"type": "[TYPE]",
"mode": "[MODE]"
},
{
"description": "[DESCRIPTION]",
"name": "[NAME]",
"type": "[TYPE]",
"mode": "[MODE]"
}
]

Related

problem on changing a columns' data type in BigQuery

I try to change a columns' data type from string to DATETIME (for example '04/12/2016 02:47:30') with the format 'YY/MM/DD HH24:MI:SS' but it shoes an error like :
Failed to parse input timestamp string at 8 with format element ' '
The initial file was a csv which i uploaded from my drive. I tried to convert the column's data type from google sheets and then re'upload it, but the column type still remains as string.
I think when you load your CSV file to the BigQuery table, you use autodetect mode.
Unfortunately with this mode, BigQuery will consider your date as String even if you changed it from Google Sheet.
Instead of using autodetect, I propose you using a Json schema for your BigQuery table.
In your schema you will indicate that the column type for your date field is timestamp.
The format you indicated 04/12/2016 02:47:30 is compatible with a timestamp and BigQuery will convert it for you.
For the loading file to BigQuery, you can directly use the console or gcloud cli with bq command :
bq load \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
For the BigQuery Json schema, the timestamp type is :
{
{
"name": "yourDate",
"type": "TIMESTAMP",
"mode": "NULLABLE",
"description": "Your date"
}
}

Add file name and timestamp into each record in BigQuery using Dataflow

I have a few .txt files with data in JSON to be loaded to google BigQuery table. Along with the columns in the text files I will need to insert filename and current timestamp for each rows. It is in GCP Dataflow with Python 3.7
I accessed the Filemetadata containing the filepath and size using GCSFileSystem.match and metadata_list.
I believe I need to get the pipeline code to run in a loop, pass the filepath to ReadFromText, and call a FileNameReadFunction ParDo.
(p
| "read from file" >> ReadFromText(known_args.input)
| "parse" >> beam.Map(json.loads)
| "Add FileName" >> beam.ParDo(AddFilenamesFn(), GCSFilePath)
| "WriteToBigQuery" >> beam.io.WriteToBigQuery(known_args.output,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
)
I followed the steps in Dataflow/apache beam - how to access current filename when passing in pattern? but I can't make it quite work.
Any help is appreciated.
You can use textio.ReadFromTextWithFilename instead of ReadFromText. That will produce a PCollection of (filename,line) tuples.
To include the file and timestamp in your output json record, you could change your "parse" line to
| "parse" >> beam.map(lambda (file, line): {
**json.loads(line),
"filename": file,
"timestamp": datetime.now()})

Convert Empty string ("") to Double data type while importing data from JSON file using command line BQ command

What steps will reproduce the problem?
1.I am running command:
./bq load --source_format=NEWLINE_DELIMITED_JSON --schema=lifeSchema.json dataset_test1.table_test_3 lifeData.json
2.I have attached data source file and scema files.
3. It throws an error - JSON parsing error in row starting at position 0 at file:
file-00000000. Could not convert value to double. Field:
computed_results_A; Value:
What is the expected output? What do you see instead?
I want empty string converted as NULL or 0
What version of the product are you using? On what operating system?
I am using MAC OSX YOSEMITE
Source JSON lifeData.json
{"schema":{"vendor":"com.bd.snowplow","name":"in_life","format":"jsonschema","version":"1-0-2"},"data":{"step":0,"info_userId":"53493764","info_campaignCity":"","info_self_currentAge":45,"info_self_gender":"male","info_self_retirementAge":60,"info_self_married":false,"info_self_lifeExpectancy":0,"info_dependantChildren":0,"info_dependantAdults":0,"info_spouse_working":true,"info_spouse_currentAge":33,"info_spouse_retirementAge":60,"info_spouse_monthlyIncome":0,"info_spouse_incomeInflation":5,"info_spouse_lifeExpectancy":0,"info_finances_sumInsured":0,"info_finances_expectedReturns":6,"info_finances_loanAmount":0,"info_finances_liquidateSavings":true,"info_finances_savingsAmount":0,"info_finances_monthlyExpense":0,"info_finances_expenseInflation":6,"info_finances_expenseReduction":10,"info_finances_monthlyIncome":0,"info_finances_incomeInflation":5,"computed_results_A":"","computed_results_B":null,"computed_results_C":null,"computed_results_D":null,"uid_epoch":"53493764_1466504541604","state":"init","campaign_id":"","campaign_link":"","tool_version":"20150701-lfi-v1"},"hierarchy":{"rootId":"94583157-af34-4ecb-8024-b9af7c9e54fa","rootTstamp":"2016-06-21 10:22:24.000","refRoot":"events","refTree":["events","in_life"],"refParent":"events"}}
Schema JSON lifeSchema.json
{
"name": "computed_results_A",
"type": "float",
"mode": "nullable"
}
Try loading the JSON file as a one column CSV file.
bq load --field_delimiter='|' proj:set.table file.json json:string
Once the file is loaded into BigQuery, you can use JSON_EXTRACT_SCALAR or a JavaScript UDF to parse the JSON with total freedom.

monetdb: export query result

How to export monetdb query result (e.g. to csv file)?
Manual says:
Copy into File
The COPY INTO command with a file name argument allows for fast
dumping of a result set into an ASCII file. The file must be
accessible by the server and a full path name may be required. The
file STDOUT can be used to direct the result to the primary output
channel.
The delimiters and NULL AS arguments provide control over the layout
required.
COPY subquery INTO file_name [ [USING] DELIMITERS
field_separator [',' record_separator [ ',' string_quote ]]] [ NULL AS
null_string ]
https://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto
I'm trying with various syntax but with no result.
example query:
select * from test;
example failures:
copy select * from test into test.csv;
copy "select * from test" into test.csv;
OK. Missing apostrophe and full path. Also delimiters useful
copy select * from test into '/home/user/test.csv' using delimiters ',';

Bigquery Loading CSV File with 'null' text in the columns

I have try to upload a CSV file into Bigquery using Google Cloud Client Library. In that one of the CSV file has 'null' text in the columns, while uploading the file Bigquery returns an error message saying "Too Few Columns".
Sample File Data:
column1, column2, column3, column4
1, null, 3, null,
2, null, null, null
I have verified the Configuration json sent, it has four Table fields for 4 columns. And error message says 'Expected 4 column(s) but got 2 column(s)'.
Is there any specfic configuration required to handle this scenario?
If the columns are numeric, then you specify a null with an empty value.
For example, this works.
$ echo 2,,, > rows.csv
$ bq load lotsOdata.lfdhjv2 rows.csv c1:integer,c2:integer,c3:integer,c4:float
Waiting on bqjob_r4f71e9aebbf9cb57_00000144acfa7622_1 ... (23s) Current status: DONE
Note that in your example above, you would have an extra value in the 1,null,3,null, line because there is an extra comma at the end. And also note that if your .csv file has a header row, you should use the --skip_leading_rows=1 parameter so that the header doesn't get interpreted as data.