Google BigQuery Create/append to table from Avro internalError - google-bigquery

I am fairly new to BigQuery, however I have been able to create and append to existing BigQuery tables from Avro files (both in EU region) until 1-2 days ago. I am only using the web UI so far.
I just attempted to create a new table from a newly generated Avro file and got the same error, details below:
Job ID bquijob_670fd977_15655fb3da1
Start Time Aug 4, 2016, 3:35:45 PM
End Time Aug 4, 2016, 3:35:53 PM
Write Preference Write if empty
Errors:
An internal error occurred and the request could not be completed.
(error code: internalError)
I am unable to debug because there is not really anything to go by.

We've just released a new feature to not creating the root field: https://cloud.google.com/bigquery/release-notes.
Since you have imported Avro before, we have excluded your project from this new feature. But unfortunately we had a bug with exclusion, and will cause reading Avro to fail. I think you most likely ran into this problem.
The fix will be released next week. If you don't need the root field and want to enable your project for the new feature, please send the project id to me, huazhang at google.com. Sorry for the trouble this has caused.

Related

error loading table on bigquery dashboard but queries works fine

I clicked a table on bigquery dashboard, got this error:
However, I can get data when I do a select on this table. (That means the table does exist)
I already have the highest admin privilege so it shouldn't be a permission issue.
I created this table with python script, which collects data, writes into a csv file, and upload the csv file to bigquery everyday. After I created the table I once changed the schema both in the script and on the dashboard. Not sure if that's the cause, but the table loading error occurred several days after I changed the schema.
If you have Addblock extensions, this might be the root cause of this issue. Thus, try disabling it, then try running your query again.
Hope it helps.

Is there size limit on appending ORC data files to Vora tables

I created a Vora table in Vora 1.3 and tried to append data to that table from ORC files that I got from SAP BW archiving process (NLS on Hadoop). I had 20 files, in total containing approx 50 Mio records.
When I tried to use the "files" setting in the APPEND statement as "/path/*", after approx 1 hour Vora returned this error message:
com.sap.spark.vora.client.VoraClientException: Could not load table F002_5F: [Vora [eba156.extendtec.com.au:42681.1640438]] java.lang.RuntimeException: Wrong magic number in response, expected: 0x56320170, actual: 0x00000000. An unsuccessful attempt to load a table might lead to an inconsistent table state. Please drop the table and re-create it if necessary. with error code 0, status ERROR_STATUS
Next thing I tried was appending data from each file using separate APPEND statements. On the 15th append (of 20) I've got the same error message.
The error indicates that the Vora engine on node eba156.extendtec.com.au is not available. I suspect it either crashed or ran into an out-of-memory situtation.
You can check the log directory for a crash dump. If you find one, please open a customer message for further investigation.
If you do not find a crash dump, it is likely a out-of-memory situation. You should find confirmation in either the engine log file or in /var/log/messages (if the oom killer ended the process). In that case, the available memory is not sufficient to load the data.

How can I undelete a BigQuery table after I already created a new one with the same name?

I accidentally deleted a table in BigQuery and I'd very much like to undelete it. I've read the answer in "How can I undelete a BigQuery table?" but in my case I already created a table with the same name, and now when I try to copy a snapshot with the timestamp being the time I created the new table with the same name, I get an error saying:
"error ... invalid snapshot time for table ... cannot read before
"
BTW, the timestamp is from a few hours ago. The whole thing happened in the last 24 hours.
So it seems to me that the problem is that BQ is trying to get a snapshot of the new table, while I need a snapshot of the old table with the same name.
Is there a way for me to access a snapshot of the older incarnation of the table to restore it?
If you're getting an error in the operation described in How can I undelete a BigQuery table?, I'm afraid you're out of luck. Undelete / time travel is a best-effort operation. Sorry about that!

Loading Avro-file to BigQuery fails with an internal error

Google BigQuery has on March 23, 2016 announced "Added support for Avro source format for load operations and as a federated data source in the BigQuery API or command-line tool". It says here "This is a Beta release of Avro format support. This feature is not covered by any SLA or deprecation policy and may be subject to backward-incompatible changes.". However, I'd expect the feature to work.
I didn't find anywhere code examples on how to use Avro format for loading. Neither I did find examples on how to use bq-tool for loading.
Here's my practical issue. I haven't been able to load data into BigQuery in Avro-format.
The following happens using bq-tool. The dataset, table name and bucket name have been obfuscated:
$ bq extract --destination_format=AVRO dataset.events_avro_test gs://BUCKET/events_bq_tool.avro
Waiting on bqjob_r62088699049ce969_0000015432b7627a_1 ... (36s) Current status: DONE
$ bq load --source_format=AVRO dataset.events_avro_test gs://BUCKET/events_bq_tool.avro
Waiting on bqjob_r6cefe75ece6073a1_0000015432b83516_1 ... (2s) Current status: DONE
BigQuery error in load operation: Error processing job 'dataset:bqjob_r6cefe75ece6073a1_0000015432b83516_1': An internal error occurred and the request could not be completed.
Basically, I am extracting from a table and inserting to the same table causing an internal error.
Additionally, I have Java program that does the same (extract from table X and load to table X) with the same result (internal error). But I think the above illustrates the problem as clearly as possible, and because of that I'm not sharing the code here. In Java, If I extract from an empty table and insert that, the insert job doesn't fail.
My questions are
I think BigQuery API should never fail with internal error. Why is that happening with my test?
Is the extracted Avro file compatible with an insert job?
There seems to be no specification what the Avro schema in an insert job is like, at least I couldn't find any. Could the documentation be created?
UPDATED 2016-04-25:
So far I've managed to get an Avro load job not to give an internal error based on the hint of not using REQUIRED fields. However, I haven't managed to load non-null values.
Consider this Avro-schema:
{
"type": "record",
"name": "root",
"fields": [
{
"name": "x",
"type": "string"
}
]
}
The BigQuery table has one column, x that is NULLABLE.
If I insert N (I've tried with one and two) rows (x being e.g. 1), I got N rows in BigQuery but x always having value null.
If I change the table so that X is REQUIRED I get an internal error.
There is no exact match from a BQ schema to Avro schema, and vice versa, so when you export a BQ table to Avro file and then import back, the schema will be different. I see the destination table of your load already exists, in this case we throw an error when the schema of the destination table doesn't match the schema we converted from the Avro schema. This should be an external error though, we're investigating why it's an internal error.
We're in the middle of upgrading the export pipeline, and the new import pipeline has a bug that doesn't work with the Avro file exported by the current pipeline. The fix should be deployed in a couple weeks. After that, if you import the exported file to a non-existent destination table, or a destination table with compatible schema, it should work. Meanwhile, importing your own Avro files should work. You can also query it directly on GCS without importing it.
There's a problem with the error mapping for the AVRO reader here. The error should have been along the lines of: "The reference schema differs from the existing data: The required field 'api_key' is missing"
Looking at your load job configuration, it includes REQUIRED fields. It sounds like some of the data you are trying to load doesn't specify these required fields, so the operation fails.
I suggest avoiding required fields.
So, there's a bug in BigQuery: an insert job using Avro format does not work if the destination table exists, but gives an internal error. The workaround is to use createDisposition CREATE_IF_NEEDED and not to have the pre-existing table there. I verified that this works.
Hua Zung's comment says that the bug will be fixed in "the fix should be deployed in a couple weeks". Needless to say that existing major bugs in the live system should be documented somewhere.
While updating the system, I really recommend improving the Avro documentation. Now there's no mention on what the Avro schema should be like (type record, name root and fields array having the columns(?)) and not even the fact that each record in the Avro file maps to a row in the destination table (obvious, but should be mentioned). Also what happens with schema mismatch is not documented.
Thanks for the help, I'll be now switching to Avro-format. It's so much better than CSV.

Unexpected error, while copy query results to table using google java bigquery api for GAE

I'm trying to copy query result to a new table, but I'm getting an error :
Copy
11:13am
query results to 49077933619:TelcoNG.table
Errors:
Unexpected. Please try again.
Job ID: job_090d08f69c8e4199afeca131b5279393
Start Time: 11:13am, 12 Aug 2013
End Time: 11:13am, 12 Aug 2013
Copy Source: 49077933619:_8dc46c0daeb9142a91aa374aa59d615c3703e024.anon17d88e0e_0960_4510_9740_b753109050f4
Destination Table: 49077933619:TelcoNG.table
I get this error since last Thursday (8 Aug 2013).
This functionality has worked perfect for over an year.
Are there any changes in the API?
It looks like there is a bug in detecting which datacenters a table created as the result of a query has been replicated to. You're doing the copy operation very soon after the query finished, and before the results have finished replicating. As I mentioned, this is a bug and we should fix it very soon.
As a short-term workaround, you can either wait a few minutes between the query and the copy operation, or you can set a destination table on your query, so you don't need to do the copy operation at all.