Failed to load FileDescriptorProto for '_CLOUD_QUERY_METADATA_SCHEMA_' - google-bigquery

My Firebase project is integrated with BigQuery, so all raw Google Analytics events are exported daily & streamed to a dedicated collection.
Since today even simple queries on those events are failing with an error:
Error running query
Failed to load FileDescriptorProto for
'CLOUD_QUERY_METADATA_SCHEMA': ;Field number 23 has already been
used in "Msg_0_CLOUD_QUERY_TABLE" by field "items".
An example query which is failing:
SELECT * FROM `project.analytics_184030700.events_*` WHERE event_name IN ("share")
As I mentioned, those (and more advanced) queries used to run until yesterday. I did not change the schema nor any other configuration in the meantime. I've also noticed that BigQuery was updated yesterday.
Looking at the error description, looks like my table schema indeed contains a field called items (a very last one, 23rd) but it was automatically added by Google Analytics.
My suspicions:
Something went wrong with the recent BigQuery release
Something went wrong during daily sync Google Analytics -> BigQuery
Some old job or cache is getting in the way of new queries
At this point I have no idea what to try next. Does anyone have any insight into what could be causing this error?
EDIT:
I noticed that this problem was also just reported in the Google Issue Tracker here: https://issuetracker.google.com/issues/192325507.

I have same issue and I didn't solve it yet but as you said it's cause is Firebase I guess. There's an extra field problem which are limited only for three days (26th,27th and 28th June).
I checked all data older than 26th June but there was no privacy_info field. As you see there is no privacy_info field again for 29th June. I think firebase put this new field but they changed their mind for some reason. But this causes a big problem for us.
Update:
I changed this part:
SELECT * FROM `project.analytics_184030700.events_*`
Like this:
SELECT * FROM `project.analytics_184030700.events_2*`
Interestingly this worked for me.

You can do a workaround for that issue; It seems there are problems with the field
privacy_info
If you select multiple table partitions, just make sure you only select the fields you need, and omit the field privacy_info.
Not using "SELECT *" did resolve this error for me.

Related

How can I fetch (via GET) all JIRA issues? Do I go to the Search node?

It looks like /api/2/project easily returns all projects in a JIRA instance in JSON format.
I'd like to do the same for issues, but this does not appear to exist.
Is /api/2/search the standard way to do a mass-dump like this? And what is the best way to regularly update this to a database? Would I do something like search (update date > [last entry in database]) and then go through the pagination? Surely I can't be the first person attempting this, though I see no similar guide anywhere online to this (I checked Jira's own docs, no mass-issue-export guide really).
EDIT: Okay it looks like search really is the "issue dump" and not the issue node which, contrary to their documentation, does not default to a collection but really for creating issues or listing one at at time. I'll probably go the route of updated > [whatever last date is in the DB]
Unless you have very few issues, you can't fetch all of them at once.
What you can do is to execute the search step by step.
For example, lets say you have 1324 JIRA issues. In order to retrive all of them you have to execute a search similar to this several times:
/rest/api/2/search?&maxResults=100&startAt=0
This will retrive the first 100 JIRA issues starting from 0.
How to get the others?
When you execute the search, a field named total is returned. That field is the number of the total JIRA issues in your system (1324 issues).
The next query will be:
/rest/api/2/search?&maxResults=100&startAt=100
Repeat this operation, incrementing the value of startAt by 100 every time, until all the issues are returned.

Resources Exceeded on simple query when using DAY() or DATE() functions

This query runs fails with resources exceeded:
SELECT
*,
DAY(event_timestamp) as whywontitwork,
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
But this one works fine:
SELECT
*
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
The source table is 14m rows but I've run similar queries on much larger datasets before. We have large results enabled and have tried both flattened results and not (though there are no nested fields anyway). The error also occurs if you use the DATE() function instead of DAY(), or a REGEXP_EXTRACT() function
The job id is realself-main:bquijob_69e3a888_152f1fdc205.
You've hit an internal error in BigQuery. We tweaked our query engine's configuration at around 3pm (US Pacific Time) in an effort to prevent the error.
Update: After observing the error rate, it looks like this change has fixed the problem. If you see any other issues, please let us know. Note that StackOverflow is best for usage questions, but if you suspect a bug, you can file an issue at our public issue tracker.

How to use BigQuery Slots

Hi,there.
Recently,I want to run a query in bigquery web UI by using "group by" over some tables(tables' name suits xxx_mst_yyyymmdd).The rows will be over 10 million. Unhappily,the query failed with this error:
Query Failed
Error: Resources exceeded during query execution.
I did some improvements with my query language,the error may not happen for this time.But with the increasement of my data, the Error will also appear in the future.So I checked the latest release of Bigquery,maybe there two ways to solve this:
1.After 2016/01/01,Bigquery will change the Query pricing tiers to satisfy the "High Compute Tiers" so that the "resourcesExceeded error" will not happen again.
2.BigQuery Slots.
I checked some documents in Google and didn't find a way on how to use BigQuery Slots.Is there any sample or usecase of BigQuery Slots?Or I have to contact with BigQuery Team to open the function?
Hope someone can help me to answer this question,thanks very much!
A couple of points:
I'm surprised that a GROUP BY with a cardinality of 10M failed with resources exceeded. Can you provide a job id of the failed query so we can investigate? You mention that you're concerned about hitting these errors more often as your data size increases; you should likely be able to increase your data size by a few more orders of magnitude without seeing this; likely you've encountered either a bug or something was strange with either your query or your data.
"High Compute Tiers" won't necessarily get rid of resourcesExceeded. For the most part, resourcesExceeded means that BigQuery ran into memory limitations; high compute tiers only address CPU usage. (and note, they haven't been enabled yet).
BigQuery slots enable you to process data faster and with more reliable performance. For the most part, they also wouldn't help prevent resourcesExceeded errors.
There is currently (as of Nov 5) a bug where you may need to provide an EACH keyword with a GROUP BY. Recent changes should enable BigQuery to automatically select the execution strategy, so EACH shouldn't be needed, but there are a couple of cases where it doesn't pick the right one. When in doubt, add an EACH to your JOIN and GROUP BY operations.
To get your project eligible for using slots you need to contact support.

Changes in query behaviour

I have some queries that run every day for several month with no problem. I didn't change anything in the queries for a long while.
In the past few days some of them fail. Error message says something regarding some fields: "Field 'myfield' not found.". these queries usually involve some sub-queries and window functions.
Example for the BQ guys:
On 2015-08-03 Job ID: job_EUWyK5DIFSxJxGAEC4En2Q_hNO8 run successfully
on the following days, same query, failed. Job IDs: (job_A9KYJLbQJQvHjh1g7Fc0Abd2qsc , job__15ff66aYseR-YjYnPqWmSJ30N8)
In addition, for some other queries running times extended from minutes to hours and sometime return "timeout".
My questions:
Was something changed in the BQ engine?
What should I do to make my queries run again?
Thanks
So the problem could be two folded:
An update to query execution engine was rolled out during the week of August 3, 2015 as documented in the announcement
If this is the case, you need to update your queries.
Some performance issues were detected lately, but in order to actually identify if there is something wrong with your project or not, you need to create a issue request I did the same in past and it was fixed.

Big query throwing invalid errors on load jobs

I have a regularly scheduled load job that runs and imports data into bigQuery via the json data format every hour. This process has been working fine for months,now all of a sudden bigQuery has started to throw me back errors about missing required fields.
Naturally the first thing I did was review my schema and compare to one of the JSON files and all required fields are indeed there. Bigquery doesn't throw much information back beyond that, and I have checked and re-checked my data 20 times because I'm usually missing something.
Is this a back-end issue? or perhaps formatting requirements have changed? A perfect example would be JOB # job_2ee5a4be176c421985d7c3eaa84abf4b.It tells me "missing required field(s)", of which there are only 4 in my schema - I check my JSON for this particular job and they are all there.
Any light shed on this would be tremendously helpful, thanks in advance!!
A sample of the json, only the first 4 fields are required in my schema, and they are all there! I have also double-checked to make sure no extra fields are in the json, and each json is on a new line etc.:
{"date":"2013-05-31 20:56:41","sdate":1370033801,"type":"0","act":"1","cid":"139","chain":"5156","hotel":"21441","template":"default","arrival":"2013-08-04 00:00:00","depart":"2013-08-05 00:00:00","window":"64","nights":"1","total":"0.0000","dailyrate":"0.0000","session":"1530894334","source":"google","keyword":"the carolina hotel chapel hill nc","campaign":"organic","medium":"organic","visits":"2","device":"pc","language":"en-us","ip":"gc.synxis.com","cookies":"2","base_total":"0.0000","base_rate":"0.0000","batch":"batch_1370045767"}
I am a Google engineer who works on BigQuery. Sorry for the trouble; it appears that you're missing a required RECORD field called currencies.
It appears that the old code was accepting this due to a bug. It was creating empty RECORD fields even if one was not specified in the JSON. As a result, a RECORD field that was REQUIRED could be omitted without triggering an error. However, the correct behavior is to trigger an error, which is what the current code does.
It is unfortunate that the error message does not tell you which required field was missing. This is a TODO in the current version of the code.