How to avoid Grafana handling NULL values from Google BigQuery as 0

How to avoid Grafana handling NULL values from Google BigQuery as 0 - google-bigquery

I have the doitintl BigQuery plugin (v2.0.2) installed for my Grafana instance (v8.2.7).
My SQL query is based on multiple tables and is joined with a full outer join on the timestamp column. Therefore, I have some null values returned and they are shown as such in the GCP BigQuery Console.
However, when I run the query in my time series panel, those null values are treated as zeros, leading to a graph that does not make sense.
Is there a way to avoid that behaviour in Grafana?

So it turns out this is a bug of the plugin, nothing from Grafana side nor settings.
However, Grafana has forked the plugin and this issue is fixed (tested with v0.1.10). Therefore, to avoid this issue, use the Grafana BigQuery datasorce plugin (note: at the time of writing the plugin is still in beta).

Related

How can I use data from more than one measurement in a single Grafana panel?

I am attempting to create a gauge panel in Grafana (Version 6.6.2 - presume that upgrading is a last resort, but possible if necessary, for the purposes of this problem) that can represent the percentage of total available memory used by the Java Virtual Machine running a process of mine. the problem that I am running into is the following:
I have used Springboot actuator's metrics and imported them into an Influx database with Micrometer, but in the process, it has stored the two values that I would like to use in my calculation into two different measurements. jvm_memory_used and jvm_memory_max
My initial Idea was to simply call a SELECT on both of the measurements to get the value that I want, and then divide the "used" / "max" and multiply that value by 100 to get the percentage to display. Unfortunately I run into syntax errors when I try to do this manually, and I am unsure if I can do this using Grafana's query builder.
I know that the syntax is incorrect, but I am not familiar enough with InfluxQL to know how to properly structure this query. Here is what I had tried:
(SELECT last("value")
FROM "jvm_memory_used"
WHERE ("area" = 'heap')
AND $timeFilter
GROUP BY time($__interval) fill(null)
) /
(SELECT last("value")
FROM "jvm_memory_max"
WHERE ("area" = 'heap')
AND $timeFilter
GROUP BY time($__interval) fill(null)
)
(The AND and GROUP BY are present as a result of the default values from Grafana's query builder, I am not sure whether they are necessary or not)
I'm assuming that my parenthesis and division process is illegal, but I am not sure how to resolve it.
How can I divide these two values from separate tables?
EDIT: I have gotten slightly further but it seems that there is a new issue. I now have the following query that I am sending in:
SELECT 100 * (last("used") / sum("max")) AS "percentUsed"
FROM(
SELECT last("value") AS "used"
FROM "jvm_memory_used"
WHERE ("area" = 'heap')
AND $timeFilter
),(
SELECT last("value") AS "max"
FROM "jvm_memory_max"
WHERE ("area" = 'heap')
AND $timeFilter
)
GROUP BY time($__interval) fill(null)
and the result I get is this:
How can I now get this query to return only one gauge with data, instead of two with nulls?
I've accepted an answer that works for versions of Grafana after 7. If there are any other answers that arise that do not involve updating the version of Grafana, please provide them as well!

I am not particulary experienced with Influx, but since your question is how to use/combine two measurements (query results) for a Grafana panel, I can tell you about one approach:
You can use a transformation. By that, you can keep two separate queries. With the transformation mode binary operation you can simply divide one of your values by the other one.
In your specific case, to display the result as percentage, you can then use Percent (0.0-1.0) as unit and you should have accomplished your goal.

Is the column of type JSON deprecated?

In the bigquery console, when creating a table, there used to be type JSON as an option for the column types but weirdly enought it was never present in their docs We used this column type in our production tables, and discovered later on that you can't select it in queries otherwise bigquery throws an error, and the json functions also didn't work with it. So we simply stopped using this column in the queries but they still exist in our tables.
However, in the past couple of days, all queries against this table are failing with this error 400 Json is not enabled for current project. and this column type is not present in the bigquery console anymore. It seems it was removed or deprecated? I checked the release notes, but the latest release was way before the error occured. This broke our production environment, and we couldnt even export the data because exporting gave the same error. Instead we had to use a new table without this column which meant we lost all our history.
Did anyone face the same problem with any other column types before, is it normal that a type is deprecated without users being notified beforehand. This is making me question the reliability of bigquery.

Please reach out to Google Cloud support and we will help you fix your issue with that problematic table. You may also want to try fixing it yourself using the ALTER TABLE DROP COLUMN statement that is currently in public preview [1]. This will drop the erroneous column (the data in that column only will be lost). The rest of the data will remain usable.
[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_drop_column_statement

I ran into the same error message few days ago and was surprised to read about this policy change that's not backed up by a mitigation process. My attempt to use Vlad Grachev suggestion to drop this column did not prevail, as the console does not allow to query this table (same "Json is not enabled for current project." error).
My only remediation at this point is:
build a new table where the json column is switched to type string
create a pipeline that transforms the objects to strings
migrate the data through the pipeline to the new table

In BigQuery Json data can be stored in a column type "Record.Are you referring the same by JSON column type?
BigQuery uses the RECORD (or STRUCT) type to represent nested structure. A column of RECORD type is in fact a large column containing multiple child columns. For more information Refer the link below,
Json Data in BigQuery
if you are not refering to the Record Data type, The Json Column type might be a test feature that might not dependent on deprecation scheme

Difference between "Preview" and Query in BigQuery

I have the following table schema:
+-----+---------+----------+
+ chn | INTEGER | NULLABLE |
+-----+---------+----------|
+ size| STRING | NULLABLE |
+-----+---------+----------|
+ char| REPEATED| NULLABLE |
+-----+---------+----------|
+ ped | INTEGER | NULLABLE |
+-----+---------+----------
When I click on 'preview' in the Google BigQuery Web UI, I get the following result:
But when I query my table, I get this result:
It seems like "preview" is interpreting my repeated field as an array, I would want to get the same result in a query to limit the number of rows.
I did try to uncheck "Use Legacy SQL" which gave me the same result but the problem is that with my table, a same query takes ~1.0 sec to execute with "Use Legacy SQL" checked and ~12 seconds when it's unchecked.
I am looking for speed here so unfortunately, not using Legacy SQL is not an option...
Is there another way to render my repeated field like it does in the "preview" ?
Thanks for the help :)

In legacy SQL, BigQuery flattens the result of queries by default. This means two things:
All child fields of RECORD fields are propagated to the top-level, with their names changed from record.subrecord.leaf to record_subrecord_leaf. Parent records are removed from the schema.
All repeated fields are converted to fields of optional mode, with each repeated value expanded into its own row. (As a side note, this step is very similar to the FLATTEN function exposed in legacy SQL.)
What you see here is a product of #2. Each repeated value is becoming its own row (as you can see by the row count on the left-hand side in your two images) and the values from the other columns are, well, repeated for each new row.
You can prevent this behavior and receive "unflattened results" in a couple ways.
Using standard SQL, as you note in your original question. All standard SQL queries return unflattened results.
While using legacy SQL, setting the flattenResults parameter to false. This requires also specifying a destination table and setting allowLargeResults to false. These can be found in the Show Options panel beneath the query editor if you want to set them within the UI. Mikhail has some good suggestions for managing the temporary-ness of destination tables if you aren't interested in keeping them around.
I should note that there are a number of corner cases with legacy SQL with flattenResults set to false which might trip you up if you start writing more complex queries. A prominent example is that you can't output more than one independently repeated field in query results using legacy SQL, but you can output multiple with standard SQL. These issues are unlikely to be resolved in legacy SQL, and going forward we're suggesting people use standard SQL when they run into them.
If you could provide more details about your much slower query using standard SQL (e.g. job ID in legacy SQL, job ID in standard SQL, for comparison), I, and the rest of the BigQuery team, would be very interested in investigating further.

Is there another way to render my repeated field like it does in the
"preview" ?
To see original not-flattened output in Web UI for Legacy SQL, i used to set respective options (click Show Options) to actually write output to table with checked Allow Large Results and unchecked Flatten Results.
This actually not only saves result into table but also shows result in the same way as preview does (because it is actually preview of that table). To make sure that table gets removed afterwards - i have "dedicated" dataset (temp) with default expiration set to 1 day (or hour - depends on how aggressive you want to be with your junk), so you don't need to worry of that table(s) - it will get deleted automatically for you. Wanted to note: this was quite a common pattern for us to deal with and having to do extra settings was boring, so we ended up with our own custom UI that does all this for user automatically

What you see is called Flatten.
By default the UI flattens the query output, there is currently no option to show query results like you want. In order to produce unflatten results you must write to a table, but that's different thing.

Error: Schema changed for Timestamp field (additional)

I am getting an error message when I query a specific table in my data set that has a nullable timestamp field. In the BigQuery web tool, I run simple query, e.g.:
SELECT * FROM [reztrack.201401] LIMIT 100
The result I get is: Error: Schema changed for Timestamp field date
Example Job ID: esiteisthebomb:job_6WKi7ZhSi8D_Ewr8b5rKV-a5Eac
This is the exact same issue that was noted here: Error: Schema changed for Timestamp field.
Also logged this under: https://code.google.com/p/google-bigquery/issues/detail?id=307 but I was unsure since it said we should be logging everything in Stackoverlfow.
Any information on how to fix this for this or other tables would be greatly appreciated.
Note: The original answer states to contact google support, but Google support for BigQuery was moved to StackOverflow. Therefore I assume that means to open it as a new question in hopes the engineers will respond.

BigQuery recently improved the representation of its internal timestamp format (there had previously been a lot of cases where timestamps broke in strange ways and this change should fix that). Your table still was using the old timestamp format, and you tickled a bug in the old format when schemas changed (in this case, the field went from REQUIRED to OPTIONAL).
We have an automated process that coalesces tables to make their storage more efficient. I scheduled this to run over your table, and have verified that it has rewritten your table using the new timestamp format.
You should now be able to query this field of your table without further problems.

IGNORE CASE query problems saving to a table and using Allow large results

I need case insensitivity in my queries so I found IGNORE CASE which works superbly when used in queries that target the browser (I am talking about BQ web UI). If I choose a destination table (an absolute must for me) and select Allow Large Results (with unchecked Flatten Results) then I get a cryptic error like this:
Error: unexpected LIMIT clause at: 2.200 - 2.206
Even though this Official Google BigQuery issue and feature request tracker post seems to speak of the same issue and even though the problem seems to have been acknowledged back in Jan 2015 the solution isn't apparent.
I could potentially use a bunch of temp tables with lowercased search columns as a workaround but that sounds awfully difficult with the number of tables and columns that I have and the complex queries that I intend to run.
Any other possible workarounds? Why isn't this working yet on BQ?

Yes, it is a known problem, and it has not been neglected. The code changes to fix it are (surprisingly) not trivial, but they are mostly done. Not team is carefully looking how to enable and deploy them. I cannot give you a timeline, but the fix to this problem is coming.
The only workarounds in the meantime, are to wrap all the string comparisons, string GROUP BYs and string ORDER BYs with conversion to LOWER() (or UPPER()) of operands.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas