Truncated BigQuery error message on "No matching signature" - error-handling

While integrating a UDF for use in a query, it's common to get a No matching signature error. These errors are very informative when the function signature is short. However when the function signature is long, such as in a case where it's a JS UDF receiving and returning complex structs, the error message is truncated and becomes useless.
For example:
No matching signature for function f for argument types: STRUCT<docGroup FLOAT64, docNum FLOAT64, docDate DATE, ...>.
Supported signature: f(STRUCT<docGroup FLOAT64, docNum FLOAT64, docDate DATE, ...>) at [202:8]
The actual mismatch is hiding somewhere in the truncated arguments!
Running bq show -j <jobid> also shows the message with a truncated list of arguments.
Is there any way of getting the full error message, or otherwise figuring out what's wrong, without having to take my SQL apart into pieces?

Try running bq show --format=prettyjson -j <jobid>
The documentation says that this is the syntax for seeing the job's complete information:
To see full job details, enter:
bq show --format=prettyjson -j
myproject:US.bquijob_123x456_789y123z456c

Related

Kmodes function Error in x[[jj]][iseq] <- vjj : replacement has length zero

I got this error when using a big data set, I cleaned the data and used Data<-na.omit(Data) to delete all rows with nulls, it worked and in RStudio I don't get any errors.
When I run the script in SQL as an external script I get the same error as before
Error in x[[jj]][iseq] <- vjj : replacement has length zero
even though I'm using the same Rscript and dataset is the same.
Has anyone had the same issue and how did you solve it.
thanks

When trying to get the source of a function in Postgres using psql, what does the error "column p.proisagg does not exist" mean?

Background:
using postgres 11 on RDS, interface is psql on a Centos 7 box; objective is to show the source of certain stored procs / functions so that I can work with them
Problem description : When I attempt to list / show the source of a given stored function using the \df+ command which I understand to be correct for this use based on [official docs here](https://www.postgresql.org/docs/current/app-psql.html], an error is given as shown:
psql=> \df+ schema_foo.proc_bar;
ERROR: column p.proisagg does not exist
LINE 6: WHEN p.proisagg THEN 'agg'
I have no clue how to interpret this; the function in question does not contain the snippet of logic shown in the error, nor the column referenced there p.proisagg. I have verified this by opening the function in vim with \ef.
My guess based on several unrelated github issues that mention this same error for example is that it is in reference to some schema code internal to postgres.
Summary: in short, I can view the source of the functions using \ef, so my work is not impaired from a practical standpoint, however I wish to understand this error and why I'm encountering it with \df+.
I had the same issue and ran these 2 commands to fix it
sed -i "s/NOT pp.proisagg/pp.prokind='f'/g" /usr/share/phpPgAdmin/classes/database/Postgres.php
sed -i "s/NOT p.proisagg/p.prokind='f'/g" /usr/share/phpPgAdmin/classes/database/Postgres.php

InputFunctionException with KeyError

Suppose I have a code in python that generates a dictionary as the result. I need to write each element of dictionary in a separate folder which later will be used by other set of rules in snakemake.
I have written the code as following but it does not work!
simulation_index_dict={1:'test1',2:'test2'}
def indexer(wildcards):
return(simulation_index_dict[wildcards.simulation_index])
rule SimulateAll:
input:
expand("{simulation_index}/ProteinCodingGene/alfsim.drw",simulation_index=simulation_index_dict.keys())
rule simulate_phylogeny:
output:
ProteinCodingGeneParams=expand("{{simulation_index}}/ProteinCodingGene/alfsim.drw"),
IntergenicRegionParams=expand("{{simulation_index}}/IntergenicRegions/dawg_IR.dawg"),
RNAGeneParams=expand("{{simulation_index}}/IntergenicRegions/dawg_RG.dawg"),
RepeatRegionParams=expand("{{simulation_index}}/IntergenicRegions/dawg_RR.dawg"),
params:
value= indexer,
shell:
"""
echo {params.value} > {output.ProteinCodingGeneParams}
echo {params.value} > {output.IntergenicRegionParams}
echo {params.value} > {output.RNAGeneParams}
echo {params.value} > {output.RepeatRegionParams}
"""
The error it return is :
InputFunctionException in line 14 of /$/test.snake:
KeyError: '1'
Wildcards:
simulation_index=1
It seems that problems is with the params section of the rule because deleting it will eliminates the error but I can not figure out what is wrong with the params!
The solution: using strings as dictionary keys
One can guess from the error message (KeyError: '1') that some query in a dictionary went wrong on a key that is '1', which happens to be a string.
However, the dictionary used in the indexer "params" function has integers as keys.
Apparently, using strings instead of ints as keys to this simulation_index_dict dictionary solves the problem (see comments below the question).
The cause: loss of type information during workflow inference
The cause of the problem is likely that the integer nature (inherited from simulation_index_dict.keys()) of the value assigned to the simulation_index parameter of the expand in SimulateAll is "forgotten" in subsequent steps of the workflow inference.
Indeed, the expand results in a list of strings, which are then matched against the output of the other rules (which also consist in strings), to infer the values of the wildcards attributes (which are also strings). Therefore, when the indexer function is executed, wildcards.simulation_index is a string, and this causes a KeyError when looking it up in simulation_index_dict.

Uploading job fails on the same file that was uploaded successfully before

I'm running regular uploading job to upload csv into BigQuery. The job runs every hour. According to recent fail log, it says:
Error: [REASON] invalid [MESSAGE] Invalid argument: service.geotab.com [LOCATION] File: 0 / Offset:268436098 / Line:218637 / Field:2
Error: [REASON] invalid [MESSAGE] Too many errors encountered. Limit is: 0. [LOCATION]
I went to line 218638 (the original csv has a headline, so I assume 218638 should be the actual failed line, let me know if I'm wrong) but it seems all right. I checked according table in BigQuery, it has that line too, which means I actually successfully uploaded this line before.
Then why does it causes failure recently?
project id: red-road-574
Job ID: Job_Upload-7EDCB180-2A2E-492B-9143-BEFFB36E5BB5
This indicates that there was a problem with the data in your file, where it didn't match the schema.
The error message says it occurred at File: 0 / Offset:268436098 / Line:218637 / Field:2. This means the first file (it looks like you just had one), and then the chunk of the file starting at 268436098 bytes from the beginning of the file, then the 218637th line from that file offset.
The reason for the offset portion is that bigquery processes large files in parallel in multiple workers. Each file worker starts at an offset from the beginning of the file. The offset that we include is the offset that the worker started from.
From the rest of the error message, it looks like the string service.geotab.com showed up in the second field, but the second field was a number, and service.geotab.com isn't a valid number. Perhaps there was a stray newline?
You can see what the lines looked like around the error by doing:
cat <yourfile> | tail -c +268436098 | tail -n +218636 | head -3
This will print out three lines... the one before the error (since I used -n +218636 instead of +218637), the one that had the error, and the next line as well.
Note that if this is just one line in the file that has a problem, you may be able to work around the issue by specifying maxBadRecords.

Error in BQ shell Loading Datastore with write_disposition as Write append

1: I tried to load on an existing table [using Datastore file]
2. Bq Shell asked me to add write_disposition to write append to load to existing table
3. If I do the above, throws an error as follows:
load --source_format=DATASTORE_BACKUP --write_disposition=WRITE_append --allow_jagged_rows=None sample_red.t1estchallenge_1 gs://test.appspot.com/bucket/ahFzfnZpcmdpbi1yZWQtdGVzdHJBCxIcX0FFX0RhdGFzdG9yZUFkbWluX09wZXJhdGlvbhiBwLgCDAsSFl9BRV9CYWNrdXBfSW5mb3JtYXRpb24YAQw.entity.backup_info
Error parsing command: flag --allow_jagged_rows=None: ('Non-boolean argument to boolean flag',None)
I tried allow jagged rows = 0 and allow jagged rows = None, nothing works just the same error.
Please advise on this.
UPDATE: As Mosha suggested --allow_jagged_rows=false has worked. It should be before --write_disposition=Write_truncate. But this has led to another issue on encoding. Can anyone say what should be the encoding type for DATASTORE_BACKUP?. I tried both --encoding=UTF-8 and --encoding=ISO-8859.
load --source_format=DATASTORE_BACKUP --allow_jagged_rows=false --write_disposition=WRITE_TRUNCATE sample_red.t1estchallenge_1 gs://test.appspot.com/STAGING/ahFzfnZpcmdpbi1yZWQtdGVzdHJBCxIcX0FFX0RhdGFzdG9yZUFkbWluX09wZXJhdGlvbhiBwLgCDAsSFl9BRV9CYWNrdXBfSW5mb3JtYXRpb24YAQw.entityname.backup_info
Please advise.
You should use "false" (or "true") with boolean arguments, i.e.
--allow_jagged_rows=false