Parse error code 154 - error-handling

I am getting parse error " The operation could not be completed. Parse error 154.
Please let me know why this is happening and what can I do to resolve this
Thanks

Parse Error 154 indicates you have too many search terms in the query. For example, you could be doing running a query containsAll on an array of strings.
query.containsAll('field', ['val1','val2','val3','val4','val5','val6','val7','val8'])
From my experience 8 values usually hits Parse's limit, so the above would fail with error code 154.
The answer is to reduce the complexity of the query, or restructure the data and/or query.

Related

dataframe failing to be converted to parquet

I have been trying to wrap my head around this for two days and haven't been successful so reaching out to the community.
I have a dataframe which has almost a million rows, and when I am trying to convert the dataframe to a parquet, I am getting the following error.
Check failed: (item_builder_->length()) == (key_builder_->length()) keys and items builders don't have the same size in MapBuilder
Looking at the error I thought that some keys who have maps as value is causing this error.
So, I looked at the possible culprit columns in our case lets call the column geos.
In the schema,
geos: map<string, string>
So, I am getting the error
Check failed: (item_builder_->length()) == (key_builder_->length()) keys and items builders don't have the same size in MapBuilder
when I am converting the whole df.
I thought that maybe one of the values of geos is not valid. So to find that row, I tried to individually convert each row to parquet. Surprisingly, all rows managed to convert to parquet.
Now, I think that this error is because of some column size limit that fast-parquet/pyarrow has.
I am using the versions
pyarrow==9.0.0
pandas==1.2.5
Anyone has experienced this before and can validate my reasoning?
Help with reasoning about the error

Debugging what row is causing "Resources exceeded" error

How does one debug Resources exceeded error messages from BigQuery, that are non-descript and do not point to any specific rows as causing the issue?
e.g.
bigquery error: Resources exceeded during query execution: ST_GeogFromGeoJSON failed: resources exhausted. Try simplifying the geometry.
I would love to simply/fix/remove the input to this function that's causing this issue, however, BigQuery doesn't give me ANY ideas as to how to debug this other than continuously re-query my dataset with different filters until I've eliminated the problematic row.
Any insights into how to debug these in the future would be much appreciated!
Or any BQ support engineers who can have the SAFE version of ST_GeogFromGeoJSON actually handle resource exceeding exceptions safely rather than throw this error message.
Reported the issue through here as well.
Note: I also have this issue when trying to debug which rows are causing resources to be exceeded when using JS UDFs.
Edit:
An example query:
SELECT * EXCEPT(decoded_line),
ST_SIMPLIFY(SAFE.ST_GEOGFROMGEOJSON(decoded_line), 30) AS simple_line
FROM polylines.decoded_lines
I'll try to find an example of a polyline that generates such an error, though...as I said above, it's been difficult to identify the exact one. Though I know this query works when down-sampling my dataset to 95% but excluding key timeframes.

Pentaho "Return value id can't be found in the input row"

I have a pentaho transformation, which is used to read a text file, to check some conditions( from which you can have errors, such as the number should be a positive number). From this errors I'm creating an excel file and I need for my job the number of the lines in this error file plus to log which lines were with problem.
The problem is that sometimes I have an error " the return values id can't be found in the input row".
This error is not every time. The job is running every night and sometimes it can work without any problems like one month and in one sunny day I just have this error.
I don't think that this is from the file, because if I execute the job again with the same file it is working. I can't understand what is the reason to fail, because it is saying the value "id", but I don't have such a value/column. Why it is searching a value, which doesn't exists.
Another strange thing is that normally the step, which fails should be executed at all( as far as I know), because no errors were found, so we don't have rows at all to this step.
Maybe the problem is connected with the "Prioritize Stream" step? Here I'm getting all errors( which use exactly the same columns). I tried before the grouping steps to put a sorting, but it didn't help. Now I'm thinking to try with "Blocking step".
The problem is that I don't know why this happen and how to fix it. Any suggestions?
see here
Check if all your aggregates ins the Group by step have a name.
However, sometimes the error comes from a previous step: the group (count...) request data from the Prioritize Stream, and if that step has an error, the error gets reported mistakenly as coming from the group rather than from the Prioritze.
Also, you mention a step which should not be executed because there is no data: I do not see any Filter which would prevent rows with missing id to flow from the Prioritize to the count.
This is a bug. It happens randomly in one of my transformations that often ends up with empty stream (no rows). It mostly works, but once in a while it gives this error. It seems to only fail when the stream is empty though.

ARRAY_AGG leading to OOM

I'm trying to run a pretty simple query but it's failing with an Resources exceeded error.
I read in another post that the heuristic used to allocate the number of mixers could fail from time to time.
SELECT
response.auctionId,
response.scenarioId,
ARRAY_AGG(response) AS responses
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
Is there a way to fix my query knowing that:
a response is composed of 38 fields (most of them being short strings)
the max(count()) of a single response is kind of low (165)
Query Failed
Error: Resources exceeded during query execution.
Job ID: teads-1307:bquijob_257ce97b_1566a6a3f27
It's a current limitation that arrays (produced by ARRAY_AGG or other means) must fit in the memory of a single machine. We've made a couple of recent improvements that should help to reduce the resources required for queries such as this, however. To confirm whether this is the issue, you could try a query such as:
SELECT
SUM(LENGTH(FORMAT("%t", response))) AS total_response_size
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
ORDER BY total_response_size DESC LIMIT 1;
This formats the structs as strings as a rough heuristic of how much memory they would take to represent. If the result is very large, then perhaps we can restructure the query to use less memory. If the result is not very large, then some other issue is at play, and we'll look into getting it fixed :) Thanks!

SQL Server Analysis Services - Datamining Error 5

I'm getting this error when I'm trying process DataMining with NestedTable in it.
Error 5 Errors in the metadata manager. The 'XYZZZZZ' dimension in the 'Dim XYZ' measure group has either zero or multiple granularity attributes.
Must have exactly one attribute. 0 0
Any idea why this is happening?
can you post your mining structure's code?
I think you have to create it with the MISSING_VALUE_SUBSTITUTION parameter to get rid of zero granularities. It always solves my proble when I have a times series with a gap on it