Invalid json for Java type io.trino.server.TaskUpdateRequest - amazon-emr

I'm getting the "Unrecognized token 'io': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')" ERROR when trying to do a simple CTAS in my Trino cluster with no underlying complexity.
I've tried restarting the cluster, increasing cluster size, but this silly error persists. Some days the error happens, and other days it doesn't. It's debilitating for our ETL work and I need to figure out what is causing it.
I'm running Trino 359 on EMR 6.4.0. Hive catalog using Glue.
error message:
Unexpected response from http://10.193.20.153:8889/v1/task/20211027_220021_01468_gtbqw.3.2?summarize com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'io': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false') at [Source: (byte[])"io.airlift.jaxrs.JsonMapperParsingException: Invalid json for Java type io.trino.server.TaskUpdateRequest
create table ide.stage_5
with (format = 'ORC')
as (
select distinct i.*
from ide.stage_4 i
);

Upon further discovery, this was being caused by the "distinct" -- it will be successful if run without the distinct.
The unsuccessful source table: This table is a Glue (hive) partitioned ORC table with SNAPPY compression, 500 columns, and 770,087 rows.
The query can be successfully run with "distinct" on a similar table with fewer columns: Glue (hive) partitioned ORC table with SNAPPY compression, 133 columns, and 756,287 rows.
Somewhere between 133 columns and 500 columns seems to be a bug, and I have submitted a ticket here: https://github.com/trinodb/trino/issues/9808

Related

What does this error mean: Required column value for column index: 8 is missing in row starting at position: 0

I'm attempting to upload a CSV file (which is an output from a BCP command) to BigQuery using the gcloud CLI BQ Load command. I have already uploaded a custom schema file. (was having major issues with Autodetect).
One resource suggested this could be a datatype mismatch. However, the table from the SQL DB lists the column as a decimal, so in my schema file I have listed it as FLOAT since decimal is not a supported data type.
I couldn't find any documentation for what the error means and what I can do to resolve it.
What does this error mean? It means, in this context, a value is REQUIRED for a given column index and one was not found. (By the way, columns are usually 0 indexed, meaning a fault at column index 8 is most likely referring to column number 9)
This can be caused by myriad of different issues, of which I experienced two.
Incorrectly categorizing NULL columns as NOT NULL. After exporting the schema, in JSON, from SSMS, I needed to clean it
up for BQ and in doing so I assigned IS_NULLABLE:NO to
MODE:NULLABLE and IS_NULLABLE:YES to MODE:REQUIRED. These
values should've been reversed. This caused the error because there
were NULL columns where BQ expected a REQUIRED value.
Using the wrong delimiter The file I was outputting was not only comma-delimited but also tab-delimited. I was only able to validate this by using the Get Data tool in Excel and importing the data that way, after which I saw the error for tabs inside the cells.
After outputting with a pipe ( | ) delimiter, I was finally able to successfully load the file into BigQuery without any errors.

Error when trying to process HyperLogLog created on Snowflake, in Trino

In Trino, I'm getting the error message Cannot deserialize HyperLogLog:
I have a query on Snowflake, doing the following:
select
__TENANT_ID
hll_accumulate(VISITOR_ID) as visitor_hll
from
[table]
where
[stuff]
group by
1;
The visitor_hll is being written to a column of type BINARY(8388608).
I then have a process that copies this data onto S3 Parquet, where I query it via Trino.
When I try to perform hyperloglog operations on the field, such as
select
merge(cast(visitor_hll as hyperloglog)) as bsi_hll
from
[table]
I get the aforementioned error.
What can I do in order to consume the HLL data created in Snowflake?
I searched for the error message that I got, and the only results on Google are the source code for the HLL function on Airlift.
I also saw that Snowflake says "For integration with external tools, Snowflake supports converting states from the BINARY format to an OBJECT (which can be printed and exported as JSON), and vice versa." (see HLL_EXPORT). This returns a JSON object, but on the S3 side of things, I don't see any way of importing this back into a HLL.

Presto failed: com.facebook.presto.spi.type.VarcharType

I created a table with three columns - id, name, position,
then I stored the data into s3 using orc format using spark.
When I query select * from person it returns everything.
But when I query from presto, I get this error:
Query 20180919_151814_00019_33f5d failed: com.facebook.presto.spi.type.VarcharType
I have found the answer for the problem, when I stored the data in s3, the data inside the file was with one more column that was not defined in the hive table metastore.
So when Presto tried to query the data, it found that there are varchar instead of integer.
This also might happen if one record has a a type different than what is defined in the metastore.
I had to delete my data and import it again without that extra unneeded column

Error while reviewing file after inserting data in redshift table

I have a table in Redshift in which I am inserting data from S3.
I viewed the table before inserting the data and it returned a blank table.
However, After inserting data in Redshift table, I am getting below error while doing select * from table.
Command to copy data in table from S3 runs successfully without any error.
java.lang.NoClassDefFoundError:
com/amazon/jdbc/utils/DataTypeUtilities$NumericRepresentation error in
redshift
what could be the possible cause and sol for this?
I have faced this error : java.lang.NoClassDefFoundError when the JDBC connection properties are set incorrectly.
If you are using postgres driver then ensure using postgres://
eg : jdbc:postgresql:// HostName:5439/
Let me know if this works.

SAP Vora dealing with decimal type

So I'm trying to create and load Vora table from an ORC file created by SAP BW archiving process on HDFS.
The Hive table automatically generated on top of that file by BW has, among other things, this column:
archreqtsn decimal(23,0)
At attempt to create a Vora table using that datatype fails with the error "Unsupported type (DecimalType(23,0)}) on column archreqtsn".
So, the biggest decimal supported seems to be decimal(18,0)?
Next thing I tried was to either use decimal(18,0) or string as the type for that column. But when attempting to load data from a file:
APPEND TABLE F002_5_F
OPTIONS (
files "/sap/bw/hb3/nldata/o_1ebic_1ef002__5/act/archpartid=p20170611052758000009000/000000_0",
format "orc" )
I'm getting another error:
com.sap.spark.vora.client.VoraClientException: Could not load table F002_5_F: [Vora [<REDACTED>.com.au:30932.1639407]] sap.hanavora.jdbc.VoraException: HL(9): Runtime error. (decimal 128 unsupported (c++ exception)).
An unsuccessful attempt to load a table might lead to an inconsistent table state. Please drop the table and re-create it if necessary. with error code 0, status ERROR_STATUS
What could be the workarounds for this issue of unsupported decimal types? In fact, I might not need that column in the Vora table at all, but I can't get rid of it in the ORC file.