HIVE_BAD_DATA: Error parsing field value '' for field 12: Cannot convert value of type String to a REAL value - amazon-s3

Hello all !
The query in Athena console
The response error
HIVE_BAD_DATA: Error parsing field value '' for field 12: Cannot convert value of type String to a REAL value
I try to create a Table queryable in Athena with a glue crawler where I specify each column's data type. My input is a CSV file where there are some empty fields.
Crawler description
Crawler description
The crawler finds each column and assigns the correct column name but when reading the values I have a parsing error.
I'm wondering if the problem could come from the crawler trying to read empty values as the wrong type.
Error come from col data looking like
Corresponding table Schema
I tried unsuccessfully to change the serialization like in:
AWS Athena: "HIVE_BAD_DATA: Error parsing column 'X' : empty String"
Specify a SerDe serialization lib with AWS Glue Crawler
Is there a parameter or a workaround to solve this issue?

Related

What does this error mean: Required column value for column index: 8 is missing in row starting at position: 0

I'm attempting to upload a CSV file (which is an output from a BCP command) to BigQuery using the gcloud CLI BQ Load command. I have already uploaded a custom schema file. (was having major issues with Autodetect).
One resource suggested this could be a datatype mismatch. However, the table from the SQL DB lists the column as a decimal, so in my schema file I have listed it as FLOAT since decimal is not a supported data type.
I couldn't find any documentation for what the error means and what I can do to resolve it.
What does this error mean? It means, in this context, a value is REQUIRED for a given column index and one was not found. (By the way, columns are usually 0 indexed, meaning a fault at column index 8 is most likely referring to column number 9)
This can be caused by myriad of different issues, of which I experienced two.
Incorrectly categorizing NULL columns as NOT NULL. After exporting the schema, in JSON, from SSMS, I needed to clean it
up for BQ and in doing so I assigned IS_NULLABLE:NO to
MODE:NULLABLE and IS_NULLABLE:YES to MODE:REQUIRED. These
values should've been reversed. This caused the error because there
were NULL columns where BQ expected a REQUIRED value.
Using the wrong delimiter The file I was outputting was not only comma-delimited but also tab-delimited. I was only able to validate this by using the Get Data tool in Excel and importing the data that way, after which I saw the error for tabs inside the cells.
After outputting with a pipe ( | ) delimiter, I was finally able to successfully load the file into BigQuery without any errors.

AWS Athena - Changing field to type "MAP"

I have an AWS athena table created by a Glue crawler, with one of the fields is a string representation of a dictionary where the keys and values are strings e.g: '{"key1": "value_1", "key2":"value2"}'
I tried to manually change the field type to MAP<string,string> but got an exception HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas meaning that the partitions were storing the field as String while the in the schema the field changed to map.
So, I tried to run MSCK Repair table but it just restored to type of the column to String.
Is there a way to change the field type without losing the current partitions?

Presto fails to import PARQUET files from S3

I have a presto table that imports PARQUET files based on partitions from s3 as follows:
create table hive.data.datadump
(
tUnixEpoch varchar,
tDateTime varchar,
temperature varchar,
series varchar,
sno varchar,
date date
)
WITH (
format = 'PARQUET',
partitioned_by = ARRAY['series','sno','date'],
external_location = 's3a://dev/files');
The S3 folder structure where the parquet files are stored looks like:
s3a://dev/files/series=S5/sno=242=/date=2020-1-23
and the partition starts from series.
The original code in pyspark that produces the parquet files has all the schema as String type and I am trying to import that as a string but when I run my create script in Presto, it successfully created the table but fails to import the data.
On Running,
select * from hive.data.datadump;
I get the following error:
[Code: 16777224, SQL State: ] Query failed (#20200123_191741_00077_tpmd5): The column tunixepoch is declared as type string, but the Parquet file declares the column as type DOUBLE[Code: 16777224, SQL State: ] Query failed (#20200123_191741_00077_tpmd5): The column tunixepoch is declared as type string, but the Parquet file declares the column as type DOUBLE
Can you guys help to resolve this issue?
Thank You in advance!
I ran into same issues and I found out that this was caused by one of the records in my source doesnt have a matching datatype for the column it was complaining about. I am sure this is just data. You need to trap the exact record which doesnt have the right type.
This might have been solved, just for info, this could be due to column declaration mismatch between hive and parquet file. To use the column names instead of the order, use the property -
hive.parquet.use-column-names=true

'NULL' value of VARCHAR(30) NOT NULL is treated as null when bq load

I tried to load csv files into bigquery table. There are columns where the types are VARCHAR(30) NOT NULL and some values are 'NULL'. So when I use the command bq load to load, got the following error:
Error while reading data, error message: 'NULL' is null for required
I want to treat the 'NULL' value as NOT NULL.
I am wondering what are the best solutions to deal with this.
In this kind of cases it's better to treat the data before importing it to BigQuery. You can use Cloud Dataprep to treat the NULL values and transform them into empty strings or whatever you see fit. You can follow these steps:
In the Cloud console go to Dataprep
Create a Flow
Add a Dataset (import CSV file)
Create a new Recipe
Under transformation select "Replace"
Select the origin data column
Under Match pattern add the following regex /^$/ (this will match empty strings)
Select the new string value you see fit
After the job is finished you can export the results as csv and import it to BigQuery.
Note that Dataprep treats the NULL values as MISSING as stated in the documentation

SQL SERVER, importing from CSV file. Data conversion error

I am trying to import data from a CSV file into a table.
I am presented by this error:
"Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data
conversion for column "dstSupport" returned status value 2 and status
text "The value could not be converted because of a potential loss of
data."."
However, I do not even need to convert this column, as you see in the image below:
The column is of type bit, and I used DST_BOOL.
This ended up being a parsing error. I had a string with commas within it, and these strings within the commas were being placed in my bit column. I fixed it by changing the delimiter from a comma to a pipe