I created an Amazon Athena table based on CSV in S3. I want to input a row of data with one column that includes commas. But it truncates the string from the commas each time.
For example:
insert into table_names (id, key_string)values (1,'{key1=1,key2=3}')
Each time, the column key_string only stores {key1=1.
I tried use double quote "{key1=1,key2=3}", escape char \"{key1=1,key2=3}\".
They don't work.
Any suggestion?
Related
Trying to load a table on database and one column with string values is loading with quotes for some of the values.
Example:
ac_name
"PepsiCo
"Coke
"DietCoke
where it should be loaded as it is in the raw CSV file which is below:
ac_name
PepsiCo
Coke
DietCoke
Why is Hive inserting quotes in front of these values? How do you remove it while loading as table?
I am facing strange issue.I tried with tab delimiter both in file and in table definition and comma as well.
But in both cases it reads the decimal values as NULL.But when I define this fields as INT it works fine.
Sample data with comma delimited values:
1,22.334
2,445.322
3,999.233
defined this table as
create table x(ID INT,SAL DECIMAL(3,3)) row format delimited fields terminated by '\t' location '\tmp\data\'
similarly for comma delimited file
create table x(ID INT,SAL DECIMAL(3,3)) row format delimited fields terminated by ',' location '\tmp\data\'
But in both cases it is reading decimal values as NULL?what is the issue
First thing is Decimal datatype doesn't not accept comma in data.
Second problem is you have to increase the decimal(3,3) to minimum decimal(7,3) for the sample data provided.
As decimal (3,3) cannot hold any of 3 values.
As your raw data contains comma in data,
You have to load the into table with all columns as string datatype .
Later use regular expression to remove the comma in data and load into second level hive table with decimal datatype.
One field of table is made up of many values seperated by comma,
for example, a record of this field is:
598423,4803510,599121,98181856,1666529,106317962,4061964,7828860,598752,728067,599809,8799578,1666528,3253720,601990,601235
I want to spread the values in every record of this field in Hive.
Which function or method I can use to realize this?
Thanks.
I'm not entirely sure what you mean by "spread".
If you want an output table that has a value in every row like:
598423
4803510
599121
Then you could use explode(split(data,',')
Otherwise, if each input row has exactly 16 numbers and you want each of the numbers to reside in a different column, you have two options:
Define the comma as a delimiter for the input table ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
Split a single column into 16 columns using the split UDF: SELECT split(data,',')[0] as col1, split(data,',')[1] as col2, ...
I am importing data into redshift using the SQL COPY statement. The data has comma thousands separators in the numeric fields which the COPY statement rejects.
The COPY statement has a number of options to specify field separators, date and time formats and NULL values. However I do not see anything to specify number formatting.
Do I need to preprocess the data before loading or is there a way to get redshift to parse the numbers corerctly?
Import the columns as TEXT data type in a temporary table
Insert the temporary table to your target table. Have your SELECT statement for the INSERT replace commas with empty strings, and cast the values to the correct numeric type.
I am exporting a csv file into hive table.
about the csv file : column values are enclosed within double-quotes , seperated by comma .
Sample record from csv
"4","good"
"3","not bad"
"1","very worst"
I created a hive table with the following statement,
create external table currys(review_rating string,review_comment string ) row format fields delimited by ',';
Table created .
now I loaded the data using the command load data local inpath and it was successful.
when I query the table,
select * from currys;
The result is :
"4" "good"
"3" "not bad"
"1" "very worst"
instead of
4 good
3 not bad
1 very worst
records are inserted with double-quotes which shouldnt be.
Please let me know how to get rid of this double quote .. any help or guidance is highly appreciated...
Thanks beforehand!
Are you using any serde? If so, then you can write a regex command in the SERDE PROPERTIES to remove the quotes.
Or you can use the csv-serde from here and define the quote character.