Hive: using quote character as delimiter in data files - hive

Can we use quote (" or ') as delimiter in hive data files? If not why?
If we could refer to a list of characters which we can use as delimiters for hive data, that would be great.

When using the decimal notation, you can use the whole basic ascii range (decimal 0-127) - tested.
Avoid using \n or\r.
As for " and ', it can be done straightforward -
create table mytable (i int,j int) row format delimited fields terminated by '"';
create table mytable (i int,j int) row format delimited fields terminated by "'";
or
create table mytable (i int,j int) row format delimited fields terminated by '\'';
create table mytable (i int,j int) row format delimited fields terminated by "\"";

Related

Hive - Load delimited data with special character cause off position

Let's say I want to create a simple table with 4 columns in Hive and load some pipe-delimited data.
CREATE table TEST_1 (
COL1 string,
COL2 string,
COL3 string,
COL4 string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
;
Raw Data:
123|456|Dasani Bottled \| Water|789
What I expect for Col3 value is "Dasani Bottled \| Water", it has some special character "\|" in the middle thus cause Hive table column off position starting at COL3 because I create the table using "|" as the delimiter. The special character \| does have a pipe | character within it.
Is there any way to resolve the issue so Hive can load data correctly?
Thanks for any help.
you can add the ESCAPED BY clause to your table creation like this to allow character escaping
CREATE table TEST_1 (
COL1 string,
COL2 string,
COL3 string,
COL4 string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|' ESCAPED BY '\'
;
From the Hive documentation
Enable escaping for the delimiter characters by using the 'ESCAPED BY'
clause (such as ESCAPED BY '\') Escaping is needed if you want to
work with data that can contain these delimiter characters.
A custom NULL format can also be specified using the 'NULL DEFINED AS'
clause (default is '\N').

How to create an external Hive table if the field value has comma separated values

I had used sqoop-import command to sqoop the data into Hive from teradata. Sqoop-import command is creating a text file with comma(,) as the delimiter.
After Sqooping, I had created an external table as shown below:
CREATE EXTERNAL TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, description String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
But description column has values like this:"abc,xyz,mnl". Due to this,loading of data into a hive table is not proper. Then how to create a text file with a delimiter other than comma while sqooping.
Then how to delimit the fields while creating an external table of Hive?
Use --fields-terminated-by in your Sqoop job if you want to avoid the default delimiter.
--fields-terminated-by - This parameter is used for field separator character in output.
Example: --fields-terminated-by |
and then change fields separator in create table statement by FIELDS TERMINATED BY ‘|’

HIVE SQL create statement

CREATE TABLE IF NOT EXISTS user.name_visits(
date1 TIMESTAMP,
MV String,
visits_by_MV int
)
COMMENT ‘visits_at_MV’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
;
It is saying error near BY
Below query worked for me..
CREATE TABLE IF NOT EXISTS user.name_visits(
date1 TIMESTAMP,
MV STRING,
visits_by_MV INT
)
COMMENT 'visits_at_MV'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
;
Error you are seeing could be because of the editor you are using.
If you look at your Quotation marks.. they're LEFT SINGLE QUOTATION MARK and RIGHT SINGLE QUOTATION MARK.
Only change I made was using an APOSTROPHE.
Try this way it should work
Change single quotes with double as below:
CREATE TABLE IF NOT EXISTS user.name_visits(
date1 TIMESTAMP,
MV String,
visits_by_MV int
)
COMMENT "visits_at_MV"
ROW FORMAT DELIMITED
FIELDS TERMINATED BY "\t"
LINES TERMINATED BY "\n"
;

Postgres Copy - Importing an integer with a comma

I am importing 50 CSV data files into postgres. I have an integer field where sometimes the value is a regular number (comma-delimited) and sometimes it is in quotations and uses a comma for the thousands.
For instance, I need to import both 4 and "4,000".
I'm trying:
COPY race_blocks FROM '/census/race-data/al.csv' DELIMITER ',' CSV HEADER;
And get the error:
ERROR: invalid input syntax for integer: "1,133"
How can I do this?
Let's assume you have only one column in your data.
First create temporary table with varchar column:
CREATE TEMP TABLE race_blocks_temp (your_integer_field VARCHAR);
Copy your data from file
COPY race_blocks_tmp FROM '/census/race-data/al.csv' DELIMITER ',' CSV HEADER;
Remove ',' from varchar field, convert data to numeric and insert into your table.
INSERT INTO race_blocks regexp_replace(your_integer_field, ',', '') :: numeric AS some_colun FROM race_blocks_tmp;

SQL Loader - strip LF when loading

I have a flat file loaded using Sql Loader.
I need to add a control when loading to strip all LF inside the values of the column MYFIELD2 for instance.
The columns are separated using '|' and I have the following control file:
LOAD DATA TRUNCATE into table MYTABLE fields terminated by '|'
trailing nullcols
(COD,DAT DATE "YYYYMMDDHH24MISS",
DATMOD DATE "YYYYMMDDHH24MISS",MYFIELD1, MYFIELD2)
Is there a way to do that?
This would work:
SELECT REPLACE(MyColumn, CHAR(10), ' ') FROM MyTable
You may also want to replace CHAR(13).