how to create hive table with line separator other than \n - hive

My data field contains newline characters in it. We need to include these newline character in the data. Is it possible to create a hive table with embedded newlines?

Add LINES TERMINATED BY '\n' to your create command and replace \n with a char you like.
But really this was the first result to find at google...

Related

How to Copy data from s3 to Redshift with "," in the field values

I am faced with "Extra column(s) found" error while reading the data from S3 to Redshift.
Since my data has 863830 rows an 21 columns, ill give you a small example of how the data is.
create table test_table(
name varchar(500),
age varchar(500)
)
and my data would be
(ABC,12)
(First,Last,25)
where First,last should go into a single columns
Unfortunately, i am unable to do that with this copy command
COPY test_table from 'path'
iam_role 'credentials'
region 'us-east-1'
IGNOREHEADER 1
delimiter as ','
Is there any way to accomodate commas into a field ?
Is it a CSV file that you're trying to load? If so, try loading with CSV format parameter specified in the command, rather than using delimiter ',' parameter. Here's an example -
COPY test_table from 'path'
iam_role 'credentials'
region 'us-east-1'
IGNOREHEADER 1
CSV;
If that doesn't help, you may have to use the ESCAPE parameter. This would need modifications in your file too. Here's an example - https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#r_COPY_command_examples-copy-data-with-the-escape-option
Your data doesn't conform to the CSV specification. See RTF-4180
To store your example data the field with the comma in it needs to be enclosed in " "
ABC,12
"First,Last",25
The parentheses in the data file will also need to be removed as these will be interpreted as part of the data fields.
Alternatively you could change the delimiter of your data from "," to something else like "%". However if this character is in your data then you are right back where you started. Ad hoc delimited files only work if you use a character that will never be in your data. This is why I recommend that you use the more robust CSV specification and use the "CSV" option to COPY.

Use multi-character delimiter in Amazon Redshift COPY command

I am trying to load a data file which has a multi-character delimeter('|~|') to Amazon Redshift DB using the COPY command. Redshift COPY command does not allow for multi-character delimiters.
My data looks like this -
John|~|23|~|Los Angeles|~|USA
Jade|~|27|~|New York|~|USA
When I try to use multi-characters in the COPY command I get "COPY delimiter must be a single character;" error.
My COPY command looks like this -
copy test_data from 's3://abcd/testFile'
credentials 'aws_access_key_id=<redacted>;aws_secret_access_key=<redacted>'
delimiter '|~|'
null as '\0'
acceptinvchars
ignoreheader as 1
MAXERROR 1;
I cannot replace or edit the source files since they are very large(>100GB), so I need a solution within the AWS Redshift paradigm.
If you can't edit the source files, and you can't use a multi-character delimiter, then use | as the delimiter and add additional (fake) columns that will be loaded with ~.
You can then either ignore these columns, or use CREATE TABLE AS to copy the data to a new table but without those columns.
Or, use CREATE VIEW to make a version of that table without the fake columns.

Load data to hive table from file with different delimiter

I want to load data to hive table created with field delimiter by ','. But my load ready file is '|' delimited. How can I specify the delimiter used in file in Load data syntax.
There are two options to manage multiple delimiters:
MultiDelimiter SerDe
Regexp SerDe
With MultoDelimSerde you can define your delimiter as
WITH SERDEPROPERTIES ("field.delim"="[,\\|]"

Configuring delimiter for Hive MR Jobs

Is there any way to configure the delimiter for Hive MR Jobs??
The default delimiter being used by hive internally is "hive delimiter" (/001). My usecase is to configure the delimiter so that i can use any delimiter as per the requirement. In hadoop there is a property "mapred.textoutputformatter.separator" which will set the key-value separator to the value specified for this property..Is there any such way to configure the delimiter in Hive?..I searched many but didn't get any useful links. Please help me.
As of hive-0.11.0, you can write
INSERT OVERWRITE LOCAL DIRECTORY '...'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
SELECT ...
See HIVE-3682 for the complete syntax.
You can try that:
SELECT (rest of your query)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY 'YourChar' (example: FIELDS TERMINATED BY '\t')
You can also use this :-
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim'='-','serialization.format'='-')
This will separate columns using - delimiter but it is specific to LazSimpleSerde.
i guess you are using INSERT OVERWRITE DIRECTORY option to write to a hdfs file.
If you create a hive table on top of the hdfs file with no delimiter, it will take '\001' as delimiter, so you can read the file from a hive table without any issues
If you source table dnt not specify the delimiter in the create schema statement, then you wont be able to change that. You op will always contain the default. And yes the delimiter will be controlled by create schema for the source table. So that isnt configurable either.
I have had a similar issue and ended up modifying 001 as second step after finishing hive MR job.

bcp and backspace (^H) delimiter

I need to parse a flat file which is containing backspace (^H) character delimiter between fields. I need to parse this file and insert into sql server 2005 tables.I tried to use bcp utility along with the format file but I wasn't able to specify the delimiter as backspace.
The default one is tab (\t). There are several other delimiters as well but none to specify backspace. Anyone has any ideas, please do help me.
Also I need to export data from sql server table to fixed length flat file.I tried to use non-xml format file, but always it asks for a delimiter.How can I create a flat file using bcp without any delimiter between the fields?
All above are character files.
This is an ugly workaround, but you could always find something that's not in the flat file, and replace everything in the flat file with that, then use that as the column terminator (using bcp -t that).
Sorry that I'm almost 11 years late on this, hopefully you've already solved your problem but you can use the hexadecimal representation of the backspace character 0x08 to parse your input file and properly delimit your fields which are separated with a backspace character.