Inserting Hive table content with special characters -Tab space and New Line - hive

I am trying to select data with special characters specifically Tab and NewLine, from Hive tables filtering on the where clause. I have tried
I have tried like '%\\n%', like '%\\t%', like '%hex(9)%' etc but they dont seem to work.
Also tried to create a dummy table to insert such data and that does not work too. Please help out.

Use rlike '\\t' for tabs and rlike '\\n' for newlines (use double backslash):
hive> select 'a\tb' rlike '\\t'; --tabs
OK
true
Time taken: 0.075 seconds, Fetched: 1 row(s)
And for newlines:
hive> select 'a\nb' rlike '\\n'; --newline
OK
true
Time taken: 0.454 seconds, Fetched: 1 row(s)
Example of inserting values with newline and tab:
create table test_special_chars as
select 'a\nb' as a union all select 'a\tb';
Newlines are tricky. The thing is that table is text file by default and newline character is interpreted normally as new line, this is why when being selected, it returns one extra row:
select * from test_special_chars;
OK
a
b
a b
Actualy, insert of \n created extra line in the text file. This is what happened.
But if you create ORC table:
create table test_special_chars stored as ORC as select 'a\nb' as a union all select 'a\tb';
It works fine, because ORC is not text format and can store newlines:
select count(*) from test_special_chars where a rlike '\\n';
Returns:
OK
1
Time taken: 40.564 seconds, Fetched: 1 row(s)
When you select a from test_special_chars where a rlike '\\n', on the screen it will be also displayed as two lines, it is interpreted on select, but the difference between ORC and text file is that in ORC newline can be stored in value without creating additional row in the file. This is why rlike '\\n' works with ORC and does not work with textfile (not returning any rows), after inserting in textfile \n creates two separate lines in the file, in the ORC it does not.
And this is how to replace newlines with something else:
select regexp_replace(a,'\\n',' newline ') from test_special_chars where a rlike '\\n';
Result:
OK
a newline b
Time taken: 1.502 seconds, Fetched: 1 row(s)

Related

SQLDeveloper removes spaces from a blankline when inserting or updating

When I execute the script (whithout comments and replacing the "." by spaces):
SET SQLBLANKLINES ON
UPDATE ANY_TABLE SET VARCHAR2_COLUMN='Hello
..... -- 1st empty line with trailing spaces
Kitty
...' -- last empty line with trailing spaces
WHERE ID='something';
it is saved in the database like:
'Hello
-- 1st empty line WITHOUT spaces
Kitty
...' -- last empty line WITH trailing spaces
So the spaces in the 1st blankline are lost but not on the last one. And I am using "SET SQLBLANKLINES ON" !!
Can anybody explain me what I'm doing wrong? Or What misconception I have?
The problem is the same with an insert and is independent of using simple spaces or tabs.
In fact, thanks to Alex Pool, we can simplify the example:
Why are the next queries returning different value lengths?
select length('Hello
...... --6 characters (spaces)
Kitty') from dual;
-- returns 12
select length('Hello
x..... -- 6 characters (an 'x' + 5 spaces)
Kitty') from dual;
-- returns 18
select length('Hello
'||'......
Kitty') from dual;
-- returns 18
What I am using:
SQLDeveloper 20.2.0.175
Oracle database 19
The problem may be related to this other one
The q'[]' function is your friend.
create table multiline_strings
(id integer,
words varchar2(256));
insert into multiline_strings (id, words) values
(
1,
q'[Hello
Kitty
]');
See this question/answer - probably should close this as a duplicate.

Decimal input get rounded in create external table in Hive from CSV

I am creating a table in Hive from a CSV (comma separated) that I have in HDFS. I have three columns - 2 strings and a decimal one (with at max 18 values after the decimal dot and one before). Below, what I do:
CREATE EXTERNAL TABLE IF NOT EXISTS my_table(
col1 STRING, col2 STRING, col_decimal DECIMAL(19,18))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/hdfs_path/folder/';
My col_decimal is being rounded to 0 when below 1 and to 1 when it's not actually decimal (I only have numbers between 0 and 1).
Any idea of what is wrong? By the way, I have also tried DECIMAL(38,37)
Thank you.

Break a row of words into word groups in Hive

I have some text that I would like to break down into two, three, or even four words at a time. I'm trying to pull meaningful phrases.
I have used split and explode to retrieve what I need, but I would like to have the row broken into two or three words at a time. This is what I have so far, which only breaks the row into one word at a time.
select explode(a.text) text
from (select split(text," ") text
from table abc
where id = 123
and date = 2019-08-16
) a
The Output I get:
text
----
thank
you
for
calling
your
tv
is
not
working
?
I would like an output like this:
text
----
Thank you
for calling
your tv
is not
working?
or something like this:
text
----
thank you for calling
your
tv is not working
?
CREATE TABLE IF NOT EXISTS db.test_string
(
text string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS orc
;
INSERT INTO TABLE db.test_string VALUES
('thank you for calling your tv is not working ?');
below is query:
select k,s from db.test_string
lateral view posexplode(split(text,' ')) pe as i,s
lateral view posexplode(split(text,' ')) ne as j,k
where ne.j=pe.i-1
and ne.j%2==0
;
thank you
for calling
your tv
is not
working ?
Time taken: 0.248 seconds, Fetched: 5 row(s)
add above logic to your actual table with where clause and let me know how it goes.

How to convert currency to a number?

i am loading a file which has an amount column and it contains values like 123,56€
when i loaded into hive table , the euro symbol gets replaced by a square box,
and the second thing is that the comma indicates a decimal.
Now i want a regex which can convert this value into 123.56 so basically remove comma and euro symbol.
Try this:-
regexp_extract(regexp_replace('123,56€',',','.' ),'([0-9.]+)', 1)
This will give 123.56
hive> select translate('123,56€',',€','.');
OK
123.56
And if you have unknown currency symbols
hive> select translate('123,56€',translate('123,56€','1234567890',''),'.');
OK
123.56
hive> select regexp_replace('123,56€','(\\d+),(\\d+).','$1.$2');
OK
123.56
and you probably want it as a number
hive> select cast(regexp_replace('123,56€','(\\d+),(\\d+).','$1.$2') as decimal(12,2));
OK
123.56

How can I specify the record delimiter to be used in SQLite's output?

I am using the following command to output the result of an SQL query to a text file:
$sqlite3 my_db.sqlite "SELECT text FROM message;" > out.txt
This gives me output like this:
text for entry 1
text for entry 2
Unfortunately, this breaks down when the text contains a newline:
text for entry 1
text for
entry 2
How can I specify an output delimiter (which I know doesn't exist in the text) for SQLite to use when outputting the data so I can more easily parse the result? E.g.:
text for entry 1
=%=
text for
entry 2
=%=
text for entry 3
Try -separator option for this.
$sqlite3 -separator '=%=' my_db.sqlite "SELECT text FROM message;" > out.txt
Update 1
I quess this is because of '-list' default option. In order to turn this option off you need to change current mode.
This is a list of modes
.mode MODE ?TABLE? Set output mode where MODE is one of:
csv Comma-separated values
column Left-aligned columns. (See .width)
html HTML <table> code
insert SQL insert statements for TABLE
line One value per line
list Values delimited by .separator string
tabs Tab-separated values
tcl TCL list elements
-list Query results will be displayed with the separator (|, by
default) character between each field value. The default.
-separator separator
Set output field separator. Default is '|'.
Found this info here
I had the same question and there is a simpler solution. I found this at https://sqlite.org/cli.html :
.separator COL ?ROW? Change the column and row separators
For example:
sqlite> .separator | ,
sqlite> select * from example_table;
1|3,1|4,1|15,1|21,1|33,2|13,2|16,2|32,
Or with no column separator:
sqlite> .separator '' ,
sqlite> select * from example_table;
13,14,115,121,133,213,216,232,
Or, to answer the specific question posed above, this is all that is needed:
sqlite> .separator '' \r\n=%=\r\n
sqlite> select * from message;
text for entry 1
=%=
text for
entry 2
=%=
text for entry 3
=%=
In order to seperate columns, you would have to work with group_concat and a seperator.
Query evolution:
SELECT text FROM messages;
SELECT GROUP_CONCAT(text, "=%=") FROM messages;
SELECT GROUP_CONCAT(text, "\r\n=%=\r\n") FROM messages;
// to get rid of the concat comma, use replace OR change seperator
SELECT REPLACE(GROUP_CONCAT(text, "\r\n=%="), ',', '\r\n') FROM messages;
SQLFiddle
Alternative: Sqlite to CSV export (with custom seperator), then work with that.