Removing special characters using Hive - hive

I have data stored in Cassandra 1.2 as shown below. There is special character under sValue - highlighted as bold. How can I use hive function to remove this ?
Date | Timestam | payload_Timestamp | actDate | actHour | actMinute | sDesc | sName | sValue
---------------------------------+--------------------------------------+--------------------------+----------------------+----------------------+------------------------+---------------------------+--------------------------------+---------------------
2014-06-25 00:00:00-0400 | 2014-06-25 08:31:23-0400 | 2014-06-25 08:31:23-0400 | 06-25-2014 | 8 | 31 | lable | /t1/t2/100/200/11/99 | 2743326591.03\x00

You can use regexp_replace() function.
More details available on
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Related

Amending the output by adding extra value in SQL

I want to amending the output from SQL table for instance adding extra text or element from the selective table in SQL. And below query is not able to execute as facing mismatch input.
select date, '123' & text from database123
Normal Output
| Date | Text |
| -------- | ------|
| 01/01/2021 | Car |
| 01/02/2021 | Car |
Expecting Output
| Date | Text |
| -------- | ------ |
| 01/01/2021 | 123Car |
| 01/02/2021 | 123Car |
You can use concat or ||:
SELECT concat('123', text), '123' || text
FROM database123

Filtering records not containing numbers

I have a table that has numbers in string format. Ideally the table should contain 10 digit number in string format, but it has many junk values. I wanted to filter out the records that are not ideal in nature.
Below is the sample table that I have:
+---------------+--------+----------------------------------+
| ID_UID | Length | ##Comment |
+---------------+--------+----------------------------------+
| +112323456705 | 13 | Contains special character |
| 4323456432 | 11 | Contains blank |
| 3423122334 | 10 | As expected, 10 character number |
| 6758439239 | 10 | As expected, 10 character number |
| 58_4323129 | 10 | Contains special character |
| 4567$%6790 | 10 | Contains special character |
| 45684938901 | 11 | Is 11 characters |
| 4568 38901 | 10 | Contains blank |
+---------------+--------+----------------------------------+
Expected Output:
+---------------+--------+----------------------------+
| ID_UID | Length | ##Comment |
+---------------+--------+----------------------------+
| +112323456705 | 13 | Contains special character |
| 4323456432 | 11 | Contains blank |
| 58_4323129 | 10 | Contains special character |
| 4567$%6790 | 10 | Contains special character |
| 45684938901 | 11 | Is 11 characters |
| 4568 38901 | 10 | Contains blank |
+---------------+--------+----------------------------+
Basically I want all the records that dont have 10 digit numbers in them.
I have tried out below query:
SELECT *
FROM t1
WHERE ID_UID LIKE '%[^0-9]%'
But this does not returns any records.
Have created a fiddle for the same.
P.S. The columns length and ##Comment are illustrative in nature.
You want RLIKE not LIKE:
SELECT *
FROM t1
WHERE ID_UID RLIKE '[^0-9]'
Note that % is a LIKE wildcard, not a regular expression wildcard. Also, regular expressions match the pattern anywhere it occurs, so no wildcards are needed for the beginning and end of the string.
If you want to find values that are not ten digits, then be explicit:
SELECT *
FROM t1
WHERE ID_UID NOT RLIKE '^[0-9]{10}$'

Adding space at the end of a line in Gherkin

My question is:
I have a datatable with some values. and i want to the end of one of the values to have a whitespace.
eg.
| Name | Surname | Statement |
| AO | PO | This is a statement |
i want to add a whitespace after the word statement. How can i do that?
Depending on your stepdefinitions (using {string}, {word} or the ^...$ notation) you can do
| Name | Surname | Statement |
| AO | PO | "This is a statement " |
Or use the whitespace character
| Name | Surname | Statement |
| AO | PO | This is a statement\s |

Hive: Format string to look like phone number

I have phone numbers saved as text in a column of my table. How can i format it to look like some phone number format using hive.
Phone number Formatted
2076234568 207-623-4568
2079425555 207-942-5555
3178723275 317-872-3275
2072367033 207-236-7033
2077832249 207-783-2249
select Phone_number
,regexp_replace(Phone_number,'(.{3})(.{3})(.{4})','$1-$2-$3') as Formatted
from t
;
+---------------+---------------+
| phone_number | formatted |
+---------------+---------------+
| 2076234568 | 207-623-4568 |
| 2079425555 | 207-942-5555 |
| 3178723275 | 317-872-3275 |
| 2072367033 | 207-236-7033 |
| 2077832249 | 207-783-2249 |
+---------------+---------------+

Replacing multiple strings from a databsae column with distinct replacements

I have a hive table as below:
+----+---------------+-------------+
| id | name | partnership |
+----+---------------+-------------+
| 1 | sachin sourav | first |
| 2 | sachin sehwag | first |
| 3 | sourav sehwag | first |
| 4 | sachin_sourav | first |
+----+---------------+-------------+
In this table I need to replace strings such as "sachin" with "ST" and "Sourav" with "SG". I am using following query, but it is not solving the purpose.
Query:
select
*,
case
when name regexp('\\bsachin\\b')
then regexp_replace(name,'sachin','ST')
when name regexp('\\bsourav\\b')
then regexp_replace(name,'sourav','SG')
else name
end as newName
from sample1;
Result:
+----+---------------+-------------+---------------+
| id | name | partnership | newname |
+----+---------------+-------------+---------------+
| 4 | sachin_sourav | first | sachin_sourav |
| 3 | sourav sehwag | first | SG sehwag |
| 2 | sachin sehwag | first | ST sehwag |
| 1 | sachin sourav | first | ST sourav |
+----+---------------+-------------+---------------+
Problem: My intention is, when id = 1, the newName column should bring value as "ST SG". I mean it should replace both strings.
You can nest the replaces:
select s.*,
replace(replace(s.name, 'sachin', 'ST'), 'sourav', 'SG') as newName
from sample1 s;
You don't need regular expressions, so just use replace().