Hive: Format string to look like phone number - hive

I have phone numbers saved as text in a column of my table. How can i format it to look like some phone number format using hive.
Phone number Formatted
2076234568 207-623-4568
2079425555 207-942-5555
3178723275 317-872-3275
2072367033 207-236-7033
2077832249 207-783-2249

select Phone_number
,regexp_replace(Phone_number,'(.{3})(.{3})(.{4})','$1-$2-$3') as Formatted
from t
;
+---------------+---------------+
| phone_number | formatted |
+---------------+---------------+
| 2076234568 | 207-623-4568 |
| 2079425555 | 207-942-5555 |
| 3178723275 | 317-872-3275 |
| 2072367033 | 207-236-7033 |
| 2077832249 | 207-783-2249 |
+---------------+---------------+

Related

Loop to find multiple minimum and maximum values

I have a table (tblProduct) with a field (SerialNum).
I am trying to find multiple minimum and maximum values from the field SerialNum, or better put: ranges of sequential serial numbers.
The serial numbers are 5 digits and a letter. Most of the values are sequential, but NOT all!
I need the output for a report to look something like:
00001A - 00014A
00175A - 00180A
00540A - 00549A
12345A - 12349A
04500B - 04503B
04522B - 04529B
04595B
04627B - 04631B
If the values in-between are present.
I tried a loop, but I realized I was using record sets. I need one serial num to be compared to ALL the ranges. Record sets were looking at one range.
I have been able to determine the max and min of the entire series, but not of each sequential group.
| SerialNum |
| -------- |
| 00001A|
| 00002A|
| 00003A|
| 00004A|
| 00005A|
| 00006A|
| 00007A|
| 00008A|
| 00009A|
| 00010A|
| 00011A|
| 00012A|
| 00013A|
| 00014A|
| 00175A|
| 00176A|
| 00177A|
| 00178A|
| 00179A|
| 00180A|
| 00540A|
| 00541A|
| 00542A|
| 00543A|
| 00544A|
| 00545A|
| 00546A|
| 00547A|
| 00548A|
| 00549A|
| 12345A|
| 12346A|
| 12347A|
| 12348A|
| 12349A|
| 04500B|
| 04501B|
| 04502B|
| 04503B|
| 04522B|
| 04523B|
| 04524B|
| 04525B|
| 04526B|
| 04527B|
| 04528B|
| 04529B|
| 04595B|
| 04627B|
| 04628B|
| 04629B|
| 04630B|
| 04631B|
Try to group by the number found with Val:
Select
Min(SerialNum) As MinimumSerialNum,
Max(SerialNum) As MaximumSerialNum
From
tblProduct
Group By
Val(SerialNum)

Postgresql query substract from one table

I have a one tables in Postgresql and cannot find how to build a query.
The table contains columns nr_serii and deleteing_time. I trying to count nr_serii and substract from this positions with deleting_time.
My query:
select nr_serii , count(nr_serii ) as ilosc,count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii, deleting_time
output is:
+--------------------+
| "666666";1;1 |
| "456456";1;0 |
| "333333";3;0 |
| "333333";1;1 |
| "111111";1;1 |
| "111111";3;0 |
+--------------------+
The part of table with raw data:
+--------------------------------+
| "666666";"2020-11-20 14:08:13" |
| "456456";"" |
| "333333";"" |
| "333333";"" |
| "333333";"" |
| "333333";"2020-11-20 14:02:23" |
| "111111";"" |
| "111111";"" |
| "111111";"2020-11-20 14:08:04" |
| "111111";"" |
+--------------------------------+
And i need substract column ilosc and column ilosc_delete
example:
nr_serii:333333 ilosc:3-1=2
Expected output:
+-------------+
| "666666";-1 |
| "456456";1 |
| "333333";2 |
| "111111";2 |
| ... |
+-------------+
I think this is very simple solution for this but i have empty in my head.
I see what you want now. You want to subtract the number where deleting_time is not null from the ones where it is null:
select nr_serii,
count(*) filter (where deleting_time is null) - count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii;
Here is a db<>fiddle.

Filtering records not containing numbers

I have a table that has numbers in string format. Ideally the table should contain 10 digit number in string format, but it has many junk values. I wanted to filter out the records that are not ideal in nature.
Below is the sample table that I have:
+---------------+--------+----------------------------------+
| ID_UID | Length | ##Comment |
+---------------+--------+----------------------------------+
| +112323456705 | 13 | Contains special character |
| 4323456432 | 11 | Contains blank |
| 3423122334 | 10 | As expected, 10 character number |
| 6758439239 | 10 | As expected, 10 character number |
| 58_4323129 | 10 | Contains special character |
| 4567$%6790 | 10 | Contains special character |
| 45684938901 | 11 | Is 11 characters |
| 4568 38901 | 10 | Contains blank |
+---------------+--------+----------------------------------+
Expected Output:
+---------------+--------+----------------------------+
| ID_UID | Length | ##Comment |
+---------------+--------+----------------------------+
| +112323456705 | 13 | Contains special character |
| 4323456432 | 11 | Contains blank |
| 58_4323129 | 10 | Contains special character |
| 4567$%6790 | 10 | Contains special character |
| 45684938901 | 11 | Is 11 characters |
| 4568 38901 | 10 | Contains blank |
+---------------+--------+----------------------------+
Basically I want all the records that dont have 10 digit numbers in them.
I have tried out below query:
SELECT *
FROM t1
WHERE ID_UID LIKE '%[^0-9]%'
But this does not returns any records.
Have created a fiddle for the same.
P.S. The columns length and ##Comment are illustrative in nature.
You want RLIKE not LIKE:
SELECT *
FROM t1
WHERE ID_UID RLIKE '[^0-9]'
Note that % is a LIKE wildcard, not a regular expression wildcard. Also, regular expressions match the pattern anywhere it occurs, so no wildcards are needed for the beginning and end of the string.
If you want to find values that are not ten digits, then be explicit:
SELECT *
FROM t1
WHERE ID_UID NOT RLIKE '^[0-9]{10}$'

Create external table from csv on HDFS , all values come with quotes

I have a csv file on HDFS and I am trying to create an impala table , the situation is it created the table and values with all the "
CREATE external TABLE abc.def
(
name STRING,
title STRING,
last STRING,
pno STRING
)
row format delimited fields terminated by ','
location 'hdfs:pathlocation'
tblproperties ("skip.header.line.count"="1") ;
The output is
name tile last pno
"abc" "mr" "xyz" "1234"
"rew" "ms" "pre" "654"
I just want to create table from csv file without quotes. Please guide where I am going wrong.
Regards,
R
A way to do that is creating a stage table that load the file with quotes and then with CTAS (Create table as select) create the right table cleaning the fields with replace function.
As an example
CREATE TABLE quote_stage(
id STRING,
name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
+-----+----------+
| id | name |
+-----+----------+
| "1" | "pepe" |
| "2" | "ana" |
| "3" | "maria" |
| "4" | "ramon" |
| "5" | "lucia" |
| "6" | "carmen" |
| "7" | "alicia" |
| "8" | "pedro" |
+-----+----------+
CREATE TABLE t_quote
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
AS SELECT replace(id,'"','') AS id, replace(name,'"','') AS name FROM quote_stage;
+----+--------+
| id | name |
+----+--------+
| 1 | pepe |
| 2 | ana |
| 3 | maria |
| 4 | ramon |
| 5 | lucia |
| 6 | carmen |
| 7 | alicia |
| 8 | pedro |
+----+--------+
Hope this helps.

Adding space at the end of a line in Gherkin

My question is:
I have a datatable with some values. and i want to the end of one of the values to have a whitespace.
eg.
| Name | Surname | Statement |
| AO | PO | This is a statement |
i want to add a whitespace after the word statement. How can i do that?
Depending on your stepdefinitions (using {string}, {word} or the ^...$ notation) you can do
| Name | Surname | Statement |
| AO | PO | "This is a statement " |
Or use the whitespace character
| Name | Surname | Statement |
| AO | PO | This is a statement\s |