split(regexp_replace ) Like Function In Presto : 331 - hive

Is there any way to split values based on consecutive 0's in presto.Minimum 6 digits should be there in first split, if digit count is less than 6 than need to consider some 0's as digit then split if digit count is >= 6 then just need to split in 2 groups.
below query is working as expected in Hive.But I am not able to do the same using presto.
select low as orginal_Value,
split(regexp_replace(low,'(\\d{6,}?)(0+)$','$1|$2'),'\\|') Output_Value from test;
Presto Query:
presto> SELECT regexp_split('1234567890000', '(\d{6,}?)(0+)$') as output;
output
[1234567890000]
(1 row)

It worked Now.
select split(regexp_replace('1234567890000','(\d{6,}?)(0+)$','$1|$2'), '|') as output;
enter code here
output
-------------------
[123456789, 0000]

Related

max consecutive digits in a string

I am trying to count the number of MAX consecutive digits that appear in a string column, let me give an example to illustrate better what I am trying to do. If I have a table called email
email
lucas1234#gmail.com
fer12#gmail.com
lupal#gmail.com
carlos1perez222#gmail.com
carlos11perez222#gmail.com
lucila1#gmail.com
my expected output would be
email count_cons_digits
lucas1234#gmail.com 4
fer12#gmail.com 2
lupal#gmail.com 0
carlos1perez222#gmail.com 3
carlos11perez222#gmail.com 3
lucila1#gmail.com 1
Check that this question is very similar to :
Number of consecutive digits in a column string
but the only difference is that the function from the results is not contemplating cases with only one digit in the email (like lucila1#gmail.com). In this case, the expected result should be 1 but the proposed function is giving 0. And also whenever the email contains "two sections" of consecutive digits (carlos11perez222#gmail.com). In this case, the expected output is to be 3 but is given 5.
Consider below approach
select *,
ifnull((select length(digits) len
from unnest(regexp_extract_all(email, r'\d+')) digits
order by len desc
limit 1
), 0) as count_cons_digits
from your_table
if applied to sample data in your question - output is
You may also try this approach using regex:
WITH email AS
(SELECT 'lucas1234#gmail.com' mail,
UNION ALL SELECT 'fer12#gmail.com',
UNION ALL SELECT 'lupal#gmail.com',
UNION ALL SELECT 'carlos1perez222#gmail.com',
UNION ALL SELECT 'carlos11perez222#gmail.com',
UNION ALL SELECT 'lucila1#gmail.com')
SELECT email,
(LENGTH(REGEXP_REPLACE(REGEXP_REPLACE(email.mail, r'[A-Za-z]+\d+[A-Za-z]+', ''),r'[A-Za-z.#]+',''))) AS count_cons_digits,
FROM email;
Output:

How to format a numeric value in Postgresql

I have a set of code that generates a number which can be seen below
SELECT MAX(RIGHT ("node_id",3)::numeric) + 1 as newnum from sewers.structures
WHERE(
ST_WITHIN(
ST_CENTROID((ST_SetSRID(structures.geom, 4326))),
ST_SetSRID((SELECT geom FROM sewers."Qrtr_Qrtr_Sections" WHERE "plat_page" = '510A'),4326)) ) and "node_id" != 'PRIVATE' and "node_id" !='PRIV_SAN' and "node_id" !='PRIV_STORM'
When I run this it generates a number based on the previously placed values. The out put will be a number that can be up to 3 digits. I want to take an output of less than three digits, and force it into a 3 digit format.
For example, if I generate the number 93 I would like to format it as 093. Same for single digit numbers like 2, I want it to be formated at 002 and so on. However, if it generates a 3 digit number, I want it to keep the same format, so 121 stays as 121.
If I got your question right, you're looking for lpad():
WITH j (x) AS (
VALUES (2),(121),(93)
)
SELECT lpad(x::text,3,'0') FROM j;
lpad
------
002
121
093
(3 rows)
Since the output will be a string, you can use to_char with a format of three 0
select to_char(1,'000');
to_char
---------
001
(1 row)

Extracting number of specific length from a string in Postgres

I am trying to extract a set of numbers from comments like
"on april-17 transactions numbers are 12345 / 56789"
"on april-18 transactions numbers are 56789"
"on may-19 no transactions"
Which are stored in a column called "com" in table comments
My requirement is to get the numbers of specific length. In this case length of 5, so 12345 and 56789 from the above string separately, It is possible to to have 0 five digit number or more more than 2 five digit number.
I tried using regexp_replace with the following result, I am trying the find a efficient regex or other method to achieve it
select regexp_replace(com, '[^0-9]',' ', 'g') from comments;
regexp_replace
----------------------------------------------------
17 12345 56789
I expect the result to get only
column1 | column2
12345 56789
There is no easy way to create query which gets an arbitrary number of columns: It cannot create one column for one number and at the next try the query would give two.
For fixed two columns:
demo:db<>fiddle
SELECT
matches[1] AS col1,
matches[2] AS col2
FROM (
SELECT
array_agg(regexp_matches[1]) AS matches
FROM
regexp_matches(
'on april-17 transactions numbers are 12345 / 56789',
'\d{5}',
'g'
)
) s
regexp_matches() gives out all finds in one row per find
array_agg() puts all elements into one array
The array elements can be give out as separate columns.

Replace the digits of a number with subsequent higher digits

Given a number I want to replace each digit with the next digit that is larger. If there is no next larger digit leave the digit as it was.
Eg : Input : 1234, Output - 2344
Since in Oracle we can process everything row by row, I tried first to separate the digits of number into rows by using the below query.
SELECT REGEXP_SUBSTR ('1234','[[:digit:]]',1,LEVEL) txt
FROM dual
CONNECT BY LEVEL <= length('1234');
The query will give me this result.
TXT
----------------
1
2
3
4
But I am stuck in here, how to compare the two rows and replace them with the largest.
Attempted expansion and clarification based on comments:
Treat the number as a string of digits. For each digit, find the first digit among the remaining digits to the right of the current one, that has a higher value than the current digit. That may not be the highest-value digit in the string, or even the highest among all the digits to the right, it is just the first higher value encountered. If there is no higher value then keep the current digit intact. Only consider following digits, preceding ones are ignored.
Some examples:
1234 -> 2344
1357 -> 3577
1157 -> 5577
1245638 -> 2456888
Breaking down the last one:
Digit 1 is 1; the first digit in the remaining string 245638 that is higher than 1 is 2.
Digit 2 is 2; the first digit in the remaining string 45638 that is higher than 2 is 4.
Digit 3 is 4; the first digit in the remaining string 5638 that is higher than 4 is 5.
Digit 4 is 5; the first digit in the remaining string 638 that is higher than 5 is 6.
Digit 5 is 6; the first digit in the remaining string 38 that is higher than 6 is 8.
Digit 6 is 3; the first digit in the remaining string 8 that is higher than 3 is 8.
Digit 7 is 8; no subsequent digit is higher then 8 so keep existing digit 8.
After some clarification in comments:
WITH t AS (
SELECT LEVEL AS pos,
ROWNUM AS txt_order,
REGEXP_SUBSTR ('1245638','[[:digit:]]',1,LEVEL) AS txt
FROM dual
CONNECT BY LEVEL <= LENGTH('1245638')
),
v AS (
SELECT t1.pos, t1.txt,
MIN(t2.txt) KEEP (DENSE_RANK FIRST ORDER BY t2.pos) as new_txt
FROM t t1
LEFT JOIN t t2 ON t2.pos > t1.pos AND t2.txt > t1.txt
GROUP BY t1.pos, t1.txt
)
SELECT LISTAGG(NVL(new_txt, txt), NULL) WITHIN GROUP (ORDER BY pos) AS OUTPUT
FROM v;
OUTPUT
--------
2456888
The t CTE is just your original query. Now the v CTE is finding the first digit later in the list which is larger than the current one; the nvl uses the current digit if there isn't one larger. The listagg just sticks the digits back together in the right order.
SQL Fiddle of the same logic, but using a recursive CTE instead of the connect-by to generate the digits, just so multiple values can be 'converted' in one go from a table. Which gives:
ORIGINAL OUTPUT
---------------------------------------- --------
1234 2344
1157 5577
1357 3577
1245638 2456888

Oracle SQL query count group by timestamp substring

Given a table that has a column of string "timestamps" (yyyyMMddHHmmssSSS format), I want to substring the first 8 characters, and get a count of how many rows have that substring, grouping the results.
Sample data...
TIMESTAMP
20100802123456123
20100803123456123
20100803123456123
20100803123456123
20100804123456123
20100805123456123
20100805123456123
20100805123456123
20100805123456123
20100806123456123
20100807123456123
20100807123456123
...and expected results...
SUBSTRING, COUNT
20100802, 1
20100803, 3
20100804, 1
20100805, 4
20100806, 1
20100807, 2
I know this should be easy, but I'm not having any luck at the moment.
I don't have a database to test with, but it seems like you are looking for
select
substr(timestamp, 1, 8),
count(*)
from
my_table
group by
substr(timestamp, 1, 8);