PostgreSQL count number of times substring occurs in text

PostgreSQL count number of times substring occurs in text - sql

I'm writing a PostgreSQL function to count the number of times a particular text substring occurs in another piece of text. For example, calling count('foobarbaz', 'ba') should return 2.
I understand that to test whether the substring occurs, I use a condition similar to the below:
WHERE 'foobarbaz' like '%ba%'
However, I need it to return 2 for the number of times 'ba' occurs. How can I proceed?
Thanks in advance for your help.

I would highly suggest checking out this answer I posted to "How do you count the occurrences of an anchored string using PostgreSQL?". The chosen answer was shown to be massively slower than an adapted version of regexp_replace(). The overhead of creating the rows, and the running the aggregate is just simply too high.
The fastest way to do this is as follows...
SELECT
(length(str) - length(replace(str, replacestr, '')) )::int
/ length(replacestr)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr);
Here we
Take the length of the string, L1
Subtract from L1 the length of the string with all of the replacements removed L2 to get L3 the difference in string length.
Divide L3 by the length of the replacement to get the occurrences
For comparison that's about five times faster than the method of using regexp_matches() which looks like this.
SELECT count(*)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr)
CROSS JOIN LATERAL regexp_matches(str, replacestr, 'g');

How about use a regular expression:
SELECT count(*)
FROM regexp_matches('foobarbaz', 'ba', 'g');
The 'g' flag repeats multiple matches on a string (not just the first).

There is a
str_count( src, occurence )
function based on
SELECT (length( str ) - length(replace( str, occurrence, '' ))) / length( occurence )
and a
str_countm( src, regexp )
based on the #MikeT-mentioned
SELECT count(*) FROM regexp_matches( str, regexp, 'g')
available here: postgres-utils

Try with:
SELECT array_length (string_to_array ('1524215121518546516323203210856879', '1'), 1) - 1
--RESULT: 7

Related

Regex: how to get the text between a few colons?

So, i have a lot of strings like the ones below in my database:
product1:1stparty:single_aduls:android:
product2:3rdparty:married_adults:ios:
product3:3rdparty:other_adults:android:
I need a regex to get only the text after the product name and before the device category. So, in the first line I'd get 1stparty:single_aduls, in the second 3rdparty:married_adults and in the third 3rdparty:other_adults. I'm stuck and can't find a way to solve that. Could anyone help me please?

As a regular expression, you can use:
select regexp_extract('product1:1stparty:single_aduls:android:', '^[^:]*:(.*):[^:]*:$')
This returns every after the first colon and before the penultimate colon.

We can try using REGEXP_REPLACE here:
SELECT REGEXP_REPLACE(val, r"^.*?:|:[^:]+:$", "") AS output
FROM yourTable;
This approach removes either the leading ...: or trailing :...: from the column, leaving behind the content you want. Here is a demo showing that the regex replacement is working:
Demo

You can also use standard split function and access result array element by index, which is quite clear to read and understand.
with a as (
select split('product1:1stparty:single_aduls:android:', ':') as splitted
)
select splitted[ordinal(2)] || ':' || splitted[ordinal (3)] as subs
from a

Consider below example
with your_table as (
select 'product1:1stparty:single_aduls:android:' txt union all
select 'product2:3rdparty:married_adults:ios:' union all
select 'product3:3rdparty:other_adults:android:'
)
select *,
(
select string_agg(part, ':' order by offset)
from unnest(split(txt, ':')) part with offset
where offset in (1, 2)
) result
from your_table
with output

How to replace characters at specific position in several words using REGEX_REPLACE

I have a query similar to this:
SELECT YEAR_CODE FROM YEAR_CODES
and it returns several records: typically 1 but sometimes 2 or 3. The returned records look like this: 2018FOO, 2019BAR
I need to get the matching previous year of the returned codes. For instance:
2018FOO becomes 2017FOO
2019BAR becomes 2018BAR
Looking for something similar to:
REGEX_REPLACE(SELECT YEAR_CODE FROM YEAR_CODES, 4th character, 4th character minus 1)

You don't need regexp_replace(), using substr() string operator with concat() function (or concatenation operators ||) is enough :
with year_codes(year_code) as
(
select '2018FOO' from dual union all
select '2019BAR' from dual
)
select concat(substr(year_code,1,4) - 1,substr(year_code,-3)) as year_code
from year_codes;
YEAR_CODE
---------
2017FOO
2018BAR
to_number() conversion is redundant, since Oracle implicitly considers a string as a number which is completely composed of digits for an arithmetic operation.

You can do use string operations:
with c as (
<your query here>
)
select
from year_code yc
where to_number(substr(yc.code, 1, 4)) = to_number(substr(c.code)) - 1 and
substr(yc.code, 5) = substr(c.code, 5)

Concatenate & Trim String

Can anyone help me, I have a problem regarding on how can I get the below result of data. refer to below sample data. So the logic for this is first I want delete the letters before the number and if i get that same thing goes on , I will delete the numbers before the letter so I can get my desired result.
Table:
SALV3000640PIX32BLU
SALV3334470A9CARBONGRY
TP3000620PIXL128BLK
Desired Output:
PIX32BLU
A9CARBONGRY
PIXL128BLK

You need to use a combination of the SUBSTRING and PATINDEX Functions
SELECT
SUBSTRING(SUBSTRING(fielda,PATINDEX('%[^a-z]%',fielda),99),PATINDEX('%[^0-9]%',SUBSTRING(fielda,PATINDEX('%[^a-z]%',fielda),99)),99) AS youroutput
FROM yourtable
Input
yourtable
fielda
SALV3000640PIX32BLU
SALV3334470A9CARBONGRY
TP3000620PIXL128BLK
Output
youroutput
PIX32BLU
A9CARBONGRY
PIXL128BLK
SQL Fiddle:http://sqlfiddle.com/#!6/5722b6/29/0

To do this you can use
PATINDEX('%[0-9]%',FieldName)
which will give you the position of the first number, then trim off any letters before this using SUBSTRING or other string functions. (You need to trim away the first letters before continuing with the next step because unlike CHARINDEX there is no starting point parameter in the PATINDEX function).
Then on the remaining string use
PATINDEX('%[a-z]%',FieldName)
to find the position of the first letter in the remaining string. Now trim off the numbers in front using SUBSTRING etc.
You may find this other solution helpful
SQL to find first non-numeric character in a string

Try this it may helps you
;With cte (Data)
AS
(
SELECT 'SALV3000640PIX32BLU' UNION ALL
SELECT 'SALV3334470A9CARBONGRY' UNION ALL
SELECT 'SALV3334470A9CARBONGRY' UNION ALL
SELECT 'SALV3334470B9CARBONGRY' UNION ALL
SELECT 'SALV3334470D9CARBONGRY' UNION ALL
SELECT 'TP3000620PIXL128BLK'
)
SELECT * , CASE WHEN CHARINDEX('PIX',Data)>0 THEN SUBSTRING(Data,CHARINDEX('PIX',Data),LEN(Data))
WHEN CHARINDEX('A9C',Data)>0 THEN SUBSTRING(Data,CHARINDEX('A9C',Data),LEN(Data))
ELSE NULL END AS DesiredResult FROM cte
Result
Data DesiredResult
-------------------------------------
SALV3000640PIX32BLU PIX32BLU
SALV3334470A9CARBONGRY A9CARBONGRY
SALV3334470A9CARBONGRY A9CARBONGRY
SALV3334470B9CARBONGRY NULL
SALV3334470D9CARBONGRY NULL
TP3000620PIXL128BLK PIXL128BLK

to_number from char sql

I have to select only the IDs which have only even digits (an ID looks like: p19 ,p20 etc). That is, p20 is good (both 2 and 0 are even digits); p18 is not.
I thought to use substr to get each number from the IDs and then see if it's even .
select from profs
where to_number(substr(id_prof,2,2))%2=0 and to_number(substr(id_prof,3,2))%2=0;

IF you need all rows consist of 'p' in beginning and even digits on tail It should look like:
select *
from profs
where regexp_like (id_prof, '^p[24680]+$');

with
profs ( prof_id ) as (
select 'p18' from dual union all
select 'p24' from dual union all
select 'p53' from dual
)
-- End of test data; what is above this line is NOT part of the solution.
-- The solution (SQL query) begins here.
select *
from profs
where length(prof_id) = length(translate(prof_id, '013579', '0'));
PROF_ID
-------
p24
This solution should work faster than anything using regular expressions. All it does is to replace 0 with itself and DELETE all odd digits from the input string. (The '0' is included due to a strange but documented behavior of translate() - the third argument can't be empty). If the length of the input string doesn't change after the translation, that means the input string didn't have any odd digits.

where mod(to_number(regexp_replace(id_prof, '[^[:digit:]]', '')),2) = 0

SQL Summing digits of a number

i'm using presto. I have an ID field which is numeric. I want a column that adds up the digits within the id. So if ID=1234, I want a column that outputs 10 i.e 1+2+3+4.
I could use substring to extract each digit and sum it but is there a function I can use or simpler way?

You can combine regexp_extract_all from #akuhn's answer with lambda support recently added to Presto. That way you don't need to unnest. The code would be really self explanatory if not the need for cast to and from varchar:
presto> select
reduce(
regexp_extract_all(cast(x as varchar), '\d'), -- split into digits array
0, -- initial reduction element
(s, x) -> s + cast(x as integer), -- reduction function
s -> s -- finalization
) sum_of_digits
from (values 1234) t(x);
sum_of_digits
---------------
10
(1 row)

If I'm reading your question correctly you want to avoid having to hardcode a substring grab for each numeral in the ID, like substring (ID,1,1) + substring (ID,2,1) + ...substring (ID,n,1). Which is inelegant and only works if all your ID values are the same length anyway.
What you can do instead is use a recursive CTE. Doing it this way works for ID fields with variable value lengths too.
Disclaimer: This does still technically use substring, but it does not do the clumsy hardcode grab
WITH recur (ID, place, ID_sum)
AS
(
SELECT ID, 1 , CAST(substring(CAST(ID as varchar),1,1) as int)
FROM SO_rbase
UNION ALL
SELECT ID, place + 1, ID_sum + substring(CAST(ID as varchar),place+1,1)
FROM recur
WHERE len(ID) >= place + 1
)
SELECT ID, max(ID_SUM) as ID_sum
FROM recur
GROUP BY ID

First use REGEXP_EXTRACT_ALL to split the string. Then use CROSS JOIN UNNEST GROUP BY to group the extracted digits by their number and sum over them.
Here,
WITH my_table AS (SELECT * FROM (VALUES ('12345'), ('42'), ('789')) AS a (num))
SELECT
num,
SUM(CAST(digit AS BIGINT))
FROM
my_table
CROSS JOIN
UNNEST(REGEXP_EXTRACT_ALL(num,'\d')) AS b (digit)
GROUP BY
num
;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

PostgreSQL count number of times substring occurs in text - sql

How about use a regular expression: SELECT count(*) FROM regexp_matches('foobarbaz', 'ba', 'g'); The 'g' flag repeats multiple matches on a string (not just the first).

There is a str_count( src, occurence ) function based on SELECT (length( str ) - length(replace( str, occurrence, '' ))) / length( occurence ) and a str_countm( src, regexp ) based on the #MikeT-mentioned SELECT count(*) FROM regexp_matches( str, regexp, 'g') available here: postgres-utils

Try with: SELECT array_length (string_to_array ('1524215121518546516323203210856879', '1'), 1) - 1 --RESULT: 7

Related

Regex: how to get the text between a few colons?

How to replace characters at specific position in several words using REGEX_REPLACE

Concatenate & Trim String

to_number from char sql

SQL Summing digits of a number

Categories

Resources