Db2 locate_in_string equivalent in PostgreSQL - sql

while migration from DB2 to PostgreSQL, i found some views using db2's locate_in_string() function, which returns the position of a specified instance of a given substring.
For example:
LOCATE_IN_STRING('aaabaabbaaaab','b',1,3); -- returns 8, for the 3d instance of 'b'
LOCATE_IN_STRING('aaabaabbaaaab','b',1,1); -- returns 4, for the 1st instance of 'b'
Unfortunately PostgreSQLs function position() gives me only the position for the first instance.
I didn't find something similar in PostgreSQL.
Is there any alternative or workaround (maybe regex?)?

There may be a different method, this is rather brute force.
It splits the string based on the pattern you are looking for. It then adds up the length of the pieces:
select v.*,
(select coalesce(sum(length(el)), 0) + count(*) * length(v.splitter)
from unnest( (regexp_split_to_array(v.val, v.splitter))[1:v.n] ) el
) as pos
from (values ('aaabaabbaaaab', 3, 'b'), ('aaabaabbaaaab', 1, 'b')
) v(val, n, splitter);

Related

Trimming value from a column in Snowflake

I have column called File with values 'Mens_Purchaser_Segment_Report' and 'Loyalist_Audience_Segment_Report'. I want to capture everything that comes before word Segment.
I used query:
select
TRIM(file,regexp_substr(file, '_Segment_Report.*')) as new_col
Output:
Mens_Purch
Loyalist_Audi
How do I capture everything before Segment?
Tried below but same results-->
TRIM(file,regexp_substr(file, 'S.*'))
TRIM(file,regexp_substr(file, '_S.*'))
You didn't specify if the trailing text is always _Segment_Report, you're asking for any text before _Segment. Depending on that various solutions can be used, see below.
create or replace table foo(s string) as select * from values
('Mens_Purchaser_Segment_Report'),
('Loyalist_Audience_Segment_Report');
-- If you know the suffix you want to remove is always exactly '_Segment_Report'
select s, replace(s, '_Segment_Report', '') from foo;
-- If you know the suffix you want to remove starts with '_Segment' but can have something after
-- - approach 1, where we replace the _Segment and anything after it with nothing
select s, regexp_replace(s, '_Segment.*', '') from foo;
-- - approach 2, where we extract things before _Segment
-- Note: it will behave differently if there are many instances of '_Segment'
select s, regexp_substr(s, '(.*)_Segment.*', 1, 1, 'e') from foo;
try
using regexp_replace
select regexp_replace(fld1, 'Segment', '') from (
select 'Mens_Purchaser_Segment_Report and Loyalist_Audience_Segment_Report' fld1 from dual );

SQL Array with Null

I'm trying to group BigQuery columns using an array like so:
with test as (
select 1 as A, 2 as B
union all
select 3, null
)
select *,
[A,B] as grouped_columns
from test
However, this won't work, since there is a null value in column B row 2.
In fact this won't work either:
select [1, null] as test_array
When reading the documentation on BigQuery though, it says Nulls should be allowed.
In BigQuery, an array is an ordered list consisting of zero or more
values of the same data type. You can construct arrays of simple data
types, such as INT64, and complex data types, such as STRUCTs. The
current exception to this is the ARRAY data type: arrays of arrays are
not supported. Arrays can include NULL values.
There doesn't seem to be any attributes or safe prefix to be used with ARRAY() to handle nulls.
So what is the best approach for this?
Per documentation - for Array type
Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.
So, as of your example - you can use below "trick"
with test as (
select 1 as A, 2 as B union all
select 3, null
)
select *,
array(select cast(el as int64) el
from unnest(split(translate(format('%t', t), '()', ''), ', ')) el
where el != 'NULL'
) as grouped_columns
from test t
above gives below output
Note: above approach does not require explicit referencing to all involved columns!
My current solution---and I'm not a fan of it---is to use a combo of IFNULL(), UNNEST() and ARRAY() like so:
select
*,
array(
select *
from unnest(
[
ifnull(A, ''),
ifnull(B, '')
]
) as grouping
where grouping <> ''
) as grouped_columns
from test
An alternative way, you can replace NULL value to some NON-NULL figures using function IFNULL(null, 0) as given below:-
with test as (
select 1 as A, 2 as B
union all
select 3, IFNULL(null, 0)
)
select *,
[A,B] as grouped_columns
from test

Select hardcoded values in Informix DB

I need to select hardcoded values in one column, so I will be able to join them with table in Informix DB. So I try in different variations to do something like this:
select a from ( values (1), (2), (3) ) ;
And I expect to get results:
1
2
3
I think in other DB this or some other variations that I tried would return the values. However, in Informix it does not work.
Could anyone suggest the solution working in Informix please?
Although what Gordon Linoff suggests will certainly work, there are also more compact notations available using Informix-specific syntax.
For example:
SELECT a
FROM TABLE(SET{1, 2, 3}) AS t(a)
This will generate a list of integers quite happily (and succinctly). You can use LIST or MULTISET in place of SET. A MULTISET can have repeated elements, unlike a SET; a LIST preserves order as well as allowing repeats.
Very often, you won't spot order not being preserved with simple values — just a few items in the list. Order is not guaranteed for SET or MULTISET; if order matters, use LIST.
You can find information about this in the IBM Informix 12.10 manual under Collection Constructors. No, it isn't obvious how you get to it — I started at SELECT, then FROM, then 'Selecting from a collection variable' and thence to 'Expression'; I spent a few seconds staring blankly at that, then looked at 'Constructor expressions' and hence 'Collection Constructors'.
INSERT INTO cccmte_pp ( cmte, pref, nro, eje, id_tri, id_cuo, fecha, vto1, vto2, id_tit, id_suj, id_bie, id_gru )
SELECT *
FROM TABLE (MULTISET {
row('RC', 4, 10, 2020, 1, 5, MDY(05,20,2020), MDY(05,20,2020),MDY(05,27,2020),101, 1, 96, 1 ),
row('RC', 4, 11, 2020, 1, 5, MDY(05,20,2020), MDY(05,20,2020),MDY(05,27,2020),101, 1, 96, 1 )
})
AS t( cmte, pref, nro, eje, id_tri, id_cuo, fecha, vto1, vto2, id_tit, id_suj, id_bie, id_gru )
IS SIMPLE SOLUTION FOR BULK INSERT, and SELECT PART SOLVING THE REST!
IS VERY SIMPLE! :) ENJOY
Informix requires an actual query statement. I think this will work:
select a
from (select 1 as a from systables where tabid = 1 union all
select 2 as a from systables where tabid = 1 union all
select 3 as a from systables where tabid = 1
) t;

PostgreSQL count number of times substring occurs in text

I'm writing a PostgreSQL function to count the number of times a particular text substring occurs in another piece of text. For example, calling count('foobarbaz', 'ba') should return 2.
I understand that to test whether the substring occurs, I use a condition similar to the below:
WHERE 'foobarbaz' like '%ba%'
However, I need it to return 2 for the number of times 'ba' occurs. How can I proceed?
Thanks in advance for your help.
I would highly suggest checking out this answer I posted to "How do you count the occurrences of an anchored string using PostgreSQL?". The chosen answer was shown to be massively slower than an adapted version of regexp_replace(). The overhead of creating the rows, and the running the aggregate is just simply too high.
The fastest way to do this is as follows...
SELECT
(length(str) - length(replace(str, replacestr, '')) )::int
/ length(replacestr)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr);
Here we
Take the length of the string, L1
Subtract from L1 the length of the string with all of the replacements removed L2 to get L3 the difference in string length.
Divide L3 by the length of the replacement to get the occurrences
For comparison that's about five times faster than the method of using regexp_matches() which looks like this.
SELECT count(*)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr)
CROSS JOIN LATERAL regexp_matches(str, replacestr, 'g');
How about use a regular expression:
SELECT count(*)
FROM regexp_matches('foobarbaz', 'ba', 'g');
The 'g' flag repeats multiple matches on a string (not just the first).
There is a
str_count( src, occurence )
function based on
SELECT (length( str ) - length(replace( str, occurrence, '' ))) / length( occurence )
and a
str_countm( src, regexp )
based on the #MikeT-mentioned
SELECT count(*) FROM regexp_matches( str, regexp, 'g')
available here: postgres-utils
Try with:
SELECT array_length (string_to_array ('1524215121518546516323203210856879', '1'), 1) - 1
--RESULT: 7

Selecting data into a Postgres array

I have the following data:
name id url
John 1 someurl.com
Matt 2 cool.com
Sam 3 stackoverflow.com
How can I write an SQL statement in Postgres to select this data into a multi-dimensional array, i.e.:
{{John, 1, someurl.com}, {Matt, 2, cool.com}, {Sam, 3, stackoverflow.com}}
I've seen this kind of array usage before in Postgres but have no idea how to select data from a table into this array format.
Assuming here that all the columns are of type text.
You cannot use array_agg() to produce multi-dimensional arrays, at least not up to PostgreSQL 9.4.
(But the upcoming Postgres 9.5 ships a new variant of array_agg() that can!)
What you get out of #Matt Ball's query is an array of records (the_table[]).
An array can only hold elements of the same base type. You obviously have number and string types. Convert all columns (that aren't already) to text to make it work.
You can create an aggregate function for this like I demonstrated to you here before.
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[name, id::text, url]]) AS tbl_mult_arr
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array (2-dimenstional, to be precise).
Instant demo:
WITH tbl(id, txt) AS (
VALUES
(1::int, 'foo'::text)
,(2, 'bar')
,(3, '}b",') -- txt has meta-characters
)
, x AS (
SELECT array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS t
FROM tbl
)
SELECT *, t[1][3] AS arr_element_1_1, t[3][4] AS arr_element_3_2
FROM x;
You need to use an aggregate function; array_agg should do what you need.
SELECT array_agg(s) FROM (SELECT name, id, url FROM the_table ORDER BY id) AS s;