Extract a string in Postgresql and remove null/empty elements - sql

I Need to extract values from string with Postgresql
But for my special scenario - if an element value is null i want to remove it and bring the next element 1 index closer.
e.g.
assume my string is: "a$$b"
If i will use
select string_to_array('a$$b','$')
The result is:
{a,,b}
If Im trying
SELECT unnest(string_to_array('a__b___d_','_')) EXCEPT SELECT ''
It changes the order
1.d
2.a
3.b
order changes which is bad for me.
I have found a other solution with:
select array_remove( string_to_array(a||','||b||','||c,',') , '')
from (
select
split_part('a__b','_',1) a,
split_part('a__b','_',2) b,
split_part('a__b','_',3) c
) inn
Returns
{a,b}
And then from the Array - i need to extract values by index
e.g. Extract(ARRAY,2)
But this one seems to me like an overkill - is there a better or something simpler to use ?

You can use with ordinality to preserve the index information during unnesting:
select a.c
from unnest(string_to_array('a__b___d_','_')) with ordinality as a(c,idx)
where nullif(trim(c), '') is not null
order by idx;
If you want that back as an array:
select array_agg(a.c order by a.idx)
from unnest(string_to_array('a__b___d_','_')) with ordinality as a(c,idx)
where nullif(trim(c), '') is not null;

Related

How to get string from value

I have an Customer_value column.
The column contains values like:
DAL123245,HC.533675,ABC.01232423
HC.3425364,ABC.045367544,DAL4346456
HC.35344,ABC.03543645754,ABC.023534454,DAL.4356433
ABC.043534553,HC.3453643,ABC.05746343
What I am trying to do is get the number after the first "ABC.0" string.
For example, this is what I would like to get:
1232423
5367544
3543645754
43534553
this is what I tried:
Substring(customer_value,charindex('ABC.', customer_value) + 5, len(customer_value)) as dataneeded
The issue that I got is for 1 and 2 I got that right data as needed, but for 3 and 4, because there are multiple ABC so it gave me everything after the first ABC.
How can I get the number after the first ABC. only?
Thank you so much
Just another option is to use a bit of JSON to parse and preserve the sequence in concert with a CROSS APPLY
Note: Use OUTER APPLY to see NULL values
Example
Select NewVal = replace(Value,'ABC.0','')
From YourTable A
Cross Apply (
Select Top 1 *
From OpenJSON( '["'+replace(string_escape(customer_value,'json'),',','","')+'"]' )
Where Value like 'ABC.0%'
Order by [key]
) B
Results
NewVal
1232423
45367544
3543645754
43534553
On the assumption you are using SQL Server (given your use of charindex()/substring()/len()) you can use apply to calculate the starting position and then find the next occurence utilising the start position optional parameter of charindex, then get the substring between the values.
select Substring(customer_value, p1.v, Abs(p2.v-p1.v)) as dataneeded
from t
cross apply(values(charindex('ABC.', customer_value)+5))p1(v)
cross apply(values(charindex(',', customer_value,p1.v)))p2(v)

SQL Array with Null

I'm trying to group BigQuery columns using an array like so:
with test as (
select 1 as A, 2 as B
union all
select 3, null
)
select *,
[A,B] as grouped_columns
from test
However, this won't work, since there is a null value in column B row 2.
In fact this won't work either:
select [1, null] as test_array
When reading the documentation on BigQuery though, it says Nulls should be allowed.
In BigQuery, an array is an ordered list consisting of zero or more
values of the same data type. You can construct arrays of simple data
types, such as INT64, and complex data types, such as STRUCTs. The
current exception to this is the ARRAY data type: arrays of arrays are
not supported. Arrays can include NULL values.
There doesn't seem to be any attributes or safe prefix to be used with ARRAY() to handle nulls.
So what is the best approach for this?
Per documentation - for Array type
Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.
So, as of your example - you can use below "trick"
with test as (
select 1 as A, 2 as B union all
select 3, null
)
select *,
array(select cast(el as int64) el
from unnest(split(translate(format('%t', t), '()', ''), ', ')) el
where el != 'NULL'
) as grouped_columns
from test t
above gives below output
Note: above approach does not require explicit referencing to all involved columns!
My current solution---and I'm not a fan of it---is to use a combo of IFNULL(), UNNEST() and ARRAY() like so:
select
*,
array(
select *
from unnest(
[
ifnull(A, ''),
ifnull(B, '')
]
) as grouping
where grouping <> ''
) as grouped_columns
from test
An alternative way, you can replace NULL value to some NON-NULL figures using function IFNULL(null, 0) as given below:-
with test as (
select 1 as A, 2 as B
union all
select 3, IFNULL(null, 0)
)
select *,
[A,B] as grouped_columns
from test

How to get maximum value of a specific part of strings?

I have below records
Id Title
500006 FS/97/98/037
500007 FS/97/04/035
500008 FS/97/01/036
500009 FS/97/104/040
I should split Title field and get 4th part of text and return maximum value. In this example my query should return 040 or 40.
select max(cast(right(Title, charindex('/', reverse(Title) + '/') - 1) as int))
from your_table
SQLFiddle demo
You can use PARSENAME function since you always have 4 parts(confirmed in comments section)
select max(cast(parsename(replace(Title,'/','.'),1) as int))
from yourtable
If you want to split the data in the Title column and get the part from the splitted text by position, you may try with one JSON-based approach with a simple string transformation. You need to transform the data in the Title column into a valid JSON array (FS/97/98/037 into ["FS","97","08","037"]) and after that to parse thе data with OPENJSON(). The result from OPENJSON() (using default schema and parsing JSON array) is a table with columns key, value and type, and the key column holds the index of the items in the JSON array:
Note, that using STRING_SPLIT() is not an option here, because the order of the returned rows is not guaranteed.
Table:
CREATE TABLE Data (
Id varchar(6),
Title varchar(50)
)
INSERT INTO Data
(Id, Title)
VALUES
('500006', 'FS/97/98/037'),
('500007', 'FS/97/04/035'),
('500008', 'FS/97/01/036'),
('500009', 'FS/97/104/040')
Statement:
SELECT MAX(j.[value])
FROM Data d
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(d.Title, '/', '","'), '"]')) j
WHERE (j.[key] + 1) = 4
If you data has fixed format with 4 parts, even this approach may help:
SELECT MAX(PARSENAME(REPLACE(Title, '/', '.'), 1))
FROM Data
You can also try the below query.
SELECT Top 1
CAST('<x>' + REPLACE(Title,'/','</x><x>') + '</x>' AS XML).value('/x[4]','int') as Value
from Data
order by 1 desc
You can find the live demo Here.

SQL Summing digits of a number

i'm using presto. I have an ID field which is numeric. I want a column that adds up the digits within the id. So if ID=1234, I want a column that outputs 10 i.e 1+2+3+4.
I could use substring to extract each digit and sum it but is there a function I can use or simpler way?
You can combine regexp_extract_all from #akuhn's answer with lambda support recently added to Presto. That way you don't need to unnest. The code would be really self explanatory if not the need for cast to and from varchar:
presto> select
reduce(
regexp_extract_all(cast(x as varchar), '\d'), -- split into digits array
0, -- initial reduction element
(s, x) -> s + cast(x as integer), -- reduction function
s -> s -- finalization
) sum_of_digits
from (values 1234) t(x);
sum_of_digits
---------------
10
(1 row)
If I'm reading your question correctly you want to avoid having to hardcode a substring grab for each numeral in the ID, like substring (ID,1,1) + substring (ID,2,1) + ...substring (ID,n,1). Which is inelegant and only works if all your ID values are the same length anyway.
What you can do instead is use a recursive CTE. Doing it this way works for ID fields with variable value lengths too.
Disclaimer: This does still technically use substring, but it does not do the clumsy hardcode grab
WITH recur (ID, place, ID_sum)
AS
(
SELECT ID, 1 , CAST(substring(CAST(ID as varchar),1,1) as int)
FROM SO_rbase
UNION ALL
SELECT ID, place + 1, ID_sum + substring(CAST(ID as varchar),place+1,1)
FROM recur
WHERE len(ID) >= place + 1
)
SELECT ID, max(ID_SUM) as ID_sum
FROM recur
GROUP BY ID
First use REGEXP_EXTRACT_ALL to split the string. Then use CROSS JOIN UNNEST GROUP BY to group the extracted digits by their number and sum over them.
Here,
WITH my_table AS (SELECT * FROM (VALUES ('12345'), ('42'), ('789')) AS a (num))
SELECT
num,
SUM(CAST(digit AS BIGINT))
FROM
my_table
CROSS JOIN
UNNEST(REGEXP_EXTRACT_ALL(num,'\d')) AS b (digit)
GROUP BY
num
;

pgsql parse string to get a string after certain position

I have a table column that has data like
NA_PTR_51000_LAT_CO-BOGOTA_S_A
NA_PTR_51000_LAT_COL_M_A
NA_PTR_51000_LAT_COL_S_A
NA_PTR_51000_LAT_COL_S_B
NA_PTR_51000_LAT_MX-MC_L_A
NA_PTR_51000_LAT_MX-MTY_M_A
I want to parse each column value so that I get the values in column_B. Thank you.
COLUMN_A COLUMN_B
NA_PTR_51000_LAT_CO-BOGOTA_S_A CO-BOGOTA
NA_PTR_51000_LAT_COL_M_A COL
NA_PTR_51000_LAT_COL_S_A COL
NA_PTR_51000_LAT_COL_S_B COL
NA_PTR_51000_LAT_MX-MC_L_A MX-MC
NA_PTR_51000_LAT_MX-MTY_M_A MX-MTY
I'm not sure of the Postgresql and I can't get SQL fiddle to accept the schema build...
substring and length may vary...
Select Column_A, substr(columN_A,18,length(columN_A)-17-4) from tableName
Ok how about this then:
http://sqlfiddle.com/#!15/ad0dd/56/0
Select column_A, b
from (
Select Column_A, b, row_number() OVER (ORDER BY column_A) AS k
FROM (
SELECT Column_A
, regexp_split_to_table(Column_A, '_') b
FROM test
) I
) X
Where k%7=5
Inside out:
Inner most select simply splits the data into multiple rows on _
middle select adds a row number so that we can use the use the mod operator to find all occurances of a 5th remainder.
This ASSUMES that the section of data you're after is always the 5th segment AND that there are always 7 segments...
Use regexp_matches() with a search pattern like 'NA_PTR_51000_LAT_(.+)_'
This should return everything after NA_PTR_51000_LAT_ before the next underscore, which would match the pattern you are looking for.