BQ function to transform caron character from a string to character without caron - google-bigquery

I am trying to get rid of a caron characters (č,ć,š,ž) and transform them to c,c,s,z :
with my_table as (
select 'abcčd sštuv' caron, 'abccd sstuv' no_caron union all
select 'uvzž', 'uvzz' union all
select 'ijkl cČd', 'ijkl cCd' union ALL
select 'Ćdef', 'Cdef'
)
SELECT *
FROM my_table
I was trying to get rid of them with
SELECT *,
regexp_replace(caron, r'š', 's') as no_caron
FROM my_table
but I think this is inefficient. I know there is an option to write your on function as described
here, but I have no idea how to use it in my case.
Thanks in advance!

Use below
SELECT *, regexp_replace(normalize(caron, NFD), r"\pM", '') output
FROM my_table
if applied to sample data in your question - output is

Related

How to fin and extract substring BIGQUERY

A have a string column at BigQuery table for example:
name
WW_for_all_feed
EU_param_1_for_all_feed
AU_for_all_full_settings_18+
WW_for_us_param_5_for_us_feed
WW_for_us_param_5_feed
WW_for_all_25+
and also have a list of variables, for example :
param_1_for_all
param_5_for_us
param_5
full_settings
And if string at column "name" contains one of this substrings needs to extract it :
name
param
WW_for_all_feed
None
EU_param_1_for_all_feed
param_1_for_all
AU_for_all_full_settings_18+
full_settings
WW_for_us_param_5_for_us_feed
param_5_for_us
WW_for_us_param_5_feed
param_5
WW_for_all_25+
None
I want to try regexp and replace, but don't know pattern for find substring
Use below
select name, param
from your_table
left join params
on regexp_contains(name, param)
if apply to sample data as in your question
with your_table as (
select 'WW_for_all_feed' name union all
select 'EU_param_1_for_all_feed' union all
select 'AU_for_all_full_settings_18+' union all
select 'WW_for_us_param_5_for_us_feed' union all
select 'WW_for_all_25+'
), params as (
select 'param_1_for_all' param union all
select 'param_5_for_us' union all
select 'full_settings'
)
output is
but I have an another issue (updated question) If one of params is substring for another?
use below then
select name, string_agg(param order by length(param) desc limit 1) param
from your_table
left join params
on regexp_contains(name, param)
group by name
if applied to your updated data sample - output is

sql repeat regex pattern unlimited times

I need to select where column contains numbers only and ends with a hyphen
I'm running SQL Server Management Studio v17.9.1
I have tried:
select * from [table] where [column] like '[0-9]*-'
select * from [table] where [column] like '[0-9]{1,}-'
select * from [table] where [column] like '[0-9]{1,2}-'
none of these work. The expression ([0-9]*-) works in any regex tester I've run it against, SQL just doesn't like it, nor the other variations I've found searching.
You can filter where any but the last character are not numbers and the last is a dash. DATALENGTH/2 assumes NVARCHAR type. If you're using VARCHAR, just use DATALENGTH
SELECT
*
FROM
[table]
WHERE
[column] like '%-'
AND
LEFT([column], (datalength([column])/2)-1) NOT LIKE '%[^0-9]%'
SQL Server does not support regular expressions -- just very limited extensions to like functionality.
One method is:
where column like '%-' and
column not like '%[^0-9]%-'
You can use left() and right() functions as below :
with [table]([column]) as
(
select '1234-' union all
select '{123]' union all
select '1234' union all
select '/1234-' union all
select 'test-' union all
select '1test-' union all
select '700-'
)
select *
from [table]
where left([column],len([column])-1) not like '%[^0-9]%'
and right([column],1)='-';
column
------
1234-
700-
Demo

Regex pattern inside REPLACE function

SELECT REPLACE('ABCTemplate1', 'Template\d+', '');
SELECT REPLACE('ABC_XYZTemplate21', 'Template\d+', '');
I am trying to remove the part Template followed by n digits from a string. The result should be
ABC
ABC_XYZ
However REPLACE is not able to read regex. I am using SQLSERVER 2008. Am I doing something wrong here? Any suggestions?
SELECT SUBSTRING('ABCTemplate1', 1, CHARINDEX('Template','ABCTemplate1')-1)
or
SELECT SUBSTRING('ABC_XYZTemplate21',1,PATINDEX('%Template[0-9]%','ABC_XYZTemplate21')-1)
More generally,
SELECT SUBSTRING(column_name,1,PATINDEX('%Template[0-9]%',column_name)-1)
FROM sometable
WHERE PATINDEX('%Template[0-9]%',column_name) > 0
You can use substring with charindex or patindex if the pattern being looked for is fixed.
select SUBSTRING('ABCTemplate1',1, CHARINDEX ( 'Template' ,'ABCTemplate1')-1)
My answer expects that "Template" is enough to determine where to cut the string:
select LEFT('ABCTemplate1', CHARINDEX('Template', 'ABCTemplate1') - 1)
Using numbers table..
;with cte
as
(select 'ABCTemplate1' as string--this can simulate your table column
)
select * from cte c
cross apply
(
select replace('ABCTemplate1','template'+cast(n as varchar(2)),'') as rplcd
from
numbers
where n<=9
)
b
where c.string<>b.rplcd
Using Recursive CTE..
;with cte
as
(
select cast(replace('ABCTemplate21','template','') as varchar(100)) as string,0 as num
union all
select cast(replace(string,cast(num as varchar(2)),'') as varchar(100)),num+1
from cte
where num<=9
)
select top 1 string from cte
order by num desc

Oracle : how to order hexadecimal field

Silly question but I can't find a reasonable answer.
I need to order a field containing hexadecimal values like :
select str from
(
select '2212A' str from dual union all
select '2212B' from dual union all
select '22129' from dual union all
select '22127' from dual union all
select '22125' from dual union all
select '22126' from dual
) t
order by str asc;
This request give :
STR
------------
2212A
2212B
22125
22126
22127
22129
I would like
STR
------------
22125
22126
22127
22129
2212A
2212B
How can I do that ?
Are these HEX numbers? Will the max letter be F? Then convert hex to decimal:
select str
from t
order by to_number(str,'XXXXXXXXXXXX');
EDIT: Stupid me. The title says it's hex numbers :P So this solution should work for you.
You have to clarify further what you want to achieve, but in general, you can sort your table on that column and the first row is the one with smallest value:
SELECT * FROM mytable ORDER BY mycolumn
If you want one record only:
SELECT * FROM mytable ORDER BY mycolumn WHERE rownum = 1
You can try something like a try_convert() in case the conversion fails.
SELECT * FROM table_name ORDER BY TRY_CONVERT(VARBINARY, column_name) ASC;
Hope this helps!

Help with T-sql special sorting rules

I have a field like:
SELECT * FROM
(
SELECT 'A9t' AS sortField UNION ALL
SELECT 'A10t' UNION ALL
SELECT 'A11t' UNION ALL
SELECT 'AB9F' UNION ALL
SELECT 'AB10t' UNION ALL
SELECT 'AB11t'
) t ORDER BY sortField
and the result is:
sortField
---------
A10t
A11t
A9t
AB10t
AB11t
AB9F
Actually I need is to combine the string and number sorting rules:
sortField
---------
A9t
A10t
A11t
AB9F
AB10t
AB11t
SELECT *
FROM (
SELECT 'A9t' AS sortField UNION ALL
SELECT 'A10t' UNION ALL
SELECT 'A11t' UNION ALL
SELECT 'AB9F' UNION ALL
SELECT 'AB10t' UNION ALL
SELECT 'AB11t'
)
t
ORDER BY LEFT(sortField,PATINDEX('%[0-9]%',sortField)-1) ,
CAST(substring(sortField,PATINDEX('%[0-9]%',sortField),1 + PATINDEX('%[0-9][A-Z]%',sortField) -PATINDEX('%[0-9]%',sortField) ) AS INT),
substring(sortField,PATINDEX('%[0-9][A-Z]%',sortField) + 1,LEN(sortField))
If the first character is always a letter, try:
SELECT * FROM
(
SELECT 'A9t' AS sortField UNION ALL
SELECT 'A10t' UNION ALL
SELECT 'A11t'
) t ORDER BY substring(sortField,2,len(sortField)-1) desc
I would say that you have combined the alpha and numeric sort. But what I think you're asking is that you want to sort letters in ascending order and numbers in descending order, and that might be hard to do in a nice looking way. The previous answers will not working for your problem, the problem is that Martin Smith's solution doesn't take strings with two letters as prefix and Parkyprg doesn't sort numbers before letters as you ask for.
What you need to do is to use a custom order, see example here: http://www.emadibrahim.com/2007/05/25/custom-sort-order-in-a-sql-statement/, but that is a tedious way to do it.
EDIT: Martins Smith's solution is updated and works just fine!