Incorrect regex_replace while decoding to Hebrew

Incorrect regex_replace while decoding to Hebrew - google-bigquery

The code below is not working because my regex_replace is not handling white spaces as it should. Help!
CREATE TEMP FUNCTION
decode(word string) AS ((
SELECT
IF
(STARTS_WITH(word, '&#x'),
safe.code_points_to_STRING(ARRAY(
SELECT
CAST(value AS int64)
FROM
UNNEST(SPLIT(REPLACE(REGEXP_REPLACE(word, '[^a-zA-Z0-9&#]', ''), '&#', '0'),';')) value
WHERE
NOT value = '' )),
word) ));
WITH
DATA AS (
SELECT
'שבחים לסוקולובסקי, האריס: ידענו שתהיה מלחמה' txt )
SELECT
(
SELECT
STRING_AGG(decode(word), ' ')
FROM
UNNEST(SPLIT(txt, ' ')) word ) AS Hebrew_txt
FROM
DATA ;
The expected result: שבחים לסוקולובסקי, האריס: ידענו שתהיה מלחמה

Consider below
create temp function decode(word string) as ((
select if(starts_with(word, '&#x'),
safe.code_points_to_string(array(
select ifnull(safe_cast(value as int64), ascii(value))
from unnest(split(replace(word, '&#', '0'),';')) value
where not value = ''
)),
word)
));
select (
select string_agg(decode(word), ' ')
from unnest(split(txt, ' ')) word
) as Hebrew_txt
from data
if applied to sample data in your question - output is

Related

PostgreSQL: Where condition using the type of elements in the Array

I have a column 'zips' with type 'text' in the table parcels.
User can fill either a single zip code, OR multiple comma separated zips, OR a range of zips separated by a hyphon.
Examples of possible datas are.
'10001'
'10002,10010,10015'
'10001,"10010-10025"'
I need to match the records with a zipcode '10015'.
eg:
select *
from parcels
where "10015" = ANY(string_to_array(parcels.zips, ','))
The Above code is working for the comma separated zips, but I am not sure about how to deal with the ranges.
I am looking for something like
select *
from parcels
where (
loop though `string_to_array(parcels.zips, ',')` and if iterating
variable contains '-', then 'where 10015 BETWEEN 10010 AND 10025'.
ELSE if zip doesn't contains '-', Then '10015' = '10001(other elements in the array)'
)
and combine the loop conditions with OR

try this :
SELECT *
FROM parcels p
CROSS JOIN LATERAL regexp_split_to_table (p.zips, ',') AS z
WHERE CASE
WHEN strpos (z, '-') > 0
THEN '10015' BETWEEN split_part (z, '-', 1) AND split_part (z, '-', 2)
ELSE z = '10015'
END

You can unnest the elements of the column and use them in an EXIST condition that checks for ranges:
select *
from parcels p
where exists (select *
from (
select split_part(trim(both '"' from z.zip), '-', 1) as from_zip,
split_part(trim(both '"' from z.zip), '-', 2) as to_zip
from unnest(string_to_array(p.zip_codes, ',')) as z(zip)
) x
where (x.to_zip = '' and x.from_zip = '10015')
or (x.to_zip <> '' and '10015' between x.from_zip and coalesce(x.to_zip, '10015'))
);
I would put this into a function to make that easier:
create function contains_zip(p_codes text, p_zip_code text)
returns boolean
as
$$
select exists
(select *
from (
select split_part(trim(both '"' from z.zip), '-', 1) as from_zip,
split_part(trim(both '"' from z.zip), '-', 2) as to_zip
from unnest(string_to_array(p_codes, ',')) as z(zip)
) x
where (x.to_zip = '' and x.from_zip = p_zip_code)
or (x.to_zip <> '' and p_zip_code between x.from_zip and coalesce(x.to_zip, p_zip_code))
);
$$
language sql
immutable;
Then it is as easy as:
select *
from parcels p
where contains_zip(p.zip_codes, '10015');
Online example

try to get your data in the format you need them, with some CTEs:
with _data as (
select * from (values(1,'10001'),(2,'10002,10010,10015'), (3,'10001,"10010-10025"')) as _vals (i,x)
),
_data2 as (
select
i,
unnest(string_to_array(x,','))as x
from _data
),
_data3 as (
select
i,
x,
replace(split_part(x,'-',1),'"','') as x1,
replace(split_part(x,'-',2),'"','') as x2
from _data2
)
select * from _data3
where
case when x2 = '' then x1::int = 10015 end
or
case when x2 <> '' then 10015 between(x1::int) and (x2::int) end

TRIM in bigquery

I want to apply TRIM function for my columns. But TRIM after Format function is not working. It's not trimming the spaces.
If I do it before format as below then it gives me error for datatype because the columns have other datatypes than string and byte as well.
Please tell me a solution for this.

Meantime, you can apply some extra processing on top of original query to get desired result - as in below example
select *,
trim(replace(regexp_replace(format('%t', t), r' *, *| *\)|\( *', '/'), '/NULL/', '/_/'), '/') HashColumn
from your_table t
if applied to sample data
with your_table as (
select ' 1' A, '2 ' B, null C, 4 D union all
select ' 12 ', null, '4', 5
)
output is

Consider below approach
create temp function json_extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));""";
select *,
( select string_agg(trim(value), '/')
from unnest(json_extract_values(replace(to_json_string(t), ':null', ':"_"'))) value
) as HashColumn
from your_table t
if applied to dummy data as below
with your_table as (
select ' 1' A, '2 ' B, null C, 4 D union all
select ' 12 ', null, '4', 5
)
output is
which, I hope, is exactly what you are looking for

Unicode not translating correctly for Right-to-left languages (Hebrew and Arabic)

The bigquery code below provided by Mikhail Berlyant (thank you again!) works well on left-to-right languages such as Russian. However, it fails on right-to-left languages such as Arabic and Hebrew whenever there is a double quotation mark <" "> inside the text to be translated. The expected result should show all input text-to-be-translated without unicode letters inside the translation. Thanks!
CREATE TEMP FUNCTION
decode(word string) AS ((
SELECT
IF
(STARTS_WITH(word, '&#x'),
safe.code_points_to_STRING(ARRAY(
SELECT
ifnull(SAFE_CAST(value AS int64),
ASCII(value))
FROM
UNNEST(SPLIT(REPLACE(word, '&#', '0'),';')) value
WHERE
NOT value = '' )),
word) ));
WITH
DATA AS (
SELECT
'Arabic' AS lang,
`'https://www.bbc.com/arabic/vert-fut-57352011'` AS url,
`'هل قوام "الساعة الرملية" يكسب المرأة جاذبية أكبر؟'` AS title
UNION ALL
SELECT
'Arabic',
`'https://www.bbc.com/arabic/world-57356844'`,
`'الكنيسة الكاثوليكية: كاردينال ألماني يقدم استقالته للبابا فرانسيس بسبب "الفشل" في التصدي للانتهاكات الجنسية بحق الأطفال'`
UNION ALL
SELECT
'Arabic',
`'https://arabic.cnn.com/world/article/2021/06/04/munich-cardinal-submits-resignation-pope'`,
`'كاردينال ميونيخ يقدم استقالته لبابا الفاتيكان بسبب "كارثة الاعتداء الجنسي" بالكنيسة الكاثوليكية'`
UNION ALL
SELECT
'Arabic',
`'https://alghad.com/%D9%88%D8%AA%D8%AA%D8%AC%D8%AF%D8%AF-%D8%A7%D9%84%D8%A2%D9%85%D8%A7%D9%84-%D8%A8%D8%B1%D8%AD%D9%8A%D9%84-%D9%86%D8%AA%D9%86%D9%8A%D8%A7%D9%87%D9%88-%D8%A7%D9%84%D9%85%D8%B1%D8%AA%D9%82%D8%A8-%D9%84/'`,
`'وتتجدد الآمال برحيل نتنياهو المرتقب لكن بحذر'`
UNION ALL
SELECT
'Arabic',
`'https://alghad.com/%D9%81%D9%88%D8%A8%D9%8A%D8%A7-%D8%A7%D9%84%D8%A7%D8%B1%D8%AA%D8%A8%D8%A7%D8%B7-%D8%A7%D9%86%D8%B3%D8%AD%D8%A7%D8%A8-%D9%88%D9%87%D8%B1%D9%88%D8%A8-%D9%81%D9%8A-%D8%A7%D9%84%D9%84%D8%AD%D8%B8/'`,
`'"فوبيا الارتباط".. انسحاب وهروب في اللحظات الأخيرة'`
UNION ALL
SELECT
'Hebrew',
`'https://www.srugim.co.il/568917-%D7%9C%D7%99%D7%90%D7%95%D7%9F-%D7%91%D7%90%D7%96%D7%9B%D7%A8%D7%94-%D7%9C%D7%A8%D7%91-%D7%90%D7%9C%D7%99%D7%94%D7%95-%D7%AA%D7%95%D7%A8%D7%AA%D7%95-%D7%94%D7%99%D7%99%D7%AA%D7%94-%D7%9B%D7%95%D7%9C'`,
`'ליאון באזכרה לרב אליהו: "תורתו הייתה כולה של י-ם"'`
UNION ALL
SELECT
'Hebrew',
`'https://celebs.walla.co.il/item/3439567'`,
`'הכוכבת הע-נ-קית שנתפסה בצעירותה עם סמים'`
UNION ALL
SELECT
'Hebrew',
`'https://www.srugim.co.il/568906-%D7%90%D7%9C%D7%A4%D7%99-%D7%9E%D7%A4%D7%92%D7%99%D7%A0%D7%99%D7%9D-%D7%9E%D7%95%D7%9C-%D7%91%D7%99%D7%AA%D7%94-%D7%A9%D7%9C-%D7%A9%D7%A7%D7%93-%D7%90%D7%9C-%D7%AA%D7%9C%D7%9B%D7%99-%D7%A0%D7%92'`,
`'אלפי מפגינים מול ביתה של שקד: "אל תלכי נגד ישראל"'`
UNION ALL
SELECT
'Russian',
`'https://tass.ru/kultura/11559939'`,
`'Фестиваль "Theatrum" открывается в Новом Манеже в Москве'`),
data2 AS (
SELECT
(
SELECT
AS STRUCT url AS url,
lang AS lang,
STRING_AGG(decode(word), ' ') AS translation
FROM
UNNEST(SPLIT(title, ' ')) word ) AS Foreign_txt
FROM
DATA )
SELECT
Foreign_txt.lang AS lang,
Foreign_txt.translation AS translation,
Foreign_txt.url AS url
FROM
data2;

Consider below example
create temp function decode(word string) as ((
select if(starts_with(word, '&#x'),
safe.code_points_to_string(array(
select ifnull(safe_cast(value as int64), ascii(value))
from unnest(split(replace(word, '&#', '0'),';')) value
where not value = ''
)),
word)
));
select id, lang,
( select string_agg(decode(chars), '' order by offset)
from unnest(regexp_extract_all(title, r'(?:&#x.{3};)+|[^&]+')) chars with offset
) as translate
from data
if applied to sample data in your question - output is

regexReplace using select

I want to use a regular expression to remove special characters (!, ", #, $,%, &, /. (,), =,?, |) from a table
SELECT
'|R!$#&2-_D%2' as Original,
UPPER
(
REPLACE
(
( MDS_Demo.mdq.regexReplace
('|R!2- _D%2',
'[!|”#$%&/()=?»«;,:._]', '', 0
)
)
, ' ', ' '
)
) as Correct
The list of characters and words to remove identified is in a table, so I wanted to replace the list of character identified in the expression and used a select to a table where is listed all special character to removed.
SELECT
'|R!$#&2-_D%2' as Original,
UPPER(REPLACE((MDS_Demo.mdq.regexReplace('|R!2- _D%2',
< SELECT SPECIAL_CHARACTERS FROM TABLE01 >
, '', 0)), ' ', ' ') ) as Correct
Any suggestions?

I believe you can replace any string expression with a (SELECT ...)
i.e. SELECT ltrim( (SELECT ' trimmed') ) as test works here
http://sqlfiddle.com/#!6/8222f/4
.. so where you have your < SELECT SPECIAL_CHARACTERS FROM TABLE01 > just put the required SELECT inside brackets and you're good to go?

How to return string if no space is found

I need to return everything in a string before the space:
select Substring('stack overflow', 1, CharIndex( ' ', 'stack overflow' ) - 1)
this will yield stack
however if we don't have a space in the data, i would like to return the entire string:
select Substring('stackoverflow', 1, CharIndex( ' ', 'stackoverflow' ) - 1)
i would like that to return stackoverflow
what is the correct way to handle this scenario?

;WITH T(C) AS
(
SELECT 'stack overflow' UNION ALL
SELECT 'stackoverflow'
)
SELECT LEFT(C, CharIndex( ' ', C + ' ' ) - 1)
FROM T

Borrowing #MartinSmith's definition:
;WITH T(C) AS
(
SELECT 'stack overflow'
UNION ALL
SELECT 'stackoverflow'
)
SELECT SUBSTRING(C, 1, COALESCE(NULLIF(CHARINDEX(' ', C)-1, -1), 255))
FROM T;
That said, I prefer Martin's. Both avoid checking the length or performing CASE etc., but mine assumes you know the max length of the string (here I assumed 255).

I'm late as per usual; here on SQL Fiddle
CREATE TABLE The_Table
(
TestString varchar(50)
);
INSERT INTO The_Table
(TestString)
VALUES
('stack overflow'),
('stackoverflow');
select
[myResult] = case
when CharIndex( ' ', TestString)> 0 then Substring(TestString, 1, CharIndex( ' ', TestString ) - 1)
when CharIndex( ' ', TestString)= 0 then TestString
else TestString
end
from The_Table

;With T(C) AS
(
Select 'Stack'
Union All
Select 'Stack OverFlow'
)
Select
Case When CharIndex(' ', C) > 0
Then SUBSTRING(C, 0, CharIndex(' ', C))
Else
C
End
From T

Declare #str varchar(100)='stack overflow'
SELECT CASE WHEN CHARINDEX(' ',#str,1) > 0 then LEFT(#str,CHARINDEX(' ',#str,1)) else #str END

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Incorrect regex_replace while decoding to Hebrew - google-bigquery

Related

PostgreSQL: Where condition using the type of elements in the Array

TRIM in bigquery

Unicode not translating correctly for Right-to-left languages (Hebrew and Arabic)

regexReplace using select

How to return string if no space is found

Categories

Resources