Big Query Regexp_Extract using Google Analytics url - google-bigquery

How do I extract the id parameter below using Big Query Regexp_Extract some rows with page urls in them that look similar to :
url.com/id=userIDmadeUPofletterandnumbers&em=MemberType
eg url.com/id=asd1221231sf&em=studentMember
I have tried using:
a. REGEXP_EXTRACT(urlValue,"id=\w+") as Idvalue but I get the error message:
Invalid string literal: "id=\w+"
I am pretty close with this: REGEXP_EXTRACT(urlValue,"(id=.*&em)") however it shows me id=asd1221231sf&em and I want to exclude id= and &em at the end

#standardSQL
WITH `project.dataset.table` AS (
SELECT 'url.com/id=userIDmadeUPofletterandnumbers&em=MemberType' urlValue UNION ALL
SELECT 'url.com/id=asd1221231sf&em=studentMember'
)
SELECT REGEXP_EXTRACT(urlValue, r'id=(\w+)') id, urlValue
FROM `project.dataset.table`
Row id urlValue
1 userIDmadeUPofletterandnumbers url.com/id=userIDmadeUPofletterandnumbers&em=MemberType
2 asd1221231sf url.com/id=asd1221231sf&em=studentMember

Related

How to extract text from jsonb array in PostgreSQL

I want to be able to extract text from jsonb array in a new column as text.
This is my SQL table.
Id.
ErrorCode.
101
["exit code: 1"]
102
["exit code: 3"]
103
["OOMKILLED"]
This is my column definition '[]'::jsonb
I needed help understanding which select command I could use in that case. I used the query above but with no success.
Select reasons -> ' ' as TTT from my_table
I want to get these results after a select command., so I can do SQL filters like
where error = 'exit code: 1' or where error = 'OOMKILLED'
Id.
Error
101
exit code: 1
102
exit code: 3
103
OOMKILLED
Use ->> to extract the first array element as text:
Select id, reasons ->> 0 as reason
from my_table
where reasons ->> 0 in ('exit code: 1','OOMKILLED');
If you don't want to repeat the expression use a derived table:
select *
from (
select id, reasons ->> 0 as reason
from my_table
)
where reason in ('exit code: 1','OOMKILLED');
try this :
SELECT Id, jsonb_array_elements_text(ErrorCode :: jsonb) AS Error
FROM my_table
For more info about json functions see the manual
When you need check is JSONB array contains some value, just use ? operator:
select * from t where err ? 'OOMKILLED';
https://sqlize.online/s/eW

SparkSQL regexp_extract function java error

I am trying to extract the id's starting with srsa from the table structure below
id reason_text_field
34394 {"initial_customer":"sda_WWyfr4AXY1fIAS", customer_result":"srsa_CAkAaAvNKL2OSD"}
in order to get the following output:
id srsa_id
34394 srsa_CAkAaAvNKL2OSD
but when I use the following SparkSQL function
REGEXP_EXTRACT(reason_text_field, 'srsa[^"]*') as srsa_id
I get this error:
java.lang.IndexOutOfBoundsException: No group
You need to specify the group to capture. Try this:
SELECT id,
REGEXP_EXTRACT(reason_text_field, '\"(srsa[^"]*)\"', 1) as srsa_id
-- or REGEXP_EXTRACT(reason_text_field, 'srsa[^"]*', 0) as srsa_id
FROM tb
Note however that you can also convert the text column reason_text_field into a map or struct using from_json then extract the field customer_result:
SELECT id,
from_json(reason_text_field, 'map<string,string>')['customer_result'] as srsa_id
FROM tb

Searching a range when using like %

I'm searching SQL Server using the following and I want to find a way to reduce the query size when it comes to the range of postal codes being searched:
SELECT TOP (100) *
FROM XXXX (NOLOCK)
WHERE (Request like '%<BillCountry>US</BillCountry>%')
AND (Request like '%<BillPostal>83%' OR Request like '%<BillPostal>84%' OR Request like '%<BillPostal>85%' OR Request like '%<BillPostal>86%' OR Request like '%<BillPostal>87%' OR Request like '%<BillPostal>91%' OR Request like '%<BillPostal>92%' OR Request like '%<BillPostal>93%' OR Request like '%<BillPostal>94%')
AND (CreatedUTC between '2022-02-01' and '2022-03-01')
ORDER BY CreatedUTC DESC
The <BillPostal>XXXXX</BillPostal> is deep inside a saved XML response.
I'm searching for a range of BillPostal such as 83XXX-87XXX and 91XXX-94XXX. Maybe this is the only way?
In Sql Server you can use a character class [] in the pattern syntax for LIKE/PATINDEX.
So the criteria for Request can be golfcoded
SELECT TOP (100) *
FROM XXXX
WHERE (Request like '%<BillCountry>US</BillCountry>%')
AND (Request like '%<BillPostal>8[3-7]%'
OR Request like '%<BillPostal>9[1-4]%')
AND (CreatedUTC between '2022-02-01' and '2022-03-01')
ORDER BY CreatedUTC DESC;
You could offload the bulk of your criteria to an exists check with the lookup terms in a separate table. An example would be:
with lookups as (
select '<BillPostal>83' term union all
select '<BillPostal>84' union all
select '<BillPostal>85'
), testdata as (
select '<xml><element><BillPostal>85</billpostal></element></xml>' col union all
select '<xml><element><BillPostal>81</billpostal></element></xml>' union all
select '<xml><element><BillPostal>86</billpostal></element></xml>' union all
select '<xml><element><BillPostal>84</billpostal></element></xml>'
)
select *
from testdata
where exists (select * from lookups where CharIndex(term,col) > 0);

Regex extract in BigQuery issue

I'm trying to simplify a column in BigQuery by using BigQuery extract on it but I am having a bit of an issue.
Here are two examples of the data I'm extracting from:
dc_pre=CLXk_aigyOMCFQb2dwod4dYCZw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=OVERDRFT;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.bank.co.za/onlineContent/ga_bridge.html
dc_pre=COztt4-tyOMCFcji7Qod440PCw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=DDA13;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.support.co.za/onlineContent/ga_bridge.html
I want to extract the portion between ;u1= and ;u2
Running the following legacy SQL Query
SELECT
Date(Event_Time),
Activity_ID,
REGEXP_EXTRACT(Other_Data, r'(?<=u1=)(.*\n?)(?=;u2)')
FROM
[sprt-data-transfer:dtftv2_sprt.p_activity_166401]
WHERE
Activity_ID in ('8179851')
AND Site_ID_DCM NOT IN ('2134603','2136502','2539719','2136304','2134604','2134602','2136701','2378406')
AND Event_Time BETWEEN 1563746400000000 AND 1563832799000000
I get the error...
Failed to parse regular expression "(?<=u1=)(.*\n?)(?=;u2)": invalid
perl operator: (?<
And this is where my talent runs out, is the error being caused because I'm using legacy SQL? Or is an unsupported format for REGEX?
Just tried this, and it worked, but with "Standart SQL" enabled.
select
other_data,
regexp_extract(other_data, ';u1=(.+?);u2') as some_part
from
unnest([
'dc_pre=CLXk_aigyOMCFQb2dwod4dYCZw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=OVERDRFT;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.bank.co.za/onlineContent/ga_bridge.html',
'dc_pre=COztt4-tyOMCFcji7Qod440PCw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=DDA13;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.support.co.za/onlineContent/ga_bridge.html'
]) as other_data
Not using regex but it still works...
with test as (
select 1 as id, 'dc_pre=CLXk_aigyOMCFQb2dwod4dYCZw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=OVERDRFT;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.bank.co.za/onlineContent/ga_bridge.html' as my_str UNION ALL
select 2 as id, 'dc_pre=COztt4-tyOMCFcji7Qod440PCw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=DDA13;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.support.co.za/onlineContent/ga_bridge.html'
),
temp as (
select
id,
split(my_str,';') as items
from test
),
flattened as (
select
id,
split(i,'=')[SAFE_OFFSET(0)] as left_side,
split(i,'=')[SAFE_OFFSET(1)] as right_side
from temp
left join unnest(items) i
)
select * from flattened
where left_side = 'u1'

Appending in middle of text in column data oracle sql

I need to append the text in middle of text .
1 EX :existing text is : "Monoclonal Anti-FLAG, Clone 6F7"
Result :"<HIDE>Monoclonal Anti-<HIDE>FLAG, Clone 6F7".
2 EX :existing text is : "Anti-100-KD subunit"
Result : "<HIDE> Anti-</HIDE>100-KD subunit"
I want to append <HIDE> tag where ever Anti and Monoclonal term is coming.
select * from table where regexp_like(colnanme, '^(Monoclonal Anti|Anti)')
Can you suggest how should i write this .for select i have written query .
The predicted certain regular expression solution.
-- Start test data
with test_data as
(select 'Monoclonal Anti-FLAG, Clone 6F7' as test_string from dual union all
select 'Anti-100-KD subunit' from dual)
-- End test data
select test_string,
REGEXP_REPLACE(test_string,'(Monoclonal Anti-|Anti-)','<HIDE>\1</HIDE>')
from test_data;
"TEST_STRING" "Result"
"Monoclonal Anti-FLAG, Clone 6F7" "<HIDE>Monoclonal Anti-</HIDE>FLAG, Clone 6F7"
"Anti-100-KD subunit" "<HIDE>Anti-</HIDE>100-KD subunit"
you could use replace and like
select replace(colname, 'Anti', '"<HIDE> Anti-</HIDE> )
from table where colname like '%Anti%'
and colname not like 'Monoclonal Anti%';
of for a combined situation you can use a case when for lead replace
select case when colname like '%Monoclonal Anti%'
then replace(colname, 'Monoclonal Anti', '"<HIDE> Monoclonal Anti-</HIDE> )
when colname like '%Anti%'
then replace(colname, 'Anti', '"<HIDE> Anti-</HIDE> )
else colname
end
from table where colname like '%Anti%';