Find and replace pattern inside BigQuery string

Find and replace pattern inside BigQuery string - google-bigquery

Here is my BigQuery table. I am trying to find out the URLs that were displayed but not viewed.
create table dataset.url_visits(ID INT64 ,displayed_url string , viewed_url string);
select * from dataset.url_visits;
ID Displayed_URL Viewed_URL
1 url11,url12 url12
2 url9,url12,url13 url9
3 url1,url2,url3 NULL
In this example, I want to display
ID Displayed_URL Viewed_URL unviewed_URL
1 url11,url12 url12 url11
2 url9,url12,url13 url9 url12,url13
3 url1,url2,url3 NULL url1,url2,url3

Split the each string into an array and unnest them. Do a case to check if the items are in each other and combine to an array or a string.
Select ID, string_agg(viewing ) as viewed,
string_agg(not_viewing ) as not_viewed,
array_agg(viewing ignore nulls) as viewed_array
from (
Select ID ,
case when display in unnest(split(Viewed_URL)) then display else null end as viewing,
case when display in unnest(split(Viewed_URL)) then null else display end as not_viewing,
from (
Select 1 as ID, "url11,url12" as Displayed_URL, "url12" as Viewed_URL UNION ALL
Select 2, "url9,url12,url13", "url9" UNION ALL
Select 3, "url1,url2,url3", NULL UNION ALL
Select 4, "url9,url12,url13", "url9,url12"
),unnest(split(Displayed_URL)) as display
)
group by 1

Consider below approach
select *, (
select string_agg(url)
from unnest(split(Displayed_URL)) url
where url != ifnull(Viewed_URL, '')
) unviewed_URL
from `project.dataset.table`
if applied to sample data in your question - output is

Related

How to concatenate a conditional field and remove the same value

I am trying to create a column with a case statement, then concatenate the column. Here is an example code.
WITH base AS (
SELECT ID, Date, Action, case when (Date is null then Action || '**' else Action End) Action_with_no_date
FROM <Table_Name>
)
SELECT ID, "array_join"("array_agg"(DISTINCT Action_with_no_date), ', ') Action_with_no_date
FROM base
GROUP BY ID;
Basically, the Action_with_no_date will display the concatenation of values in Action with '**' string added to the values where Date is null for each ID
After I did this, I found an edge case.
If there is the same Action (i.e. play) taken for one ID, and if one action has date and the other one doesn't, then the output will have one play and one play** for the ID
However, I want this to display just one play with **.
Below is the example data for ID = 1
ID Date Action
1 1/2/22 read
1 1/3/22 play
1 NULL play
and expected result for the ID
ID Action_with_no_date
1 read, play**
How should I handle this?

You can calculate ** suffix if there is any row with null per id and action using analytic max() with case expression. Then concatenate suffix with action.
Demo:
with mytable as (
SELECT * FROM (
VALUES
(1, '1/2/22', 'read'),
(1, '1/3/22', 'play'),
(1, NULL, 'play')
) AS t (id, date, action)
)
select id, array_join(array_agg(DISTINCT action||suffix), ', ')
from
(
select id, date, action,
max(case when date is null then '**' else '' end) over(partition by id, action) as suffix
from mytable
)s
group by id
Result:
1 play**, read

Aggregate on a non group by column check if any value matches a criteria

Let's say I have a table Category with columns
id, childCategory, hasParts
Let's say I want to group by id and check if any value in hasParts has value true.
How to do this efficiently?

this has got to be the most vague post that i've seen on here but i'll take a stab at it. based on my own imagination and the 3 sentences provided, here we go:
create table category (id int, childcategory nvarchar(25), hasparts bit)
insert category
select 1, 'stroller', 1
union all
select 1, 'rocker', 1
union all
select 2, 'car', 0
union all
select 2, 'doll', 0
union all
select 3, 'nasal sprayer', 0
union all
select 3, 'thermometer', 1
select *,
case when exists (select 1 from category b where a.id = b.id and b.hasparts = 1) then 'has true value' end as truecheck
from
(
select id, count(*) as inventory
from category
group by id
) a
drop table category
this should theoretically get you want you want. adjust as needed.

how to convert jsonarray to multi column from hive

example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!

This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14

Split a Column with Delimited Values and Compare Each Value

I have a column that contains multiple values in a delimited(comma-separated) format -
id | code
------------
1 11,19,21
2 55,87,33
3 3,11
4 11
I want to be able to compare to each value inside the 'code' column as below -
SELECT id FROM myTbl WHERE code = '11'
This should return -
1
3
4
I've tried the solution below but it does not work for all cases -
SELECT id FROM myTbl WHERE POSITION('11' IN code) <> 0
This will work with a 2 digit number like '11' as it will return a value that is <> 0 if it finds a match. But it will fail when searching for say '3' because rows with 'id' 2 and 3 both will be returned.
Here is link that talks about the POSITION function in REDSHIFT.
Any other approach that will solve this problem?

you can get the count of this string
SELECT id FROM myTbl WHERE regexp_count(user_action, '[11]') > 0

I think we can use regexp_substr() as follow.
select tb .id from myTbl tb where '11' in (
select regexp_substr( (select code from myTbl where id=tb.id),'[^,]+', 1, LEVEL) from dual
connect by regexp_substr((select code from myTbl where id=tb.id) , '[^,]+', 1, LEVEL) is not null);
just try this.

Use split_part() function
SELECT distinct id
FROM myTbl
WHERE '11' in ( split_part( code||',' , ',', 1 ),
split_part( code||',' , ',', 2 ),
split_part( code||',' , ',', 3 ) )

This is a very, very bad data model. You should be storing this information in a junction/association table, with one row per value.
But, if you have no choice, you can use like:
SELECT id
FROM myTbl
WHERE ',' || code || ',' LIKE '%,11,%';

How to use regex to split using last occurrence of forward slash in BigQuery

I have sample data as
with temp_table as
(
select "/category/sub-category/title-of-the-page" as pagename
union all
select "premier-league/splash"
union all
select "portal"
union all
select "news/1970/01/01/new-billion"
union all
select "/premier-league/transfers/"
union all
select "/premier-league/tfflive"
)
, clean_pagename as
(
select * ,
if (regexp_contains(pagename, "^/+" ) , regexp_extract(pagename, "^/+(.*)/?$") , pagename) as clean_page
from temp_table
)
, dated_content as
(
select *, if (
regexp_contains(clean_page , "/[0-9][0-9][0-9][0-9]/[0-9][0-9]/[0-9][0-9]/") ,
regexp_replace(clean_page , "[0-9][0-9][0-9][0-9]/[0-9][0-9]/[0-9][0-9]", "dated-content" ),
clean_page
) as new_pagename
from clean_pagename
)
,category_and_titles as
(
select *, split(new_pagename, "/")[offset(0)] as page_category,
coalesce(REGEXP_EXTRACT(new_pagename, r'/([^/]+)?$') , "no-title") as title,
regexp_replace(new_pagename, r'[^/]+$', "") as path
from dated_content
)
select pagename,
page_category ,
path,
title
from category_and_titles
Here is what I am doing - I remove the first / in the string and replace date-content using a regex. Next I would like to extract 3 things
category - first section of the string before first /
path - that component of string from 0 until last / has been encountered
title - everything after last / in the string.
There are instances where / is not present at all (record #3). In this case I want all the 3 parts to be equal to original string.
For example - for string as /premier-league/transfers/, I would like my output to be -
category = "premier-league" , path = "premier-league/transfers/" , title = ""
My current code gives me results as
Whereas, I need -

Without much refactoring and leaving all your original logic intact - just do below changes for category_and_titles CTE
...
, category_and_titles AS (
SELECT *,
SPLIT(new_pagename, "/")[OFFSET(0)] AS page_category,
IF(REGEXP_CONTAINS(new_pagename, r'/'), REGEXP_REPLACE(new_pagename, r'[^/]+$', ""), new_pagename) AS path,
IF(REGEXP_CONTAINS(new_pagename, r'/'), COALESCE(REGEXP_EXTRACT(new_pagename, r'/([^/]+)?$'), "no-title"), new_pagename) AS title
FROM dated_content
)
...
with this minor change result will be as expected

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find and replace pattern inside BigQuery string - google-bigquery

Consider below approach select *, ( select string_agg(url) from unnest(split(Displayed_URL)) url where url != ifnull(Viewed_URL, '') ) unviewed_URL from `project.dataset.table` if applied to sample data in your question - output is

Related

How to concatenate a conditional field and remove the same value

Aggregate on a non group by column check if any value matches a criteria

how to convert jsonarray to multi column from hive

Split a Column with Delimited Values and Compare Each Value

How to use regex to split using last occurrence of forward slash in BigQuery

Categories

Resources