How to Pass list of words into SQL 'LIKE' operator - sql

Iam trying to pass a list of words into SQL Like operator.
The query is to return column called Customer Issue where Customer Issue matches any word in the above list.
my_list =['air con','no cold air','hot air','blowing hot air']
SELECT customer_comments
FROM table
where customer_comments like ('%air con%') #for single search
How do i pass my_list above?

Regular expression can help here. Other solution is using unnest. Which is given already.
SELECT customer_comments
FROM table
where REGEXP_CONTAINS(lower(customer_comments), r'air con|no cold air|hot air|blowing hot air');

A similiar question was answered on the following, works for SQL Server:
Combining "LIKE" and "IN" for SQL Server
Basically you'll have to chain a bunch of 'OR' conditions.

Based on the post #Jordi shared, I think below query can be an option in BigQuery.
query:
SELECT DISTINCT customer_comments
FROM sample,
UNNEST(['air con','no cold air','hot air','blowing hot air']) keyword
WHERE INSTR(customer_comments, keyword) <> 0;
output:
with sample:
CREATE TEMP TABLE sample AS
SELECT * FROM UNNEST(['air conditioner', 'cold air', 'too hot air']) customer_comments;

Consider below
with temp as (
select ['air con','no cold air','hot air','blowing hot air'] my_list
)
select customer_comments
from your_table, (
select string_agg(item, '|') list
from temp t, t.my_list item
)
where regexp_contains(customer_comments, r'' || list)
There are myriad ways to refactor above based on your specific use case - for example
select customer_comments
from your_table
where regexp_contains(customer_comments, r'' ||
array_to_string(['air con','no cold air','hot air','blowing hot air'], '|')
)

Related

How to split column by delimiter on Google BigQuery

I've a column of emails and I'd like to split it into two columns using # as a delimiter.
Table:
Expected outcome
Try
select split(email, "#")[ofsset(0)],split(email, "#")[ofsset(1)]
Consider yet another [non-orthodox and maybe even silly but hopefully fun and exposing some extra features of BigQuery] approach
select * from (
select * from your_table,
unnest(regexp_extract_all(email, r'[^#]+')) piece with offset
)
pivot (min(piece) as email for offset in (0, 1))
if applied to sample data in your question - output is
Try SPLIT with subsequent OFFSET:
SELECT SPLIT(email, '#')[OFFSET(0)] as email1, SPLIT(email, '#')[OFFSET(1)] as email2
FROM mytable

Regex: how to get the text between a few colons?

So, i have a lot of strings like the ones below in my database:
product1:1stparty:single_aduls:android:
product2:3rdparty:married_adults:ios:
product3:3rdparty:other_adults:android:
I need a regex to get only the text after the product name and before the device category. So, in the first line I'd get 1stparty:single_aduls, in the second 3rdparty:married_adults and in the third 3rdparty:other_adults. I'm stuck and can't find a way to solve that. Could anyone help me please?
As a regular expression, you can use:
select regexp_extract('product1:1stparty:single_aduls:android:', '^[^:]*:(.*):[^:]*:$')
This returns every after the first colon and before the penultimate colon.
We can try using REGEXP_REPLACE here:
SELECT REGEXP_REPLACE(val, r"^.*?:|:[^:]+:$", "") AS output
FROM yourTable;
This approach removes either the leading ...: or trailing :...: from the column, leaving behind the content you want. Here is a demo showing that the regex replacement is working:
Demo
You can also use standard split function and access result array element by index, which is quite clear to read and understand.
with a as (
select split('product1:1stparty:single_aduls:android:', ':') as splitted
)
select splitted[ordinal(2)] || ':' || splitted[ordinal (3)] as subs
from a
Consider below example
with your_table as (
select 'product1:1stparty:single_aduls:android:' txt union all
select 'product2:3rdparty:married_adults:ios:' union all
select 'product3:3rdparty:other_adults:android:'
)
select *,
(
select string_agg(part, ':' order by offset)
from unnest(split(txt, ':')) part with offset
where offset in (1, 2)
) result
from your_table
with output

How to easily remove count=1 on aliased field in SQL?

I have the following data in a table:
GROUP1|FIELD
Z_12TXT|111
Z_2TXT|222
Z_31TBT|333
Z_4TXT|444
Z_52TNT|555
Z_6TNT|666
And I engineer in a field that removes the leading numbers after the '_'
GROUP1|GROUP_ALIAS|FIELD
Z_12TXT|Z_TXT|111
Z_2TXT|Z_TXT|222
Z_31TBT|Z_TBT|333 <- to be removed
Z_4TXT|Z_TXT|444
Z_52TNT|Z_TNT|555
Z_6TNT|Z_TNT|666
How can I easily query the original table for only GROUP's that correspond to GROUP_ALIASES with only one Distinct FIELD in it?
Desired result:
GROUP1|GROUP_ALIAS|FIELD
Z_12TXT|Z_TXT|111
Z_2TXT|Z_TXT|222
Z_4TXT|Z_TXT|444
Z_52TNT|Z_TNT|555
Z_6TNT|Z_TNT|666
This is how I get all the GROUP_ALIAS's I don't want:
SELECT GROUP_ALIAS
FROM
(SELECT
GROUP1,FIELD,
case when instr(GROUP1, '_') = 2
then
substr(GROUP1, 1, 2) ||
ltrim(substr(GROUP1, 3), '0123456789')
else
substr(GROUP1 , 1, 1) ||
ltrim(substr(GROUP1, 2), '0123456789')
end GROUP_ALIAS
FROM MY_TABLE
GROUP BY GROUP_ALIAS
HAVING COUNT(FIELD)=1
Probably I could make the engineered field a second time simply on the original table and check that it isn't in the result from the latter, but want to avoid so much nesting. I don't know how to partition or do anything more sophisticated on my case statement making this engineered field, though.
UPDATE
Thanks for all the great replies below. Something about the SQL used must differ from what I thought because I'm getting info like:
GROUP1|GROUP_ALIAS|FIELD
111,222|,111|111
111,222|,222|222
etc.
Not sure why since the solutions work on my unabstracted data in db-fiddle. If anyone can spot what db it's actually using that would help but I'll also check on my end.
Here is one way, using analytic count. If you are not familiar with the with clause, read up on it - it's a very neat way to make your code readable. The way I declare column names in the with clause works since Oracle 11.2; if your version is older than that, the code needs to be re-written just slightly.
I also computed the "engineered field" in a more compact way. Use whatever you need to.
I used sample_data for the table name; adapt as needed.
with
add_alias (group1, group_alias, field) as (
select group1,
substr(group1, 1, instr(group1, '_')) ||
ltrim(substr(group1, instr(group1, '_') + 1), '0123456789'),
field
from sample_data
)
, add_counts (group1, group_alias, field, ct) as (
select group1, group_alias, field, count(*) over (partition by group_alias)
from add_alias
)
select group1, group_alias, field
from add_counts
where ct > 1
;
With Oracle you can use REGEXP_REPLACE and analytic functions:
select Group1, group_alias, field
from (select group1, REGEXP_REPLACE(group1,'_\d+','_') group_alias, field,
count(*) over (PARTITION BY REGEXP_REPLACE(group1,'_\d+','_')) as count from test) a
where count > 1
db-fiddle

Alternate of like in oracle select query

I have a table with single column.
TABLE T with data like:
A11
B1
As112
DF123
VG112
I'm lookingfor alternate of like (VG% and DF%) ... Because i think if there will be more to compare it may effect performance
Here are some of the alternatives you could try. Performance may improve if you have function based index specific to these expressions in where clause (substr).
select * FROM t where SUBSTR(data,1,2) = 'VG'; -- If you are always comparing first 2 characters.
select * FROM t where SUBSTR(data,1,2) IN ( 'VG' , 'DF'); -- Multiple comparisions
INSTR is another option but you cannot have an index which suits this comparison.
select * FROM t where INSTR(data,'VG') = 1;
I think that like is the best choice. You can create list of patterns and use join, like here:
select * from t join p on t.text like p.pattern||'%'
It does not matter if you use built-in type odcivarchar2list or define patterns using union all, it only makes syntax shorter.
Example:
with t(text) as (
select column_value
from table(sys.odcivarchar2list('A11', 'B1', 'As112', 'DF123', 'VG112'))),
p(pattern) as (
select column_value
from table(sys.odcivarchar2list('DF', 'VG')))
select *
from t join p on t.text like p.pattern||'%'
TEXT PATTERN
------ -------
DF123 DF
VG112 VG

What's the equivalent of Excel's `left(find(), -1)` in BigQuery?

I have names in my dataset and they include parentheses. But, I am trying to clean up the names to exclude those parentheses.
Example: ABC Company (Somewhere, WY)
What I want to turn it into is: ABC Company
I'm using standard SQL with google big query.
I've done some research and I know big query has left(), but I do not know the equivalent of find(). My plan was to do something that finds the ( and then gives me everything to the left of -1 characters from the (.
My plan was to do something that finds the ( and then gives me everything to the left of -1 characters from the (.
Good plan! In BigQuery Standard SQL - equivalent of LEFT is SUBSTR(value, position[, length]) and equivalent of FIND is STRPOS(value1, value2)
With this in mind your query can look like (which is exactly as you planned)
#standardSQL
WITH names AS (
SELECT 'ABC Company (Somewhere, WY)' AS name
)
SELECT SUBSTR(name, 1, STRPOS(name, '(') - 1) AS clean_name
FROM names
Usually, string functions are less expensive than regular expression functions, so if you have pattern as in your example - you should go with above version
But in more generic cases, when pattern to clean is more dynamic like in Graham's answer - you should go with solution in Graham's answer
Just use REGEXP_REPLACE + TRIM. This will work with all variants (just not nested parentheses):
#standardSQL
WITH
names AS (
SELECT
'ABC Company (Somewhere, WY)' AS name
UNION ALL
SELECT
'(Somewhere, WY) ABC Company' AS name
UNION ALL
SELECT
'ABC (Somewhere, WY) Company' AS name)
SELECT
TRIM(REGEXP_REPLACE(name,r'\(.*?\)',''), ' ') AS cleaned
FROM
names
Use REGEXP_EXTRACT:
SELECT
RTRIM(REGEXP_EXTRACT(names, r'([^(]*)')) AS new_name
FROM yourTable
The regex used here will greedily consume and match everything up until hitting an opening parenthesis. I used RTRIM to remove any unwanted whitespace picked up by the regex.
Note that this approach is robust with respect to the edge case of an address record not having any term with parentheses. In this case, the above query would just return the entire original value.
I can't test this solution at the moment, but you can combine SUBSTR and INSTR. Like this:
SELECT CASE WHEN INSTR(name, '(') > 0 THEN SUBSTR( name, 1, INSTR(name, '(') ) ELSE name END as name FROM table;