BigQuery - Using regexp with LIKE operator (?) - sql

I'd like to get productids from url and I've almost finetuned a query to do it but still there is an issue I cannot solve.
The url usually looks like this:
/xp-pen/toll-spe43-deco-pro-small-medium-spe43-tobuy-p665088831/
or
/harry-potter-es-a-tuz-serlege-2019-m19247107/
As you can see there are two types of ids:
in general, ids start with '-p'
ids of some special products start with '-m'
I created this case when statement:
CASE
WHEN MAX(hits.page.pagePath) LIKE '%-p%'
THEN MAX(REGEXP_REPLACE(REGEXP_EXTRACT(
hits.page.pagePath, '-p[0-9]+/'), '\\-|p|/', ''))
WHEN MAX(hits.page.pagePath) LIKE '%-m%'
THEN MAX(REGEXP_REPLACE(REGEXP_EXTRACT(
hits.page.pagePath, '-m[0-9]+/'), '\\-|m|/', ''))
ELSE NULL
END AS productId
It's a little complicated at the first look but I really needed a regexp_replace and a regexp_extract because '-p' or '-m' characters doesn't appear only before the id but it can be multiplied times in a url.
The problem with my code is that there are some special cases when the url looks like this:
/elveszett-profeciak-2019-m17855487/
As you can see the id starts with '-m' but the url also contains '-p'. In this case the result is empty value in the query.
I think it could be solved by modifying the like operator in the when part of the case when statement: LIKE '%-p%' or LIKE '%-m%'
It would be great to have a regexp expression after or instead of the LIKE operator. Something similar to the parameter of '-p[0-9]+/' what I used in regexp_extract function.
So what I would need is to define in the when part of the statement that if the '-p' or '-m' text is followed by numbers in the urls
I'm not sure it's possible to do or not in BQ.

So what I would need is to define in the when part of the statement that if the '-p' or '-m' text is followed by numbers in the urls
I think you want '-p' and '-m' followed by digits. If so, I think this does what you want:
select regexp_extract(url, '-[pm][0-9]+')
from (select '/xp-pen/toll-spe43-deco-pro-small-medium-spe43-tobuy-p665088831/' as url union all
select '/elveszett-profeciak-2019-m17855487/' union all
select '/harry-potter-es-a-tuz-serlege-2019-m19247107/'
) x

Related

T-SQL - How to pattern match for a list of values?

I'm trying to find the most efficient way to do some pattern validation in T-SQL and struggling with how to check against a list of values. This example works:
SELECT *
FROM SomeTable
WHERE Code LIKE '[0-9]JAN[0-9][0-9]'
OR Code LIKE '[0-9]FEB[0-9][0-9]'
OR Code LIKE '[0-9]MAR[0-9][0-9]'
OR Code LIKE '[0-9]APRIL[0-9][0-9]
but I am stuck on wondering if there is a syntax that will support a list of possible values within the single like statement, something like this (which does not work)
SELECT *
FROM SomeTable
WHERE Code LIKE '[0-9][JAN, FEB, MAR, APRIL][0-9][0-9]'
I know I can leverage charindex, patindex, etc., just wondering if there is a simpler supported syntax for a list of possible values or some way to nest an IN statement within the LIKE. thanks!
I think the closest you'll be able to get is with a table value constructor, like this:
SELECT *
FROM SomeTable st
INNER JOIN (VALUES
('[0-9]JAN[0-9][0-9]'),
('[0-9]FEB[0-9][0-9]'),
('[0-9]MAR[0-9][0-9]'),
('[0-9]APRIL[0-9][0-9]')) As p(Pattern) ON st.Code LIKE p.Pattern
This is still less typing and slightly more efficient than the OR option, if not as brief as we hoped for. If you knew the month was always three characters we could do a little better:
Code LIKE '[0-9]___[0-9][0-9]'
Unfortunately, I'm not aware of SQL Server pattern character for "0 or 1" characters. But maybe if you want ALL months we can use this much to reduce our match:
SELECT *
FROM SomeTable
WHERE (Code LIKE '[0-9]___[0-9][0-9]'
OR Code LIKE '[0-9]____[0-9][0-9]'
OR Code LIKE '[0-9]_____[0-9][0-9]')
You'll want to test this to check if the data might contain false positive matches, and of course the table-value constructor could use this strategy, too. Also, I really hope you're not storing dates in a varchar column, which is a broken schema design.
One final option you might have is building the pattern on the fly. Something like this:
Code LIKE '[0-9]' + 'JAN' + '[0-9][0-9]'
But how you find that middle portion is up to you.
The native TSQL string functions don't support anything like that.
But you can use a workaround (dbfiddle) such as
WHERE CASE WHEN Code LIKE '[0-9]%[^ ][0-9][0-9]' THEN SUBSTRING(Code, 2, LEN(Code) - 3) END
IN
( 'JAN', 'FEB', 'MAR', 'APRIL' )
So first of all check that the string starts with a digit and ends in a non-space character followed by two digits and then check the remainder of the string (not matched by the digit check) is one of the values you want.
The reason for including the SUBSTRING inside the CASE is so that is only evaluated on strings that pass the LIKE check to avoid possible "Invalid length parameter passed to the LEFT or SUBSTRING function." errors if it was to be evaluated on a shorter string.

How run Select Query with LIKE on thousands of rows

Newbie here. Been searching for hours now but I can seem to find the correct answer or properly phrase my search.
I have thousands of rows (orderids) that I want to put on an IN function, I have to run a LIKE at the same time on these values since the columns contains json and there's no dedicated table that only has the order_id value. I am running the query in BigQuery.
Sample Input:
ORD12345
ORD54376
Table I'm trying to Query: transactions_table
Query:
SELECT order_id, transaction_uuid,client_name
FROM transactions_table
WHERE JSON_VALUE(transactions_table,'$.ordernum') LIKE IN ('%ORD12345%','%ORD54376%')
Just doesn't work especially if I have thousands of rows.
Also, how do I add the order id that I am querying so that it appears under an order_id column in the query result?
Desired Output:
Option one
WITH transf as (Select order_id, transaction_uuid,client_name , JSON_VALUE(transactions_table,'$.ordernum') as o_num from transactions_table)
Select * from transf where o_num like '%ORD12345%' or o_num like '%ORD54376%'
Option two
split o_num by "-" as separator , create table of orders like (select 'ORD12345' as num
Union
Select 'ORD54376' aa num) and inner join it with transf.o_num
One method uses OR:
WHERE JSON_VALUE(transactions_table, '$.ordernum') LIKE IN '%ORD12345%' OR
JSON_VALUE(transactions_table, '$.ordernum') LIKE '%ORD54376%'
An alternative method uses regular expressions:
WHERE REGEXP_CONTAINS(JSON_VALUE(transactions_table, '$.ordernum'), 'ORD12345|ORD54376')
According to the documentation, here, the LIKE operator works as described:
Checks if the STRING in the first operand X matches a pattern
specified by the second operand Y. Expressions can contain these
characters:
A percent sign "%" matches any number of characters or
bytes.
An underscore "_" matches a single character or byte.
You can escape "\", "_", or "%" using two backslashes. For example, "\%". If
you are using raw strings, only a single backslash is required. For
example, r"\%".
Thus , the syntax would be like the following:
SELECT
order_id,
transaction_uuid,
client_name
FROM
transactions_table
WHERE
JSON_VALUE(transactions_table,
'$.ordernum') LIKE '%ORD12345%'
OR JSON_VALUE(transactions_table,
'$.ordernum') LIKE '%ORD54376%
Notice that we specify two conditions connected with the OR logical operator.
As a bonus information, when querying large datasets it is a good pratice to select only the columns you desire in your out output ( either in a Temp Table or final view) instead of using *, because BigQuery is columnar, one of the reasons it is faster.
As an alternative for using LIKE, you can use REGEXP_CONTAINS, according to the documentation:
Returns TRUE if value is a partial match for the regular expression, regex.
Using the following syntax:
REGEXP_CONTAINS(value, regex)
However, it will also work if instead of a regex expression you use a STRING between single/double quotes. In addition, you can use the pipe operator (|) to allow the searched components to be logically ordered, when you have more than expression to search, as follows:
where regexp_contains(email,"gary|test")
I hope if helps.

Using charlist wildcard in the middle of the string

I'm trying to run a query where the charlist wildcard is defined in the middle of the string as follows:
SELECT * FROM table WHERE key LIKE 'A___[AB]________',
and of course it doesn't work. Here I want to query for 13 letters string which consists of 'A' at the beginning, and 'A' or 'B' at the 6th place. I do not want to use the keyword "OR" for this search since later I have to run more complicated queries, and I want to keep it simple.
Any suggestions?
LIKE does not understand regular expresions in Oracle. use REGEXP_LIKE instead. http://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions018.htm
your regexp should look like this '^A.{4}[AB].{7}$'

RegExp Find Numbers that have All Same Digits

I am working with an Oracle database and would like to write a REGEXP_LIKE expression that finds any number where all digits are the same, such as '999999999' or '777777777' without specifying the length of the field. Also, I would like it to be able to identify characters as well, such as 'aaaaa'.
I was able to get it working when specifying the field length, by using this:
select * from table1
where regexp_like (field1, '^([0-9a-z])\1\1\1\1\1\1\1\1');
But I would like it to be able to do this for any field length.
If a field contains '7777771', for example, I would not want to see it in the results.
Try this:
^([0-9a-z])\1+$
Live demo
You're almost there. You just need to anchor the end of the regex.
^([0-9a-z])\1+$

regarding like query operator

For the below data (well..there are many more nodes in the team foundation server table which i need to refer to..below is just a sample)
Nodes
------------------------
\node1\node2\node3\
\node1\node2\node5\
\node1\node2\node3\node4\
\node1\node2\node3\node4\node5\
I was wondering if i can apply something like (below query does not give the required results)
select * from table_a where nodes like '\node1\node2\%\'
to get the below data
\node1\node2\node3\
\node1\node2\node5\
and something like (below does not give the required results)
select * from table_a where nodes like '\node1\node2\%\%\'
to get
\node1\node2\node3\
\node1\node2\node5\
\node1\node2\node3\node4\
Can the above be done with like operator? Pls. suggest.
Thanks
You'll need to combine two terms, LIKE and NOT LIKE:
select * from table_a where
nodes like '\node1\node2\%\' AND
nodes NOT like '\node1\node2\%\%\'
for the first query, and a similar solution for the second. That's with "plain SQL". There are probably SQL Server specific functions which will count the number of "\" characters in the column, for instance.
maybe use the delimiter to get the resutls.
it is unclear what you are actually trying to get, but you could use the
substr
function to either count or find the position of the delimiter '/' character.
It seems like this would work (basically just eliminating the last backslash):
select * from table_a where nodes like '\node1\node2\%\%'
EDIT
You could also try this:
select * from table_a where
nodes like '\node1\node2\%\' or
nodes like '\node1\node2\%\%\'
A little late to the party, but it appears that the problem is still open. Could it be that the backslashes are escaping the wildcard meaning of the percent signs? And the backslash n could be getting interpreted as well.
Doesn't sql-server know a wildcard for a single character?
select * from table_a
where nodes LIKE '#node1#node2#node_#';
nodes
---------------------
#node1#node2#node5#
#node1#node2#node3#
I testet this on postgresql, where it is hard to insert a backslash, which is the reason why I replaced them with #.
Here is another possibility - negate more than one backslash (# used for my convenience):
SELECT * FROM table_a
WHERE (nodes LIKE '#node1#node2#%#'
AND NOT nodes LIKE '#node1#node2#%#%#');
On postgresql there is too the possibility to match against patterns, with SIMILAR TO, or ~:
SELECT * FROM table_a
WHERE nodes SIMILAR TO '#node1#node2#[^#]*#';
nodes
---------------------
#node1#node2#node5#
#node1#node2#node3#
[] encapsulates a group of alternatively allowed characters, for example [aeiou] would be a lowercase vocal. But when the caret is the first sign in the brackets, the sign(s) are negated so [^aeiou] would mean anything but a lowercase vocal, and [^#] means anything but a #.
The asterix behind that expression means that the preceding sign can occur as often as you like, 0 to million times. (+ would mean at least one times, ? would mean 0 or 1 times).
So '#node1#node2#[^#]*#' means '#node1#node2#', followed by anything but a hash, 0 or single or multiple times, and then, finally a hash.