I'm still newbie to Bigquery.
I'm trying to get a data, if a field there is a string like:
> /a/arrow
> /b/bow
> /c/cheese
> /d/dog
> /e/edward
> /f/fruit
> ....
> /z/zebra
I've written:
WHEN
REGEXP_CONTAINS(LOWER(page_name),'/|^/a/|^/b/|^/c/|^/d/|^/e/|^/f/|^/g/|^/h/|^/i/|/^j/|^/k/|^/l/|^/m/|^/n/|^/o/|^/p|^/q/|^/r/|^/s/|^/t/|^/u/|^/v/|^/w/|^/x/|^/y/|^/z/') then 'library'
But it's still not working well, mixed with values other than those in the pattern. Can I get the correct value? thank you
Thank you in advance for the help!
You can use
REGEXP_CONTAINS(entrance_page_name,'^/[a-z]/')
The regex matches
^ - start of string
/ - a / char
[a-z] - a lowercase ASCII letter
/ - a / char.
Related
I want to format the strings in a table column, in a specific format.
Input table:
file_paths
my-file-path/wefw/wefw/2022-03-20
my-file-path/wefw/2022-01-02
my-file-path/wef/wfe/wefw/wef/2021-02-03
my-file-path/wef/wfe/wef/
I want to remove everything after the last / sign, if the only thing after it resembles a date (i.e. YYYY-MM-dd or ####-##-##).
Output:
file_paths
my-file-path/wefw/wefw/
my-file-path/wefw/
my-file-path/wef/wfe/wefw/wef/
my-file-path/wef/wfe/wef/
I'm thinking of doing something like:
SELECT regexp_replace(file_paths, 'regex_here', '', 1, 'i')
FROM my_table
I'm unsure of how to write the RegEx for this though. I'm also open to easier methods of string manipulation, if there are any. Thanks in advance!
You can use
REGEXP_REPLACE ( file_paths, '/[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}$', '/' )
See the regex demo.
The /[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}$ is a POSIX ERE compliant pattern matching
/ - a slash
[0-9]{4} - four digits
- - a hyphen
[0-9]{1,2} - one or two digits
-[0-9]{1,2} - a hyphen and one or two digits
$ - end of string.
If your values can contain trailing whitespace, insert [[:space:]]* before $: /[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}[[:space:]]*$.
You may use this regex:
^(.*?\/)\d{4}-\d{2}-\d{2}$
You may try this query:
select regexp_replace(file_paths, '^(.*?\/)\\d{4}-\\d{2}-\\d{2}$','$1')
Demo
Good,
I need help to create a regular expression to just take the name and extension of file of the following directories.
/home/user/work/file1.dbf
/opt/user/file2.dfb
I am trying to create an expression in Oracle12C to only output "file1.dbf" and "file2.dbf".
I am currently trying to do the regular expression on the next page and reading the following documentation.
Thanks in advance and I hope I have explained correctly.
You don't need a regular expression to do this. A combination of substr and instr would be sufficient.
instr(colname,'/',-1) gets the last occurrence of / in the string. And the substring after that position would be the filename as per the data shown.
The filter instr(colname,'/') > 0 restricts the rows which don't have a / in them.
select substr(colname,instr(colname,'/',-1)+1) as filename
from tablename
where instr(colname,'/') > 0
A regular expression for the same would be
select regexp_substr(colname,'(.*/|^)(.+)$',1,1,null,2) as filename
from tablename
(.*/|^) - All the characters upto the last / occurence in the string or the start of the string if there are no / characters.
(.+)$ - All the characters after the last / if it exists in the string or the full string if / doesn't exist.
They are extracted as 2 groups and we are interested in the 2nd group. Hence the argument 2 at the end of regexp_substr.
Read about the arguments to REGEXP_SUBSTR here.
An alternative regex approach would be to strip everything up to the last /:
with demo as
( select '/home/user/work/file1.dbf' as path from dual union all
select '/opt/user/file2.dfb' from dual )
select path
, regexp_replace(path,'^..*/') as filename
from demo;
I'm not sure how to write my regex command on Hive to pull the numerical prefix substring from this string: 211118-1_20569 - (DHCP). I need to return 211118, but also have the flexibility to return digits with smaller or larger values depending on the size of the numerical prefix.
hive> select regexp_extract('211118-1_20569 - (DHCP)','^\\d+',0);
OK
211118
or
hive> select regexp_extract('211118-1_20569 - (DHCP)','^[0-9]+',0);
OK
211118
^ - The beginning of a line
\d - A digit: [0-9]
[0-9] - the characters between '0' and '9'
X+ - X, one or more times
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
regexp_extract(string subject, string pattern, int index)
predefined character classes (e.g. \d) should be preceded with additional backslash (\\d)
index = 0 matches the whole pattern
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringOperators
I came through an expression -
select * from table where regexp_like(field, '^\d+\D+$');
I'm sure of what the expression does, but please can someone explain what '^\d+\D+$' refers to exactly?
Thanks.
^ beginning of string
\d single digit
+ one or more occurrences of preceding
\D nondigit character
+ one or more occurrences
$ end of string
So, it means one or more digits followed by one or more nondigits, and that should be the whole string, not a substring.
Write a grammar that generates strings that contain matched brackets and parentheses. Examples of valid strings are:
[([])]
()()[[]]
[[]][()]()
Examples of invalid strings are:
[}
[[]
()())
][()
My answer:
< string > -> < term >*
< term > -> (< string >) | [< string >]
If this works the way I think it does than a < string > turns into zero or more terms which are then put in brackets or parenthesis and then filled with zero or more terms again. However I'm not sure about the asterisk and haven't been able to find any examples of someone using it the way I did.
Sorry if I'm way off.
Turns out the answer was:
< S > -> < S >< S > | (< S >) | [< S >] | () | []
mine was "not valid in BNF, no base cases", Oh well.