Postgresql regexp_replace 'g' flag - sql

i have string
[[good|12345]] [[bad1 [[bad2 [[bad3 [[bad4 [[bad5 [[good|12345]]
i need to kill [[ if word havent | after it.
what i do:
select regexp_replace('[[good|12345]] [[bad1 [[bad2 [[bad3 [[bad4 [[bad5 [[good|12345]]',
'\[\[([^\|]+?(\[\[|\Z))', '\1', 'g')
what i get:
[[good|12345]] bad1 [[bad2 bad3 [[bad4 bad5 [[good|12345]]
what i want to get:
[[good|12345]] bad1 bad2 bad3 bad4 bad5 [[good|12345]]
it looks like the last 2 symbols of my regexp [[ doesn't exists in next iteration of regexp

You should use a look-ahead instead of a group:
select regexp_replace('[[good|12345]] [[bad1 [[bad2 [[bad3 [[bad4 [[bad5 [[good|12345]]', '\[\[([^\|]+?(?=\[\[|\Z))', '\1', 'g')
See demo SQL fiddle
The (?=\[\[|\Z) look-ahead only checks the presence of [[, but does not consume the characters (i.e. matches and moves on through the string). Thus, the following [[ remain available for the next match.

Related

Search for any of a list of strings inside another string

I need to identify records with valid addresses by comparing the address fields against a list of street-like words.
So the code would look something like:
set street_list = 'STREET', 'ROAD', 'AVENUE', 'DRIVE', 'WAY', 'PLACE' (etc.)
;
create table [new table] as
select *
from [source table]
where [address line 1] (contains any word from STREET_LIST) or
[address line 2] (contains any word from STREET_LIST) or
[address line 3] (contains any word from STREET_LIST)
;
Is this possible?
Using LostReality's regexp suggestion, I got as far as:
select *
from [source table]
where upper([address line 1]) regexp '.* STREET.*|.* ST.*|.* ROAD.*|.* RD.*|.* CLOSE.*|.* LANE.*|.* LA.*|.* AVENUE.*|.* AVE.*|.* DRIVE.*|.* DR.*|.* HOUSE.*|.* WAY.*|.* PLACE.*|.* SQUARE.*|.* WALK.*|.* GROVE.*|.* GREEN.*|.* PARK.*|.* PK.*|.* CRESCENT.*|.* TERRACE.*|.* PARADE.*|.* GARDEN.*|.* GARDENS.*|.* COURT.*|.* COTTAGES.*|.* COTTAGE.*|.* MEWS.*|.* ESTATE.*|.* RISE.*|.* FARM.*'
;
and it seems to work.
But I have two small problems with it:
1) how do I write the regexp on more than one line so it's easier to read?
2) is there any way of putting that regexp into a macro variable because I want to check 5 address lines and I don't want 5 copies of the same expression.
Thanks
Solution for Hive. You can put regexp pattern in the variable and also you can use macro, fixed your template:
set hivevar:street_list ='STREET|ST|ROAD|RD|CLOSE|LANE|LA|AVENUE|AVE|DRIVE|DR|HOUSE|WAY|PLACE|SQUARE|WALK|GROVE|GREEN|PARK|PK|CRESCENT|TERRACE|PARADE|GARDEN|GARDENS|COURT|COTTAGES|COTTAGE|MEWS|ESTATE|RISE|FARM';
--boolean macro for using in the WHERE
create temporary macro contains_word(s string) (upper(s) rlike ${hivevar:street_list} ) ;
with some_table as ( --use your table instead of this synthetic example
select stack(2,'some string containing STREET and WALK',
'some string containing something else') as str
) --use your table instead of this synthetic example
--use macro in your query
select str from some_table
where contains_word(str);
Result:
OK
some string containing STREET and WALK
Time taken: 0.229 seconds, Fetched: 1 row(s)
Use OR like in your question:
where contains_word(address_line_1) OR contains_word(address_line_2) ...
Hope you have got the idea

SQL: run only one of two stataments if an internal condition is met

Don't know if someone here knows Tasker Android app, but I think everyone could globally understand what I'm looking to accomplish, because I will basically talk about "raw" SQL code, as it's written on most common languages.
First, this is what I want, roughly:
IF (SELECT * FROM ("january") WHERE ("day") = (19)) MATCHES [%records(#) = 1] END
ELSE
SELECT * FROM ("january") WHERE ("day") = (19) ORDER BY ("timea") DESC END
What I want to say above is: If in the first part of the code (IF ... END) the number of the resulting records, matching the number 19 on 'day' column, is just one, end execution here; but if more than one record is found, jump to the next part, after ELSE.
And if you are a Tasker user, you will understand the next (my current) setup:
A1: SQL Query [ Mode:Raw File:Tasker/Resources/Calendar Express/calendar_db Table:january Columns:day Query:SELECT * FROM ("january") WHERE ("day") = (19) Selection Parameters: Order By: Output Column Divider: Variable Array:%records Use Root:Off ]
A2: SQL Query [ Mode:Raw File:Tasker/Resources/Calendar Express/calendar_db Table:january Columns:day Query:SELECT * FROM ("january") WHERE ("day") = (19) ORDER BY ("timea") DESC Selection Parameters: Order By: Output Column Divider: Variable Array:%records Use Root:Off ] If [ %records(#) > 1 ]
An:...
So, as you can see, A1 will run always, without exceptions, getting the result in the variable array '%records()' (% is how Tasker identifies vars, as $ in other langs; and the use of parenthesis rather than brackets). Then, if the number of entries inside the array is just one, A2 will be jumped (if %records(#) > 1), and following actions are executed.
But, if after running A1 the %records() array contains 3, A2 action will be executed overwritting the content of %records() array, previoulsy set. But this time will contain same number of records (3), but reordered.
Is possible to do so, in just one code line? Thanks ;)
As 'sticky bit' replied on a comment before, I can just still using the second action, as it won't affect the output if it's only a single record. Solved!

Parsing SQL Queries by semicolon

I'm trying to read, using Scala, a sql file full of queries to be executed, however, I'm struggling to parse special cases that contain a semicolon that is not the terminator. For example, if the query is:
SELECT * FROM table WHERE name LIKE "%;%",
It separates this into two statements even though it should be one.
Assuming that the query terminator is always a ; at the end of a line, we can make good use of
.split(";\\s*\\n"); matching the ; zero or more whitespace characters follows by an newline character.
or .split("(?m);\\s*$") using the inline (?m) multiline modifier that allows us to match the end of the line with $).
Sample Code:
val a = """SELECT * FROM table WHERE name LIKE "%;%"
AND regexp_replace(
'abcd1234df-TEXT_I-WANT' -- use your input column here instead
, '^[a-z0-9]{10}-(.*)\$' -- matches whole string, captures "TEXT_I-WANT" in \$1
, '\$1' -- inserts \$1 to returnÖ TEXT_I-WANT
) = 'TEXT_I-WANT'
;
SELECT * FROM table WHERE name LIKE "%;%"
AND regexp_replace(
'abcd1234df-TEXT_I-WANT' -- use your input column here instead
, '^[a-z0-9]{10}-(.*)\$' -- matches whole string, captures "TEXT_I-WANT" in \$1
, '\$1' -- inserts \$1 to returnÖ TEXT_I-WANT
) = 'TEXT_I-WANT'
;""".split(";\\s*\\n")
println(a.mkString("Next Query:"))
If you prefer to match, this pattern can do a good job too: "(?m)^[\\s\\S]*?;$"
(add additional whitespace \s as needed)
Full Sample:
import scala.util.matching.Regex
object Demo {
def main(args: Array[String]) {
val pattern = new Regex("(?m)^[\\s\\S]*?;\\s*$")
val str = """SELECT * FROM table WHERE name LIKE "%;%"
AND regexp_replace(
'abcd1234df-TEXT_I-WANT' -- use your input column here instead
, '^[a-z0-9]{10}-(.*)\$' -- matches whole string, captures "TEXT_I-WANT" in \$1
, '\$1' -- inserts \$1 to returnÖ TEXT_I-WANT
) = 'TEXT_I-WANT'
;
SELECT * FROM table WHERE name LIKE "%;%"
AND regexp_replace(
'abcd1234df-TEXT_I-WANT' -- use your input column here instead
, '^[a-z0-9]{10}-(.*)\$' -- matches whole string, captures "TEXT_I-WANT" in \$1
, '\$1' -- inserts \$1 to returnÖ TEXT_I-WANT
) = 'TEXT_I-WANT'
;"""
println((pattern findAllIn str).mkString("\n----------------\n"))
}
}
Try Regex: ^.*?;$ with m option (to match new line) as mentioned here
Demo

MariaDb: JSON_ARRAY_APPEND to empty array

I find it confusing (and not working) to use JSON_ARRAY_APPEND with empty JSON arrays according to the docs. I'm using newest version of MariaDb 10.2.6.
When I do:
SELECT JSON_ARRAY_APPEND('[1]', '$', JSON_EXTRACT('{"test":123}', '$'));
Result is as expected:
[1, {"test": 123}]
(the same with:
SELECT JSON_ARRAY_APPEND(JSON_EXTRACT('[1]', '$'), '$', JSON_EXTRACT('{"test":123}', '$'));
)
But, when I operate on empty array:
SELECT JSON_ARRAY_APPEND('[]', '$', JSON_EXTRACT('{"test":123}', '$'));
The result is:
(NULL)
Probably because of this I cannot update field with empty array. When I do:
UPDATE `test` SET `test`.`log` = JSON_ARRAY_APPEND(`test`.`log`, '$', JSON_EXTRACT('{"test":123}', '$'))
I get an error:
(4038) Syntax error in JSON text in argument 1 to function 'json_array_append' at position 2
Am I getting something wrong or is it some kind of bug or caveat?
Regards,
JK.
MariaDB [test]> SELECT JSON_ARRAY_APPEND(JSON_ARRAY(''), '$', JSON_EXTRACT('{"test":123}', '$'))\G
*************************** 1. row ***************************
JSON_ARRAY_APPEND(JSON_ARRAY(''), '$', JSON_EXTRACT('{"test":123}', '$')): ["", {"test": 123}]
1 row in set (0.00 sec)

REGEXP_SUBSTR : extracting portion of string between [ ] including []

I am on Oracle 11gR2.
I am trying to extract the text between '[' and ']' including [].
ex:
select regexp_substr('select userid,username from tablename where user_id=[REQ.UID] and username=[REQD.VP.UNAME]','\[(.*)\]') from dual
Output:
[REQ.UID] and username=[REQD.VP.UNAME]
Output needed:
[REQ.UID][REQD.VP.UNAME]
Please let me know how to get the needed output.
Thanks & Regards,
Bishal
Assuming you are just going to have two occurrences of [] then the following should suffice. The ? in the .*? means that it is non-greedy so that it doesn't gobble up the last ].
select
regexp_replace('select userid,username from tablename where user_id=[REQ.UID] and username=[REQD.VP.UNAME]'
,'.*(\[.*?\]).*(\[.*?\]).*','\1\2')
from dual
;
I'm not an Oracle user, but from quick perusal of the docs, I think this should be close:
REGEXP_REPLACE('select userid,username from tablename where user_id=[REQ.UID] and username=[REQD.VP.UNAME]',
'^[^\[]*(\[[^\]]*\])[^\[]*(\[[^\]]*\])$', '\1 \2')
Which looks much nastier than it is.
Pattern is:
^[^\[]* Capture all characters up to (but not including) the first [
(\[[^\]]*\]) Capture into group 1 anything like [<not "]">]
[^\[]* Capture everything up to (nut not including) the next [
(\[[^\]]*\]) Capture into group 2 anything like [<not "]">], at the end of the string
Then the replacement is simple, just <grp 1> <grp 2>