How to join tables on regex - sql

Say I have two tables msg for messages and mnc for mobile network codes.
They share no relations. But I want to join them
SELECT msg.message,
msg.src_addr,
msg.dst_addr,
mnc.name,
FROM "msg"
JOIN "mnc"
ON array_to_string(regexp_matches(msg.src_addr || '+' || msg.dst_addr, '38(...)'), '') = mnc.code
But query fails with error:
psql:marketing.sql:28: ERROR: argument of JOIN/ON must not return a set
LINE 12: ON array_to_string(regexp_matches(msg.src_addr || '+' || msg...
Is there a way to do such join? Or am I moving wrong way?

A very odd way to join. Every match on one side is combined with every row from the other table ...
regexp_matches() is probably the wrong function for your purpose. You want a simple regular expression match (~). Actually, the LIKE operator will be faster:
Presumably fastest with LIKE
SELECT msg.message
, msg.src_addr
, msg.dst_addr
, mnc.name
FROM mnc
JOIN msg ON msg.src_addr LIKE ('%38' || mnc.code || '%')
OR msg.dst_addr LIKE ('%38' || mnc.code || '%')
WHERE length(mnc.code) = 3;
In addition, you only want mnc.code of exactly 3 characters.
With regexp match
You could write the same with regular expressions but it will most definitely be slower. Here is a working example close to your original:
SELECT msg.message
, msg.src_addr
, msg.dst_addr
, mnc.name
FROM mnc
JOIN msg ON (msg.src_addr || '+' || msg.dst_addr) ~ (38 || mnc.code)
AND length(mnc.code) = 3;
This also requires msg.src_addr and msg.dst_addr to be NOT NULL.
The second query demonstrates how the additional check length(mnc.code) = 3 can go into the JOIN condition or a WHERE clause. Same effect here.
With regexp_matches()
You could make this work with regexp_matches():
SELECT msg.message
, msg.src_addr
, msg.dst_addr
, mnc.name
FROM mnc
JOIN msg ON EXISTS (
SELECT *
FROM regexp_matches(msg.src_addr ||'+'|| msg.dst_addr, '38(...)', 'g') x(y)
WHERE y[1] = mnc.code
);
But it will be slow in comparison.
Explanation:
Your regexp_matches() expression just returns an array of all captured substrings of the first match. As you only capture one substring (one pair of brackets in your pattern), you will exclusively get arrays with one element.
You get all matches with the additional "globally" switch 'g' - but in multiple rows. So you need a sub-select to test them all (or aggregate). Put that in an EXISTS - semi-join and you arrive at what you wanted.
Maybe you can report back with a performance test of all three?
Use EXPLAIN ANALYZE for that.

Your immediate problem is that regexp_matches could return one or more rows.

Try using "substring" instead, which extracts a substring given a regex pattern.
SELECT msg.message,
msg.src_addr,
msg.dst_addr,
mnc.name
FROM "msg"
JOIN "mnc"
ON substring(msg.src_addr || '+' || msg.dst_addr from '38(...)') = mnc.code

Related

How to select rows with only Numeric Characters in Oracle SQL

I would like to keep rows only with Numeric Character i.e. 0-9. My source data can have any type of character e.g. 2,%,( .
Input (postcode)
3453gds sdg3
454232
sdg(*d^
452
Expected Output (postcode)
454232
452
I have tried using WHERE REGEXP_LIKE(postcode, '^[[:digit:]]+$');
however in my version of Oracle I get an error saying
function regexp_like(character varying, "unknown") does not exist
You want regexp_like() and your version should work:
select t.*
from t
where regexp_like(t.postcode, '^[0-9]+$');
However, your error looks more like a Postgres error, so perhaps this will work:
t.postcode ~ '^[0-9]+$'
For Oracle 10 or higher you can use regexp functions. In earlier versions translate function will help you :
SELECT postcode
FROM table_name
WHERE length(translate(postcode,'0123456789','1')) is null
AND postcode IS NOT NULL;
OR
SELECT translate(postcode, '0123456789' || translate(postcode,'x123456789','x'),'0123456789') nums
FROM table_name ;
the above answer also works for me
SELECT translate('1234bsdfs3#23##PU', '0123456789' || translate('1234bsdfs3#23##PU','x123456789','x'),'0123456789') nums
FROM dual ;
Nums:
1234323
For an alternative to the Gordon Linoff answer, we can try using REGEXP_REPLACE:
SELECT *
FROM yourTable
WHERE REGEXP_REPLACE(postcode, '[0-9]+', '') IS NULL;
The idea here is to strip away all digit characters, and then assert that nothing were left behind. For a mixed digit-letter value, the regex replacement would result in a non-empty string.

SQL special group by on list of strings ending with *

I would like to perform a "special group by" on strings with SQL language, some ending with "*". I use postgresql.
I can not clearly formulate this problem, even if I have partially solved it, with select, union and nested queries which are not elegant.
For exemple :
1) INPUT : I have a list of strings :
thestrings
varchar(9)
--------------
1000
1000-0001
1000-0002
2000*
2000-0001
2000-0002
3000*
3000-00*
3000-0001
3000-0002
2) OUTPUT : That I would like my "special group by" return :
1000
1000-0001
1000-0002
2000*
3000*
Because 2000-0001 and 2000-0002 are include in 2000*,
and because 3000-00*, 3000-0001 and 3000-0002 are includes in 3000*
3) SQL query I do :
SELECT every strings ending with *
UNION
SELECT every string where the begining NOT IN (SELECT every string ending with *) <-- with multiple inelegant left functions and NOT IN subqueries
4) That what I'm doing return :
1000
1000-0001
1000-0002
2000*
3000*
3000-00* <-- the problem
The problem is : 3000-00* staying in my result.
So my question is :
How can I generalize my problem? to remove all string who have a same begining string in the list (ending with *) ?
I think of regular expressions, but how to pass a list from a select in a regex ?
Thanks for help.
Select only strings for which no master string exists in the table:
select str
from mytable
where not exists
(
select *
from mytable master
where master.str like '%*'
and master.str <> mytable.str
and rtrim(mytable.str, '*') like rtrim(master.str, '*') || '%'
);
Assuming that only one general pattern can match any given string, the following should do what you want:
select coalesce(tpat.thestring, t.thestring) as thestring
from t left join
t tpat
on t.thestring like replace(tpat.thestring, '*', '%') and
t.thestring <> tpat.thestring
group by coalesce(tpat.thestring, t.thestring);
However, that is not your case. However, you can adjust this with distinct on:
select distinct on (t.thestring) coalesce(tpat.thestring, t.thestring)
from t left join
t tpat
on t.thestring like replace(tpat.thestring, '*', '%') and
t.thestring <> tpat.thestring
order by t.thestring, length(tpat.thestring)

How to add a title and substring

I need to get a list of names as per the following format
"Mr."+first name initial+last name+"."
There is only one table for this
salesperson (f_name, l_name)
What i have been trying is;
SELECT 'Mr.' ||' ' || SUBSTRING(f_name,1,1) || ' ' || l_name ||’.’||
FROM salesperson;
It works without the substring or left, but not if I include them.
Use concat instead of || operator to concatenate strings in MySQL. As you have it, it would be interpreted as logical OR condition, hence you get the error.
SELECT CONCAT('Mr.',' ',SUBSTRING(f_name,1,1),' ',l_name,'.')
FROM salesperson;
Oracle solution
SELECT 'Mr.'||' '||SUBSTR(f_name,1,1)||' '||l_name||'.'
FROM salesperson;
It is better practice anyways to grab the name data in full and then format it in the view part of your application with languages that are more suited to string manipulation. This also makes your code more reusable.
That being said use this
SELECT CONCAT("Mr. ",SUBSTRING( f_name, 1, 1 ) ," ",l_name,".") FROM salesperson

Using CASE on empty string

I have a code that goes like this:
SELECT
'"35933-14",' ||
'"' || us_1.gr_UniqueName || '",' ||
'"' || (CASE WHEN us_1.mls0_PrimaryString = '' THEN 'This is empty'
WHEN CAST(Length(us_1.mls0_PrimaryString) AS INT) < 4 THEN ('Less than 4: '|| SUBSTR(us_1.mls0_PrimaryString,1,10000))
ELSE SUBSTR(us_1.mls0_PrimaryString,1,10000) END) || '",' ||
'"",' ||
'"",' ||
'""'
FROM
us_GroupTab us_1
WHERE (us_1.gr_Active = 1)
AND (us_1.gr_PurgeState = 0)
AND (us_1.gr_PartitionNumber = 0)
AND (us_1.gr_UniqueName IN ('US_HARDWARE_1', 'US_HARDWARE_2','GROUP_NULL'));
Basically the problem is that not all empty string is handled, some users are only inputting multiple spaces which the first case statement does not handle. Is there any way to do this, I have tried using TRIM function but it does not work.
Thanks!
An empty string is the same as null in Oracle, and you can't compare anything to null. You need to use is null instead of = null or = ''.
CASE WHEN TRIM(us_1.mls0_PrimaryString) IS null THEN 'This is empty' ...
You also don't need to cast the length check to int. And the maximum length of a varchar2 before 12c is 4000 chars, so there's no point using 10000 in your substr. In fact the first substr isn't going to do anything anyway as you already know the length is less than 4.
If you want to remove new lines and carriage returns before checking - and that is perhaps something you should be doing client-side, unless you want to store those too - then you can either replace them first:
CASE WHEN TRIM(REPLACE(REPLACE(us_1.mls0_PrimaryString, CHR(10)), CHR(13))) IS null
THEN ...
Or more generically remove all whitespace which would catch tabs etc. too:
CASE WHEN REGEXP_REPLACE(us_1.mls0_PrimaryString, '[[:space:]]') IS NULL THEN ...
Or:
CASE WHEN REGEXP_LIKE(us_1.mls0_PrimaryString, '^[[:space:]]*$') THEN ...
Note that don't need a separate trim with regexp_replace.
Best solution would be to validate and filter that kind of input before it even enters the database.
But as that is not the case, a solution that could work:
regexp_matches()

SQL Query inside a function

I am using PostgreSQL with PostGis. I am executing a query like this:
select st_geomfromtext('point(22 232)',432)
It works fine. But now I want to take a value through a query. for example:
select st_geomfromtext('point((select x from data_name where id=1) 232)' , 432)
Here data_name is some table I am using and x stores some values. Now query inside is treated as a string and no value is returned.
Please help.
ERROR: syntax error at or near "select"
Try this:
select st_geomfromtext('point(' || x || ' 232)', 432) from data_name where id=1
Postgis has a function ST_MakePoint that is faster than ST_GeomFromText.
select ST_SetSRID(ST_MakePoint(x),432) from data_name where id=1;
While #muratgu answer is generally the way to go, one minor note:
A subquery gets you a different result when no row is found for id = 1. Then you get nothing back (no row), instead of:
select st_geomfromtext(NULL, 432)
If you need a drop-in replacement:
select st_geomfromtext('point('
|| (select x from data_name where id=1)
|| ' 232)' , 432)